LIB}: "’ V Michigan State University ”as" \\l\l\\\\ This is to certify that the thesis entitled THE DEVELOPMENT AND VALIDATION OF THE AFFECT RECOGNITION AND RESPONSE SCALE, A MEASURE OF EMPATHIC ABILITY presented by Margaret Ann Parsons has been accepted towards fulfillment of the. requirements for Ph.D. degree in Counsel inc;l Personnel Services and Educational Psychology Major professor Date 5/7/7; / / 0-7639 OVERDUE FINES ARE 25¢ PER DAY PER ITEM Return to book drop to remove this checkout from your record. '\|~-.'I1 in}; ‘52? _ 532351995} W '9 , ["000 H2794 ! i 15; t L“. © 1979 MARGARET ANN PARSONS ALLRIGHTS RESERVED THE DEVELOPMENT AND VALIDATION OF THE AFFECT RECOGNITION AND RESPONSE SCALE, A MEASURE OF EMPATHIC ABILITY By Margaret Ann Parsons A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services and Educational Psychology 1979 ABSTRACT THE DEVELOPMENT AND VALIDATION OF THE AFFECT RECOGNITION AND RESPONSE SCALE, A MEASURE OF EMPATHIC ABILITY By Margaret Ann Parsons Empathy, the ability to recognize what another person is feeling and respond appropriately to those feelings, is a necessary part of the counseling relationship. Many attempts have been made to develop an instrument to adequately measure this ability. To date, however, there is no instrument available which has been adequately validated for use in graduate admissions programs in psychology. The purpose of this study was to develop such an instrument and provide initial validation data for it. The Affect Recognition and Response Scale combines two areas of theory and research which have been previously studied separately: the ability to recognize emotions through facial expressions of affect and the measurement of the ability to respond empathically to another person. The first subtest of the Affect Recognition and Response Scale uses slides of facial affect to measure the ability to recognize emotions. The second and third subtests use color videotape and written stimulus situations with multiple-choice answers to measure the ability to respond empathically. Margaret Ann Parsons Sixty-five subjects were selected from each of two populations, one population of students in professions which regularly require the use of empathic skills and a p0pulation of students in professions which do not require the regular use of empathic skills. Subjects in Group I were graduate students in clinical and counseling psychology at Michigan State University and Central Michigan University. Subjects in Group II were graduate students in engineering, mathematics and the physical sciences at Michigan State University. A multiple measures design with two crossed factors of group and sex was used. In addition, supervisors' ratings of affective skills were obtained for subjects in Group I who were currently engaged in clinical work. The scale was administered to subjects in small groups, demographic data was obtained on a Biographical Data Sheet, and supervisors' ratings were obtained for subjects in Group 1. Reliability for the total scale was estimated to be .853. The average percent of agreement among expert judges for all items on the scale was .94. Factor analysis results indicated one main factor for the scale corresponding to the ability to respond empathically, regardless of stimulus situation format. The factor analysis structure did not correspond to the subtest structure of the scale, nor was there a secondary factor structure corresponding to the emotion categories for the slides of facial affect used in the first subtest. Margaret Ann Parsons The major results of the study were: l. Graduate students in clinical and counseling psychology scored significantly higher than graduate students in engineering, mathematics and the physical sciences on the second and third subtests. There were no significant differences between groups on the first subtest (slides of facial affect. 2. There was no relationship between subtest scores and supervisors' ratings of affective skills. 3. There were no significant differences between men and women on any of the subtests. 4. There was a slight but significant positive relationship between scores on each of the subtests and graduate grade-point average. The scale was found to be within acceptable limits for reliability and some initial positive validation data was obtained. While the recognition of emotions may be a necessary condition for the ability to respond adequately, this ability seems to be wide- spread. Further study of how the expression of emotions is modified by display rules and how groups differ in the ability to recognize such modified expressions is needed. Further validation of the Affect Recognition and Response Scale is indicated, using different criterion groups and different behavioral measures of empathic ability. DEDICATION To Bill, John, and Mike ii ACKNOWLEDGMENTS Many people have provided help, support and understanding during the completion of this work: my family, especially, Chuck Bassos, Bill Hinds, Judith Krupka, Bill Farquhar, Don Grummon, Diane Singleton, Margaret Zerba, Martha Anderson and Doug Miller. TABLE OF CONTENTS LIST OF TABLES . Chapter I. II. III. THE PROBLEM Need Purpose . Theory Theories of Emotion Definitions of Emotion Nature of Emotion . . . Facial Expressions of Emotion . Recognition of Emotions . Definitions of Empathy Measurement of Empathy General Hypotheses Definitions of Terms Delimitations . Overview RELATED RESEARCH . Recognition of Emotion: Nonverbal Behavior . Recognition of Emotion: Facial Expression of Affect : Development of Stimulus Photographs . Recognition-Labeling Experiments . Definitions of Empathy . . Measures of Empathy . Demographic Variables . Use of Standardized Tests . Criterion Groups . . Implications of Related Research DESIGN OF THE STUDY . Sample Development and Description of the Affect .Recognition. and Response Scale Item Pool . . Pilot Videotape Pilot Studies iv Page vii Chapter IV. Description of Subtest l: Slides of Facial Affect . Development of Subtest 2: Videotape Vignettes Development of Subtest 3: Written Stimulus Situations . . . . . . . . . Expert Judges' Data . Subtest Score Correlation Matrix . Reliability. Factor Analysis Description of Other Instruments Research Design Apparatus Methodology Hypotheses . Analysis Smmmy. ANALYSIS OF RESULTS . Hypothesis 1: Differences Between Criterion Groups Hypothesis II: Prediction of Supervisors' Ratings . Hypothesis III. Differences between Male and Female . Subjects . . Group by Sex Interaction Effects Analysis of Intelligence Effects Summary. . . . SUMMARY AND CONCLUSIONS Summary. Conclusions . . Assessment of the Scale . Validation of the Scale . . . Differences Between Men and Women . Interaction Effects Between Group and Sex . Intelligence Effects . . . . Future Use of the Scale . Limitations of the Study . . Implications for Future Research Page APPENDICES A. Sample Test Packet . . . . . . . . . . . . . 104 B. Percent of Agreement for Pictures of Facial Affect . . ll7 C. Inter-Item Correlation Matrix . . . . . . . . . ll9 D. Item-Total Reliability Data . . . . . . . . . . 121 E. Item Analysis Data . . . . . . . . . . . . . 126 F. Item-Factor Correlations . . . . . . . . . . . l3l BIBLIOGRAPHY . . . . . . . . . . . . . . . . . l35 vi Table 3.l 3.2 LIST OF TABLES Demographic Data for Group I and Group II . Number of Expert Judges Choosing Each Emotion Category for Slides of Facial Affect (Subtest 1) Number of Expert Judges Rating Each Response Choice Correct or Incorrect for Subtest 2 and Subtest 3 Subtest Score Correlation Matrix . Reliability and Item Analysis Data for Subtest Scores and Total Scale Scores . . . . Varimax Rotated Factor Matrix for All Items with Minimum Eigen Value of l.00 . . . Varimax Rotated Factor Matrix for All Items with the Number of Factors Preset to Three Varimax Rotated Factor Matrix for Subtest l (Slides of Facial Affect) with the Number of Factors Preset to Five . . . . . . . . Design of the Study for Group and Sex Effects Research Design for Supervisor's Ratings Analysis Summary Data for Multivariate Analysis of Group Effects . . . . . . . Univariate Analyses of Variance for Group Effects Stepwise Multiple Regression Summary Table for Prediction of Supervisors' Ratings from Subtest Scores . . . . . . . . . . . Summary Data for Multivariate Analysis of Sex Effects . Summary Data for Multivariate Analysis of Group by Sex Interaction Effects . . . . . . . . vii Page 43 49 51 53 54 56 58 59 62 63 71 72 74 76 77 Table 4.6 'nrnrnrn Univariate Analyses of Variance for Groups on Undergraduate and Graduate Grade-Point Average . Stepwise Multiple Regression Summary Table for Prediction of Subtest Scores from Undergraduate and Graduate Grade-Point Average (G.P.A.) Comparison of Univariate Analyses of Variance for Group Effects on Subtest Scores with Undergraduate and Graduate Grade-Point Average as Covariates and without Covariates . . . . . . . Percent of Judgments of Each Emotion for Each Photograph . Inter-Item Correlation Matrix Reliability Analysis for Total Scale Reliability Analysis for Subtest l (Slides of Facial Affect) . Reliability Analysis for Subtest 2 (Videotape Vignettes) . Reliability Analysis for Subtest 3 (Written Stimulus Situations) . . . . . Percent of Agreement for Slides of Facial Affect for All Subjects . Raw Score Distribution for Total Scale Item Difficulty and Discrimination Indices Item Response Summary Data Item-Factor Correlations for All Items With No Preset Factors . . . Item-Factor Correlations for All Items With Three Preset Factors . . . . . . . Item-Factor Correlations for Subtest l with Five Preset Factors viii Page 79 81 83 118 120 122 123 124 125 127 128 129 130 132 133 134 CHAPTER I THE PROBLEM The ability to be able to recognize what another person is feeling and respond appropriately to those feelings may be considered an essential part of the counseling relationship. Rogers (1951, 1957) has labeled this ability empathy, and his work has been expanded on by Truax and Carkhuff (1967), who have sought to opera- tionalize definitions of empathy and study the component abilities included in this skill. Carkhuff (1969) defines empathy as the ability to determine what a client is feeling and to communicate this understanding to the client. He has labeled these two compo- nents the ability to discriminate and the ability to communicate. A related field of research has developed which has sought to operationalize and measure the recognition of affect through the study of facial expressions of emotion. Izard (1971) has studied emotion recognition and labeling across cultures using a set of slides showing different affect categories which he has developed. Ekman and his associates (1972) have also developed slides to study the expression of facial affect and the rules which appear to govern the display of emotion and the recognition of facial expressions. While it seems that the ability to respond empathically to another person rests on the prior ability to recognize that person's l feelings and that one of the major cues to another's feelings is his/ her facial expression, little work has been done to integrate these two fields of study. One of the intentions of this study will be to integrate the related research fields of empathy and the study of facial expressions of emotion. Need There have been periodic attempts in the literature to point out the necessity for making selection decisions in graduate programs in psychology based on criteria which include noncognitive dimensions, particularly empathic ability (Sub-Committee on Counselor Trainee Selection, Division of Counseling Psychology, American Psychological Association, 1954; Santavicca, 1959; Stripling and Lester, 1963; Patterson, 1962; Carkhuff, 1969d; Hurst, 1973; Jones, 1974; Hurst and Shatkin, 1974). The need for an instrument to assess empathic skills is also evident in related areas. In the field of medical education, for example, increasing emphasis is being placed on the importance of the doctor-patient relationship (Schofield, 1966; Turner, et a1., 1974) and the desirability of being able to screen applicants to medical school on non-cognitive criteria. Chief among these criteria is empathic ability. As yet no adequate measure has been found for this purpose. Admissions decisions, both in graduate psychology programs and in medical schools, continue to be made largely on the basis of cogni- tive variables (Rawls, Rawls, and Harrison, 1969; Hurst, 1973). Inter- views have been used to assess noncognitive areas, but are generally unreliable and costly in terms of faculty and applicant time (Sax, 1968; Schwab, 1969; Austin, 1972). Several studies have shown that the traditional selection measures, i.e., undergraduate grade-point average, Graduate Record Examination scores, and letters of recom- mendation, although predictive of academic success, are unrelated to ratings of empathy (Bergin and Solomon, 1963; McGreevy, 1967; Wiggins and Blackburn, 1969). There are uses for a valid and reliable measure of empathic ability in other areas. Many paraprofessional training programs, such as volunteer crisis centers, could use such an instrument for screening trainees or as a stimulus for training and discussion. A variety of instruments are currently available for the measurement of empathy skills. None of these measures, however, has been widely validated or used for selection purposes. As Hurst and Shatkin point out: . . direct new measures must be introduced and validated for admissions purposes if admissions procedures are to be defended on just about any level. Purpose The purpose of this study is to provide initial validity data for the Affect Recognition and Response Scale. This scale has been developed for use as an admissions screening device and consists of three subtests. Each subtest uses a different stimulus situation. Subtest 1 has a set of slides of facial affect, Subtest 2 has a color 1Hurst, Michael, and Shatkin, Stephen, "Relationship Between Standardized Admissions Variables and Certain Interpersonal Skills," Counselor Education and Supervision (September 1974), p. 32. videotape, and Subtest 3 uses written stimulus situations. Two skills are measured: the ability to recognize affect and the ability to respond empathically. The study compared responses of two groups of graduate stu- dents, one group from a population of students in professions which regularly require the use of empathic skills, and one group from a population of students in professions which do not require regular use of empathic skills. In addition, supervisor's ratings of affec- tive skills were obtained for students in the first group who were currently involved in direct clinical service, and these ratings were compared with their scores on the Affect Recognition and Response Scale. my. Two main theoretical discussions will be presented here. The first is a theory of emotion, based largely on the work of Ekman, Izard, Tomkins, and Plutchik. Ekman's approach will be emphasized since it integrates much of the work of the other theorists. Secondly, a theoretical explanation of empathy as the recognition and response to emotion and the measurement of empathy will be presented. Theories of Emotion Several different theories of emotion have been developed in an attempt to explain affective experience. Physiological explana- tions for emotion focus on observed bodily states and changes. From this viewpoint emotions are described as changes in glandular secre— tions, neural activity, and movements of the musculature, particularly the facial musculature. Varying degrees of emotional intensity may be measured by observing physiological changes, but discrete emotions are not postulated. Schlosberg (1954) added the concept of subjective experiencing of feelings in his theory of emotions but postulated only three dif- ferent dimensions of emotion. These three dimensions, pleasant- unpleasant, attention-rejection, and activity-rest, were the only divisions of emotional experience which he made. Plutchik (1962) enlarged on Schlosberg's notions of emotional dimensions to develop discrete categories of emotions. Izard, Tomkins, and Ekman have all developed theories based on the concept of discrete emotion cate- gories, although the categories used have varied from theorist to theorist. Definitions of Emotion Definitions of emotion generally focus on one or more of the following areas: neurophysiological changes; patterns of muscle activity, including movements of the facial musculature; subjective experience of feelings; eliciting stimuli; verbal responses; and behavioral or interactive consequences of the experience of emotions. Theorists differ chiefly according to which of these aspects they include and which they do not. Schlosberg, for example, focuses on the subjective experience as the chief component of emotion. Hebb (1946), in work with primates, defines emotion as neuro- physiological states which are inferred from behavior and which have interactive and behavioral consequences. Plutchik, while acknowledging the subjective experiencing of emotions, considers this to be a sufficient component, but not a necessary one, similar to the psychoanalytic view that emotional experience may be repressed and thus not experienced subjectively, even though the emotion may be obvious to an observer. He bases his definition on the components of neurophysiological changes, patterned muscle activity, eliciting stimuli, and behavioral consequences. Tomkins focuses his definition of emotion on muscle and glandular responses, particularly the patterns of facial muscle changes associated with different emotions, and also includes the idea of feedback, which may modify or change the emotion or the subjective experience of the emotion. Ekman defines emotion rather loosely, including the concepts of physiological responses, motor responses (including facial muscle patterning), verbal responses, and interactive consequences of cer- tain behavior. Izard incorporated all of the previously mentioned components of emotion in his definition, although emphasizing more than other theorists the subjective experience and the importance of the facial muscle responses. His definition is perhaps the most concise and inclusive of the theorists: When neurochemical activity, via innate programs, pro- duces patterned facial and bodily activities, and the feed- back from these activities is transformed into conscious form, the result is a discrete fundamental emotion which is both a motivating and a meaningful cue-producing experience.2 2Izard, Carroll, The Face of Emotion (New York: Appleton- Century-Crofts, 1971), p. 185. For the purpose of this study, emotion is defined as the feeling state of a person, whether or not subjectively experienced, which is accompanied by specific neurophysiological and muscular responses. Nature of Emotion Much of what is known about the nature of emotions and the experience of emotions is contained in the various definitions which have been presented. In common usage, emotion most often refers to the subjective feeling state of a person, such as a feeling of anger, happiness, or sadness. This subjective experience, however, may or may not be present. Clinical experience, as well as personality theory, supports the assumption that a person may be experiencing an emotion while not consciously aware of it or willing to acknowledge it. In addition, feedback from the emotion, such as awareness of physiological changes, or awareness and analysis of the emotion itself, will often change the subjective experience of that emotion. Specific neurophysiological changes take place in the body when an emotion is experienced, and these changes vary with the emotion. The hypothalamus seems to be a particular site for electro- chemical activity during emotions and, in fact, injections of certain chemicals or electrical stimulation of specific brain areas will result in subjective emotional experiences. Duchenne was the first to extensively map the changes in facial musculature associated with different emotional responses. Izard and Ekman have continued this work with Izard focusing on the underlying muscle structures of the face which change with the different emotion categories and Ekman emphasizing the external changeS'hithe appearance of the face with the different emotions. An important aspect of emotion is that of behavior. Plutchik defines his emotion categories in terms of disposition to various behaviors; Darwin saw the different emotion expressions developing from different behaviors involved in the experience of these emotions, such as the infant's smile developing from the sucking movements of the mouth during feeding. A simple division of emotion—related behavior is the familiar "fight or flight" response, with fighting associated with the emotion of anger and flight with the emotion of fear. Other emotions show similar characteristic patterns of behavior. However, as one moves up the phylogenetic scale the behaviors associated with different emotions become increasingly complex. As will be discussed in the section on display rules, man has learned many ways of disguising or inhibiting behavior which would ordinarily accompany the experience of a given emotion. Contemporary theorists generally divide emotions into a number of discrete categories, although the exact number of categories and the labels for distinct emotions vary from theorist to theorist. Ekman has chosen a set of discrete emotion categories based on pre- vious research, with each category chosen having been found by more than one researcher to be recognizable in a literate culture. These six categories were used as the basis for the recognition of facial affect in this study. The six categories are: anger, pleasure, fear, distress, surprise, and disgust. Facial Expressions of Emotion The face is the primary site for the expression of emotions in humans, partly because of the flexibility and responsiveness of the facial muscles, partly because of the large part played by face-to- face contact in human communication. Current debate centers on whether facial expression, and in fact, emotional experience, is universal and cross-cultural or varies from culture to cUlture. Many of the studies done in this area have looked at the ability of sub- jects in different cultures to agree on the facial expression of different emotions. Ekman's position is that the experience and expression of emotion is both universal and culture-bound. The initial emotional experience and the facial expression of the primary emotions anegenerally the same across cultures, while the subsequent behavior varies widely from culture to culture. The initial emotional response and expression is an innate, reflexive behavior, apparent from birth. Each member of a given society, however, learns from childhood a set of rules, which Ekman calls display rules, which govern emotional experience and expression. By the time a person reaches adulthood his/her emotional expression is to a large extent governed by learned display rules, even though the initial emotional experience is innate. These display rules are used to intensify, deintensify, neutralize, or mask the expression of an emotion. Thus, in a culture where aggressiveness is frowned upon, the display rule for anger may serve to mask its expression with a smile. In addition to culturally differing display rules, cultures may differ in the eliciting circumstances for different emotions, the lO behavioral consequences of the emotion, and attitudes about certain emotions. Recognition of Emotions Recognition of emotions is an important component of empathic ability. In order for a person to respond accurately and helpfully to the emotion of another, she/he must first be able to recognize and accurately label that emotion. Much of the work that has been done in the area of recognition of emotion from nonverbal clues has used facial expressions of emotion for stimulus cues. Two main problems remain in this area, that of generalizability of results and conclu- sions, and that of how to develop valid stimulus situations. Generalizability covers several areas. Are results consistent across judges, across subjects, and across situations? This study tests the hypothesis that the ability to accurately recognize facial expressions of emotion is not consistent across judges but rather is a skill which varies from person to person. Previous research on the judgment of facial expression has often assumed that the ability to accurately label facial expression of emotion is consistent across the general population, at least within a given culture. Generality across subjects assumes that there is little varia- tion in the ability to portray facial expressions of emotion. While research indicates some generality for primary or "pure" expressions of emotion, the spontaneous expression of emotion is governed largely by culturally learned display rules, as discussed previously. Research using posed expressions suggests that there is in fact variation in the ability to accurately portray facial expressions. 11 The third aspect of generality is generality across situation, e.g., is the expression of fear the same when a person is alone or with others, with friends or strangers? Again, culturally learned display rules seem to restrict the generality of expression across situations, such that the expression of an emotion, for example, may be culturally acceptable when one is with family members but will be neutralized or masked when one is in public. The problem of posed versus spontaneous expressions of emotion is related to difficulties of generality. While posed expressions appear to be "artificial," research findings indicate that they are more easily agreed upon by judges than spontaneous expressions. Aside from methodological difficulties inherent in obtaining spontaneous expressions of emotion, there seems to be little difference in the actual expression of posed and spontaneous emotions (Coleman, 1949). In addition, as Plutchik and Izard have pointed out, primary emotions occur infrequently in spontaneous situations. Learned behavior quickly overrides the pure expression of emotion, resulting in affect blends, substitutions, and masking. Definitions of Empathy Empathy is defined in this study as the ability to recognize another's emotion and respond to that emotion. This definition rests on the theoretical foundation of work begun by Rogers and continued by Truax and Carkhuff. Rogers defined empathy as ". . . a state of perceiving the internal frame of reference of another with accuracy, and with the emotional components and meanings which pertain thereto, 12 as if one were the other person, but without ever losing the 'as if' 3 Carkhuff further refined this definition to include condition." two separate components, which he labeled the ability to discriminate and the ability to communicate. He considered both to be necessary for effective empathic ability, unlike Rogers, who put more emphasis on the ability to perceive emotion. Tagiuri, in his review of the literature (1965), defines empathy as the ability to accurately perceive or judge others, and provides one of the few discussions of the link between the work done in judging facial affect and the work in the area of empathy measure- ment, although confining himself to studies which defined empathy as predictive ability. In his discussion, however, Tagiuri does go further to enlarge the definition of empathy to include several independent abilities, including the ability to discriminate distinct emotions, as in the studies of judging of facial affect. Measurement of Empathy The development of the Affect Recognition and Response Scale is based on the analog model of selection proposed by Carkhuff: "The best index of a future criterion is a previous index of that 4 Much of the previous research on the measurement of criterion." empathy has attempted to measure empathy indirectly, through the use of personality inventories, standardized tests, or measures of 3Rogers, Carl, On Becoming a Person (Boston: Houghton Mifflin Co., 1961), p. 284. 4Carkhuff, Robert, Helping and Human Relations, Vol. 1 (New York: Holt, Rinehart and Winston, Inc., 1969), p. 85. 13 related characteristics. Those instruments which have shown the most promise have been those most directly related to the actual use of the ability to be measured, i.e., some form of stimulus situation to which the subject can make an empathic response. Just as empathic ability involves more than one dimension, so too does the stimulus to which subjects respond. Thus, the ability to recognize emotions will vary according to whether the stimulus cue involves the face or body, one or more than one person, or nonverbal as well as verbal cues, as in the use of a videotape or written stimulus situation. General Hypotheses The following general hypotheses are tested in this study: 1. Subjects in graduate programs which require the use of one-to-one interpersonal skills will have higher scores on a test of empathic ability than subjects in graduate programs which do not require the use of these skills. 2. Subjects with higher supervisor's ratings of affective skills in clinical settings will score higher on a measure of empathic ability. 3. There will be no difference between men and women in scores on a measure of empathic ability. Definitions of Terms Empathy. The ability to determine what another person is feeling and to communicate this understanding to the other person. Affect. The feeling or emotional state of a person at a given time. 14 Facial Affect. The nonverbal communication of an emotional state or feeling through facial expression, involving patterned movements of the facial musculature. Nonverbal Communication. The expression of affect without the use of words, primarily through facial expression, posture, gestures, and voice qualities such as pitch and volume. Emotion. The feeling state of a person, whether or not subjectively experienced, which is accompanied by specific neuro- physiological and muscular responses. Stimulus Situation. A test item which presents a person or persons expressing an emotion, to which the subject can make a response. Stimulus situations used in the present study in the Affect Recognition and Response Scale are slides of facial affect, videotape vignettes, and written vignettes. Criterion. A direct and independent measure of the variable to be tested, in this study, the variable of empathic ability. Criterion Groups. Groups selected because of differences on some criterion measure. Delimitations The samples used in this study are restricted to volunteer subjects, rather than being randomly drawn from their respective populations. The samples were generally nonminority (only three minority subjects in Group I and none in Group II) and contained a higher percentage of males than females. 'Although the percentage of female and minority subjects in the sample is probably representative 15 of their numbers in the p0pulations from which they were drawn, caution should be used in applying the results of this study to future admissions programs which nay include larger numbers of both female and minority applicants. The use of criterion groups assumes that a group of subjects identified as having high empathy skills will score higher on a measure of empathy than a group of subjects identified as low in empathy skills, if the scale is indeed a valid measure of empathic ability (Hambleton and Novick, 1973). Various attempts have been made to identify high and low empathy criterion groups, but this remains a major difficulty in empathy measurement. The assumption made in this study is that advanced graduate students in counseling and clinical psychology, the majority of whom have completed a supervised practicum, have more highly developed levels of empathic ability than do graduate students in fields not requiring training or practice in the use of empathic abilities. In Chapter V, the implications of these limitations in inter- preting the results of this study will be discussed more fully. Overview In Chapter II the literature relevant to facial affect and emotion and the measurement of empathic ability will be reviewed. In Chapter III the design and analysis of the study will be presented, including a description of the samples and methodology and a descrip- tion of the analyses used. The development of the Affect Recognition and Response Scale will also be described in Chapter III, along with 16 reliability and factor analysis data. In Chapter IV the analysis of the results will be given. Chapter V will contain the summary and conclusions, as well as a discussion of the implications for future research. CHAPTER II RELATED RESEARCH This chapter will present an overview of research in several areas. First, some general findings on recognition of emotion will be given. A more extensive summary of research findings specifically dealing with the area of recognition and labeling of the facial expression on affect will then be presented. A section on defini- tions of empathy and research in the measurement of empathic ability will then be given. Finally implications of the related research for the present study will be discussed. Recognition of Emotion: Nonverbal Behavior Many authors (Barbara, 1956; Berger, 1958; Dittman and Wynne, 1961; Ekman, 1965) have pointed out the importance of the study of nonverbal behavior, particularly in a psychotherapy setting. Nonverbal behavior involves any aspect of communication other than verbal content, such as body movements, facial expression, voice rate, pitch, and length of speech pauses. Researchers have studied the usefulness of attending to nonverbal communication during inter- actions and the degree to which judges can agree on different aspects of nonverbal communication. Davitz and Davitz (1959a, 1959b) in two separate studies looked at the ability of judges to accurately identify feelings l7 l8 expressed in content-free speech. They found a significant negative correlation between similarity between feelings and the accuracy with which those feelings were discriminated. In addition, some feelings were more frequently identified correctly, with anger being most frequently identified and pride least frequently identified cor- rectly; Theyconcluded that feelings which are subjectively experi- enced as similar will be more difficult to differentiate than feel- ings experienced as disparate. Dittman and Wynne (1961), however, in their study of voice characteristics such as stress and pitch, while finding that such characteristics could be reliably coded by different judges, were unable to find consistent patterns to correspond with the different emotions expressed, using excerpts from a therapy interview and from a recorded radio program. Dittman (1962) also studied the relation- ship between body movements and moods (emotions) in therapy inter- views and was able to find a relationship between moods and frequency of body movements, although he concluded that these patterns are unique to each individual. Starkweather (1956) reviewed several studies of vocal cues in nonverbal communication and concluded that vocal cues are useful in indicating the presence of strong emotional states. Eldred and Price (1958) studied vocalization patterns in psychotherapy inter- views and were able to find high agreement among judges on different patterns of vocal cues which correspond to different emotions of the client in the interview. 19 Ekman has done several studies focusing on the nonverbal com- munication value of body movements and body posture, in addition to his work in the area of recognition of facial affect. In one series of four experiments (Ekman, 1964), judges were asked to match verbal excerpts from interviews with photographs of body position taken during the interviews. Results showed that the judges could correctly match verbal with nonverbal behaviors significantly better than chance. In a second series of experiments using stressed and unstressed interviews for stimulus situations (Ekman, 1965), Ekman used both videotapes and still photos as cues for judges. Ekman con- cluded that whole body stimulus photos were better than either face- only photos or body-only photos since different parts of the body transmit different types of nonverbal information. He found no difference, however, in accuracy between still photos and motion pictures. Judges were initially allowed to see only one photo from an interview to eliminate situational and contextual cues, but Ekman found that adding more photos from the interview did not improve accuracy. However, he did find that judges could not accurately distinguish between stressed and unstressed conditions when shown the subject only but could make the distinction when shown a photo of both the subject and the interviewer. He suggests that one reason for this may be that the subject tries to conceal his emotions, an idea he later expanded into his concept of display rules. 20 Recognition of Emotion: Facial Expression of Affect Development of Stimulus Photographs Many different sets of stimulus photographs of facial affect have been developed over the years and used in research in the recog- nition of facial expressions of emotion. One of the earliest sets was that developed by Ruckmick (1921) using a female drama student who practiced and posed various expressions. While the quality of the pictures is generally good, they are limited in the number of emotion categories represented and the fact that only one stimulus subject was used. Frois-Wittman (1930) also developed a set of photographs, using himself as the model, and posed various facial expressions. He attempted to eliminate situational cues, hand gestures, or other distractions from the expression portrayed. In addition,lmadeveloped a set of drawings of different facial expression. While his photos were an improvement over available pictures, they still represented only a limited number of emotion categories and used only one stimulus subject. Coleman (1949) reviewed the literature on the facial expression of affect and concluded that posed pictures were too artificial to be useful. He developed a set of stimulus photographs using various stimulus conditions to elicit spontaneous expressions of emotion. The difficulty in this method can best be understood by describing the elicitors which Coleman used: subjects were given a sudden very loud blast on an electric horn, received a severe 21 electric shock, and were required to gradually crush a snail through the use of both index fingers. Coleman includes a section on the extremely negative reactions of subjects used to make these photos. The actual value of such spontaneously obtained photos will be dis- cussed further in the next section on recognition labeling experi- ments. Coleman also asked each subject to act the emotions they originally experienced spontaneously and so obtained a second set of stimulus photos. It has been difficult to obtain spontaneous examples of cer- tain emotions expressed with sufficient intensity because the experi- encing of these emotions is naturally defended against, such as shame or fear. Based on previously learned display rules, primary emotions occur only infrequently in spontaneous expression. Affect blends are much more common, or masking of the expression of emotions. Inman (1976) in his study of facial expression using slow motion and normal Speed videotape stimulus situations, found that raters recorded a greater number of emotions for the slow motion tape than for the normal speed tape. Ekman theorized that the initial expression of a primary emotion is often displayed for a few micro-seconds, but is quickly masked, thus making it difficult to obtain photographs of expressions of strong primary emotions. More recently, both Izard (1971) and Ekman (1976) have developed sets of slides of facial expressions of affect for use in research. These slides have used trained actors to pose various expressions of emotions. Both sets have the advantage of including a variety of stimulus subjects, both male and female, and a complete 22 representation for the categories of emotion being studied. Izard’s slides appear somewhat dated today, and one recent study (Zerba, 1977) showed low homogeneity coefficients for items within each emotion category for his set of slides. The slides developed by Ekman (1976) were made using trained subjects who posed a variety of emotions. The technical quality is an improvement over other available pictures, and Ekman reports a high percentage of agreement among college students used as judges for each slide available in his final set. Recognition-Labeling Experiments Research in the area of recognition and labeling of emotions has gone on for many years. Early studies in the area of recognition and labeling of facial expression were often hindered by the lack of adequate stimulus materials. Nevertheless, some answers to questions about facial expression were provided. These questions were whether judges can accurately identify expressions of emotion, whether some emotions are easier to identify than others, whether the identifica- tion of emotion is dependent on situational or interactional cues, whether some persons are more accurate judges of emotions than others, and whether it is easier to judge spontaneous or posed expressions accurately. In one of the early studies of labeling of facial expressions, Ruckmick (1921) asked observers to label the emotion expressed in each of his series of photographs of a young woman. Although he performed no statistical analysis of the results, he did find some 23 agreement and accuracy of judgment. Primary emotions were judged more accurately, and with better agreement, than secondary emotions. Accuracy was made more difficult by the fact that judges were asked to label thirty-five separate expressions, each supposedly different. The intended distinctions between such categories as resentment and sulkiness or haughtiness and defiance are difficult to distinguish in theory much less as distinct facial expressions. Frois-Wittman (1930) conducted a series of experiments in recognition-labeling using pictures of himself which he had developed and a set of pictures of facial expression. Judges in the experi- ments were college students. Each judge was given a list of forty- three terms compiled from previous researchers and asked to label the expressions presented in the photos and drawings. The median for agreement on the pictures was 37.5 percent, not low considering the possibility of forty-three different labels, each considered as a separate category. In general, Frois-Wittman found a wide scatter for labels, with one or more modal frequencies. Pictures with more than one modal label showed a logical relationship between the modes, e.g., anger and hate being the two modes for a given picture. Frois- Wittman also studied patterns of muscular involvement in each expression and found distinct patterns of muscle involvement for each expression which had appeared as a modal frequency for at least one picture. Frequent disagreements were found between judgments on the whole face and judgments of separate features and Frois-Wittman con- cluded that the meaning of a given pattern of muscle involvement, e.g., raising of the eyebrows, differed, depending on the rest of the 24 facial muscle pattern. Given the limitations of the stimulus pictures with which he worked and the large number of classifications, often representing fine distinctions of meaning, Frois-Wittman was able to show a significant agreement in the ability of observers to recognize an emotion expressed in a stimulus picture. In a follow-up study using the Frois-Wittman pictures, Hulin and Katz (1935) used seventy-two pictures and asked judges to sort the pictures into groups according to whether the pictures showed approximately the same facial expression. Results showed a wide scatter in agreement among judges, with some cases of high percentage of agreement. Unfortunately, Hulin and Katz did not report which emotion categories were chosen by observers or ask them to label the groupings they chose. Results are reported as the percent agreement of observers on the similarity between any two pictures, making it impossible to assess the agreement in labeling any one picture. Coleman (1949), in addition to reviewing the literature on studies of facial expression of emotion, used both spontaneous and posed stimulus pictures. The situations used to elicit the spon- taneous emotions have been previously described. Coleman used undergraduate students in psychology as judges and asked them to match the expression shown on the stimulus photo with the list of situations. He concluded that laughter was the most easily identified emotion and also found that the acted situations were equal to or better than the natural expressions in ease of accurate identification. There were no differences between male and female judges in accuracy of identification. In his review of other studies, however, Coleman 25 cites both studies with no sex differences and studies in which females were more accurate than males. Differences in methodology reduce the comparability of these studies, however. Full-face photos were judged more accurately than either mouth-region or eye-region photos. Coleman's findings add support for the notion of discrete, identifiable emotion categories. In 1965 Tagiuri reviewed studies in the area of recognition of emotion, including recognition of facial expressions of emotion. He discusses five problems that have not only hindered research in this area, but make comparison of results across studies difficult. These problems remain relevant to current research in this area. The first problem which Tagiuri discusses is the variability in stimulus situations presented, i.e., still photos, motion pictures, drawings. At the time of his review there was no accurate standard— ized set of pictures of facial expression available, and each researcher generally developed his own set of stimulus pictures, none of which was comparable to any other. Secondly, there is a difference in the task, either recognition or labeling of emotion categories. As Tagiuri points out, the task of labeling an expression without preselected categories is not the same task as that of select- ing one label from a set which has been preselected by the researcher. A related problem, and one which was particularly apparent in early research on recognition is the variability in emotion cate- gories of labels used. As previously mentioned, these range from Schlosberg's three dimensions to Frois-Wittman's list of forty-three different terms for emotional expressions. Studies have also varied 26 in the contextual or situational cues provided for the judges. Tagiuri concludes that the more situational information available to a judge, the more accurate will be his/her judgment. The final prob- lem which he mentions is that of the sampling of emotional expressions. Ruckmick and Frois-Wittman, for example, both used only a single sub- ject in posing their sets of photographs. Thus, these photos are open to possible distortion based on idiosyncracies of the subjects used. Izard and Ekman were the first to develop sets of stimulus slides using a number of different subjects of both sexes. Ekman also includes more of an age range in the subjects which he used. Tagiuri presents several important conclusions in his review. He finds no consistent sex differences in recognition of affect, but does conclude that there is some relationship between the ability to judge emotion and level of intelligence. His review of studies of subjects who are blind, and thus have not learned emotional expres- sions or cues from others, shows that there are some innate patterns of expression of emotion. He also concludes that some expressions are more easily discriminated than others and that the expression of a specific emotion varies with the sequence of emotions and the situation, as in the masking of an emotion to fit what is considered appropriate in a situation. Tagiuri cites studies to support both the cross-cultural and the universal positions of emotional expres- sion and suggests that the universal similarities may reflect innate aspects of expression and recognition. Tomkins and McCarter (1964) used a set of sixty-nine posed photographs and a sample of twenty-four urban firemen in a 27 recognition experiment. Each subject was asked to identify the photos according to a set of nine emotion categories, including neutrality. An average correlation of .86 between judgments and the affect which was intended was found. Tomkins and McCarter also found some systematic confusion between emotion categories and some individual idiosyncracies for individual subjects. Affects most likely to be confused were those most similar to each other, e.g., distress and shame, interest and neutrality. Tomkins and McCarter theorize that affects which are triggered by similar situations are most likely to be confused with each other. Individual idiosyncracies may develop because of learning, as when a child is taught that the display rule for anger is to mask its expression with a smile, or because of an individual tendency to continuously experience only one affect or expect only one affect from others, as when a person who is continuously hostile is not able to accurately express enjoy- ment or recognize its expression in others. Ekman and Friesen (1971) conducted an experiment to measure the universality of facial recognition with a preliterate tribe from New Guinea. Each subject was told a story focused on one emotion and then asked to pick one of three pictures of facial expression which matched the story. The percent of subjects choosing the correct picture was generally better than 75 percent, with the exception of the fear and surprise categories, which were not accurately discriminated. In a subsequent experiment, American college students were shown videotaped facial expressions of the 28 New Guinea natives and were able to accurately identify the emotions being portrayed. Izard (1971) conducted a series of experiments in emotion labeling and recognition using a sample of American and foreign college students. His stimulus photos were a set of slides of facial affect which he had developed. In the Emotion Recognition Experiment each subject was asked to choose an emotion from a list of eight emotions provided. The average agreement for each slide was 78 per- cent, with a high degree of agreement across cultures (American, European, Oriental, and African). The same subjects were also asked to provide their own labels for the slides, before participating in the Emotion Recognition Experiment. The average agreement for labeling was 56 percent for females and 50 percent for males, with lesser degree of similarity across cultures than for the recognition task. Definitions of Empathy There are many definitions of empathy which have been pro- posed over the years. The first major definition was that of Dymond (1949), who defined empathy as the ability of the subject to accurately predict another's feelings, attitudes, or opinions. This definition was used both by Dymond and by Kerr and Speroff (1954) as the basis for tests of empathic ability. Smith (1966) used predictive empathy, which he called sensitivity to people, as the basis for both a training program and a measure of ability. 29 Cohen (1973) and Feshbach and Feshbach (1969) defined empathy as the ability to vicariously experience the emotions of another person and used slides as the stimulus situations for testing this concept with small children. Stotland and Dunn (1963) carried this idea one step further and defined the vicarious experiencing of emotion in physiological terms, measuring empathic ability by check- ing physiological changes in their subjects concomitant with the actual experiencing of the same emotion as another person. Chapin (1942) considered empathy to be the equivalent of social insight and devised a test to measure a person‘s knowledge of social skills in a variety of situations. A major influence on research'hithis area has been the definition of empathy given by Rogers, which was discussed in Chapter I. For Rogers, empathic understanding was one of the three necessary and sufficient conditions for change in therapy (1957). Another aspect of empathy which has been studied particularly by Izard (1971) and Ekman and Friesen (1975) is the nonverbal expression of affect, particularly facial expression and body posture as expressions of affect. The ability to correctly label the nonverbal cues to another's feelings is an important expansion of the definition of empathy. Measures of Empathy Although a large number of instruments in various forms have been developed, the difficulty of operationalizing the concepts, the multifaceted nature of the construct of empathy, the process-content 3O distinction in interviews, and problems in identifying adequate criterion groups continue to pose difficulties for research. A number of authors have specifically addressed these problems (Wolf and Murray, 1936; Taft, 1955; Strunk, 1957; Carkhuff, 1969c; Gormally and Hill, 1974), but they have yet to be satisfactorily resolved. Astin (1957) used two different measures of empathy, based on different definitions, and found that one measure discriminated between counselors and noncounselors, and the other did not. Similarly, Hayden (1955) used a measure of predictive empathy and ratings of group members' empathy by group leaders. His results were not significant, and he concluded that predictive ability is not the best definition of empathy. Hastorf and Bender (1952) discuss the confounding effects of projection and perceived simi- larity on predictive empathy measurement. Truax and Carkhuff (1967) have been careful to differentiate between the ability to discriminate affect and the ability to com- municate empathically, as have other investigators (Chandler, 1970; Jarrett, et a1., 1972; and Jones, 1974). Both Chandler and Heilman (1972) factor-analyzed the data from several different empathy measures and concluded that empathy is a many-faceted concept, rather than a single construct. The written test has had the most extensive use among instru- ments measuring empathic ability. Within that format there is con- siderable variation in the construction of the tests. Early measures of predictive ability (Dymond, 1949; Kerr and Speroff, 1954) 31 involved rating, in written form, how others would respond to a given test or situation. Dymond required the subject to predict others' self-ratings for six personality traits, whereas Kerr required that the subject predict how people in general would respond to music, magazine selections and interpersonal situations. Attempts have also been made to measure the process and relationship aspect of interactions (Barrett-Lennard, 1962; Linden, et a1., 1965; Dilley and Tierney, 1969). Often these measures have been used concurrently with other measures of therapist empathy (Truax, 1966; McWhirter, 1972; Kurtz and Grummon, 1972), but the results have generally shown little correlation between the empathy measures and the client's perceptions either of the therapy relation- ship or of the therapist's empathic ability. An assortment of other empathy measures are based on the semantic differential (Bellucci, 1971), word association techniques (Kandler and Hyde, 1953), physiological measures (Stotland and Dunn, 1963), or developed from other tests such as the Minnesota Multi- phasic Personality Inventory (MMPI) (Hogan, 1969; Hurst, 1973). A variation of the written instrument has been utilized which requires that the subject respond in writing to a given stimulus (Astin, 1957; O'Hern, 1962; Carkhuff, 1969c). Sidman (1968) developed a test based on responses to questions about short stories. In addition, Carkhuff designed his Index of Discrimination and Index of Communication to be used either in a written form or with both the stimulus situations and the subject's responses 32 recorded on tape and was able to demonstrate that the two forms were equivalent. Instruments have also been designed which utilize situations with a number of possible responses provided in a multiple-choice format. This has the advantage that the responses can be more readily rated (Chapin, 1942; Porter, 1950; Kerr and Speroff, 1954; Ashby, et a1., 1957; Craig, 1959). Still another variation is the Interaction Maze described by Gazda (1974). In the Interaction Maze, the subject is presented with a series of stimulus situations centered on one problem. The subject moves back and forth through the series of responses, depending on which response he chooses. Thus the instrument more closely resembles a real-life interaction where an interviewer will elicit more information with a facilitative response or stop communication with a judgmental response. Bernstein and his associates (Bernstein, et a1., 1954; Rasche, et a1., 1973) have developed and validated an objectively scored instrument which is used with medical students in a doctor-patient relationship course. Most other measures of empathic ability focus on the ability of the subject to perceive and discriminate affect and to communi- cate this perception to the client. There are several variations, but the general format involves a presentation of several stimulus situations (tape recordings, film) to which the subject responds. The responses are then rated using various scales. The most widely used rating scale is the Accurate Empathy Scale developed by Truax (Truax and Carkhuff, 1967; Walker, 1969; Spadone, 1974), with varia- tions developed by Carkhuff (1969c), Smith (1971), and others 33 (Chandler, 1970; Mickelson and Stevic, 1971; Gazda, 1974). Adler and Enelow (1966), Passons and Olson (1969), and Guerney, et a1. (1968) have also developed scales for rating responses. Carkhuff‘s rating scales have been criticized by a number of authors (Chinsky and Rappaport, 1970; Rappaport and Chinsky, 1972; Gormally and Hill, 1974; Horwitz, 1977; Thoresen, 1977). The rating scales require training judges and are not usable for large-scale testing. In addition, obtaining adequate inter-judge reliability has been a persistent problem. Many of Carkhuff's findings are biased by his use of the same rating scale for both pre- and post- training measures and for the actual training sessions. Thus, his subjects were measured in empathic ability using the Accurate Empathy Scale, before training, trained to give correct responses to the same scale, and then measured with the scale after training. Demographic Variables Demographic variables have been studied extensively to determine their effect on empathic ability, but results either do not reach significance, or are contradictory. Taft (1955) reviewed numerous studies involving the correlates of the ability to judge others (his definition of empathy). His variables include age, sex, family background and sibling rank, and intelligence and per- ception, but there were no consistent results for any of these vari- ables. Some investigators report differences in results by sex (Cantrell, 1967; Johnson, et a1., 1967; Sidman, 1968; Feshbach and Feshbach, 1969; Huber, 1972; Cohen, 1973; Veeser, 1974), while 34 others report no differences (Taft, 1955; Cohen and Struening, 1962; O'Hern, 1962; Blumstein, 1972). Where sex differences are found, females display the greater empathic ability. The effects of birth order have also been found to be contra- dictory (Stotland and Walsh, 1963; Stotland and Dunn, 1963; Cantrell, 1967; Cohen, 1973). Prior training or experience also appears to have an inconsistent effect. Cohen and Struening (1962), Greenberg, et a1. (1969), Huber (1972), and Veeser (1974) all report positive effects of training and experience in the development of empathic ability. Campbell (1962), however, found no differential effects due to experience and training. And Carkhuff (1969a; Carkhuff, et a1., 1970) reports a decrease in empathic ability as the result of professional training. Use of Standardized Tests Various standardized tests have been administered in an attempt to predict empathic ability using personality variables. One series of such studies is based on the work of Whitehorn and Betz. In Whitehorn's original study (1960), psychiatrists were divided into two groups according to success rates with schizophrenic patients, and then administered the Strong Vocational Interest Blank (SVIB). Whitehorn found significantly different response patterns between the two groups, and he labeled these two groups the 'A' and 'B' therapists. Subsequent studies, however, have not upheld the clear-cut distinction of 'A' therapists as effective clinicians and 'B' therapists as ineffective (Boyd, 1970; Scott and Kemp, 1971). 35 A considerable amount of work has been done using the Minnesota Multiphasic Personality Inventory (MMPI) in an attempt to find correlations with empathic ability (Vesprani, 1969; Blumstein, 1972; Jones, 1974) or counselor effectiveness (Brams, 1961; Johnson, 1967; McGreevy, 1967). Results for studies using the MMPI show mixed results, with occasional significant correlations for some subscales. Hurst, in his review of the literature (1973), reports consistent negative correlations between the Depression and Psych- asthenia Scales and empathy measures, but Brams (1961) and Jones (1967) did not find the same results. Their general conclusion was that the MMPI is not a useful measure for screening for empathic ability. The Edwards Personal Preference Schedule (EPPS) has also been used extensively in research. Hogan (1969) reports a signifi- cant negative correlation between the Social Desirability subscale and scores on his empathy test. Morris (1971) reports that in the literature the results have been variable and sometimes contra- dictory. Results in general are similar to those found for the MMPI: some subscales of the EPPS correlate with the criterion measures used, but the significant subscales are not the same from study to study, and results are sometimes contradictory (Bergin and Solomon, 1963; Stefflre, et a1., 1963; Lawton, 1965; Johnson, et a1., 1967; Vesprani, 1969; Charles, 1973). A third area of study has focused on the related concepts of authoritarianism, dogmatism, and openness, most often using the Rokeach Dogmatism Scale as the measure. Milliken and Paterson (1967) 36 and Stefflre, et a1. (1962) found significant discrimination for subscales of the Dogmatism scale, but Passons and Olsen (1969) report no correlation between empathic sensitivity and scores on the Dogmatism scale. Allen (1967) measured openness by a special scoring of Rorschach protocols and concluded that openness was related to effectiveness of therapy. A large number of studies have used other instruments. Except for the three already discussed, however, no measure has been used extensively, or with consistent findings. Instruments used include the Guilford-Martin Inventory (Halpern, 1954), Personal Orientation Inventory (Winborn and Rowe, 1972), Strong Vocational Interest Blank (Stefflre, et a1., 1962); Berkeley Public Opinion Questionnaire (Brams, 1961); Myers-Briggs Type Indicator (Gough, 1960; Hogan, 1969; Boles, 1975); and the Omnibus Personality Inventory (Gruberg, 1969). Criterion Groups There have been differing approaches to the use of criterion groups for empathy research. Criterion groups have been designated on such bases as self-report, peer, or faculty ratings (Bandura, 1956; Stefflre, et a1., 1962; Lawton, 1965; Allen, 1967), or by using various measures of empathy to divide subjects into high and low empathy groups for concurrent validation (Sidman, 1968; Dilley and Tierney, 1969; Feshbach and Feshbach, 1969; Blumstein, 1972). Sandler (1972) compared female nonprofessional mental health workers (high empathy group) with a control group of adult women on 37 several measures, including the Hogan Empathy Scale and found that the experimental group scored significantly higher on the empathy measure. O'Hern (1962) developed the Sensitivity Scale to measure empathic ability using taped client problems as stimulus situations. The instrument was administered to counselor candidates and dis- criminated at a significant level between those judged most and least effective by staff ratings. It did not, however, discriminate between those judged most and least sensitive as counselors. Milliken and Paterson (1967) divided counselor candidates into two criterion groups ("good" and "bad" counselors) according to ratings by both supervisors and coached clients on their Counselor Effective- ness Scale. Mickelson and Stivic (1971) divided counselors into facili- tative or non-facilitative counselor groups according to rankings based on responses to taped stimulus situations. Their study tested the effectiveness of verbal reinforcement techniques in eliciting client information-seeking behavior, and results showed significant differences for the facilitative and non-facilitative counselors in the predicted direction. Carkhuff, Kratochvil, and Friel (1968) used first and fourth year clinical and nonclinical graduate students as criterion groups to study the effects of training on counselor effectiveness but did not find significant results. Campbell (1962) used experienced and inexperienced counselors to study counseling subrole behaviors but found few differences between the two groups. Astin (1957) used counselors and non-counselors as criterion groups and administered a test of predictive empathy ability and a 38 situational test of empathy ability. Results showed that the situational test discriminated between counselors and non-counselors, and a predictive test did not. Allen (1967) found a correlation between psychological openness (defined as self-awareness and aware- ness of one's own feelings) and supervisor's ratings of practicum students. Similarly, Bandura (1956) found a relationship between therapist's anxiety and supervisor's rating of competence. Veeser (1974) developed an instrument to measure sensitivity to both verbal and nonverbal emotional cues and found that psychology graduate students scored higher than engineering graduate students or undergraduate students on both the verbal and nonverbal measures. Implications of Related Research The review of related research suggests that while many instruments have been developed in attempts to measure empathic ability, there is currently no instrument available which has been adequately validated or which includes the aspect of recognition of nonverbal expressions of emotion through facial affect. There is support for the concept of discrete emotion categories and for the use of an analog model of measurement as the most likely to prove valid. Posed stimulus situations have proven better than spontaneous expressions for accuracy of judgments, are more easily standardized, and may help to increase comparability of research results in the future. 39 Research findings on sex differences both on measures of empathic ability and in recognition and labeling of facial affect have been consistently inconclusive, showing either no differences or higher ability for women. With the gradual eliminating of sex role stereotypes and the greater acceptance of empathic behavior for men, any differences which may earlier have existed may be dis- appearing. The use of criterion groups in previous research has focused on distinctions of training and experience or has used some measure of performance, such as supervisor's ratings, to designate high and low empathy groups. In this study the criterion groups are desig- nated by both training and experience, and supervisor's ratings are used as an additional validity check. CHAPTER III DESIGN OF THE STUDY The design of the study involved administering the Affect Recognition and Response Scale to subjects in the two designated groups and having each subject complete a Biographical Data Sheet which provided data on the demographic characteristics of the samples. Supervisor's ratings of affective skills were obtained for subjects in Group I (graduate students in majors requiring the use of one-to- one interpersonal skills) who were currently involved in clinical work. Tests of significance were applied to test scores and super- visor's rating scores to test the major hypotheses. A description of the sample, design, methodology, and analysis used is presented in this chapter. A description of the development of the Affect Recognition and Response Scale, including expert judges' data, reliability and item analysis data, and factor analysis results, is also included. Sample A sample of sixty-five subjects was obtained from each of two populations. The population sampled for Group I consisted of graduate students majoring in counseling and clinical psychology from Michigan State University and Central Michigan University. The majority of the subjects had completed at least one year of coursework and were 40 41 currently involved in clinical work, either in a practicum, intern- ship, or job setting. Students at Central Michigan University were contacted through announcements in classrooms and the department office. Ten students agreed to participate as subjects and were tested in one session. Subjects at Michigan State University were individually contacted by telephone. Lists of doctoral graduate students were obtained from the departments of counseling and clinical psychology. An attempt was made to contact each student on the list because of the large size of the sample required relative to the total available population. Each student was asked to participate in a one and a half hour testing session for research purposes and told that the purpose of the study would be explained at the end of the testing session. Each subject was also paid for his/her participation. Fifty-five students from Michigan State University agreed to participate as subjects for the study. The population sampled for Group II consisted of graduate students majoring in engineering, mathematics, and the physical sciences at Michigan State University. Lists of graduate students in the engineering and mathematics departments were obtained from the respective departments. All foreign students were eliminated from the lists to avoid the effect of cultural differences, particu- larly in the recognition of facial affect. Each of the remaining students was individually contacted by telephone and asked to participate in the study. After contacting all students in these departments it was not possible to obtain enough subjects, so graduate students majoring in 42 the physical sciences were also included. Names of these students were obtained through the Michigan State University student directory. Since the directory gives information on a student's class rank, major, and home address, in addition to name and telephone number, it is possible to identify non-foreign graduate students in the desired majors. Again an attempt was made to contact almost all of the available students due to the large sample size required and a high refusal rate for participation. Due to the relative lack of both women and minority graduate students in the population sampled for Group II and the lack of minority students in the population sampled for Group I, it was not possible to obtain equal numbers of female and male subjects for either group, nor was it possible to obtain sufficient minority subjects to test any hypotheses about differences in test scores due to race. Table 3.1 presents demographic data for each sample. Development and Description of the Affect Recognition and Response Scale The Affect Recognition and Response Scale is a revised form of the Empathy Skills Rating Scale (Krupka and Parsons, 1978), which was developed as a measure of empathic ability for use in medical school admissions screening under a grant from the National Fund for Medical Education. The Affect Recognition and Response Scale consists of three subtests using a set of slides of facial expressions of emotions and a series of written and color videotape vignettes (see Appendix A for a copy of the test packet, with sample items). The Empathy Skills Rating Scale consists of five subtests, including 9-" TABLE 3.l.--Demographic Data for Group I and Group II. 43 Variable Group I Group 11 $22 Male 35 51 Female 29 14 Non-response l 0 Race Minority 3 0 Non-minority 62 65 Degree M.A./M.S. 28 18 Ph.D. 35 45 Non-response 2 2 Ass Mean 28 27 Range 22 to 39 21 to 46 Number of Children Mean .41 .31 Range 0 to 3 O to 3 Undergraduate G.P.A.a Meanb 3.29 3.36 5.0. .46 .37 Graduate G.P.A. Mean 3.80 3.64 5.0. .16 .24 aGrade-point average. bStandard deviation. 44 a set of postural line drawings, a series of color videotape vignettes, and a series of written stimulus situations. Subtests 2 and 3 of the Affect Recognition and Response Scale, including the color videotape vignettes, are taken directly from the Empathy Skills Rating Scale. Subtest 1 was added to this, using slides and emotion categories developed by Ekman and Friesen (1976). Item Pool An initial pool of six hundred written stimulus situations was generated by the test developers for the Empathy Skills Rating Scale. These items consisted of brief statements, usually no more than two or three sentences, covering a wide range of expressed affect and subject matter, such as hostility, enjoyment, depression, and fear; and sexuality, death, and racial issues. An attempt was made to develop stimulus situations which were brief, contained the expression of only one emotion, either overtly or covertly, and covered a wide range of topics, emotion categories, and levels of emotional intensity. A selection of stimulus subjects was also made so that they covered an age range from children to the elderly and included men, women, and minority as well as nonminority subjects. Pilot Videotape A black and white pilot videotape was produced using vignettes developed from selected stimulus situations taken from the original pool of items which had been generated. These pilot vignettes used trained role-players and included two people in an interaction in 45 each vignette. Each of these vignettes lasted between thirty and forty-five seconds. Pilot Studies An initial form of the Empathy Skills Rating Scale was developed, including the pilot videotape, and administered to two separate groups with a total of ten subjects, five male and five female. Each group took the scale, filled out an extensive debriefing questionnaire and participated in a debriefing session which was recorded and transcribed. Based on results from these pilot studies, the scale was revised and a new color videotape was produced. This revised form of the scale was administered to a third pilot group of subjects, consisting of sixteen male and female under- graduate students in introductory psychology courses. Subjects in all three pilot groups were presented written and videotape stimulus situations and asked to write their own responses. These responses were used to develop multiple-choice answers for Subtests 2 and 3 of the scale. Description of Subtest l: Slides of Facial Affect Several attempts were made to develop slides of facial affect which would be suitable for the scale. An initial set of slides was reproduced from works of art, but there was little agreement among pilot subjects on the emotion expressed in each slide. A second set of slides was then developed from the color videotape vignettes, using stop-action equipment for the videotape, but it was not possible to 46 cover a range of distinct emotion categories. A third set of slides was made from a black and white videotapes which portrayed a variety of subjects expressing emotions. Although this set of slides was of better technical quality, it still did not provide enough different clearly expressed poses for each emotion category. The set of slides finally used for the scale was chosen from the Pictures of Facial Affect developed by Ekman and Friesen (1976). These are a set of 110 slides of facial expressions of emotion, using more than a dozen different persons who were trained to contract or relax different facial muscles associated with various facial expressions, so as to pose a specific facial expression for a given emotion category. The six emotion categories used were those which have generally been included by most theorists in the area of emotion: pleasure, distress, fear, anger, disgust, and surprise. Data on reliability and validity for the entire set of Pictures of Facial Affect are presented in a brochure which accompanies the set of slides (see Appendix B). All slides included in the set met a criterion of 70 percent or better agreement among observers. From this set of 110 slides, 36 (six from each emotion category) were originally selected for Subtest 1 of the scale. A Table of Emotions (see Appendix A) was developed for use with the slides. Each emotion category contains the main emotion description and a subset of synonyms denoting varying degrees of intensity for the main emotion category. 5Videotape courtesy of Bob Wilson, College of Education, Michigan State University. 47 These 36 slides were administered to a group of expert judges. Based on their response the "fear" category of emotions was dropped, since only two slides in this category met the criterion of 80 percent or better agreement among judges. Five slides which met this criterion were selected in each of the remaining categories for the final version of the scale, and the Table of Emotions was revised to eliminate the "fear" category. Thus, the final form of Subtest 1 contains five slides in each of five emotion categories, all of which met the criterion of 80 percent or better agreement among expert judges. Development of Subtest 2: Videotape Vignettes Based on results of the pilot studies using a black and white test videotape, a new color videotape was developed. Trained actors were used to enact short (15-20 seconds) vignettes using a script' developed from the original item pool. Twenty different vignettes with ten different actors ranging in age from seven to sixty-five were filmed. This videotape was then edited, based on technical quality and realism of the vignettes, to produce a final version of the videotape containing fifteen vignettes, each followed by one minute of blank tape for response time. The videotape vignettes, which were the stimulus situations for Subtest 2 of the scale, were administered to the expert judges with a set of multiple-choice response answers. Based on these expert judges' responses, four of the items were dropped from the subtest due to lack of agreement. Since it was not possible to edit these vignettes from the tape, they were administered to all subjects as part of the scale but were not scored or included in any of 48 the data analyses. Thus, the final version of Subtest 2 contains eleven videotape vignette stimulus situations with multiple-choice responses, all of which met the criterion of 80 percent or better agreement among the expert judges. Development of Subtest 3: Written Stimulus Situations Fifteen written stimulus situations, taken from the original item pool, were included in the initial form of Subtest 3 which was administered to the expert judges. Based on their responses, one item was dropped from the scale. The final version of Subtest 3 contains fourteen items, all of which met the criterion of 80 percent or better agreement among the expert judges. Expert Judges' Data The Affect Recognition and Response Scale was individually administered to five expert judges, three female and two male. All had doctoral degrees in counseling psychology and were engaged in clinical work. Data from the expert judges' responses were used as the basis for the development of a scoring key for the scale. The form of the scale given to the judges included 36 slides, six in each of the six emotion categories (Subtest l), 15 videotape vignettes (Subtest 2), and 15 written stimulus siutations (Subtest 3). Judges were asked to rank order the multiple-choice responses for Subtests 2 and 3 from one to four, with one being the least helpful response and four being the most helpful response. Data from the expert judges' response for Subtest l are shown in Table 3.2. 49 TABLE 3.2.--Number of Expert Judges Choosing Each Emotion Category for Slides of Facial Affect (Subtest 1). Slide Emotion Category Number Anger Pleasure Distress Disgust Feara Surprise _| (30me 01-wa-4 01 U'IU'I ._a N .._a .b N N —.l A 3The "fear" category was dropped from the final form of Subtest 1. 50 A criterion of 80 percent agreement (four of the five expert judges) was used to retain slides for the scale. Thirty-one of the slides (86 percent) met this criterion. Only two of the six slides in the "fear" category met this criterion, however, so this category was dropped from the scale. Since only five of the six slides in the "anger" category met the criterion, one slide was randomly dropped from each of the remaining categories to equalize the number of slides in each category. Thus, the final form of the scale con- tains a total of 25 slides, five in each of five emotion categories (anger, pleasure, distress, disgust, and surprise). Data from the expert judges' responses for Subtests 2 and 3 are shown in Table 3.3. Using the criterion of 80 percent agreement, it was not possible to assign a ranking from one to four for responses to each of these items. A decision was made to aggregate rankings of three and four to a single high ranking (correct response) and rankings of one and two to a single low ranking (incorrect response). Using the criterion of 80 percent agreement on this high-low ranking, it was possible to retain eleven items for Subtest 2 and fourteen items for Subtest 3. Thus, for each of these items subjects received one point for either of the two correct responses and zero points for either of the two incorrect responses. The average percent of agreement among the judges was calculated for each subtest and for the total scale. For the total scale this was 94 percent. 51 TABLE 3.3.--Number of Expert Judges Rating Each Response Choice Correct or Incorrect for Subtest 2 and Subtest 3. Item Number Response Choice and Rating a b c d 26 Correct 0 4 5 l Incorrect 5 1 0 4 27* Correct 3 0 2 5 Incorrect 2 5 3 O 28 Correct 0 5 5 O Incorrect 5 0 O 5 29 Correct 5 4 l 0 Incorrect O l 4 5 30 Correct 0 o 5 5 Incorrect 5 5 0 0 31 Correct 5 4 1 0 Incorrect 0 l 4 5 32 Correct 0 o 5 5 Incorrect 5 5 O 0 33 Correct 0 5 o 5 Incorrect 5 0 5 0 34 Correct 5 5 o 0 Incorrect 0 0 5 5 35* Correct 3 2 5 0 Incorrect 2 3 O 5 36* Correct 5 o 3 2 Incorrect O 5 2 3 37 Correct 0 5 5 0 Incorrect 5 O O 5 38 Correct 0 o 5 5 Incorrect 5 5 0 0 39* Correct 5 2 0 3 Incorrect 0 3 5 2 52 TABLE 3.3--continued. Item Number Response Choice and Rating a b c d 40 Correct 5 O 0 5 Incorrect 0 5 5 0 41 Correct 5 O 5 O Incorrect 0 5 0 5 42 Correct 0 5 4 l Incorrect 5 0 1 4 43 Correct 0 0 5 5 Incorrect 5 5 0 O 44 Correct 4 1 5 0 Incorrect l 4 0 5 45 Correct 5 O 0 5 Incorrect 0 5 5 O 46 Correct 5 5 O 0 Incorrect 0 0 5 5 47 Correct 0 0 5 5 Incorrect 5 5 0 O 48 Correct 5 5 0 0 Incorrect O 0 5 5 49 Correct 0 5 O 5 Incorrect 5 0 5 0 50 Correct 0 5 0 5 Incorrect 5 O 5 0 51 Correct 4 5 0 1 Incorrect 1 0 5 4 52 Correct 0 5 5 0 Incorrect 5 0 0 5 53 Correct 5 0 l 4 Incorrect O 5 4 l 54 Correct 4 1 5 0 Incorrect 1 4 O 5 *Item omitted in final form of scale. 53 Subtest Score Correlation Matrix Pearson product-moment correlation coefficients were calculated for subtest scores for all subjects. The correlation matrix is presented in Table 3.4. The correlation matrix for individual item scores was also calculated and is given in Appendix C. For the subtest scores, all correlations are signifi- cant at the .001 level. The high correlation between Subtests 2 and 3 raises a question of the need for the two separate formats and whether these two subtests are in fact measuring different con- structs. These questions will be discussed further in the section on factor analysis. TABLE 3.4.--Subtest Score Correlation Matrix. Subtest 1 Subtest 2 Subtest 3 (Slides) (Videotape) (Written) Subtest l (Slides) 1.000 Subtest 2 (Videotape) .264* 1.000 Subtest 3 (Written) .332* .752* 1.000 * All correlations significant at the p < .001 level. Reliability Reliability for the Affect Recognition and Response Scale was calculated using the Kuder Richardson formula #20, which calculates an internal consistency coefficient using all possible split-half 54 combinations of items. A reliability coefficient was calculated for the scale as a whole and for each of the individual subtests. For the entire scale the reliability estimate was .853. Reliability estimates for each subtest were: Subtest 1, .416; Subtest 2, .799; Subtest 3, .804 (Table 3.5). Item-total reliability statistics were also calculated and are presented in Appendix 0. TABLE 3.5.--Reliability and Item Analysis Data for Subtest Scores and Total Scale Scores. Subtest Subtest Subtest Total Characteristic 1 2 3 Scale Mean Item Difficulty 13.6 24.1 24.6 18.6 Mean Item Discrimination 13.6 51.6 60.3 31.4 Kuder-Richardson #20 Reliability Coefficient .416 .799 .804 .853 Standard Error of Measurement 1.52 1.19 1.35 2.44 Reliability estimates for the scale as a whole and for Subtests 2 and 3 were within acceptable limits. The reliability estimate for Subtest 1, however, is quite low, particularly consider- ing the number of items in the subtest. Zerba (1977) reported similarly low reliability for recognition and labeling tasks using slides and emotion categories developed by Izard. Indices of discrimination and difficulty for subtest scores indicate that items in Subtests 2 and 3 were more difficult than items in Subtest 1 and 55 discriminated more highly between subjects in the upper and lower scoring groups. Indices of difficulty and discrimination for individual items are presented in Appendix E. Factor Analysis In order to examine the underlying structure of the scale, a factor analysis using all items was done. A principal components factor analysis with no assumptions about expected structure was first performed, followed by a varimax rotation with no preset number of factors to be extracted. An eigen valuelofl.00 or greater was used as the criterion for determining the number of factors extracted by varimax rotation. A total of ten factors emerged for the scale. The minimum value for factor loadings was set at an absolute value of .40; factor loadings for items in each of the ten factors are shown in Table 3.6. Item-factor correlations for all of the 54 items with each of the 10 factors are presented in Appendix F. The factor structure which emerged from this initial analysis did not conform to the subtest structure of the test. Rather, there was one main factor with high factor loadings for most of the items in Subtests 2 and 3, a second factor with high loadings for slides l and 24 from the "pleasure" emotion category for Subtest l, a third factor with high loadings for two items from Subtest 3, and additional factors with high loadings for one, two or three individual slides from Subtest 1. Thus, it appears that Subtests 2 and 3, rather than measuring different abilities or constructs, are essentially measuring the same thing. In addition, there seems to be 56 Am “magaamv cougar: u 3 AN Swapaamv man» u H AF pmaaazmv maewpm u m oo.F- om. om. o“. oo. om. ca. om. - IIIIII TIIIIIIIIIIISIIIIIIIIIISIlllllfillllllllllfilIIIII?lllll%lllll?llllllllllllllllllIIIIIIIIIIIO om. + ow. mmc Pm: “Pm _Nm mm me: me: _mc om. mm mpm me: we: omc mmm am we: we: we: mmc amp om. mm: mm om: emm Ne: . mm» ON owe Am» om. mum om. _m opm oo._+ ope aa ma Na ca ma ea ma NE _a .co._ co a=_a> cam?“ 5:52:22 ;S_z msaSH _F< Lac x_cuaz cabana aaSaSom xaswca>--.e.m m4m M11 II. Prediction of Supervisor's Ratings Null Hypothesis: There will be no relationship between subtest scores on the Affect Recognition and Response Scale and ratings of affective skills for subjects in Group 1. Alternative Hypothesis: There will be a positive relation- ship between subtest scores on the Affect Recognition and Response Scale and ratings of affective skills for subjects in Group I. III. Differences between sexes. Null Hypothesis: There will be no differences between men and women in mean scores for each of the subtests of the Affect Recognition and Response Scale. H0: MM = MN Alternative Hypothesis: Women will have higher mean scores than men for each subtest of the Affect Recognition and Response Scale. HA MN > MM Where, MI = Mean subtest scores for subjects in Group 1. M11 = Mean subtest scores for subjects in Group II. MM = Mean subtest scores for male subjects. Mw = Mean subtest scores for female subjects. 67 Analysis The main purpose of the study was to provide validation data for a measure of empathic ability. For this reason, several different aspects of scale validation were incorporated into the study. These included administering the test to a group of expert judges to determine agreement on correct item responses, performing factor analyses on the scale to examine the underlying structure, a criterion- based test of construct validity using subjects in groups which dif- fered on the dimension of empathic ability, and a test of construct validity using an independent criterion measure (supervisor's ratings of affective skills). Data on expert judges' agreement and the factor analysis of the scale have been presented in a previous section. The following analysis was used to test the hypotheses related to criterion-based validation. A two-way multivariate analysis of variance for multiple measures taken at one time was used to test the first and third hypotheses, with a significance level of .05 or less. For those hypotheses, univariate analyses of variance were then used to determine which subtest scores were significantly different for sub- jects in Group I and Group II. For the univariate analyses the alpha level was set at .017 by dividing the .05 alpha level equally among the three subtest scores. A regression analysis with a significance level of .05 or less was used to test the second hypothesis and determine the amount of variance accounted for by subtest scores. 68 Summary Sixty-five subjects were obtained for each of the two criterion groups. Subjects in Group I were drawn from a population of graduate students majoring in counseling and clinical psychology at Michigan State University and Central Michigan University. Sub- jects in Group II were drawn from a population of graduate students majoring in engineering, mathematics, and the physical sciences at Michigan State University. All subjects were paid volunteers, and no attempt was made to randomize selection from either population. Because of skewed distributions in the populations sampled, it was not possible to obtain equal numbers of female and male subjects. The Affect Recognition and Response Scale was developed and tested for construct validity. Supervisor's ratings were obtained for subjects in Group I who were currently involved in clinical work, using the Supervisory Rating Scale, an eight-point scale measuring affective ability which was developed for the study. Data on sample characteristics was also obtained using a Biographical Data Sheet, also developed for the study. A multiple measures design with two crossed factors of group and sex was used to test the main hypotheses. Scores on the three subtests of the Affect Recognition and Response Scale were the dependent variables. Research methodology involved administering the Affect Recognition and Response Scale to subjects, obtaining responses on the Biographical Data Sheet, and obtaining supervisor's ratings for subjects in Group I. 69 Statistical hypotheses were formulated to test the differences between subjects in Group I and Group II, the differences between male and female subjects, and the relationship between subtest scores and supervisor's ratings of affective skills. A two-way multivariate analysis of variance for multiple measures was used to test for group and sex differences. Additional univariate analyses of variance were used to determine which subtest scores were significantly different for groups and for sexes. The develOpment of the Affect Recognition and Response Scale, a description of the scale, and data on expert judges' ratings, reliability, and factor analysis results were also presented. Expert judges' agreement for the scale was 94 percent. Overall scale reliability, using a measure of split-half reliability, was estimated at .853. Factor analysis results indicated one main factor cor- responding to empathic responding, regardless of stimulus situation format. The results of the hypotheses tests and an interpretation of these results will be presented in Chapter IV. CHAPTER IV ANALYSIS OF RESULTS The statistical hypotheses, an analysis of the data, and a summary of the results of the hypothesis tests are presented. first and third hypotheses were tested by a multivariate analysis of variance. When the results of the multivariate analysis were significant at the .05 level additional univariate analyses of variance were used to determine which subtest scores were signifi- cantly different. The second hypothesis was tested using stepwise multiple regression analysis. Hypothesis I: Differences Between Criterion Groups Null Hypothesis: There will be no differences between subjects in Group I and subjects in Group II in mean scores for each of the subtests of the Affect Recognition and Response Scale. Alternative Hypothesis: Subjects in Group I will have higher mean scores than subjects in Group II for each of the subtests of the Affect Recognition and Response Scale. 70 71 Where, MI Mean subtest scores for subjects in Group I. MII - Mean subtest scores for subjects in Group II. Significant differences were found between groups for the three sub- test scores (F = 77.13, p < .00001). Cell means, standard deviations, and approximate overall F value are shown in Table 4.1. TABLE 4.1.-~Summary Data for Multivariate Analysis of Group Effects. Group and Mean Standard ggniidgfigg Subtest Score DeV1at1on Interval Group I Subtest 1 21.80 1.82 21.34 to 22.25 Subtest 2 10.02 1.12 9.74 to 10.30 Subtest 3 12.86 1.33 12.53 to 13.19 Group II Subtest l 21.06 2.36 20.48 to 21.64 Subtest 2 6.53 1.88 6.07 to 6.99 Subtest 3 8.32 2.52 7.70 to 8.94 Approximate overall F Value = 77.133 Significance of F Value = .00001 Univariate analyses of variance were performed to examine group differences individually for each of the three subtest scores. For the univariate analyses the alpha level was set at .017 by dividing 72 the .05 overall alpha level equally among the three subtests. The results of the univariate analyses are presented in Table 4.2. TABLE 4.2.--Univariate Analyses of Variance for Group Effects. . Hypothesis . Significance Var1ab1e Mean Squares F Rat1o of F Ratio Subtest 1 17.61 4.04 .04666 Subtest 2 394.70 161.86 .00001* Subtest 3 670.07 165.10 .00001* *Significant at the .017 level. Differences in mean scores on Subtest 1 (slides of facial affect) were not significant at the .017 level. Thus, the null hypothesis of no differences between groups was not rejected for scores on Subtest 1. Differences in mean scores on Subtest 2 (videotape vignettes) and Subtest 3 (written stimulus situations) were signif- cant at the .017 level. An examination of mean scores indicates that the differences were in the predicted direction. The null hypothesis of no difference in mean scores was therefore rejected in favor of the alternative hypothesis. Subjects in Group I did score higher than subjects in Group II on Subtest 2 and Subtest 3. The mean difference in scores for Subtest 2 was 3.49. The mean difference in scores for Subtest 3 was 4.54. 73 Hypothesis II: Prediction of Supervisors' Ratings Null Hypothesis: There will be no relationship between subtest scores on the Affect Recognition and Response Scale and ratings of affective skills for subjects in Group I. Alternative Hypothesis: There will be a positive relationship between subtest scores on the Affect Recognition and Response Scale and ratings of affective skills for subjects in Group I. Hypothesis II was tested using a multiple regression equation with the three subtest scores as independent variables and supervisor's ratings of affective skills as the dependent variable. The results of the regression analysis were not significant at the .05 level for the three subtest scores entered together into the regression equation or for each subtest score entered independently in the step- wise regression analysis. The three subtest scores together accounted for only 2.0 percent of the variance in supervisor's ratings of affective skills. A summary of the multiple regression analysis results is presented in Table 4.3. The null hypothesis was not rejected in favor of the alternative hypothesis. There appears to be no relationship between subtest scores on the Affect Recognition and Response Scale and ratings of affective skills for subjects in Group I. 74 mxw. mmm. omo. «em. mmo.n m pmmpnzm mo“. Rem. omo. mum. mFF.- N “mounam Ame. Rpm. “Po. oeP. mop. _ ummpnam m=Pm> u do mapm> J mpmm do new mpamwcm> mocmopwwcmwm ppmcm>o mm coccm ucmucmpm m ucmucmgmucH .mmcoom pmmpnzm eogw mmcwumm .mLOmP>cma=m do cowwuwvmca com mpnm» accessm :owmmmcmmm mpavupzz mmwzqmpm--.m.e u4m

a_ mo. age um “caducwcmwm k. kpooo. omem.op NFNN. eo.m ow.m mmmcm>< unwoguwumcw mumsumcw ommm. omom. Romp. om.m mm.m amata>< ucwoaumcmco mpmacmcmcwuca owpmm u we op m manage :mwzumm HH aaoco H uzocw mPQmFLm> zuwpwnmnocm .p m d mwcmaam cum: com: com: “cavemamo .mmmcm>< pcwomnmumco mumzvmcu new muwzcmcmcmccn co manage cow mucmwcm> do mmmzpwc< mgmwcm>wcaul.o.e u4map mo. 3:» “a pendacvcmwm .4. mwo. we.m Neo. mm. mm.- .<.a.u mumzumcmcmucs ammo. Ne.e nmo. mm.~ mm.m .<.a.m mumzumco m pmmunam *Noo. om.m mop. om. e¢.- .<.a.w mpmsumcmcmvcs epoo. mm.m~ moo. em. um.m .<.m.w mucaumcu N pampnzm moF. mm.m mmo. we. pp. .<.a.w mumaumcmcmuza 5mmo. eo.e mmo. mm. em._ .<.a.w upmaumcw P ummpnzm mapm> u do wspm> d mumm do mpamwcm> mocmo_»wcm?m Fpmcw>o mm coccm ucmucmum mumm mpnmwcm> unoccmamucH acmucmamc 1.- 11 IIIII.I1 1 1 III A.<.a.ov mmmcm>< ucwoalmumcw mumaumcw vcm mpmaumcmcmucs soc; mmcoom pmmpnsm mo cowuuwumcm cow mpnmh angE=m cowmmmcmmm mpawu_=z mmwzamumul.m.e m4m