ABSTRACT DEVELOPMENT OF THE INTERPERSONAL SKILLS INTERACTION ANALYSIS: AN INTERACTION ANALYSIS TECHNIQUE TO MEASURE INTERPERSONAL COMMUNICATION SKILLS IN SMALL GROUP SETTINGS BY Randall M. Isaacson An examination of the field of affective and socio-emotional education shows an increasing interest in these fields, particularly in relation to the classroom teachers facilitation of human rela- tions and interpersonal communication skills. The present study reviewed this research with an emphasis on those pre-service and in-service teacher education programs which instruct and evaluate interpersonal communication skills. The review of the research in the field pointed out the almost total lack of objective instruments available to researchers to measure group members communication skills. The emphasis of the present study was the development of an interaction analysis observation instrument which would be capable of measuring interpersonal communication skills in small group settings. Three areas were examined: the instruments reliability, validity and the interpretation of matrices and flow charts. ' The reliability of the Interpersonal Skills Interaction Analysis (ISIA) was measured by Scott's n in three areas. The Randall M. Isaacson inter-rater reliability (coefficient of observer agreement) was estimated using Scott's n for four observers with inter-correlations ranging from .72-.88. A live versus taped reliability coefficient was calculated to estimate the loss in reliability due to audio- tape recordings. These coefficients were .72 and .79 demonstrating the acceptability of the audio-tapes. A stability coefficient was also calculated, which demonstrated a greater within group stability than between group stability. The validity of the ISIA was demonstrated by using partici- pant and expert opinion's ratings of each group. The ISIA distinguished between those groups judged effective and ineffective (Nilcoxon Matched pairs = .002) and further discriminated the differences between the effective and ineffective groups to be due to the communication skills under study. Using a non-parametric statistic (Friedman Anova) the effective and ineffective groups were found to differ on self-disclosure (.002), active listening (.035), feedback (.077) and affective interactions (.03l). The validity of the ISIA was further supported by a rank order correlation which showed the individual group opinionnaire data to correlate with the ISIA category data. The findings were further discussed and illustrated through an examination of matrix and flow chart inter- pretation. DEVELOPMENT OF THE INTERPERSONAL SKILLS INTERACTION ANALYSIS: AN INTERACTION ANALYSIS TECHNIQUE TO MEASURE INTERPERSONAL COMMUNICATION SKILLS IN SMALL GROUP SETTINGS By of“ Randall M. Isaacson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services, and Educational Psychology l976 Dedicated to Elmer P. Isaacson, my grandfather, who waited with more patience for a longer time than anyone else to call me Dr. Isaacson Thank you, Grandpa ii Durinl disagreement i not least" in has been more below. She he and other impc "'9 keep my san in Ire. Thank Others COthee memb U09 By: forgetful 9 but 900d chairman Judy H but who gave many Years, 1 encouraging my Howarc late innings. many my Stat agreat deal 0 W” always ACKNOWLEDGMENTS During this ordeal I read one author who stated his disagreement with the frequent mentioning of wives as "last but not least" in acknowledgments. I agree. My wife, Christie Lee, has been more helpful to me than all the helpful people listed below. She helped me with ideas, with coding, with proofreading and other important odds and ends. But, most important, she helped me keep my sanity and she believed in me when I didn't believe in me. Thank you Chris. Others who helped me in more ways than just being committee members were my committee: » Joe Byers, who was often times hard to find and sometimes forgetful, but really right with me when the going got tough. A good chairman and friend. Judy Henderson, who is now next to impossible to find, but who gave me ideas and guidance in my professional growth for many years, in Ed Zoo and other important areas. Thanks for encouraging my exploratory behavior. Howard "Nolan" Teitelbaum, a friend who pinch hits in the late innings. A late addition to my committee who helped me modify my statistics after I had collected my data and tried, with a great deal of patience, to understand Ed Zoo. Thanks Nolan, you'll always be my ace in the bullpen. iii John Lc mittee and also hundreds of thi ally. I can b every day. Abbey S my writing thar finish, but 1'; loss. There ( aCiluaintances Ian. A few 5; assistants whol Ithank you Rol 7° “’0 900d frl the ISIA, thanl new friend whoI Whig into, And la my attempts air patient, but I John Lopis, a close friend who happened to be on my com- mittee and also happened to be my expert judge. Thanks for the hundreds of things you've shown me both professionally and person— ally. I can best thank you by passing them on to others, and I do, every day. Abbey Shur, the person who probably helped me more with my writing than anyone else. I'm sorry you weren't around at the finish, but I'm sure the students in Israel have gained from our loss. There are, of course, hundreds of students, friends, and acquaintances to_ thank. Each helped me to become the professional I am. A few specifics immediately come to mind. To those teaching assistants who allowed me to invade the privacy of their IPL group; I thank you Ron Jones, Mike Radke, Mary Samuelson, John Graves. To two good friends who dealt with the frustration of learning the ISIA, thank you Felice Shulman and Laurie Moffot. And to a new friend who drew my flow charts and didn't know what he was getting into, thanks Dave Robinson. And last, but not least, thanks to my family who supported my attempts all through school. I can't really say you were patient, but I always knew you were right with me. iv LIST or TABLE LIST or F101;?- LIST OF never Chapter 1- INTROE Pury Rat Def II. REVIE Ove Int Gro Hun a Obs Hun Cor III. INST; TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES . LIST OF APPENDICES . Chapter I. INTRODUCTION Purpose of the Study . 1 Rationale for the Research . Definitions, Deliminations, and Terms II. REVIEW OF RELEVANT AND RELATED RESEARCH Overview Introduction . Group Work in Education . . . . Human Relations Research and Observation: Problems and Solutions . . Observation: Measurement of Communication Skills Human Relation Goals: Categories for Observation Conclusion . . . . . . . III. INSTRUMENT DESCRIPTION AND PILOT TEST Introduction . . . Interpersonal Skills Interaction Analysis Procedure . . . . . . . ISIA Category Description . ISIA Pilot Test and Instrument Modification Modifications . . . . . . . . IV. METHODOLOGY . Introduction . Data Collection : Reliability Page Chapter Validity . Data Analysis Conclusion V. RESULTS Introduction . Reliability Validity Conclusions VI. DISCUSSION Introduction . . . . . . . . . . . . Review of Results: Solutions to Old Problems . Procedural Recommendations . . . . . . Proposed Modification of ISIA and Training . Suggestions for Future Research . Conclusion BIBLIOGRAPHY . APPENDIX vi Page . 120 . 128 . 137 . 138 . 138 . 138 . 145 . 162 . 164 . 164 . 164 . 168 . 172 . 179 . 182 . 183 Table 10. LIST OF TABLES List of all ISIA Categories . Reliability Matrix - Intercorrelations of the Five Observations Using Scott's n . In-Class Versus Taped Observations Within-Group and Between- -Group Stability Coefficient Estimates Using Scott's n . . . . Nilsoxon Matched- Pairs Signed- -Ranks Test - Effective Versus Ineffective Groups . . . . . Friedman Two-Hay Analysis of Variance by Ranks--All Groups Friedman Two-Way Analysis of Variance by Ranks-- ' Effective Groups . . . . . . . . . Friedman Two-Nay Analysis of Variance by Ranks-- Ineffective Groups . . . . . . Wilcoxon Matched-Pairs Signed-Ranks Test-—Effective Versus Ineffective Groups, all Self-Disclosure . . Spearman Rank Order Correlation Coefficient: Opinionnaire Data--ISIA Category Data . . . . . . . vii Page 94 . 139 . 141 . 144 . 146 148 . 150 . 151 . 152 . 154 LIST OF FIGURES Figure Page 1. Communication Flow in an Effective IPL Group (group 4) as Recorded by the ISIA. . . . . . . . . . 158 2. Communication Flow in an Ineffective IPL Group (group 9) as Recorded by the ISIA. . . . . . . . 159 viii LIST OF APPENDICES Appendix Page A. ISIA Training Manual . B. Interpersonal Process Lab Opinionnaire . C. Gazda's Global Scale . D. Interpersonal Process Lab Objectives E. Flander's System Interaction Analysis F. Ober's Reciprocal Category System (RCS) G. ISIA Categories H. ISIA Observation Sheet I. Validity Group Matrices and Flow Charts ix CHAPTER I INTRODUCTION Since the first groups in Bethel, Maine, in the late 1940's, a great deal of change has come about in the field of group dynamics and interpersonal communication skills. From a few individuals whose primary concern was personal and social change in the area of education, group work has now expanded to many diverse areas of our society. In fact, it would be unusual for an individual, born in the last 20 years not be faced with the decision of par- ticipating or not participating in some sort of intensive group experience sometime in his life. The general public has been bombarded by facts and fictions about the intensive group encounter through the media and from individuals who have experienced such groups in business, medicine, religion, weekend workshops, or the formal educational institutions. An enormous amount of energy has been expended by a great many people to call attention to the importance of communication skills in man's everyday life. Some of the proponents of this movement have used their expertise in the field to work with marital problems, business (e.g., National Training Lab), parent-child communications (Gordon, 1970), psycho- logical therapy sessions (Berne, l96l, 1966) and individual growth and self-actualization (e.g., Esalen). Perhaps the fastest growing, and perhaps the largest sector of the intensive group experience movement is attempting to bring about changes in our educational institutions (Christ, l972). These changes encompass diverse experiences derived from a variety of educational viewpoints at all levels of our schools, from early elementary school through college. While the changes very often involve changes in educators' viewpoints and behavior toward stu- dents they sometimes involve changes in curriculum. Interpersonal communication skills seem to have become a curricular area in and of themselves, not just a side issue to be dealt with when com- munication becomes a problem. Communication skills have become a subject matter to be taught in schools just as science or math. In spite of the variety of settings in which group work is being used and the diversity of viewpoints which underlie the various approaches of the group leaders, one finding of most group work is the lack of systematic evaluation and research. In the preface to one of the most in-depth studies of sensitivity groups, it is stated that, "The explosive expansion of the use of groups for personal change has not been matched by corresponding concern for information about what such groups do and how well they do it. Innovation has exceeded evaluation" (Lieberman, et al., l973, p. vii). The lack of evaluation and research of group work leaves many with questions as to what group participation involves and what possible benefits or harm might accrue to those involved in such intense group work. This presents a real concern to many individuals who are faced with the choice of joining such groups. At this time, their only recourse is to seek out subjective reports from those who are, or have been, involved in such experiences. Moreover, there exists greater problems for those individuals who are exposed to group work in a less than voluntary situation. With the increasing emphasis on communication skills and affective education, a substantial portion of the school population is now exposed to some sort of group work in their schools. Programs such as DUSO (Dinkmeyer, 1970) and FOCUS (Anderson and Miner, 1971) at the elementary school level and value clarification programs in high schools put a great deal of emphasis on teaching communi- cation skills, and to the layman those programs may seem similar, if not identical, to sensitivity groups for children. This has caused concern for parents and educators, as much of the publicity and popularity associated with the sensitivity movement has centered around what might be called a Bob-and-Carol-and-Ted-and-Alice fantasy, the "touchy-feely" aspect of the intensive group experi- ence. In light of the paucity of facts and the almost complete 1ack of research and evaluation, this fantasy has become a reality for a great many people. For that portion of the society which chooses to voluntarily participate in intensive group work, research on the possible benefits or harm of such experiences would be helpful but perhaps not essential. If the choice is left to the individual and even if the experience had no long-term benefit, many individuals might elect to participate in such a group as recreational adventure. In education, however, the value of an activity which lacks a spe- cific goal is now being questioned. The public wants to know what their children are learning in schools and more importantly many parents want to know why their children are learning those facts and skills. The public wants schools to be held accountable for the "what and why" of learning. Educators are looking more closely at what teachers need to know and do to be more successful in the classroom and have developed curricula which explicitly stipulate what skills have to be mastered (e.g., competency-based education). Educators are concerned with teacher's in-class performance; what skills teachers need to exhibit rather than simply what knowledge they possess about math or any other subject matter. One area in which teachers need new knowledge and skills is interpersonal communications. Colleges of Teacher Education are beginning to change also. Standard methods courses do not fully prepare teachers to use programs such as DUSO (Dinkmeyer, 1970) and FOCUS (Anderson and Miner, l97l) which has created a growing need for per-service teacher education course work in interpersonal communications, human relations, and value clarification training. Programs and course work in these areas are moving slowly but the interest and need will certainly bring about more changes at a faster rate in the coming years in both schools and colleges of teacher edu- cation. f 'In all areas of education, the need for evaluation and research on intensive group experiences is essential. Accountability on a national as well as a local level calls for evaluation of specific objectives. Performance-based teacher education requires basic competencies for teachers which demands an assessment of the criteria by which we shall judge teachers. New Teacher Edu- cation programs which emphasize interpersonal communication and human relation skills need to implement both formative and sump mative evaluation procedures (Scrivens, l967). The sensitivity movement has had an influence on the recognition of the need for socio-emotional and affective education in school; but it has not helped develop the criteria for the competencies needed by teachers to deal with these domains, nor has it aided in developing a method of evaluating such skills; A few colleges of Teacher education, e.g., University of Georgia (Gazda, et al., l973a), University of Massachusetts (Allen & Cooper, l967), and Michigan State University (Lopis, 1973), have begun to develop programs which focus on the need for teachers to be trained in interpersonal communication and human relation skills. Recently, two state legislatures, Minnesota and Wisconsin, haVe made it mandatory that schools of teacher education instruct their prospective teachers in human relation skills. Programs dealing with human relations, effective education, and inter- personal communication skills are increasing in number, but methods of evaluating such programs are still somewhat primitive. Evalu- ation is not keeping pace with development. The present research will begin to look at the evaluation of such intensive group experiences and propose one method, an observation instrument for interpersonal communication skills, of doing such evaluation and research. Purpose of.the Study It is time for those involved in groups whose goal is change, to specify the desired change and assess that change. Thus far, the emphasis has been on process, but in a society that is increasingly calling for accountability we must not neglect the product. To understand the group process as it unfolds is an important part of the group functioning; but to understand the product of group participation in terms of skill acquisition and changes in groups members' behavior is also important. Added significance for greater understanding results from societal pressure for such information. This research will attempt to begin to look at groups-- Interpersonal Process Laboratories (IPL) in particular--to examine the behavior of both the facilitator and group members while participating in the group, and to develop an observation system to measure those behaviors. Rationale for the Research The increasing interest in intensive small group involvement has created the need to evaluate such group in terms of what par- ticipation entails and also of what long term consequences par- ticipants can expect. In the preface to his book Beyond Words: the Story of SensitivitygTraining and the Encounter Movement, Back (l972) states, "Renewed interest in formal evaluations and studies underway may soon relieve the gloomy picture of the state of the research shown here" (p. xi). But the gloomy picture painted by Bach in 1972 is still with us and, as noted below, may be getting worse. Recently the intensive group movement and the interest in communication skills has gone beyond the social movement phase and has become the interest of business and professional people. Professions which have as one of their tasks relating with people in the socio-emotional domain, that is helping others relate effectively in the social world with others and with their own emotions, are particularly interested in communication skills and human relation training. Nurses, doctors, clergy, social workers, counselors, and teachers all need the skill of communicating with people in order to optimally deliver their services. With this in mind, many professional schools are imple- menting programs which instruct the pre-service professional in communication skills. Nurses training (Aiken, 1973), doctor-patient relationship courses, group counseling course work, and social work training all involve experiences in groups or direct communication skill training. These programs are being developed very rapidly, but they have put little if any emphasis on the evaluation of such programs. Granted, most of these programs are too new to begin a summative evaluation program, but there is a place for formatiVe evaluation in these programs. Presently the most popular, if not the only, method of evaluation is the subjective opinion of an experienced facilitator or communications "expert." But the reliability of such evaluations can certainly be questioned. Given the vested interest such experts have in these new programs, it opens the posibility for them to spuriously find success in their programs. This is pointed out by Lieberman et al. (1973), who note that encounter group theorists often allow their perceptions to become self-fulfilling prophecies. Objective formative evaluation of such programs is imperative if the programs are to improve. Objective formative evaluation will also aid later summative evaluation which must demonstrate the programs' success if such programs are to continue. Group Work in Education An evaluation technique for communication skills is urgently needed in education just as in the other professions. A teacher must communicate effectively with the students to discern their needs. Moreover, teachers have the added responsibility of fa- cilitating effective communication between students, a responsi- bility that requires the teacher to have effective communication skills himself and also be able to teach these skills to others. This points out the importance of evaluation in the educational profession: first, the teachers of teachers must be evaluated, (i.e., are the instructors at the college level skilled in com- municating with others?); second, the pre-service teachers should use effective communication skills in their college environment, (i.e., are preservice teachers demonstrating the appropriate skills in their communication classes?); third, a teacher who has experienced a communication skills program in college should use those skills in the public schools, (i.e., does he or she model and instruct the pupils in those skills?) With the need for communication skills programs in the public schools and colleges of teacher education, and the emphasis on accountability in education, one wonders what has caused the delay in initiating evaluation procedures in this area. One reasons for the delay is that the movement in education is still in the development stage. Socio-emotional education is only a few years old as an academic discipline, and the leaders in the field spent their energies on development, they want to finalize the product before they evaluate it for the public.' This delay may also be influenced by the anti-research bias which Back (1972) notes, "In fact, an investigator's concern with assessment techniques is frequently taken as an expression of hostility" (p. 15). In most school subjects the evaluation is done by standard- ized test or some other sort of paper and pencil test. The objective of the instruction is to teach knowledge or in some cases a particular skill which can be demonstrated by performing some task that is measured objectively through a paper and pencil test. While knowledge of group dynamics, which could be evaluated by means of a paper and pencil test, is important, the demonstration of the skill in a group situation is the true test of the instruc- tion. Here lies the roadblock to the evaluation of communication 10 skills programs; no objective evaluation procedures exist to examine communication skills in group settings. Presently group evaluation is done by an experienced observer or even by the facilitator himself by means of subjective report. Since traditional testing cannot measure group functioning, subjective evaluation was (is) seen as the only alternative by most people concerned with evaluation and research. But there is an alternative which has been used to examine classroom interaction. Since the first Handbook of Research on Teaching(Gage, 1963) literally hundreds of observation instruments have been developed, a large portion of which have been developed for teaching (Simon and Boyer, 1970). Some of these instruments have been developed for a particular subject matter such as math or foreign language but most simply examine general classroom interaction. These observation systems would seem to be a solution to objective evaluation of intensive small group experiences, but they have not been the complete cure all. Recently, William Childers (l973) used one of these observation systems (Flanders, 1970) to examine the effects of the Georgia program (Gazda, 1973) on student teacher behavior. He found few significant results and in his recommendations suggests that a new instrument be developed, "A more sensitive instrument should be developed that will more directly reflect differences in communication styles" (Childers, 1973, p. 72). These results are not surprising in light of the fact that Flander's system examines pupil-teacher interaction at-a very 11 general level (e.g., lecturing, giving directions, or asking questions). These general behaviors may not change as a result of more effective communication skills. The teacher may ask different types of questions after a human relations program, but the number of questions may be the same; therefore, the Flanders system would not discriminate those differences. Observation systems are a promising answer to the question of objective evaluation; but now that interpersonal communications is a subject matter in and of itself, it seems necessary to develop an observation system that is directed at examining specific types of communication skills. Flanders (1970) noted that in developing an observation system, the first requirement is that you know what you're looking for: an observation system is like walking in the park; if you're looking for birds, you won't notice the rabbits. Of all the hundreds of observation instruments (Simon and Boyer, l970), none list interpersonal communication skills as its focus of interest. Observation systems may well be one answer to objective evaluation and research, but to look for rabbits we cannot expect to find results using binoculars that see only birds. There is a specific need for an observation instrument whose objective is the examination of specific interpersonal commu- nication skills. Definitions, Deliminations, and Terms ,Before moving into an examination of observation instru- ments and interaction analysis, definitions of terms that delimit 12 the area of interest that will be examined in this research will be presented. Three general areas in which there are terms which have been defined in a multitude of ways (or undefined) will be examined. These are: affective education, group work, and observation instruments. I. Affective Education The term affective education is in some ways a misleading term. For the proposes of this study affective education will refer to that position which encourages students to talk about how they feel (affective awareness) and understand their own feelings and the feelings of those around them (affective and cognitive awareness). Although named affective education, this field also deals with the cognitive domain in the students must have a cognitive understanding of the process used in dealing with those feelings effectively. In the area of affective edu- cation, there are four terms or concepts which are commonly used but which may be so general (and misused) as to be confusing to the reader. I A. Cognitive. Cognitive refers to knowledge or facts. It is usually used in regard to what a person knows about a situatiOn, person, incident, or body of information. Most teacher preparation is aimed at the cognitive domain, and prepares teachers to instruct students in a body of knowledge (facts). These facts cover a wide spectrum of information from historical dates to math skills. Knowledge of the specific criteria which must be included 13 in particular communication skills would also constitute a cognitive area, i.e., knowledge about feelings (e.g., understanding group dynamics) is cognitive even though the content is feelings (af- fective). B. Affective. Affective refers to feelings or emotions. It is usually used in regard to how a person feels about a situ- ation, person, incident, or body of information. Very little time is spent in teacher education programs to prepare teachers to deal with the affective concerns found in public school classroom. C. Socio-Emotional Domain. Man's needs, as well as the subject matter taught in school, can be broken down into three interrelated areas or domains: intellectual domain, physical (or psycho-motor) domain, and socio-emotional domain. The first two domains are relatively well defined and researched and have received most of the attention of the educational community (e.g., the three "R's", hotlunch programs, school nurses, physical co- ordination, etc.). The socio-emotional domain deals with the social interaction of people. This involves the cognitive know- ledge of how to communicate effectively and get along with others in social situations and also the affective component of how one feels about himself or herself and those around him/her. A major aspect of the socio-emotional domain involves handling emotions constructively in social settings. E. Interpersonal Communication Skills. Interpersonal communitation skills constitute a subset of the sociOvemotional domain and would probably be classified as affective education by 14 many people. With the variety of programs being started, many terms are used to label a few general skills. Three skills that are included in nearly all the programs are active listening, self-disclosure, and feedback. These skills encompass nearly all the specific skills generated for any program (Allen, 1968; Becvar, 1974; Gazda, 1973; Gordon, T970; and Lopis, 1974). II. Group Work The area of group work, or the intensive group experience, is very difficult to define because the terms do not have precise meanings that are accepted by even the experts in the field. One of the major difficulties in understanding the group work field is the distinction between the various sectors of the movement. Since the present study is concerned almost exclusively with group work in education, it is important to delineate exactly where group work and education fit together. Back (T972) addresses this problem in Beyond Words: In the early period of sensitivity training, however, the idea of making changes through a group experience multiplied in education as well as in the medical and social work fields where education was needed. This philosophy has permeated the whole group-work field to the extent that sensitivity training has become confused with all of group work. . . . The basis of sensitivity training still remains the strong experience, the subjective feeling of change, while group work is generally much more goal oriented and wary of strong emotions (p. 176). To understand the group movement, Back (T972) has con— structed a map to locate and distinguish the various types of groups Using three dimensions: (1) experience for itself or goal directed (2) strong or weak emotional impact on members, (3) the 15 Tentative Assignments of Experiences to the Scheme (Back, 1972) Experience-Directed Goal-Directed Strong Weak Strong Weak Psych-Resorts U) '3 3 '0 '§ A C E G :5 (Therapeutic (Training) 15 Methods) (Encounter) (Recreation) F (Indoctrination) (Management) Group The accompanying chart shows the eight possibilities of sensitivity training according to this scheme. They are labeled A through H. . . . The boundary lines of the field of sensitivity training are not very definite and are continually shifting, especially as long as sensitivity training is expanding, for instance in the field of therapy (p. 122). 16 individual versus group orientation. Using these three dimensions, Back constructs a matrix (cube, 2 x 2 x 2) with eight areas which correspond to the various approaches of the group movenent. The first dimension (experience for itself or goal directed) examines the purpose for group membership, is the experience itself the goal (i.e., self-expression) or does the experience have a goal beyond participation specifically, is the purpose of group member- ship change? The second dimension considers the emotional impact on the group members. The diStinction is not clear cut but the extremes point out the difference: a weekend marathon at Esalen aimed at sensory awareness arouses entirely different emotions than an afternoon workshop for businessmen on how to get along better with employees. The third dimension examines whether the group emphasis is on individual growth or group development, is the purpose of the group to better the individuals involved or the group as a whole? The group movement involves many approaches and styles which all fit into Back's matrix someplace depending upon how they meet the three criteria, but two types of groups in particular need further clarification. A. Sensitivity, Encounter, T-Groups. Although the proponents of each of these movements would no doubt object to grouping them together, these three particular types of groups all occupy the same cell in Back's matrix, experience-directed groups with a strong emotional impact that can have either a group or individual orienta- tion (cells A & B). These groups have no definite aim beyond 17 encouraging people to understand themselves and encourage strong emotions and feelings. This is the area of the group movement where the Bob-and-Carol-and-Ted-and-Alice fantasy comes closest to becoming a reality. In the present study, this group will be referred to as the encounter group or movenent. B. Educational Process Group. This term is used to label those groups which have a definite aim or goal (i.e., there are objectives that extend beyond the time and space of the group) and also have individual development as a priority over group develop- ment. Their emotional impact would depend upon the particular group although the experience would rarely have the emotional impact of an encounter group. The most critical difference between these groups and others in the movement is their strong emphasis on specific objectives and change within individuals. The most fre- quently stated objectives for these groups is the acquisition of particular communication skills. These groups generally fall in cell G, although some may be close to cell E. C. Interpersonal Process Laboratory. This study will explore one particular type of group, the interpersonal process laboratory (IPL), which exemplifies the educational process group. This group consists of approximately 15 students and one in- structor (facilitator) meeting two hours twice a week for ten weeks. This group is part of a course (Education 200: The Indi- vidual and the School) at Michigan State University which is a required course for education majors. The objectives for this course appear in Appendix D. 18 III. Observation Techniques In Medley and Mitzel's (1963) discussion of systematic observation, they refer to the term observational technique as ". . . procedures which use systematic observations of classroom behavior to obtain reliable and valid measurements of differences in the typical behaviors which occur in different classrooms, or in different situations in the same classroom" (p. 250). This covers a broad range of observational techniques (or systems) of both the category and sign type. A category system includes a number of specifically defined categories into which all observable behavior which is of interest fall and also includes the number classifiable in each category. Therefore, if verbal behavior is the area of interest, all verbal statements theoretically fall into one and only one category and the instrument specifies how often they occur in a given segment of time. A sign system on the other hand specifies what behaviors an observer is to watch for and records only those behaviors. The two systems differ in that the category system is theoretically exhaustive of behaviors of the type recorded. A. Interaction Analysis. Interaction analysis is a specific type of category system devised by Flanders (1960) which studies the chain of classroom events in such a way as to take into account each recorded event in sequence with every other recorded event. This is done by recording, in sequence, each event according to a specific category definition and then transferring the list of code symbols, one symbol to one event, onto a matrix which shows graphically the 19 relationship between the events, i.e., what precedes and follows the individual events (see Flanders, pages 54-75, 1970). Although this system does not specify what precedes and follows each specific individual event once the codes are tabulated on the matrix they do portray the probability of each category being followed or preceded by every other category. This increases the amount of information retrievable from the data by going beyond a simple frequency count by adding the dimension of time. B. Interpersonal Skills Interaction Analysis (ISIA). The ISIA is the title of an interaction analysis technique derived from Flanders' Interaction Analysis Categories (FIAC), (Flanders, 1970) and Ober's Reciprocal Category System (RCS), (Ober, et al., 1970) and from programs in communication skills (e.g., Lopis, 1975). It is a multiple cluster category system that examines interpersonal communication skills in groups whose goals are specifically related to those skills. The clusters and categories are listed in Appendix A. Procedure The development of an observation instrument involves a number of steps. The first task in developing an observation instrument is to set broad limits on the types of behaviors to be investigated. In the present study this broad area of interest is interpersonal communication skills. The investigator must then review the literature to discover what instruments have already been developed in the field. This is followed by the specification 20 of the type of observation procedure to be used (e.g., sign system, category system) and the categories to be included in the observation instrument. After the type of system and categories are decided upon, operational definitions must be written for each of the categories. This is perhaps the most important step as reliability and validity rest on the extent to which the categories are behaviorally defined and mutually exclusive. After the categories are defined the appropriate population must be delimited and pilot testing of the instrument must be carried out. Following pilot testing any needed modifications of the instrument must be included in the preparation of a training manual for observers. The training manual must include category definitions, ground rules for using the instrument and some type of practice exercises for observers. Once the instrument training manual is prepared the instrument is ready for field testing. Field testing involves specifying the procedures for the data collection, verifying the reliability of the instrument (including the training of observers) and showing that the instrument is valid for the stated population. These steps will be used to develop the ISIA. Population The populations to which the ISIA may be applied encompasses a great variety of environments but is restricted by the goals of the groups. The ISIA may be used to examine the communication skills of people (both adult and children) involved in groups whose goal is the development of more effective communication 21 skills. Many programs have been developed recently which focus on these skills and the ISIA may be an appropriate tool for evaluating and researching these groups. Public school programs, (e.g., DUSO and FOCUS), parental training programs (e.g., Parent Effectiveness Training), teacher education programs, and any other academic programs whose goal is more effective communication‘are all possible populations in which the ISIA may be used. The popu- lation to be examined in the present study will be those individuals enrolled in an introductory course in education, The Individual and The School (E0200), at Michigan State University during the Summer, and Spring terms, 1974-75. A major segment of this course consists of participation in interpersonal process laboratories (IPL) whose objectives are: self-disclosure, active listening, questioning, observation, and feedback skills (see IPL above in definitions, delimitations, and terms). Data Collection The actual data used in developing an interaction analysis technique is the sequence of codes recorded by trained observers. This data may be obtained in a number of ways each of which has advantages and disadvantages related to how removed the coding is from the actual group interaction. The least removed method of collecting the data would involve actual in—class coding by trained observers. The advantage to this method of data collection is that the observer is exposed to all the verbal and non-verbal stimulae to aid in the coding. Due to the subtle nature of 22 interpersonal communication all these cues may be important and must be explored. But because of the possible effects the observer may have on the group (i.e., changes in behavior due to the presence of an observer) and time limitations of the observers, in-class observation of all groups would be inefficient. For the present study, in-class observation would be examined only as a check for the reliability of other data collection procedures. That is to say that some in-class observation would be carried out and then compared to other methods of data collection with the aid of exploring the possible loss of data by other observation methods. A second method of data collection would involve the use of videotape equipment to collect the group interactions. The advantage of using this method would be the recording of both verbal and non-verbal interactions, but the cost and possible interruption due to the recording equipment prohibit the col- lection of data by videovrecordings. The use of tape recording is the third data collection method. Flanders' (1971) work and pilot testing by the author indicate this methOd to be both efficient and reliable. The majority of the data collected for the present study will be done by audio-tape recordings, as this procedure involves a minimal amount of group disruption while still retaining all the verbal interaction. The data for the present study was collected by tape recording during the Summer Term, 1974, and Spring Term, 1975. To examine the validity of the instrument, student and instructor Opinionnaire data were collected during Summer Term, 1974. 23 Reliability The establishment of the reliability of an observation schedule is perhaps the most crucial element in developing such an instrument. Two types of.reliability will be analyzed in the present study: the coefficient of observer agreement and the reliability of in-class observations versus tape recorded observations. The coefficient of observer agreement (Medley and Metzel, 1963) is the amount of inter-rater agreement and is defined as the correlation between scores based on observations made by different observers at the same time. This is the most common form of reliability when examining an observation instrument. For the present study, Scott's n (Scott, 1955) will be used to esti- mate inter-rater agreement. This method of estimating reliability can be interpreted as the extent to which the coding reliability exceeds chance. Research on interaction analysis has used this method of estimating reliability (Amidon and Hough, 1967). In-class observation versus tape recorded observation will examine the possible decrease in reliability due to the loss of non-verbal cues through the use of tape recordings. This will be done by having an observer code a live group session at the same time as the group is being tape recorded. At a later date the same observer will recode the tape recording and the reliability of the in-class versus tape recorded data will be estimated using Scott's n. Because of the verbal nature of the ISIA, it is felt that the loss will be minimal. 24 Validity Validity measures of observation instruments are difficult to define and are seldom addressed in the literature.. Medley and Metzel (1963) refer to the validity of an observation as the extent to which the observation data reflect actual differences in behavior as opposed to different impressions by different observers. Most developers of observation instruments seem to assume that a high degree of observer agreement demonstrates the similarity of impressions by observers which, in turn indicates actual-differences in behavior. This may be evidence of direct or primary validity (Ebel, 1972), but it would seem important to examine the derived or secondary validity including the correlations of the observed behavior to "actual" behavior or secondary measures of that behavior. In the study of the intensive group experience, no previous objective measures of the participants behavior exists; that is the purpose of developing the ISIA. This being the case, the researcher must look to'a less reliable but useful subjective evaluation, the opinions of the participants. The IPL Evaluation (Appendix B) is an instrument developed to systematically collect the opinions of the group members as they relate to the skills measured by the ISIA.1 During the Summer Term, 1974, this instrument was administered to all group members immediately following the taped group meetings. The opinions of group members rating their own interactions is subjective and can be very biased. Because of this an expert in interpersonal communications 25 listened to the sessions and rated each session on the same opinion- naire (Appendix B). This data was used to validate the group member's opinionnaire data. The opinionnaire data are used to choose extreme groups (effective versus ineffective; see questions #8 and #20, Appendix B) which are compared to examine the primary validity of the instrument. If the instrument is, in fact, sensitive to the communi- cation skills it purports to measure, it should be able to distinguish between those groups seen as effective by the members as opposed to those groups viewed as ineffective. The opinionnaire data will also be used to verify the quantity of particular skills used in the groups. For example, questions #7 and #19 refer to the amount of active listening demonstrated in the group: do groups who differ significantly on those questions show a difference in the ISIA categories which represent active listening? _Questions to be Addressed Intensive group work in education is expanding at a rapid rate. A great deal of research and evaluation is needed in this area in the near future to point out the strengths and weaknesses of group work in all areas of formal and informal education. Before any research or evaluation can be started, the tools of evaluation must be developed. This study will focus on the development of one tool which, if shown to be reliable and valid, will be valuable in examining programs whose primary concern is communication skills. The answers to the following questions may offer more systematic 26 objective evaluation and research in the field of interpersonal communication skills programs. Are observation techniques, interaction analysis in particular, suitable tools for examining communication skills in intense group experiences? Can observers be trained to code interpersonal communi- cation skills reliably? Do audio-tape recordings disclose enough of the cues of communication to reliably code interpersonal interaction or must observation be done live, in-class? Can an instrument, such as the ISIA, reflect the subjective judgments of the participants? Can such an instrument discriminate between groups judged effective and those judged ineffective? If so, what particular skills are evidenced in those effective groups? What behaviors occur less frequently? What patterns of behavior occur in such groups? CHAPTER II REVIEW OF RELEVANT AND RELATED RESEARCH During the past quarter century, man has made great strides towards understanding the behavior of individuals in groups. During the same period, education has undergone great changes and innovation have become standard procedure. This chapter will explore the interaction of these two phenomena, specifically the impact of the sensitivity movement on the institution of education. An examination of the literature reveals numerous attempts at research and program evaluation in the area of sensitivity training, human relations training, affective education and other related fields. But one is astounded at the number of researchers and reviewers in the field who cry out the same old song: we need research on human relations training, we need instruments to measure the outcomes of groups, we need better methodology to study the treatment effects (what happens in those "black boxes" called training groups?), or group goals must be behaviorally stated, to mention a few of the verses. But, like the proverbial weather problem, it seems everyone is talking about it, but no one ever does anything about it; the lack of adequate tools to study small group interactions still impedes research and evaluation in education as well as in other fields. For the sake of brevity, this review will only 27 28 examine this problem as it relates to educational research and evaluation. But the problem is just as pressing in other disci- plines. Overview By way of orientation, this chapter will begin with a general discussion of the role of group dynamics and interpersonal communication skills in education, relating the felt need for such programs, their intergration into the schools, and a brief overview of the types of programs being instituted at various levels of schooling. This will be followed by a discussion of the in-service human relations programs for teachers, and then a review of the research on the pre-service teacher training programs. The difficulties involved in researching the attitudes and behaviors of small group members will be reviewed including a discussion of the methods of observing the outcomes of group participation. The last section of this chapter will focus on one method of quantifying group behavior; interaction analysis as developed by Flanders will be explained particularly as it relates to the development of the ‘Interpersonal Skills Interaction Analysis. Introduction All of my professional life I have heard quotes of surveys which showed that 75-80% of human beings failed in the work-a- day world because they cannot relate effectively with other people; yet the major part of our educational effort is directed toward improving instruction--how to teach students more math earlier, interesting ways to present new and old facts . . . we have bigger and better reading programs, and we are producing so many non-readers that we are creating jobs in school after school for remedial reading teachers. (And I 29 am happy to say that many of the remedial reading teachers that {I know are finding that their best results occur when they set aside the textbook for awhile and relate to the child as a human being.) (Tatum, 1969) With all the time and energy put into educational research on curriculum, learning, teacher education, etc., and all the changes that have been brought about in the classroom, a large proportion of our society is still disenchanted with the institution of education. The above quote by Tatum (1969) echoes a position that is a growing concern to many educators--the need for affective education, human relations, and communication skills in the class- room. Recently there has been a great deal of controversy among the public, students, and professional educators concerning the direction education ought to be taking. For example, Ebel (1972) states ". . . it seems clear that the principal task of the school is to facilitate cognitive learning" (p. 33). But those educators of a more humanistic persuasion claim that the school's function is to bring about what Rogers (1969) called "significant" or "experi- ential learning." We defines this type of learning as having ". . . a quality of personal involvement--the whole person in both his feeling and cognitive aspects being jn_the learning event" (p. 5). As with any philosophical argument, no single fact can be brought to bear that will settle this issue. There is a great deal more to be said on both sides of this issue, but it seems clear that education is expanding beyond merely the facilitation of cognitive learning and all signs seem to indicate it will continue to do so despite a great deal of resistance. 30 Despite the resistance, the fields of affective education and communication skills are being incorporated into an ever increasing number of programs. As Reece and Passmore (1971) point out, education has emphasized knowing and doing for the past four decades, but feelings may be the primary focus of the seventies. In the years to come, society may mandate a more humanizing edu- cational experience and part of that experience must include some instruction on relating to those around you. More than likely these experiences will include a human relations model similar to sensitivity training. Educators must be prepared to show the usefulness of such a program (what do the participants gain?) and also be aware of the skills teachers will need to facilitate such programs . A Although sensitivity training has always been connected with education (NTL is and always has been formally related to the National Education Association), the impact of the group movement was negligible up until the middle to late sixties. Even today much of what the public knows (teachers included) is based on rumor, subjective report, or sensationalism from the media. In an opinion poll in Nation's Schools (1970), half of superintendents interviewed seemed to be saying they'd suspend judgment on sensitivity training . until they received more information.‘ But tWice as many felt such experiences had a positive effect than negative. Their uncertainty and concern related to the proficiency of the group leaders and the conflicting information on the effects of the groups. These con- cerns are legitimate whether a person is deciding on attending a 31 weekend marathon himself or deciding on implementing a human relations program in his school district. The competency of the trainer or facilitator is extremely important, particularly in a situation in which the participants are not volunteers such as the public schools. This will be explored in depth in the later discussion of pre- service and in-service training, but one must note the risk a superintendent or principal takes when s/he implements a human relations program in his/her district or school if his/her teachers have no training in group work. The issue of conflicting informa- tion is also a great concern even today as so little fact exists on the effects of groups other than the "fact" that most participants have a positive opinion of their group experience. Although there is a shortage of objective knowledge con- cerning training groups, teachers have been exposed to group work through professional journal articles and various workshops run specifically for teachers. In 1970, Educational Leadership devoted an entire issue to "Sensitivity Education: Problems and Promise.“ In that article teachers, principals and other edu- cators from various parts of the country shared information about their programs including the opinions of their students and staff relating to sensitivity training. The enthusiasm generated by the programs comes through in one typical statement, "It is hard to imagine anything more important at the present time than the improvement of human relations, and that is what successful sensitivity education furthers. Our material wealth is un- believable, but we often seem to be in the Dark Ages in our human 32 relations" (Corey, 1970, p. 238). Other teacher education publica- tions such as Childhood Education (Lippit, 1970, and Trubowitz, 1975) and Scholastic Teacher (Harrison, 1971) show this same enthusiasm. Summer workshops for teachers are offered at colleges and NTL (at Bethel) as well as programs which may be contracted by school districts. One contract program called Talent Awareness Training (Sponberg, 1969) which holds workshops mostly in the Rocky Mountain states had already reached 20,000 elementary teachers as of 1969. These programs, designed to introduce teachers to sensitivity education, are numerous and the numbers are growing. This increase is also true in the types of programs which are being developed and used in the public schools. Grqpp Work in Education The growing use of sensitivity training in business, industry, religion, and as recreation has been phenomenal in the past fifteen years. The areas of education in which groups are being used are equally diverse. From nursery school through graduate school, from nursing homes (Diekman, 1972) to campus police I (Abramson, 1973), people who meet in groups are finding uses for sensitivity training and communication skills training. Public School Programs The most wide spread reported use of group work in education is in pre-service and in~service teacher training programs, but the implementation of group work is by no means limited to teacher training. Many programs have been developed at all levels of 33 education both public and private. A number of these programs have been developed to simply open up the communication or discussion aspect of a particular course, others have been used to develop related skills (e.g., speech and theatre), while others have been directed at particular problems of the school or students (e.g., racial tension or drug problems). The applications of the sensitivity training experiences have also varied; a number of the programs consist of one teacher reporting his personal application in his classrooms while other programs are formal decisions at a school district to implement a city-wide program. One similarity among all these programs is the use of a subjective evaluation technique, if, in fact, any evaluation is performed. Elementary School Programs There is no lower or upper age limit for some form of sensitivity education. Children in nursery schools have been helped to become more aware of the effects of their inter- personal behavior on other children and on themselves. They can be helpful to keep in closer touch with the way they feel about and perceive what other people do to them and what they do to other people (Corey, 1970, p. 240). An intensive group experience can be very involved and sometimes upsetting to the participants, and for this reason many people might feel that young children should not be exposed to it. But applications of sensitivity training have been used in nursery school (Human Development Program, HDP, Bessell, 1968) and elementary schools (MacDougal, 1973) and curricula have been developed in human relations for elementary school children (Van Camp, 1973). In fact, Dinkmeyer (1970, 1972) has developed a packaged program which is used in elementary schools across the country. Developing 34 Understanding of Self and Others (DUSO) (Dinkmeyer, 1970) is a human relations program which includes tapes, pictures, teachers' guide, and other materials for kindergarten through sixth grade. The objectives of the program are similar to those for groups of older participants and include listening skills, self-disclosure, and value clarification. DUSO, HOP, FOCUS and programs that include techniques such as Glasser's classroom meetings (Glasser, 1969 and O'Donnell and Maxwell, 1971) all derive a portion of their practices from sensitivity training. These programs are being implemented in an ever increasing number of schools without any systematic evaluation or research of the outcomes. This causes concern for educators and the public because the answers to the questions; what are my children being exposed to? and what can I expect my child to learn from these groups? are still being answered very subjectively, if any attempt is made to answer them at all. High School Programs. The programs in high school have been used to augment the normal classrooms as they have in elementary schools, but group work has taken on the added dimension of facilitating particular problems of schools or districts such as drug problems or racial problems. In regular classrooms, group work has been used in speech (Heiman, 1974; Galvin, 1974) and English courses (Harrison, 1971; Simon and Sarkotich, 1967). There are also examples of schools for dropeouts which use human relations training and one integral part of their program (Caine and Lindenaver, 1973). 35 There are classes aimed more specifically at human relations and communications skills (Wells, 1970) and some governmental support for high school programs which focus on socio—emotional growth ' (Springport High School, 1967). Specific problem areas in schools are a new curricular development with interracial relations (Curkhuff and Banks, 1970; Price, 1969) and drug education (Deardon and Jekel, 1971; Southern Regional Education Board, 1974) receiving the most attention. Human relations training and sensitivity education has been used in a number of settings in high schools across the country. Many personal accounts appear in professional teacher journals which indicate the variety of uses for group work. But other than questionnaire data from the participants or subjective observations of the teacher, very little has been done in the way of evaluation. High school programs exist but at this time no definitive statement can be made in relation to their effectiveness. College Programs Outside of teacher education, a number of college related studies have been reported which deal with sensitivity training and human relations programs. The first course to implement sensitivity training occurred at Harvard in the early 1950's (Mann, 1967). These first studies generated a number of research reports (Bales, T950; Hore, 1973) that laid the groundwork for later research. Today the vast majority of college and universities have some type of intensive group experience available, some of them similar to 36 the program at Antioch College (Solomon, et al., 1970) which involves a cross disciplinary approach including social psychology, drama, and speech. At Antioch, as with many college programs, the evaluation was somewhat systematic but the first paragraph of summary of research findings forewarns the reader of the problems to be expected: Our research measures and findings have been limited in their applicability and usefulness. This is partly the fault of our practice and partly due to the lack of valid and reliable testing instruments (p. 59). Other areas in which sensitivity training have been used in college environments include training programs for counselors ((Schroeder, et al., 1973; Perkins and Atkinson, 1973; Dendy, 1971) and counseling of patients (Arbes and Hubbel, 1973). Studies such as those have used global ratings of empathy and understanding as their dependant measure (e.g., Empathetic Understanding [EU], Carkhuff, 1969a) or they have used self-report or attitudinal changes as their measure. The choice of these subjective measure- ments resulted from the lack of established objective instruments. These difficulties will also be noted in carefully planned in-service and pre-service teacher education programs. The lack of adequate measurement tools has impeded the needed evaluation and research on the outcomes of group participation. In-Service Teacher Training. In-service teacher education programs which involve sensitivity training techniques come in all shapes and sizes. From an uncontrolled study of three small Manitoba (Canada) High Schools (Benmen and Capelle, 1971) to an in-depth controlled study of the teaching-learning process, done 37 in conjunction with a major university (Bowers and Sour, 1961), many school districts are using group training techniques to improve their staff relations and their teachers in-class communi- cation skills. Perhaps the most interesting aspect to examine in reviewing these studies is the range of subjectivity in the evalu- ation techniques of the programs. The Buffalo (New York) Board of Education's final evaluation of their Human Relations Education Project (1970) will give the reader some insight into the unspecific nature of many of the evaluation reports (and perhaps the programs). To begin with, the report offers no definition of human relations as implemented in their program (one is unsure if the program involves racial relations, communication skills or some other objective), and the reader is further confused by "the specifically stated objectives" which are, "Teachers will assert increased awareness of the importance of human relations in their own lives and the lives of their students.” These types of program objectives are not uncommon, as the programs of both West Virginia (Forman, 1968) and Tennessee (Khana, 1969) have similar objectives which focus on the "awareness of the need for human relations." The unspecific nature of many of the programs may be a result of the infancy of the field. But, if sensitivity education and human relations programs are to improve, formative evaluation must be undertaken and that must be attempted based on the goals of the programs (i.e., specific objectives). --Sensitivity training has been used in a variety of edu- cational settings (e.g., junior college staff, Keile and Gallessieh, 38 1971), but the majority of the reported programs have involved princi— pals and/or teachers. A number of these programs report no evaluation (Hendrickson, 1968 and Kimple, 1968, 1969, 1970) or a limited ques- tionnaire evaluation (McElvaney, et al., 1967), but an interesting number of programs are including formal albeit subjective evalu- ation programs. Before examining the evaluation programs, it may be import- ant to make a more definitive statement concerning the subjective- ness of evaluation. The author uses three criteria in assessing the subjective nature of a study. The first criterion to apply involves the subjective nature of the data: is the reported data personal opinion or fact based on systematic behavior observation? A large portion of the studies already cited include opinion data (e.g., the teacher noticed that the students got along better) and are questionable because of the probable biased perception of the reporter (i.e., s/he sees what s/he expects and wants to see). The data must also be considered subjective whenever the data are of a self-reported nature. This is particularly important with volunteer participants in sensitivity training, as they may have expected to gain from the experience and, therefore, perceived the gains they expected. The second criterion in assessing studies is the subjective nature of the methodology. Campbell and Stanley (1963) refer to the experiment ". . . as the only means for settling disputes regarding educational practice . . ." (p. 2) and list various factors jeop- ardizing internal and external validity. Although it may be too 39 critical at this time to judge sensitivity training research by all the standards pr0posed by Campbell and Stanley, some of the criteria are directly relevant to a great deal of the research being con- ducted in this area. Many studies of sensitivity training do not employ a control group (Harrison, 1971 and Diamond and Shapiro, 1973), therefore eliminating the possibility of any comparison (would they have changed if they had received no treatment or a placebo treatment?). A second area of concern is the differential selection of subjects or lack of random assignment to control and treatment groups. As noted previously, this is very important when using volunteer participants. The third methodological criterion is related to the measurement of the data; is the instrument being used reliable and valid? This is of particular concern when the instrument is made expressly for the study and no data on reliability or validity are reported. The third criterion in assessing the subjectiveness of the evaluation relates to the type of research or evaluation being per- formed. Dunkin and Biddle (1974) point out the four possible variables in educational research; pressage variables (formative experiences, training experiences and personality characteristics), context variables (conditions to which the teacher must adjust, i.e. environment), process variables (the actual activities of classroom teaching, what teachers and pupils do), and product variables (the outcomes of teaching, the changes that come about in pupils). These four variables can be combined to examine a number of cause-effect relationships. In relation to pressage, process, and product 40 variables sensitivity training has been lacking because researchers have not examined the behaviors being demonstrated in groups nor the behavioral outcomes of the groups. In training teachers in communication skills these three variables are closely related and can all be measured in the same way. The pressage variable involves the teacher's training experiences and should be measured by examining the teacher's behavior in training groups. The process variable includes teacher's and student's in-class behavior and must be evaluated by measuring the communications being used in the classroom. The product variables are the changes (or lack of changes) in student's communication skills and must be behavioral measures of student's behavior. To effectively compare these three variables a researcher must be able to measure in-class behavior, whether that class is in a college, in-service teacher training, or elementary classroom. This is where most research on sensitivity groups is lacking; they do not examine in-class behaviors. Dunkin and Biddle (1974) point out the problem as related to teacher effectiveness research: Perhaps the most significant shortcoming of these early studies is that they assiduously avoided looking at the actual processes of teaching in the classroom . . . if teachers do vary in their effectiveness, then it must be 'because they vary in the behaviors they exhibit in the classroom. To shed light on this point, one must study classrooms--where the action actually is (p. 13). This problem is as prevalent in sensitivity research as it was (is) in teacher effectiveness research. To use a pressage variable (sensitivity training) as an independent variable (in many cases an undefined variable) and then expect a significant 41 change in a product variable (student opinion or behavior) requires an extremely powerful treatment and an equally sensitive measurement instrument. When one considers that most sensitivity research is basically exploratory and most of the measurements are crude by almost any psychometric standard, non-significant results should be expected. The cause-effect relationship is strained by pressage- product research and without the experimental controls called for by Campbell and Stanley (1963) the research results begin to look quite subjective. In teacher effectiveness studies the results of pressage- process research has not revealed training experiences to have as great an impact on teacher in-class behavior as might be expected and process-product research, when it is undertaken, is equally discouraging (Dunkin and Biddle, 1974). But to make the jump from pressage variables to product variable with very little experimental control is mostly a subjective leap of faith. This will be explored more fully in the section on research, for now the studies will reveal many of the shortcomings. A doctoral dissertation by Bailey (1967) clearly points out the pressage-product difficulty. He studied the effects of sensi- tivity training upon a high school faculty using student perceptions as measured by the Student-Opinion Questionnaire as the dependent variable. The design of the study controlled for most sources of invalidity, as it followed Campbell and Stanley's (1963) "Non- equivalent Control Group Design" and included two post-tests, one approximately one month following the sensitivity training and a 42 second post-test three months after training. The main thesis of the study was, "If change in teachers is observable by and has an effect upon the students, then change should be recorded by students. The students are the product of the educational process and should be one of the eventual criteria for evaluating in-service programs" (p. 9). The Student-Opinion Questionnaire had ten objectively scored items and the four hypotheses were based on the data from these items. All hypotheses were found not to be significant. All items on both administrations of the instrument were investigated for differences between the experimental group and the control group. Of the twenty comparisons, one significant difference was found, "ability of teacher to explain clearly," on the first post- test. In explaining the non-significance, the author stated four possible reasons: (1) there was no change as a result of sensi- tivity training, (2) the laboratory was not long enough to bring about change, (3) the students were unable to perceive change if it did occur, or (4) the instrument was not sensitive enough to the change if it was perceived by the students. As a final word the author said, "If one accepts other research that has demonstrated positive changes as a result of sensitivity training and the posi- tive reactions by the teachers following this laboratory, it may be suggested that the explanation for the lack of significant differences may be related to the instrument" (p. 108). The questions raised by this study do not relate to the subjectivity of the data (the reliability and validity of the instrument are substandiated) or the methodology, but rather the type of research 43. involved. Bailey offered four explanations for the non-significance of the results but because of the type of research he chose it is not possible to say which explanation is the most plausible. The study is about behavior change in a faculty as a result of human relations training, but there is no measure of any behavioral changes. The in-class process is missing. .We have no measure of the behavior of the individuals while in the sensitivity training group, no measure of those individuals' subsequent behavior in the classroom, and yet researchers want to know if those hypothesized changes effect students who are supposed to observe those hypothe- sized changes. Sensitivity training research is in its infancy and must be measured one step at a time. More sensitive tools are important, but an equally important question is--sensitive to measure what? Pressage-product research is used frequently with in-service training programs. Some programs using sensitivity training report significant findings; others seem to rationalize their non-significant findings away. Schmuck (1967, 1968) found significant and positive changes in students' perceptions of classroom groups, their own status and influence, attitudes, and friendship patterns. Nelson (1969) found no significant results and points out the distance between training and the student product, "The tests of student anxiety, alienation and opinion surveys are perhaps not germain to an assess- ment of the kinds of changes human relations training can effect in a short term project" (p. 31). 44 In various types of sensitivity training from business to education, a common dependent measure is ratings by peers, co- workers or supervisors of perceived changes in behavior following a sensitivity training experience. In two studies at NTL, Miles (1960, 1965) found perceived change as reported by participants and job associates in relation to "listens more," "communicates better," and "shares decisions," for elementary school principals. Krafft (1967) studied the changes in behavior, due to a human relations laboratory, of secondary school seminar instructors. He found no instrument to measure their behavior and chose instead to measure their behavior by the perceptions of the participant himself, a co-worker, and the principal of each subject. He collected the data by interview, but had difficulty because the subjects, co- workers and principals knew the identity of the experimental and control group. The experimental subjects knew what behavioral changes the interviewer was interested in and the principals talked almost exclusively about the experimental subjects. This points out the difficulty in perceptual data; the subjects, and those they frequently come in contact with are sensitized to the desired changes. They expect changes and their perceptions may simply be revealing those expectations. Data which are based on behaviors and not perceptions will necessarily be more objective and valid. Values and attitudes is another frequently used dependent variable to measure changes in teachers due to a human relations experience. Benmen and Capelle (1971) found high school teachers to improve their self-actualization, attitudes toward educational 45 process, and values of inclusion and affection as measured by the Personal Orientation Inventory (POI) (Shostrom, 1964), Educational Process Opinionnaire (EPO) (Wehling and Charters, 1969), and Funda- mental Interpersonal Relations Orientation (FIRO-B) (Schutz, 1958) respectively. Lee (1967), in a study of the effectiveness of a human relations training program for in-service teacher training, found that teachers' attitudes (towards pupils in interpersonal relations and teaching as a vocation) increased as measured by the Minnesota Teacher Attitude Inventory (MTAI) as a result of sensitivity training. In another in-service training program undertaken to examine teacher attitude change as a result of sensitivity training, Sweeney (1969) found elementary and secondary school teachers to score significantly higher on the MTAI after training than did a control group. Two statements by Sweeney (1969) point out the possible misapplication of research efforts in the area of sensitivity education: Teacher-pupil attitudes are simply indicators of the teacher's classroom behavior and the mere introduction of better attitudes by instruction may not produce any change in behavior (p. 4). But then a few pages later he seems to contradict himself: The study focus was on teacher attitudes. What is needed, among many other possible approaches, is a focus on pupil perception of the teacher prior to T-group sensitivity training and then after the experience. It may be that the learning which the teacher experiences, the insight, the awareness, etc., may not always be brought out from the group experience to the classroom (p. 7 The application of learning, which occurs in a sensitivity experi- ence, to the classroom is the goal of in-service group experiences. If teachers cannot apply what they learn in an in-service workshop, 46 the workshop has failed them. The learning should involve changes in behavior and, if the researcher wishes to examine the changes in teachers' in-class behavior, he would do well to measure that behavior, not the student, co-worker, or principal's perception of that behavior. Contradictory statements such as those by Sweeney (1969) are disturbing: perhaps the absence of adequate instruments to measure behavior encourages researchers to examine non-process variables. Pre-Service Teacher Training The new trends in education have implications for present and future teachers, as well as teacher educators. Teachers must be trained or assisted to assume their new roles com- fortably and effectively. They need to be oriented toward working more with smaller groups and individuals; they must be trained in the skills needed to function within this orientation. . . ." (Crist, 1972, p. 73). Since the first college program at Harvard (Mann, 1967) in the early sixties, the use of sensitivity training on the college campus has expanded enormously, particularly in colleges of teacher education. In this review alone the programs of research of approximately twenty institutions will be cited and one would suspect that for every reported university or college program, numerous programs exist which have no published results. The number of programs is substantial and multiplying every year because of the intensified interest in their use for the personal demands of teachers and for the facilitation of elementary and secondary school programs which include human relations and com- munication skills. The new trends in education that Crist (1972) speaks of exist in every type of school environment and at every 47 age level, but teachers do not have all the necessary skills to implement these programs. A school district can purchase a DUSO kit for every classroom, but, unless the teachers are prepared to use those materials, they will be wasted, either because the teacher passively resists using them because she has never been introduced to them (a quite common practice) or because she at- tempts to teach a subject matter (human relations) she is unfamiliar with and fails. "New math" was (is) not as successful as it could have been because teachers were not prepared to use it in their classrooms despite the fact that almost every teacher had been instructed in some form of math education course. Human relations is not a new version of what teachers are already teach- ing; it is a new curriculum that some teachers have never heard of, much less taught. In-service training is one way of intro- ducing the in-class teacher to human relations, but, if these programs are to be implemented successfully, future teachers must be instructed in the knowledge and techniques needed so they will be comfortable and effective in their own classrooms. This section will begin with an overview of the programs in pre-service teacher training, noting some of the program evaluation being conducted. That will be followed by a more in-depth look at four particular programs, Minnesota, University of Massachusetts, Carkhuff's human relation training, and the program at Michigan State University. In an article entitled "Sensitivity Training: Solution or Conspiracy?" Wiggins (1970) examines the benefits of sensitivity training and some of the deficiencies in school programs. He notes 48 that NTL reports at least eight graduate programs which include sensitivity training and that programs at every educational level are investing money and time in programs and training. However, many of these programs have encountered troubles: unclear or non-existent objectives, poorly trained personnel, the lack of research and evaluation to establish the programs as beneficial to the participants. All these difficulties have brought sensi- tivity training to the point where Wiggins (1970) feels that the role of training in education must be reevaluated. He suggests that the status of sensitivity training in schools would improve if: (1) the term sensitivity training were replaced with hgman_ relations training, (2) standards for trainers were developed and enforced, (3) "Human relations training were used only when clearly defined goals and behaviorally defined objectives are established," (4) "Research could be conducted to provide empirical evidence as guide posts to direct application of human relations training," and (5) "Evaluation models to assess the results of training programs could be developed" (p. 257). These suggestions can be used to examine some of the human relations programs that schools of teacher education offer. Human relations training is essentially a subjective experience. Researchers of sensitivity training have consistently encountered difficulty in describing or having others describe such an experience (Lieberman, et al., 1973), and one seldom finds a group experience which will describe its goals any more clearly than the goals at NTL: (1) self insight, (2) better 49 understanding of other persons and awareness of one's impact on them, and (3) better understanding of group process (Sweeney, 1969). Considering this type of subjectivity, it is not surpris- ing to find that most human relations programs have a subjective goal such as increased awareness or no stated goal at all. The Syracuse University Model Elementary Teachers' Education Program (Benjamin, et al., 1968) had as its goal for a teacher the in- creased awareness of and sensitivity to him/herself as a: (1) person, (2) teacher of children, and (3) member of the educational system. This is at least as specific as a number of other programs in teacher education (Central College, Roelofs and Sears, 1971; Carnegie-Mellon University, Borke and Burstyn, 1970; Lehman College, O'Hare, 1968; and University of Maryland, Baltimore County, Calliotte, 1971). . The contrasting approach to a subjectively defined experi- ence is proposed by Egan (1970) who suggests contract groups as a structured approach to encounter groups. Contract groups define for the members the expected outcomes of the group and a broad boundry for their behavior. He asks participants to engage in the following kinds of activities; support, self-disclosure, express feelings, confront others, and respond to confrontation (all of which are defined). His thesis revolves around the con- tract which he sees as having research potential because it points out the behaviors of interest. He feels the contract defines categories that can be used in a scoring system which could be used for research and evaluation. In examining research, Egan 50 (1970) feels that many of the non-significant findings in the sensitivity movement would better be labeled irrelevant because the measures have so little relationship to the expected outcomes of the groups. He further points out the relationship between clearly defined goals and research and evaluation: Perhaps it is time to review the criteria we use to judge the success or failure of sensitivity-training experiences. If measurement is to have any meaning at all, it is necessary to delineate clearly the specific goals of any laboratory experience, to determine what means are associated with achieving these goals, and to devise measures to determine whether these goals have been reached or not. Perhaps the criteria we have used to measure success or failure have been too gross or have not reflected the real goals of the experience (p. 366). The specification of the goals of an experience such as human relations training is difficult due to the complexity of the behaviors involved and the variance of the experience itself. Movement has been made toward specifying human relations goals in behavioral terms in programs such as Northwest Regional Edu- cational Lab (Wallen, 1968), Indiana University at South Bend (Peterson, et al., 1973), University of Illinois (Gross, et al., 1971) and others but it should be noted that in the majority of the published reports of teacher education programs no mention is made of specific behaviors as outcomes of the programs. It should also be noted that the research and evaluation of these programs does not aid the reader to any great extent in evaluating the effectiveness of human relations programs, particularly in relation to teachers' in-class behavior. Although no correlation is necessarily established because of this trend, it might be 51 said that some credence is lent to the previous quote from Egan (l970), evaluation may depend on specifically defined outcome behaviors. In the section to follow, four programs will be explained, a state program, two university programs and a number of uni- versity programs related to the human relations training model developed by Carkhuff. Minnesota Human Relations Regpirement In 1971 the State Board of Education of Minnesota adopted EDUC 521, a human relations component in all programs leading to certification in education. As is the case with many university programs, the goals of this state program are open ended (Hatfield, 1972), including: knowledge and understanding of racial and cultural differences, the ability to recognize one's own atti- tudes and feelings, ability to create learning environments conducive to successful experiences, ability to communicate effectively with all pupils, and ability to express and encourage others to express honest emotions and understand the effect of one's behavior on others. Since the adoption of EDUC 521 very little reported research has been conducted. Carl and Jones (1972) reported on a study to determine the effects of the program on teachers but the extent of their evaluation was a questionnaire at the conclusion of the workshop. The questionnaire discovered that the participants .felt the experience was helpful in understanding other people's feelings, and Jones (1972) report on a study to determine the 52 effects of human relations training on teachers. The extent of their evaluation was the administration of a questionnaire at the conclusiOn of the workshop. The questionnaire discovered that the participants felt the experience was helpful for under- standing other peoples' feelings. A study at the University of Minnesota (Thorman, 1971) examined the effectiveness of four methods of.training pre-service teachers in interpersonal skills. The study involved a hundred education students randomly divided into four treatments: (1) control, (2) academic study of interpersonal relations, (3) T- group, and (4) work with school children. The dependent measures of the study were (1) MTAI, (2) Behavioral Inventory of Interpersonal Skills (part I, student rates him/herself; part II, a friend rates the student's interpersonal skills), (3) FIRO-B. The results were not significant although self-report questionnaires showed students' attitudes toward T-group and child experiences, direct experiences (face-face) with people to be more valuable than academic experiences with the same objectives. The findings of the study led to the clear recommendations by Thorman (1971). "(1) present programs for training prospective teachers in inter- personal relations should be subject to close scrutiny, and (2) efforts to construct instruments which are increasingly sensitive to the objectives of interpersonal skills training should continue . the results of the study confirmed the need for instruments specifically related to the situation being evaluated" (p. 22). 53 The Educ 521 Human Relations Component is a requirement that may become a standard part of teacher education certification (Wisconsin has a similar plan). The more these programs are exposed to the public light, the more important close scrutiny will become. This will make Thornman's recommendation for instru- ment development all the more important as teacher and educational accountability must include evaluation of all programs. University of Massachusetts: A Behavioral Objective Curriculum in Human Relations The Model Elementary Teacher Education Program (METEP) at Massachusetts (Allen and Cooper, 1967; Ivey and Rollins, 1970, 1972; Ivey, et al., 1970) is one of nine proposals for elementary teacher funded by the department of Health, Education and Welfare, Office of Education, in 1968. At least six of the nine proposals contain a component or module that is directed towards human relations, sensitivity training or communication skills (Fattu, 1968). Of the nine funded proposals, the University of Massachusetts is perhaps the most visible and includes more focus upon human relations training. The program attempts to teach pre-service teachers the possible options in three areas: content knowledge, behavioral skills, and human relations skills. This review will focus on the human relations skills. A The program is committed to teaching specific behaviors the teacher should be able to engage in using specific behavioral objectives and performance criterion in evaluation. 54 The human relations portion of the METEP curriculum is called Human Interaction (HI). It is written from a behavioral frame of reference building from traditional human relations, sensitivity training, and behavioral psychology. The teacher trainees participate in a "Do-Use-Teach" program in which they show they can demonstrate (do) the skill, then practice (use) it in their lives, and finally they must teach the skill in the university laboratory school. The program defines, with behavioral objectives, the skills of relaxation, listening (attending behav- iors) and non-verbal communication. The reported evidence of evaluation of the Human Inter- action program consists of one experimental study (Iver and Rollins, 1970). The design included random assignment to treat- ment and control with pre- and post-testing of both groups. The treatment consisted of the "Do-Use-Teach" program including four hierarchies: relaxation, non-verbal awareness, attending behaviors, and decision-making. Two instruments were selected for each hierarchy: one to measure attitudinal changes (a semantic differential), the second to measure changes in skill level as a result of the training. An additional instrument was used to examine the subject's discrepancy between his/her self-concept and his/her goal-self-concept. Each instrument used (all of; which had been developed prior to the present study except one) had a reliability of better than .80. Each instrument was used as a pre-test and post-test for each of the hierarchies. The data were collected in settings other than the Human Interaction 55 groups, and it is not clear if the testing environments relate to groups or interpersonal interaction. The results indicated no change in self-concept discrepancy; but for the two measures of each of the hierarchies, significant changes in the treatment group's attitudes (all except attending behaviors) and skills (all except relaxation) were demonstrated. In the conclusion, the authors make the following recommendation, "The study ought to be seen as an observational study of a human relations program that was performance based. What needs to be done is a repli- cation of this study in which more precise instrumentation is used. . . ." (p. 65). This study begins to show the effectiveness of a human relations training program, particularly in relation to attitude changes. But the measurement of skill acquisition requires closer scrutiny. The measure of relaxation was based on reading errors due to delayed auditory feedback (a secondary measure of anxiety) and the dependent measure of decision-making was a paper and pencil test. Both non-verbal awareness and attending behaviors were measured by an observation system designed specifically for measuring those skills, but the stimulus and environment in which those skills were demonstrated is not defined. More precise instrumentation is needed, but it is also imperative that the environment in which the data are collected be more precisely defined. It is also important that that environment closely approximate the environment in which the student is expected to display the acquired skill. This may 56 mean that data should be collected on the pre-service teacher in groups and/or in the classroom. Carkhuff's Systematic Human Relations Training Model The most systematically designed and thoroughly researched teacher education program in human relations is the Systematic Human Relations Training Model (SHRT) at the University of Georgia (Gazda, et al., 1973). Based on the model developed by Carkhuff (1969) for lay and professional helpers, this program has at its foundation Rogers' (1957) therapeutic concepts: accurate empathy, non-possessive warmth, and genuineness. The human relations training classes are small groups of approximately ten students who meet with a facilitator for two hours, once a week for ten weeks. The course is quite structured and is theoretically devided into three phases (Gazda, et al., 1973a, l973b) which introduces and requires mastery on the following skills: phase l--empathy, respect, and warmth, phase 2--concreteness, genuine- ness, and self-disclosure, phase 3--confrontation and immediacy. The entry level of the students is assessed by a modified version of Carkhuff's (1969) communication and discrimination indexes. The global rating of responses (Gazda, et al., l973a, p. 96) is used to analyze and assign a rating to any helper response. Each of the eight dimensions (empathy, etc.) also has an individual rating scale, similar to the global scale, which is used in instruction to aid students in discriminating facilitative 57 responses and also a communication scale to rate their own responses as a helper in helper-helpee interaction. The program is systematically designed to train teachers in the counseling skills which Carkhuff (1969) and others have shown in research to be effective in the helping relationship. The emphasis is on training. While many human relations programs focus on here-and-now feelings and personal awareness, this is not the goal of the Systematic Human Relations Training Model (SHRT). Rather the goal is to have pre-service teachers leave the experience with a set of counseling type skills which they may use in one-to-one teacher-student situations. There is a great deal of research related to SHRT that has examined many aspects of education and related fields. The results have generally been very supportive of the program. Research by a number of authors in the 1960's showed a high correlation of empathy, warmth, and respect with various measures of teacher behavior and product outcomes. Those students become the impetus for developing a program such as the SHRT. Dixon and Morse (1961) found teachers identified by pupils as "more open" to be significantly more empathetic, warm and respectful. A number of authors (Cogan, 1958; Christianson, 1960; Solomon, et al., 1964) found teacher warmth related to general pupil achievement. In a number of related studies by Aspy (Aspy, 1965; Aspy, l969;_ Aspy and Hadback, 1967), reading achievement in elementary students was found to be related to high levels of the facilita- tive dimension. Other studies have shown the facilitative 58 dimension related to students' in-class behavior in pre-school adjustment (Truax and Tatum, 1966) and for children with behavior and academic problems (Staffer, 1970). Other studies which show similar results for other student populations include Hefele (1971) with deaf children and Pierce and Schaubel (1970) with graduate student counselors. Since 1970 a number of studies have evaluated the effects the SHRT had on pre-service and in-service teacher behavior. Berenson (1971) studied the effects of SHRT on student teachers' behavior using a number of dependent measures (Carkhuff's index of responding, a classroom supervisor rating form, the Teacher Situation Reaction Test (TSRT) and Amidon and Flanders Interaction Analysis). The experimental design included an experimental group which received SHRT, a training control group which received didactic instruction in human relations training, a "Hawthorne" effect control group, and the control group proper. The SHRT experimental group showed significant results in: (1) higher levels of helping as measured by the written index of responding, (2) the assessment by classroom and college supervisors, (3) solving problems as measured by the TSRT and (4) differing from the control group in classroom behavior as measured by an inter- action analysis (more positive reinforcement, less criticism, less emphasis on subject matter). Other studies using the Index of Responding (Global Scale) have shown significant gains in discrimination and communication of the facilitative dimensions 'for pre-service teachers (Bixler, 1972; Balzer, 1973; Hornsby, 59 1973), in-service workshops (Taylor and Barnes, 1970) and at other universities (University of Maryland, Baltimore County, Calliotte, 1971, and Boston University, Marshall, 1970, and Hartzell, et al., 1973). The SHRT model has been researched by educators for a number of populations examining the effects of group composition (Hornsby, 1973) and other training variables. A majority of these research studies use the Global Scale (Gazda, 1973) as at least one of their dependent measures. This measure, with modification, has been used with success since the late 1960's, but it has some shortcomings when Used in an educational setting. The SHRT model attempts to train teachers in specific skills, but the ultimate goal is that they use these skills in the class- room. But the Global Scale cannot be used in a natural environ- ment. It is designed to measure single responses to a helpee stimulus, and the classroom environment is more complex than that. Classroom interaction includes statements which are uncodable when using the Global Scale. As was noted in reviewing the research on the Massachusetts program, it is important to examine the product outcomes of a program in terms of the teacher's in- class behavior. The Global Scale seems to be incapable of categorizing classroom behavior. It is designed for testing and perhaps with modification could be used in one-to-one counseling-type interactions, but classrooms and group inter- actions involve more complexity than that. 60 Another difficulty with the Global Scale is that it is a high-inference scale. High inference scales are composed of codes which are not denotable or countable behaviors (Rosenshine and Furst, 1973). An examination of the scale (Appendix C) reveals the inferential nature of coding the categories and the footnote points out the possible subjectivity involved in coding (i.e., how is a coder to interpret "the rater must be guided by the level(s) of the condition(s) that are offered or withheld in the helper's response?"). The high.inference nature of the categories is shown in another light by the results of studies by a number of authors (e.g., Muehlberg, et al., 1969; Kiesler, et al., 1967). In examining empathy, positive regard, and congruence, the studies challenged the independence of these scales. A global therapist quality or "good guy factor" was found which accounted for nearly 90% of the variance among empathy, regard, genuineness, concreteness, and self-disclosure. Two explanations could account for these high correlations: therapists high on one dimension are high on all dimensions or the dimensions are not separate. The second explanation could relate to the high inference nature of the scales. An examination of the scales for the eight skills (see Gazda, 1973) reveals a striking similarity. In a study by Childers (1973) of the effects of the SHRT model on student teachers' in- class behavior, the need for a low-inference observation system for group environments is pointed out. Childers (1973) found practically no significant results and, in his recommendations for further research, states: "A more sensitive instrument 61 should be developed that will more directly reflect differences in communication style" (p. 72). Michigan State University--Interpersonal Process Laboratories The human relations program at Michigan State University is part of an introductory educational psychology course, The Individual and the School (Educ 200) which focuses on socio- emotional education. The course is divided into three inter- related parts: the carrel portion which involves the cognitive tasks of teaching concepts (e.g., assessment techniques, respondent learning, etc.), the large group presentation which is a lecture presentation of relevant issues in education, and the Inter- personal Process Laboratory (IPL) which involves the presentation, demonstration, and practice of interpersonal communication skills. The IPL sections of the course consist of approximately fifteen students and one instructor. These sections meet for two hours, twice a week for the entire term (ten weeks). In these sections the instructor presents and explains the seven objectives of the IPL (see Appendix D) to the students and discusses their value and implications for personal relationships in general and for class- room teaching. The major purpose of the IPL section is the practice and demonstration of the seven objectives. That is, the instruc- tor's responsibility is to facilitate and evaluate the students' mastery of the interpersonal communication skills. This is done through the use of strategies similar to those used in sensitivity groups (Lopis, 1973). Each instructor is free to use whatever 62 strategy s/he wishes (or no structured strategy) to facilitate his/her students mastering the IPL objectives. The course is based on a mastery model and is graded on a pass/no credit basis. To receive a pass, each student is required to "master" each of the IPL objectives. The evaluation of the students rests with the IPL facilitator who is required to prepare a "feedback sheet" for each student twice during the term. The feedback sheet is composed of various behavioral indicators for each of the objectives on which the facilitator rates the students' com- munication skills. The course is behaviorally oriented, emphasizing the pre-service teachers' understanding and demon- stration of specific interpersonal communication skills to aid the in-class teachers to communicate with those around them in both cognitive and affective domains. A number of research reports have been written concerning the entire Ed 200 course, but very little of the research focuses solely on the IPL phase of the course. In a study of attitude changes as a result of the Ed 200 course, Stiggins (1972) found significant attitude changes using a semantic differential pre- test, post-test design. Using the evaluation, potency, and leniency dimensions, Stiggins (1972) found the carrel concepts (e.g., shaping behavioral objectives) to change meaning more significantly than the IPL.concepts (e.g., questioning and listening skills), although most concepts became more valuable, potent and lenient. A student questionnaire study by Schulman and Byers (1974) examines the entire Ed 200 course, but focuses 63 on the laboratory experience. The questionnaire form was used to gather data on the IPL because of "the lack of adequate alter- native means of gathering this data" (p. 1). Results showed over ninety percent of the students felt the course increased their ability to teach; over sixty percent felt the course increased their desire to teach; close to seventy-five percent responded that they would participate in an IPL even if it were not required; and seventy percent said they would like to participate in an advanced IPL. Using a questionnaire sent by mail one year after their completion of Ed 200, Radke (1975) studied the possible benefits or harm to IPL participants. Using a random sample of twenty- eight respondents, he found two possible casualties (perceived harm, present and past) and fourteen students who perceived growth present and past as a result of participating in an IPL experience. A study by Schulman (1974) examined facilitator grading and decision-making. She found facilitator grading decisions to vary widely which confirmed a theory that a student's chances of passing vary depending upon the instructor that student was assigned. She theorized that this was a function of either (1) the instruc- tional skills of the facilitator or (2) the varying criteria used by different facilitators. This presented a problem which could not be solved because (1) "there are no objective criteria for determining TA (facilitator) competency levels" (p. 12) and (2) "there are no objective measures of student performance against which the accuracy of TA criteria can be compared" (p. 12). 64 The human relations program at Michigan State University encounters the same evaluation difficulties as many other similar programs; one must use a questionnaire form (and accept biased perceptions of students), develop an observation system (and accept questionable reliability and validity), or engage in no evaluation and rely on positive comments by enthusiastic students to show the value of the program. These problems lead us to an evaluation of the research that may begin to assist the person charged with the evaluation of a human relations program. Human Relations Research and Observation: Problems and Solutions And the infrequency with which change in teacher and pupil behavior has been the criterion in educational research seems notable, when change in behavior is the goal of education. Much of the available research has suffered from the lack of a planned and coherent design. Faith in laboratory train- ing has sometimes depended on questionable data; measures of known validity and reliability have often been lacking; and reliance, sometimes of necessity, has been placed in ques- tionable self-ratings, loosely and hurriedly constructed self- report inventories: ratings completed by individuals who have little or no opportunity to observe behavior adequately and hard-to-interpret unquantifiable projective devices (Bowers and Sears, 1961, p. 154). I This chapter has pointed out a number of human relations, sensitivity, and/or encounter group programs in education and some related fields. Most of these programs have reported serious obstacles in evaluating their effectiveness. This section will review some of the difficulties encountered in researching group work in education, looking particularly at two general barriers that the field must grapple with if it is to show the potential of these programs in the schools. The quote by Bowers and Sears 65 (1961) points out the two hurdles which must be cleared if evalu- ation and research in human relations is to progress: what should ~researchers measure and how shall they measure it? This section will begin with an examination of the need for goals and objec- tives in human relations programs. That will be followed by a brief review of the measurement problems associated with subjective data and secondary data. Types of behavioral measures will then be examined including an introduction to the observational devices used to measure group participants' behavior. As has been noted previously in this review, sensitivity training has been viewed primarily as a subjective experience. The majority of the research in the field has accepted that premise as a given, and this may have been the basic problem that undermined many studies. But this premise is no longer viewed as tenable in education or other fields. Campbell and Dunnette (1968), in a report on industrial T-group experiences, point out three major problems facing T-group research; (1) lack of theory which relates to change, "Presently, it is unclear what kinds of outcomes to expect from any specific T-group effort," (p. 79); (2) the difficulty in relating learning in training groups to organizational settings--what is transfer and how do you measure it? and, (3) the measurement problem is compounded by the slippery notion of "interpersonal awareness." In summary, Campbell and Dunnette (1968) state, "Research must devote more effort to specifying the behavioral outcomes they expect to observe as a result of T-group training" (p. 68). 66 The ambiguity in goals, training methods and evaluation causes confusion in the consumer since there are conflicting interpretations of the same research data. Proponents of sensi- tivity training will find that evidence is supportive of a hypothesis that training leads to behavioral change. Critics will review the same results and find no indication of change (Barber, 1969). Many changes are needed to alleviate this problem, but two of the most basic are adequate specification of the independent and dependent variables (Diamond and Shapiro, 1973). "In light of the multitude of critical-parameters then, the use of generic terms like 'sensitivity,' 'encounter,' and 'T-group' are inadequate as defining operations. At this stage, it becomes most important for researchers and theoreticians to isolate and specify exactly what goes on in their groups" (p. 2). In relation to dependent variables, it is equally important to employ dependent measures specifically consistent with the group goals. Once the goals have been behaviorally defined and the nature of the training has been revealed, the next issue which ‘ must be examined is the measurement of those goals. Since the goals of the training will involve the behavior of the participants, one method of examining the appropriateness of the measurement tool will be to judge how removed the actual data is from the participants' behavior. ‘-Previously discussed studies involving self-reported perCeived change (Bunker, 1965; Miles, 1960 and 1965; and Sperber, 67 1972) are examples of subject bias contaminating the data. Peer- reported changes (Miles, 1960, 1965, and Kraft, 1967) or student perceptions (Bailey, 1967) suffer from the same problem of perceptual distortion. These approaches to the measurement of behavioral changes seem sound and, since the results are usually encouraging, their use will probably continue. But the effects of perceptual bias are so strong that no conclusions can be drawn from these studies except perhaps the verification of the participants' enthusiasm. ‘ Personality tests and attitude and value inventories are often used to collect secondary data on group participants. Some of the inventories most often used include: Minnesota Teacher Attitude Inventory (Lee, 1967; Sweeney, 1969; Thurmon, 1971), Fundamental Interpersonal Relations Orientation Behavior (Thorman, 1971; Bonmen and Capelle, 1971; Solomon, 1970), Edwards'-- "Personality Preference Schedule" (Solomon, 1970), and the Personal Orientation Inveotory (Banment and Capelle, 1971). The results of these studies have generally been disappointing, although some significant results have been found. Nevertheless, the trouble in interpreting even significant data still exists; as Sweeney (1969) pointed out, teacher attitudes are only indicators of the teachers' classroom behavior and even the most drastic change in attitude may not produce any change in behavior. Studies that examine changes in participants' behavior encompass a wide variety of dependent measures. A study by Schmuck (1967) of in-service teachers' innovative behavior used 68 the number of innovative practices tried out by the teacher in his/her classroom (self-reported) as a dependent measure. He found significant change as a result of the group experience. Studies by Heck (1971) and Hunt, et al. (1969) use a task developed by Hunt (1965) to measure teachers' interpersonal sensitivity and flexibility. The task involved teaching a lesson in which a student acted as though s/he obviously did not under- stand the concept. The criteria used to evaluate the lesson was a measure of the teacher's ability to understand another person's perspective, to approach the teaching task from the child's under- standing. No significant changes were observed as a result of participation in a sensitivity training experience. 'A comparison of the Schmuck (1967) dependent measure with the task developed by Hunt (1965) points out an important aspect of behavioral measures. While the Schmuck measure may seem trite, it has an important attribute. He was using sensitivity training to bring about changes in teachers' use of innovative experiences in the classroom. He found significant results and part of that sig- nificance must be attributed to the fact that he was measuring the behavior he was attempting to teach. A statement by Heck (1971) points out the dilemma of his study, "Another feature of this particular training project was that the sensitivity training program had one primary objective: that being a behaviorally defined skill labeled communication effectiveness, it was important to measure that skill by using a behavioral method" (p. 505). The non-significant results of this study may have 69 been caused by the fact that although the dependent measure was a behavioral measure, it did not approximate the group goal. Measuring a skill using a behavioral method is important, but that dependent measure must be consistent with and a direct measure of the behavioral goal. Direct measures of group outcomes vary in the specificity of their criteria. Meador (1971) reports on the Process Scale developed by Rogers and Rablen (1958) which tends to be very general. The scale measures self-disclosure and the definition of the highest stage will give the reader a feeling for the inferential nature of the scale, "Seventh stage. The individual lives comfortably in the flowing process of his experiencing. New feelings are experienced with richness and immediacy and this inner experiencing is a clear referent for behavior" (Meador, 1971, p. 72). As was pointed out earlier in this chapter, the work of Carkhuff (1969b) and Gazda (1973a) suffers from some of the same subjective scale definitions as Rogers and Rablen's Process Scale. An article by Gormally and Hill (1974) examines the strengths and weaknesses of research on Carkhuff's training model. One of most notable weaknesses of the Carkhuff rating scales is the difficulty in systematizing judgments of helper responses. Carkhuff's scales have been a valuable contribution to measuring group effectiveness, but they also present problems, VFor example, the scale points lack of operational specificity which makes it difficult to maintain objectivity and standardization of scale 70 use in ratings," and this prompts the authors to suggest ". . . other measures should be related to rating scale changes. Con- clusions based entirely on trainee movement on rating scales should be regarded tentatively" (Gormally and Hill, 1974, p. 542). Another difficulty with the Carkhuff scale is the generalizability of data on skill acquisition to real life situ- ational responding (Gormally and Hill, 1974). A training group's growth is normally reported through data collected by the dis- crimination and communication indexes. As stated previously, these scales involve multiple choice tests and written responses to client stimulae. But numerous difficulties have been discovered using written responses as a dependent measure. Researchers (Carkhoff, 1969c, and Greenburg, 1968) found evidence that only highly functioning therapists have high correlations among written, oral and live interview situations. Other studies have demon- strated that trainees can write stylistically correct responses but are unable to respond empathically in interviews (Butler and Hansen, 1973). Gormally and Hill (1974) point out the problem, "Learning to communicate empathically requires a different and more difficult level of skill than writing a response. . . Although written responses are easy and economical to use, they lack generalization to real helping situations: this limits their utility in research" (p. 541). A suggestion by Gormally and Hill (1974) anticipates the need for a category system based on freqUency data and independent categories to be used with group recordings: 71 An alternative to the use of rating scales is to listen to the entire interview and record frequency data, for example, number of responses that identify a feeling, number of nonverbal referents, etc. The responses can be identified for simple presence or absence, and the categories are fairly independent. Use of frequency data reduces the subjectivity involved in rating scale measures (p. 544). Another measurement technique which suffers from the lack of generalizability is Kagen and Krathwohl's Affective Sensitivity Scale (A.S.S., 1967). This scale measures sensitivity by having the subjects identify (by multiple choice testing) the feelings of possible clients who are shown on video-taped vignettes. Several studies have used the A.S.S. to examine changes in individuals' sensitivity following a group experience (Danish, 1971 and Dendy, 1971) and a study by Danish and Kagen (1971) points out one of the difficulties. They found significant positive change in some groups but not all groups and the results left them unsure of the reason for the variance. This leaves two questions; what occurred in the various groups to account for the variance? (a process question) and what do significant results mean in terms of the subjects' "real world" behavior? (does success in identifying feelings on a multiple choice test relate to inter- personal empathy or communications?) The studies reviewed so far suggest two important criteria in the evaluation of behavioral change in group participants. First, the measurement must relate to the goals of the experience. There are two reasons for this; if the researcher is interested in significant results, specific measurement is more likely to produce them (you wouldn't use a test of multiplication following instruction 72 in addition) and also specific measures will be more generalizable to real life setting (e.g., the classroom) if the goals were appropriate. The second criteria is that the technique should include or be capable of measuring the process variables of the group experience. We must open up the black box called group training so that the process variables can be related to the product variables. If participants change as a result of group experiences, what is it that happens in those groups that bring about those behavioral changes? Most studies reviewed so far have used product outcomes as the dependent measure although some have been used to measure process variables. The discussion will now turn to two obser- vational systems that can be used to measure process variables. The Group Assessment of Interpersonal Traits (GAIT) is a report schedule to measure interpersonal skills (Goodman, 1969). The schedule is used in a structured small group situation in which the measurement technique resembles group training. The group is composed of about eight participants and three observers. The participants are asked to write on a card an interpersonal concern which they will voluntarily share during the group meeting. One person volunteers to start (s/he is the discloser) and proceeds to read his/her concern. Another participant may volunteer to engage the person in a five-minute dialogue (s/he is the understander). This continues until every participant has engaged in both roles. The participants and observers are then required to rate all the participants on a six-point Likert-like scale in relation to 73 statements which reflect the following dimensions: empathic understanding, emotional honesty-openness, warmth-acceptance. A study by D'Augelli (1973) reported difficulty in establishing high reliability with the GAIT. For empathic understanding, emotional honesty-openness, and warmth-acceptance, he noted the following reliabilities: observers - .78, .69, .64 and for participants - .61, .48, .35 respectively. .It should be noted that the subjective nature of the three rating categories may have lead to the low reliabilities. These reliabilities along with the structured environment needed to use this technique are definite drawbacks to using the GAIT for group evaluation. A more specific category observation system was developed by Whalen (1969) for measuring group verbal behavior. Her system has raters score all verbal responses into the following cate- gories: (1) personal discussion, (a) personal self-disclosure, (b) immediate feelings, (c) personal questions; (2) feedback, (a) positive feedback, (b) negative feedback, (c) neutral feed- back, (d) accepts feedback, (e) rejects feedback, (f) requests feedback; (3) impersonal discussion, (a) impersonal self-disclosure, (b) extra group process, (c) impersonal questions; (4) group process; (5) descriptive aspects of communicative speech; (6) unscoreable utterances. The reliability estimates were computed for each of the categories individually and the majority of the categories had reliabilities of about .90. The continuous coding by the raters was facilitated by an event recorder which collected frequency and duration data for each of the categories. The 74 frequency and duration data were analyzed but no analysis was reported concerning the order of the events. Whalen (1969) notes the criteria for the selection of the dependent measure: "The classes (categories) were selected so as to include the behaviors modeled in the film as well as those which typically occur in newly formed groups" (p. 511). Recently, the Whalen categories have been combined with the GAIT technique in research on counsel- ing skills (Rappaport, et al., 1973 and D'Augelli and Chinsky, 1974). Because Whalen's categories are appropriate for newly formed groups (strangers), they may not all be appropriate for evaluating the outcomes of groups composed of individuals who have met for longer periods of time. But the promise of a category procedure which uses the goals of the human relations programs as the categories needs to be explored. Such a system would be directed toward both specific measurement and process measurement. The development of such a technique will be examined in the next section. Observation: Measurement of Communication Skills As with other fields of study having to do with inter- personal interaction, curiosity about issues outstrips metho- dological resources. Often the researcher is confronted with a choice between a well-established, tested instrument which has doubtful or tangential relevance to the laboratory situation, or a tailor-made but untested new instrument. There has been a tendency to utilize established, validated measures rather than to rely on homemade devices whose deficiencies may become apparent only after all the data have been collected. Yet . . . instruments must be developed specifically for the social context under study (Stock, 1964, p. 437 . 75 In examining the research on Carkhuff's human relation model, Gormally and Hill (1974) noted the many drawbacks to using rating scales in measuring communication skills. They suggested using frequency data based on specifically defined categories to reduce the subjectivity involved in rating scale measurements. One study has been cited (Whalen, 1969) which collected frequency data using specific, well defined behaviors to delimit a set of categories. It was noted that the system developed by Whalen (1969) was used for research on the behavior of groups which were composed of strangers. Because human relations training participants are not strangers (at least not strangers following the first few meetings), and because the goals for human relations training groups differ from the goals of Whalen's research on modeling and counseling, it would be appropriate to develop a category system similar to Whalen's but directed toward categories encompassing more of the goals and objectives of human relations training. The categorization procedures used with Whalen's system allow for the analysis of frequency and duration. Another important consideration in evaluating small group behavior may be an analysis of patterns of interaction. A system which will allow the researcher to examine recurring patterns of interaction and what precipitates those patterns would be valuable in assisting the examination of a macro-view of groups. Frequency counts look at groUps from a micro-view and often times pick out otherwise unnoticed differences. Patterns of group interaction look at 76 groups from a macro-view and could help the investigator under- stand the larger picture of the group. This section will review two important considerations in developing a category system for human relations training. A procedure which simplifies the collection of frequency data and patterns of interaction will be viewed to be followed by a survey of the goals of human relations programs to be used as categories in a new category system. Interaction Analysis Classroom and group observation have been a topic of research interest since the early 1940's and many systems of observation have been developed to study classroom climate (Anderson and Brewer, 1945; Lewin, et al., 1939; Withall, 1949; Bales, 1950, and Medley and Mitzel, 1958). In looking at all the observation systems, one stands out as being the most influencial. Flanders' System of Interaction Analysis (FIAC or FSIA) (Flanders, 1960) has been used more often for classroom observation and has stimulated a wider variety of studies con- cerned with the classroom than any other observation system (Dunkin and Biddle, 1974). But perhaps more important in relation to the present review is the fact that the FSIA has spawned a number of other observation systems based on modification of the FSIA (e.g., Amidon and Hunter, 1967; Hough, 1967; and Ober, 1966) which can be used in a variety of settings to measure a variety of behaviors. 77 There has been an enormous amount of research derived from the FSIA. Many hypothesis related to teacher effectiveness have been studied using Flanders' system and some of those studies have given education and teachers a better way of looking at themselves. These studies are enlightening but in most instances do not pertain to the present study. But recently the FSIA and other observational systems have received some criticism (Dunkin and Biddle, 1974 and Rosenshine and Furst, 1973). This section will review the advances made in the field of interaction analysis particularly as they relate to the development of an interaction analysis system for measuring communication skills in small group settings. This will include response to some of the recent criticism of observation systems. Flanders' System of Interaction Analysis (FSIA) or Flanders' Interaction Analysis Categories (FIAC) was developed in the late 1950's to estimate the amount of interdependence between succes- sively coded statements. When developed, the categories them- selves were viewed as secondary in importance to the interaction analysis procedure which involved sequential time unit categorization and matrix display (Amidon and Hough, 1967). The impact of the FSIA during the last twenty years has had more to do with the procedures involved in coding with the system than with the categories them- selves. The FSIA is an observation system consisting of ten categories (see Appendix E), seven which refer to teacher behavior, two which refer to student behavior, and a category for silence or 78 confusion. To use the FSIA (or any other interaction analysis system based on Flanders' procedure) an observer, who has been trained in discriminating the various categories, listens to a classroom interaction, (either live, video-taped, or audio-taped) decides which category best represents each event and writes down the code symbol of that category (Flanders, 1970). The observer codes at a steady coding tempo (from twenty-five to twelve symbols per minute depending on the observer and the system used) and produces a long series of code symbols, one symbol to one event. This list of symbols can be analyzed for category frequencies or a matrix can be generated that will allow the investigator to examine the sequences of events by comparing every event to the event immediately before and after it. The time unit, sequencial coding, and matrix generation have been the prime contribution of the FSIA as they have allowed the investigator to examine the data in many ways. Simple frequency counts may at times be important as the FSIA'S time unit procedure allows the investigator to analyze not only the frequency of certain categories but also their duration. But more important, the matrix generation allows the investigator to examine recurring patterns of behavior within the matrix. Flanders emphasizes that the ten categories of his original system may only be a starting point for many researchers interested in questions beyond the scope of the FIAC. This encouragement to other researchers to develop new categories for interaction analysis has not fallen on deaf ears. Simon and Boyer (1970), in a 79 publication devoted solely to observation instruments, note that at least fourteen of the seventy-nine observation instruments they report on are derived from the FSIA. Rosenshine and Furst (1973) report similar findings, of the twenty-five systems they report on that are expansions or modifications of other systems, twenty are at least partially derived from the FIAC. In Flanders' most complete treatise on interaction analysis (Flanders, 1970) he devotes two chapters to the development of alternative interaction analysis systems. One method of modifying interaction analysis is the use of multiple coding with category clusters. Multiple coding with category clusters is based on the coders ability to code more than one symbol for each event. A code would include more than one diget, each diget representing a category from a different cluster. Therefore if an investigator were interested in who was talking and what kind of communication skill the speaker was using (as is the case in the present study) he would use two clusters, one to specify the speaker, the second to specify the communication skill. The number of possible cate- gories would be the product of the number of categories in the first cluster times the number of categories in the second cluster. Ober (Ober, et al., 1971) developed a multiple category system, based on the FSIA, in an attempt to overcome what he felt was an overemphasis on teacher talk. Ober felt that although research emphasized the teacher's behavior in the classroom a great deal could be learned from the students' verbal behavior. His system, called the Reciprocal Category System (RCS) (see 80 Appendix F), devotes equal attention to student talk and teacher talk. The RC5 is a multiple category system which has nine cate- gories (plus a category for silence or confusion). Each category can be coded as a single diget (e.g., l, 5, 9) to represent teacher talk or a two diget number (e.g., ll, 15, 19) to represent student talk. Ober's equal emphasis on student and teacher talk is relevant to the present study because the interactions in human relations training groups are frequently dominated as much by student talk as teacher talk. In developing an interaction analysis technique for human relations training groups it will be important to devote equal attention to teacher and student talk. The cognitive-affective distinction has been encorporated in many observation systems (80% of the systems in Simon and Boyer (1970) report some emphasis on the affective domain) but analysis of the effects of the affective-cognitive interaction have received very little attention. This may be a function of classrooms and teachers as emphasized by the fact that Flanders and others (e.g. Amidon and Hough, 1967) report an average of less than one percent of teacher talk being categorized as accepting feelings. This seems to have caused most observation systems to focus less on the affective dimension although some systems do emphasize students' feelings (e.g., McRel Interaction Analysis System (Simon and Boyer, 1970, #58) and Hough System (Simon and Boyer, 1970, #9). The affective domain cannot be over- looked in developing an observation system for human relations training as a great deal of emphasis is placed on people's feelings 81 in such groups. In fact a great deal of data would be overlooked if the affective domain was not equally represented, as often times groups may devote more time to the members' affect than cognition. Just as student talk was noted to be just as important as teacher talk, so too should affective messages be considered just as important to categorize as cognitive messages. These considerations suggest a cluster for a human relations observation schedule. Since all group interaction must be either student or teacher talk, and either cognitive or affective, the combination of these two dimensions (along with a code for silence) would make up a cluster to denote who is speaking. This cluster would be made up of five categories: (1) silence, (2) teacher-cognitive talk, (3) teacher-affective talk, (4) student-cognitive talk, and (5) student-affective talk. Before turning to an examination of various communication skills which could make up the second cluster an examination of some of the criticisms of observation systems would be appropriate. One of the most pressing problem with any observation system is the categories, their definition, and the extent to which they are mutually exclusive. For systems to be reliable and valid they must not include more than one category to code a single event. That is there must be a one to one relationship between observed behavior and one possible code. For a system to be reliable an observed behavior should be coded with only one category by any trained observer. Dunkin and Biddle (1974) are critical of many category systems which do not have mutually 82 exclusive categories. They feel that systems whose categories are not mutually exclusive suffer in reliability, may show weak and contradictory findings, and make interpretation difficult because the researcher cannot be sure what the reported data means. This is certainly an important question to examine in developing a new instrument and not a simple problem to solve. Simon and Boyer (1970) point out that optimally, observation systems should represent a set of mutually-exclusive, all- inclusive behavior but that in reality this is only a goal to reach for. For most systems many behaviors seem to fall into two or more categories which requires observer training which still does not result in 100 percent reliability. Because observation systems are used to measure complex behaviors they cannot be easily defined. One could use a system which had only two categories, "someone talking" and "no one talking." This would result in high observer agreement and would also satisfy the other requirements set down by Dunkin and Biddle (1974) but it would yield results which, in most cases, would be useless. In developing an observation system categories must be chosen with as little overlap as possible. These categories must also be defined as specifically as possible to avoid the confusion of having many categories applicable to one behavior. Closely related to this problem is the question of the inference the observer must make in coding categories. The previous discusSion would seem to indicate that observers should make no inferences at all in coding. But low-inference measures have shown 83 less success in predicting student success when compared with high- inference measures such as enthusiasm and clarity. High-inference measures are many times less reliable and leave the investigator somewhat in the dark about what the data means in terms of the behaviors the teacher demonstrated. Low-inference measures over- come these problems but have not proven successful as significant predictive measures. Because of this problem Dunkin and Biddle (1974) suggest pursuing the low-inference components of high- inference concepts and Rosenshine and Furst (1973) state, ”One way to combine the two observational procedures would be to use student questionnaires as the source of high-inference measures and tape recordings of the instructional period as the source of low-inference measures" (p. 166). The present study uses student questionnaires and tape-recordings in the collection of data. Dunkin and Biddle's (1974) most general concern for observation is that to some extent the instrument has become the focus of attention rather than the concepts they measure. This is the foundation of their concern that researchers are turning out more and more "new” category systems without being able to state why the categories are chosen or how the research categories are related to other research. In developing new category systems it is important to state why certain categories are chosen and relate those categories to the work that has already been done in the area. The purpose of the next section will be to Show how the categories chosen for the present instrument relate to the work done in the field of human relations. 84 Human Relation Goals: Categories for Observation In the last section a note was made of the concern raised by Biddle and Dunkin (1974) about researchers specifying why they choose to use various categories in their observation systems. In the field of human relation training and sensitivity education there are numerous terms to denote a small group of behaviors which are viewed as helpful in interpersonal interactions. For an observation system to have the widest possible usefulness it should incorporate categories to measure each of these helpful behaviors. This section is devoted to specifying why the proposed interaction analysis system incorporates the categories it has by showing generally how those categories relate to the communi- cation skills training that is being performed in education. As has been noted time and again during the review of educational programs, the majority of programs do not have specific behavioral objectives. Of those programs which do have specific goals it is possible to categorize their objectives into three broad skills, self-disclosure, listening, and feedback. These three areas of skills will be examined as they have been defined in communication programs and as they relate to an observation system. One of the most basic attributes to sensitivity training is self-disclosure. The merits of disclosing one's self to others has been expounded in almost every type of sensitivity group under a wide variety of names and rationale. The term self-disclosure hasbeen used by a number of programs (Barbour and Goldberg, 1974; 85 Wallen, 1968; Johnson, 1972; Egan, 1970; and Lopis, 1975) and is probably the most well known term deriving much of its exposure from the work of Jourard (1964, 1971). It includes such related terms as openness, leveling or authenticity (Springport High School, 1967) and also the general area of expressing feelings (Egan, 1970 and Belvar, 1974). .Johnson (1972) defines it as follows. "Self-disclosure may be defined as revealing how you are reacting to the present situation and giving any information about the past that is relevant to understanding how you are reacting to the present" (p. 10). It is generally what a layman might call a "personal discussion" although it need not relate intimate details of the speakers past life. Whalen (1969) dis- tinguishes this "personal discussion" from "impersonal discussion" in her observation system. She uses the term impersonal discussion to mean the offering of biographical information or other generally accessible information about the speaker. In developing an observational system it is important to distinguish self-disclosure from the offering of other information which does not reveal relevant (relatively non-accessible) data about the speaker. If sensitivity training helps people to level with others and be more open and authentic this should be revealed in an increase in self-disclosure behavior. An observation system for evaluating a human relations program must include a category for measuring self-disclosure and another category to record the offering of other information, similar to Whalen's (1969) impersonal dis- cussion. 86 Self-disclosure involves the speaker revealing himself to those around him, to others who are listening. The second skill incorporated into most sensitivity education experiences involves the listener. For people to feel comfortable revealing their ideas or feelings they must feel that others are listening to them and accepting what they say and feel.. Numerous behaviors are involved in helping others feeling comfortable in self- disclosure, such as attending behaviors (eye contact, posture, etc., Ivey and Rollins, 1970), understanding and exploratory responses (Dendy, 1971), and paraphrasing and behaviors which reflect empathy (Gazda, et al., 1973). Generally these behavioral indicators, and other non-behaviorally defined skills, involve letting the speaker know that the receiver is listening with understanding in a nonjudgmental way. This type of listening is called active listening by Gordon (1970) and is described as listening with the purpose of understanding the complete message of another person (both the content and feelings of the message) and communicating to the speaker, through your behavior, that you are understanding. This involves paraphrasing, asking clari- fying and exploratory questions, and showing through your behavior that you understand and empathize with the speaker. In human relations programs this has been referred to as: reflecting and summarizing feelings (Perkins, 1973; Barbour and Goldberg, 1974), empathetic understanding or attention (Bervar, 1974; Barbour and Goldberg, 1974), empathetic understanding or attention (Bervar, 1974; Barbour and Goldberg, 1974) or checking for understanding 87 and summarizing (Springport High School, 1967). Behaviors such as these are important in any human relations program and for an observation system to evaluate such programs it must have a category which allows for the measurement of active listening behaviors. The third general skill area of interest to human relations programs is feedback. The following definition by Nylen, et a1. (1967) approximates the definitions offered by other programs, whether they call the skill feedback, confrontation, or conflict resolution. Feedback is "communication which gives back to another individual information about how he has affected us and how he stands with us in relation to his goal or intentions" (p. 75). Feedback can be either positive (showing support for the person's behavior) or negative (non-support for his behavior) and generally this skill is further defined as to the feedback's usefulness or constructiveness. Numerous criteria are used in various programs for evaluating how constructive or useful a feedback statement is but generally the following criteria are included: the feedback ‘must describe the specific behavior, the statement should be presented as tentative knowledge not as fact, and the sender of feedback should include his own-feelings about the behavior or alternative behaviors the receivers could f0110w (Bervar, 1974; Johnson, 1972; and Springport High School, 1967). As an integral part of communication training it is important to include in an observation schedule, categories which measure positive feedback, 88 negative feedback and also some measure of the constructiveness of that feedback. Conclusion A portion of the literature related to sensitivity training and its evaluation has been presented here. The increased use of sensitivity training or some variation of human relations training in education combined with the continued lack of reliable measure— ment instruments in the field points out the need for the present research. Although there is an immediate need for the evaluation of many human relations programs, researchers such as Bowers and Sears (1961) and Stock (1964) have pointed up the more pressing need to develop reliable valid instruments to use in evaluation. The need of the sixties has become the need of the seventies, very little has been done to solve the problem. It seems investi- gators are more concerned with showing the value of their program than with developing instruments capable of reliable, valid measures. The present research is an attempt to begin to show ways to collect valid and reliable data about the communication skills of group participants in small group settings. CHAPTER III INSTRUMENT DESCRIPTION AND PILOT TEST Introduction The need for a method of evaluating intensive small group experiences is clear: many groups exist and a recognition of the need for such groups is creating many new programs that incorpo- rate these experiences. However, research and evaluation in this area are lacking and instruments for measuring communication skills are practically nonexistent. The present study is an examination of this problem with an emphasis on one possible solution, an interaction analysis observation system. This chapter will introduce the reader to an interaction analysis system designed specifically for measuring communication skills in small group settings and will be referred to as the Interpersonal Skills Inter- action Analysis (ISIA). The initial section of this chapter will describe the procedures for using an interaction analysis system, specifically the coding and matrix generation, and also briefly describe the categories of the ISIA. The last section in this chapter will describe the ISIA Pilot Test and instrument modifications. Interpersonal Skills Interaction Analysis The ISIA follows a multiple coding category system (Flanders, 1970) derived in part from the observation schedules of Flanders 89 90 (FIAC) (Flanders, 1960) and Ober (RCS) (Ober, et al., 1971). Being a classroom interaction analysis system, the ISIA seeks to abstract communication by ignoring the content character- istics (i.e., what is being talked about) and focusing on the process characteristics (i.e., the types of communication skills being used in the interactions). Interaction analysis systems are a process of encoding and decoding. Encoding is defined as assigning, by coded symbol, statements to previously defined categories. This assignment is done by trained observers and the recording of the data is done chronologically. Decoding is the reverse process. A trained analyst interprets the coded data from which inferences can be made about the original interactions even though the decoder may not have been present when the original data was collected (Flanders, 1970). In this way the communi- cation process can be examined and compared with other interactions apart from the specific content of the interaction. The purpose 0 of observation schedules is descriptive rather than prescriptive although the understanding to be gained from the data may facilitate future modification of the communication process. In a multiple coding category system a single code consists of two or more numbers or letters which symbolize a single event. The ISIA which uses two category clusters requires a two place symbol for each event. One place designates the category within the first cluster and a second place which designates the category in the second cluster. For example, a single event might be coded "46" to indicate “4 ," the fourth category in the first 91 cluster, and " 6," the sixth category in the second cluster. The first category cluster in the ISIA (category A) designates who is speaking (teacher, student, or silence) and in what domain (cognitive or affective) the speaker is talking. The second cluster (cluster B) indicates what communication skill the speaker is using. The following are the codes for the ISIA: Cluster A Cluster B O - Silence or confusion 1. Positive Feedback 1 - Teacher - cognitive a. responsible 2 - Teacher - affective . b. irresponsible 3 - Student - cognitive 2. Active Listening 4 - Student - affective 3. Elicits Information 4. Directs or Suggests 5. Offers Information 6. Self-Disclosure 7. Negative Feedback a. responsible b. irresponsible The ISIA has three exceptions to the use of two numbers per code. The code for silence or confusion (A:O)1 is necessarily a single digit, 0, since it includes no particular communication skin.2 The other exceptions are the codes for positive feedback (8:1) and negative feedback (B:7) which include a letter in addition 1The symbol "A:O" indicates the 0 code in category A, "8:2" indicates the 2 code in category B, "32" indicates the third code in category A and the second code in category B. 2Although there is an appropriate and inappropriate time for'silence, laughter, etc., this will not be examined by the ISIA. .1 Nil: I! 92 to the two number codes. The letter is used to indicate the type of positive or negative feedback that is used, either responsible (a) or irresponsible (b). For example, the code B:la indicates positive responsible feedback and B:lb indicates positive irresponsible feedback. Letters are used with negative feedback as well. Flanders (1970) describes interaction analysis as a label which "refers to any technique for studying the chain of classroom events in such a fashion that each event is taken into consider- ation," (p. 5). To extract the optimal amount of information from such a technique, two important conditions must be met. First, the events must be recorded in sequence, allowing the interpretation of the order of the events. Second, the observer must have a coding tempo which will allow the interpretation of the duration of specific events. When these two conditions are met, the coding data can be decoded and interpreted for total time spent in specific types of interactions, what types of inter- actions precede and follow specific communication skills, what patterns of communication skills exist in the classroom or group, and other questions of interest to the observer, teacher, or researcher. The ISIA category system is a totally inclusive system (Flanders, 1970) which exhausts all the possibilities of any potential event. That is, the five categories in the A cluster combined with the nine categories in the 8 cluster allow for the coding of any verbal statement. This makes possible the continuous 93 coding at a constant rate throughout the observation. The con- tinuous coding at a constant rate (coding tempo) is an important consideration as it allows the interpretation of the sequence of events in relation to their duration. This is essential whenever the observer (or researcher) wishes to investigate the proportion of time spent in any specific category and also in examining the sequencing of particular categories. Redundancies and contradictions should be avoided in a multiple category system to make the system workable and complete. A code symbol that contains two digits is redundant to the extent that any other two digit symbol can be used to cOde the same behavior. A code symbol is contradictory when any two serial digits represent categories that are mutually exclusive and therefore produce a meaningless symbol (Flanders, 1970). The ISIA has no redundancies, i.e., one and only one code symbol can be used for any observed event. The ISIA has some contradictory categories which are meaningless and cannot be used. However, all possible interactions can be coded in the ISIA scheme. Hence the categories of the ISIA can be said to be jointly exhaustive. Table l (and Appendix G) shows a listing of the thirty- three possible categories in the ISIA, it does not include the contradictory categories that are by definition meaningless, e.g., the code 45 would signify student talk, affective, offering information, but by definition the offering of information must be cognitive, therefore, the code 45 is meaningless. ~— 94 Table 1 List of all ISIA Categories Cluster B la 1b 2 3 4 5 6 7a 7b c o b 1 11a 11b 12 13 14 15 16 17a 17b 5 A 2 21a 216 22 23 26 27a 27b E 3 316 316 32 33 34 35 36 37a 37b R 4 416 41b 42 43 46 47a 47b Procedure The procedure for recording events in sequence involves having a trained observer sit in on a group or listen to a tape recording of a group and decide which category best represents the communication events just completed. This categorizing and recording of the codes in sequence is done as often as possible at a constant tempo. Pilot testing revealed that a reasonable coding rate for the ISIA is approximately one code every five seconds, although some variance between observers is expected. Variance between observers is not as critical a variable as a variant tempo for eggh_observer. Having a regular tempo is important, because most con- clusions depend on rate consistency, for example, the comparison of two categories during the same observation can only be done if a code in one category represents an equal amount of time in another category: twenty coded units of one category should 95 equal approximately the same elapsed time as twenty coded units of another category. Once the data is recorded in sequence on an observation sheet (see Appendix H), it can be transferred to a matrix which facilitates the interpretation of patterns within the data. One of the most significant contributions to the field of observation techniques has been the interaction matrix which Flanders (1960) introduced with the FIAS. Given any category system designed for classifying events at a constant rate, in sequence, the information obtained by the data will be increased (in fact, more than squared) by considering pairs of events as the unit to be tabulated rather than single events (Flanders, 1970). This can be done efficiently by generating a matrix with g_rows and p_ columns (p_being the number of meaningful categories in the interaction analysis system) and using this matrix to show the sequential order of the observed events. This is done by using pairs of events whereby the first number of any given pair designates the row and the second number designates the column. The following example will help to illustrate the relationship: lst pair , 3rd pair 5th pair 0 15 . 15 33 15 35 0 2nd pair 4th pair 6th pair Note that in the above example each code symbol is used twice to form a pair (with the exception of the first and last codes which are used once). Each code symbol is first used as the 96 second number in a pair (except the initial number) and-then used as the first number in the following pair. In this way, the n x n matrix facilitates the observer's investigation of the patterns of interaction, i.e., what precedes and follows certain communication skills. For example, the above illustration shows the third pair to consist of the code symbol 15 (teacher - cognitive - offers information) followed by the code symbol 33 (student - cognitive - elicits information). This pair has the address 15 - 33, it is located at the intersection of row 15 and column 33. By using this pairing system on the hundreds of coding symbols recorded in a thirty-minute observation, an observer can generate a matrix that yields a great deal more information than the individual codes themselves. Whenever observation code symbols are recorded in a fashion which preserves the original sequence at a stable coding tempo, a matrix can be tabulated which yields the added information needed to examine an interaction beyond a simple frequency count. This type of matrix combines individual code symbols and short interaction patterns into one matrix which may delineate large more frequent patterns of communication which may go unnoticed if one were examining individual interaction codes or the overall perception of the entire interaction. Primary and secondary communication patterns can be examined for frequency and duration. Individual rows and columns can be inspected to answer questions such as: "What response most frequently follows negative feedback?" or "What most frequently precedes student - affective self-description?" 97 ISIA Category Description The ISIA uses two category clusters to discriminate types of interactions in small group settings. This section will briefly describe each category in those clusters. A more complete descrip- tion of the categories including examples is presented in the ISIA Training Manual (Appendix A). Cluster A A:O--Silence or Confusion - This category includes pauses, short periods of silence, or periods of confusion in which the observer cannot understand the interaction clearly enough to Code it (e.g., laughter). A:l and A:2--Teacher - Both of these categories refer to . verbal statements of the teacher (classroom) or group leader (small group, process lab, etc.). A:3 and A:4--Student - Both of these categories refer to verbal statements of the student (classroom) or group members (small group, process lab, etc.). A:1 and A:3--Cognitive - Cognitive statements refer to verbal comments which have a factual or content input. Cognitive statements are related to knowledge, the process of knowing. Statements which are coded cognitive are the presentation of how the person thinks about something as opposed to how they feel about it. _ A:2 and A:4--Affective - Affective descriptions are those which refer directly to feelings. These statements may refer to either the speaker's feelings or the feelings of other group members. 98 They often include words which refer to affective states such as love, hate, anger, frustration, shy, etc. (see also, Appendix E in Gazda, 1973). For statements to be coded affective they must label and/or refer directly to a feeling. This is, if someone is obviously angry but does not label that anger ("I'm really angry") the message is coded cognitive. The coding is based on verbal communication not on inferences to feeling states in the group members. This is a very conservative approach but it eliminates the problem of false positives. Cluster B Cluster B of the ISIA is used to denote what communication skill is being used by the speaker. These skills are closely related to helper-helpee skills used in counseling and other communication skills programs (e.g., Carkhuff, 1969) although the words used to label the particular skill may be different. The nine categories are not all "communication skills" but rather are particular communication skills and other categories which make the ISIA a totally inclusive system. B:l--Positive Feedback - Feedback is the response or reaction a person gets from or gives to others regarding one's personal being or actions. It is a verbal response of a sender (the person giving the feedback) to a receiver (person to whom feedback is directed) which is focused on the receiver's being or actions (stimulus behavior). In the case of positive feedback, the sender's message (the positive feedback) shows support for 99 the stimulus behavior of the receiver. It is, in effect, positive reinforcement for the stimulus behavior. Feedback can be either cognitive or affective. If it is directed toward the receiver's affective behavior (e.g., sharing of feelings) and/or includes the sender's affect associated with the stimulus behavior (e.g., sender stating how the stimulus behavior makes him feel) it is coded affective (21 or 41). Cognitive positive feedback (11 or 31) would be coded for any positive feedback that refers to a cognitive stimulus behavior and does not include the sender's affective reaction to that stimulus behavior. Bl:a--Responsible - For positive feedback to be responsible (B:la), it must meet two criteria, it must be specific to the stimulus behavior and it must be potentially helpful. For feed- back to be specific, it must describe to the receiver the stimulus behavior in specific rather than general terms. That is, the receiver must be aware of exactly what he is getting feedback about. The helpful quality refers to the nature of the stimulus behavior itself. That is, what the sender is approving of must be something that should be continued or increased. To meet the criteria of helpful, the sender must be giving positive feedback about a stimulus behavior that is potentially growth-producing to the receiver. Both of these conditions must be met for positive feedback to be responsible. Bl:b--Irresponsible - Positive feedback which does not meet the criteria of pgth_helpful and specific is coded positive irre- sponsible feedback (B:lb). 100 B:2--Active Listening - Active listening is a sentence, word, or phrase which puts the focus of an interaction on the person who has previously been talking and encourages that person to elaborate further in the interaction. This may be accomplished by paraphrasing, reflection of feelings, or the asking of a clarifying or exploratory question. The important ingredient in active listening is that the listener communicate to the speaker that he has understood what the speaker said (or that he does not understand and wishes clarification) and also communicates to the speaker the listener's desire to hear and understand more of the speaker's ideas or feeling. B:3-~Elicits Information - This type of talk asks a question or requests information about the content, subject, or process of the group with the intent that another should answer (respond). The purpose of behaviors in this category is to elicit or secure information. It differs from active listening in that it is the initiation of an interaction and not the encouragement of an ongoing interaction. Eliciting behaviors may be cognitive or affective. B:4--Directs or Suggests Solutions - This type of talk gives directions, instructions, orders, or assignments with which another is expected to comply. It differs from B:3 statements in that directions are given and compliance is indicated. State- ments which are part of an interaction made up of active listening and self-disclosure but which direct the person to a specific solution are also coded B:4. 101 B:5--Offers Information - The code 8:5 is used when a statement is the offering of facts or information concerning the content, subject, or procedures being considered. It is also to code responses to questions or requested information by others. This code is used for statements which are the presentation of facts outside of one's own experience, i.e., it relates what the speaker knows rather than what he has done. It is the presentation of cognitive information and can never be affective. B:6--Self—Disclosure - Self-disclosure is the offering of information of a personal nature and includes the sharing of values, opinions, personal experiences, and feelings. The 8:5 versus B:6 distinction depends on whether the information presented is fact outside of one's experience (8:5) or facts or feelings within one's experience (8:6). B:7--Negative Feedback - Negative feedback differs from positive feedback in that the sender is stating non-support for the stimulus behavior. The predictable effect of the negative feedback is that it weakens the stimulus behavior. The criteria for responsible feedback (either negative-B:7a or positive - B:la) is that the feedback must be specific and potentially helpful. If the feedback lacks either specificity pr_helpfulness, it is coded as irresponsible feedback. ISIA Pilot Test and Instrument Modification During the winter and spring term, 1974, a pilot test was run in a sample of IPLs of ED 200. The pilot test was important 102 for two reasons. Because of the personal nature of intensive group experiences, it was felt that the affective reaction of the members, both facilitators and students, was an important variable to examine. Secondly, because the ISIA was a new instrument, it was important .to collect some data on its feasibility for use in groups in relation to useability, reliability, and specificity. During the winter and spring terms approximately ten IPL sessions were tape recorded. The nature of the research was explained to the members of the groups by the group facilitator and permission was requested to record the group session for that day. In all cases, the groups consented to the taping, although some initial hesitancy by some members was evidenced. In some cases, this hesitancy was discussed after the taping had begun and it is the author's judgment that this reluctance was quickly overcome. Although some facilitators stated an initial apprehension con- cerning the taping, the stated reactions following the tapings were all positive. The tapes were made available to the facili- tators and this was seen by many of them to be an asset in their working with the groups. There were no stated negative reactions by either facilitators or students. During the spring term, the opinionnaire (Appendix B) was given to two groups following the taping of their sessions. Brief instructions were given orally to the students. They were then instructed to read the directions, fill out the opinionnaire, in class, and hand it in to the instructor. The opinionnaire required less than five minutes to complete for the majority of the students. 103 Although there was no negative affect associated with filling out the opinionnaires, some difficulties were discovered. These diffi- culties were all associated with the directions: some students were unsure of the difference between self-description and offering information, some students were confused over how to make the judgments they were asked to make (i.e., they were unsure of what to compare the session with), and some students did not take into account the phrase "first half of the session," (i.e., they used the entire class as the unit of analysis on the first twelve questions). Each of these difficulties was remedied by more specific directions both oral and written. From the results of the pilot test, the author concluded that the affective concerns were not the problem they initially were felt to be. With the new wave of privacy invasion, it was felt many people would resent tapings, but this was not found to be the case. This is not to say that taping groups is unquestion- ingly accepted, rather the author found that by explaining the nature of the research and the use to be made of the recordings, the members of the group were quite willing to allow the taping. The key to this success seems to be the honest communication of the objectives and procedures involved in the use of the recorded material. A majority of the ten IPL sessions that were taped were coded by the author using the ISIA. These tapes were used to examine the feasibility of the instrument in terms of the following questions: How much time is involved in the training of an observer 104 to use the ISIA? Is it possible to code group sessions using cate- gories such as self-description, active listening, and feedback, or are these too vague? How reliable is the observer and instrument? What modifications need to be made in the instrument, manual, or procedures to conduct future research? Although no observer was actually trained to use the ISIA (except the author himself) there is some indication as to the length of time it would take to train an observer. After choosing the categories, writing and refining the manual, and listening to some tapes, it took the author less than ten hours of training to reach a level where he had a stable coding tempo and a subjective feeling that he was coding with reasonable reliability. Taking into account the author's familiarity with the instrument and his experience in teaching the IPLs (three years), it seems reasonable to assume the following: (1) the training of an observer who has experience in facilitating IPLs (and therefore the IPL objectives) would require approximately ten to fifteen hours to reach an inter-rater reliability of .80 (using the author as the criterion). (2) It is difficult to judge whether knowledge of group dynamics or knowledge of the IPL objectives was helpful in learning to use the ISIA. It is possible that group facilitators (e.g., sensitivity groups, encounter groups) would be able to use the ISIA in the same period of time as an IPL facilitator. (3) It is felt that individuals with no such experience would take from fifteen to twenty hours of training to become proficient in the use of the 105 ISIA, depending upon their understanding of interpersonal communi- cation skills. Because the ISIA was developed partially from the objectives of the IPLs, it would be assumed that the behaviors exhibited in the IPLs would reflect these objectives. The coding of the tapes from winter and spring terms reflected this. Although the communi- cation skills were not equally distributed across all the possible categories, there was evidence that most of the categories were represented and that the category definitions and ground rules are specific enough to make the instrument useable and reliable. For example: One of the sessions from winter term was coded twice (one week between the two codings) to examine the intra-rater reliability. In this particular.twenty-five minute segment of one session, nineteen of the thirty-three categories were used. Of the nineteen categories used, some categories were used much more frequently than others (four categories accounted for 66% of the coded data; 15-20%, 36-19%, 35-14%, O-l3%), while the remaining coded data were more evenly divided between the remaining fifteen categories. The reliability of an observation instrument is a difficult question to address as there is little agreement as to exactly what such a reliability should measure (Medley and Mitzel, 1963; Mitchell, 1970; and Rosenshine and Furst, 1973). This question will be addressed in depth in the section on Reliability, but for the pilot test, it was felt that an intra-rater reliability measure would give some indication of the "agreement coefficient," 106 (Rosenshine and Furst, 1973), potential of the instrument. Scott's "pi" (Scott, 1955) was chosen to estimate the reliability as it is unaffected by low frequencies, can be adapted to percent figures, and takes into account the number of categories. The results showed an intra-rater reliability of no less than .70. Although only one reliability check was done (that being on one-half of one session), it is felt that this indicates sufficient specificity to warrant further investigation of the ISIA without major modifi- cations of the instrument itself. As with any observational instrument, it is important to be able to use the ISIA in more than one situation (e.g., IPLs). During the spring, 1974, the investigator had access to a fourth grade classroom in a local public school (Southridge Elementary School, Charlotte, Michigan) which was using the DUSO program (Developing Understanding of Self and Others, Dinkmeyer, 1970). Three classes were tape recorded while the class was involved in DUSO. These were coded using the ISIA. The ISIA was found to be appropriate for this environment as the categories covered all the interactions and the distribution of codes covered a majority of the possible code categories. Modifications From the results of the pilot, a number of modifications were made and implemented in the present study. These modifi- cations fall into three areas, the taping of IPL sessions, the opinionnaire, and the ISIA. 107 Recording of IPL sessions--The results of the pilot test showed that apprehensions to the tape recording by the group members could best be avoided or alleviated by the honest communi- cation of the objectives and procedures involved in the use of the recorded material. Therefore, it was crucial that these objectives and procedures were made as clear as possible to the students and facilitators who were involved in the study. During the summer term, when a number of groups were taped for the final four weeks, the author personally described the study to each group, answered any questions the group members had and allowed them to decide whether they would participate in the study. This seemed to cause the least interference in group functioning and also avoided the possible negative affect associated with required participation. Opinionnaire Several difficulties were discovered in the pilot test in relation to the opinionnaire. The students' difficulty with some of the terms indicated the need for a brief description of terms, particularly those terms which were not found in the objectives for the course (e.g., offering information). A glossary of the terms was included with each opinionnaire. The directions needed to be spelled out in more detail as students seemed to just glance over them. In light of the students rushing through, the directions (and the glossary) were put on a separate page, as the first page of the opinionnaire. It was particularly important to emphasize two parts of the directions: 108 (a) it must be clearly pointed out that the opinionnaire was asking the students to look at the class in two parts, the first twelve questions relating to the first half of the class, and the second twelve questions relating to the second half of the class. That is, question #1 is identical to #13 except for the words fir§t_in #1 (referring to the fir§t_half of the class) and segggg_in #13 (referring to the segppg_half of the class). (b) It must be clearly pointed out that students were to make judgments for the twenty- four questions in relation to other IPL sessions (i.e., individual class periods) they have experienced during the term. Students seemed to confuse "session" and "section," stating that they hadn't attended any other section. This was more clearly explained on the introductory page. Generally, the observation instrument was acceptable in the form used during the winter and spring terms, 1974. Two minor modifications were seen as beneficial to the present study. The investigator's concern over fine discriminations and a personal communication with Ned Flanders brought about the collapsing of what were reflective statements, clarifying statements, and exploratory statements into one code, active listening. Convergent and divergent questions were also collapsed into what is now Elicits Information. The possible loss of reliability and Flanders' work, which has shown such fine discriminations to contribute little to the data, led to the elimination of those categories. The second modification of the ISIA involved the rewriting of the observation manual. The training of observers and the possible use of the 109 ISIA in other settings by other investigators requires that the ISIA have an in-depth manual to instruct observers in coding procedures. CHAPTER IV METHODOLOGY Introduction Chapter III described the procedures, categories and the pilot test of one observation technique for measuring communi- cation skills in small group settings, the Interpersonal Skills Interaction Analysis. This chapter will review the procedures involved in testing the suitability of the ISIA in one particular type of communication skills group, the Interpersonal Process Laboratory (IPL). The answers to the following questions, first posed in Chapter I, are the focus of the development of the ISIA: 1. Can the ISIA be shown to be a reliable observation instrument? What conditions influence that reliability? 2. Can the ISIA be shown to be a valid observation instru- ment for recording interpersonal communication skills? Using the available subjective criterion, is there any correlation between the ISIA and those criteria? 3. Is the ISIA capable of delineating particular types of communicative behavior in small group settings? Do the subjective reports of group effectiveness relate to the behaviors demonstrated in the groups as recorded by the ISIA? The answers to these questions will be sought through the analyses described in this chapter. The first area of discussion will be the specific methods of data collection including the population, tapings, observations and observers. The next area of discussion will be the reliability of the instrument, of various types of 110 111 reliability, of the specific methods to be used in the present study, and a review of the procedures involved in training observers. A discussion of the validity of the instrument will include consideration of the types of validity measures, the specific instrument used, and the procedures involved in esti- mating the validity of the observation instrument. Data Collection The actual data in the present study were the recorded codes of the ISIA which represent the communication skills being used by group members in a small group setting. Except for the data collected for the in-class versus taped reliability estimate, all data in the present study were taken from audio-tape cassettes. This section will include an examination of the population and sample, raw data, and observers. ngulation and Sample . The population for this study will be those individuals involved in an introductory course in education, The Individual and the School, at Michigan State University during the summer term, 1974. The course is divided into three parts: the carrel portion which involves the cognitive tasks of teaching concepts (e.g., assessment techniques, respondent learning, etc.), the large group presentation which is a lecture presentation of relevant issues in education, and the Interpersonal Process Laboratory (IPL) which involves the presentation, demonstration, and practice of inter- personal communication skills. The latter portion of the course, the IPL, will be the area of interest for this study. 112 Raw Data During the summer term, 1974, five IPL classes were tape recorded for the last three weeks of the term. During that term, fifteen IPL sections were offered to students. Because of a limited number of tape recorders, only eight sections could be taped. The researcher explained the study to the eight sections, outlining the objectives of the research, explaining how the tapes were to be used, and answering any questions the students had. They were then told to make their decision with their instructor regarding participation in the study after the researcher had left. Six of the eight groups decided to participate in the study. One of these groups was dropped from the sample because of missing data. (The five remaining groups make up the sample. Each of the five IPL sections was recorded for the last three weeks of the term, two groups had six recorded sessions and three groups had five recorded sessions. Each recorded session included a two-side cassette tape (forty-five minutes per side) and the opinionnaire data from students and instructor. Observers Four observers (those trained in coding the ISIA) were used to estimate the reliability of the ISIA: the researcher, an instructor in E0200, a school teacher, and a student. It was felt to be important to estimate the reliability of the ISIA using a group of observers with a variety of experiences in interpersonal communications and educational background._ The researcher was 113 experienced in group dynamics, had worked with the objectives under study and had facilitated more than forty IPL groups. The instructor in ED 200 was experienced in group dynamics, had worked with the objectives under study for more than two years and had led approxi- mately five IPL groups. The school teacher was the wife of the researcher who had an understanding of group dynamics but no formal instruction in the objectives of the ED 200 course. She had never participated in an IPL group but was experienced in group work through the facilitation of DUSO exercises in her classroom. The student was a pre-service teacher who had experienced two IPL sections. Except for the researcher, all the observers received the same training with the ISIA (see training manual, Appendix A). Reliability The definition of the reliability of an observation instru- ment involves a number of variables and it would seem to vary according to the environment in which the observation instrument was being used. Medley and Mitzel (1963) define reliability as follows: "A measure is reliable to the extent that the average difference between two measurements independently obtained in the same classroom is smaller than the average difference between two measurements obtained in different classrooms," (p. 250). This definition takes into account three variables, the amount of inter- rater agreement (what Medley and Mitzel call "coefficient of observer agreement"), the amount of within-class variability, and the amount of between-class variability. 114 The coefficient of observer agreement is defined as the correlation between scores based on observations made by different observers at the same time. This is the most common form of reliability when examining an observation instrument. This type of reliability can be estimated by a variety of reliability indexes; the most common being the percentage of judgments on which the coders agree. Unfortunately, a measure which only takes into account the percentage of agreement is biased in favor of systems with small numbers of categories. For example, a random assignment to a two-category system would yield a much higher reliability estimate than random assignment to a ten-category system. Therefore, a reliability estimate must take into account the number of possible categories and also the number of categories used. Otherwise, one would only need add categories that could not possibly be used to increase the reliability. When the data to be analyzed is on a nominal scale, as is the case with most observational instruments, one method of determining the reliability is by Scott's n (Scott, 1955). This method can be interpreted as the extent to which the coding reliability exceeds chance. It is calculated by the following formula: where Po (observed percent agreement) represents the percentage of judgments on which the two observers agree when coding the same 115 data independently; and Pe is the percent agreement to be expected on the basis of chance. This formula takes into account the number of categories used, the number of codes recorded, and the per- centage of the agreement between the observers. Scott's n has been used extensively by those involved in observation research (Amidon and Hough, 1967, Ober, et al., 1971) but it has received some criticism. Mitchell (1969) notes that methods of reliability such as Scott's take into account total events in each category rather than the reliability of individual codes by the observers. This could be a serious problem if the researcher were interested in using individual codes in his analysis. This is not the case in the present research. The units to be analyzed in the present research involve frequency count totals (column totals of the individual class matrices) and patterns within a matrix, not individual codes. A difficulty may also arise in relation to observer's coding tempo. Since it is unlikely that observers' coding tempos will be exactly the same, the only possible way to examine the reliability of individual codes would be to use transcripts of the tapes. This would be extremely inefficient for the present study and is probably not necessary in any case due to the use of a frequency count in the analysis of the data. Therefore, Scott's n is the preferred reliability index in estimating the coefficient of observer agree- ment. The coefficient of observer agreement is the most basic and most essential step in showing the reliability of an observation technique. In the present study, Scott's n was used to estimate 116 this coefficient, the reliability of inter-rater agreement or the correlation of trained observers coding the same group sessions. The coefficient of observer agreement is a necessary condition for reliability but not always a sufficient condition. Unreliability can also arise from within-class variability and between-class variability. If the interactions that are observed and coded do not differ sufficiently between group sessions, even perfect inter-observer agreement will not result in acceptable reliability. For example, if an instrument were developed to measure a trait which everyone demonstrated in exactly the same way, the observer agreement could be perfect, but the between-group variability would be zero and the instrument would be worthless. On the other hand, if the within-class variability was as great as the between-class variability, the trait or behavior being measured would be very unstable and even perfect observer agreement would result in a limited reliability because of what Cronbach (1972) labels the lack of generalizability of the results. If the within-class variability is as great as the between-class variability, whether that variability be very high or very low, the instrument cannot discriminate one class from another. This would diminish the usefulness of the instrument to the point where it could only be used in a descriptive manner. In examining the reliability of an observation instrument, two separate factors come into play. The inter-rater reliability (or coefficient of observer agreement) relates to the instrument itself, but the stability of the trait or behavior being measured 117 also influences the reliability. The within-class variability is therefore an important consideration in examining an observation schedule's reliability. Medley and Mitzel (1963) refer to a stability coefficient as the correlation between scores based on observations made by the same observer at different times. They contend that any instability across occasions is due to random error in the environ- ment or the persons. McGaw, et al., (1972) contends that this is not necessarily the case as it does not allow for lawful changes in behavior. In the present study this is an important consideration as there is little expected stability from one IPL session to the next due to the differing objectives of each session. That is, one session may have as its objective listening skills while the next session may have as its objective, feedback skills. This would constitute lawful changes in behavior from one session to the next and would naturally lower any stability estimates between sessions. McGaw, et al., (1972) compared an observer agreement coefficient to the reliability coefficients associated with alter- nate forms of a test. If the psychometric analogy were extended, the stability coefficient could be examined by means of the split- half reliability coefficient. Because many small group experiences have lawful changes in the behavior of the group members due to alteration of the objectives of the group session, a stability coefficient comparing different sessions would be predictably low. If it were found that different sessions which had different behaviors were coded in a way that indicated stable behavior across 118 sessions, this would indicate unreliability rather than reliability. An alternative way of estimating the stability coefficient is to apportion the group sessions into two- or three-minute divisions and then use an odd-even correlation to estimate the stability coefficient. Using the split-half reliability coefficient, it is possible to begin to examine the ISIA in relation to the definition stated earlier by Medley and Mitzel (1963, p. 250). One judgment of the within-class versus between-class stability would be the extent to which a split-half correlation of one session is greater than the correlation of split-halves of two different sessions. This examines whether within-class variability is less than between- class variability but it may leave some questions unanswered. Because of the flexible approach of instructors and the changing objectives between sessions stability coefficients would be predictable low. Although low correlations are to be expected, within-class stability should show higher correlations than between-class stability. One additional question of reliability is raised in relation to the influence of non-verbal behavior. In interpersonal communi- cation, part of any message is disclosed through non-verbal cues. Since this study uses data from audio-tape recordings, it is important to investigate the possible loss of information due to using audio-tape recordings as opposed to in-class observations. During the spring term, 1975, the researcher investigated this question by comparing the coded data collected during an in-class 119 observation to data collected on the same session by means of a tape recording. The researcher observed and coded a group session while it was going on while tape recording the same session. Later, the tape recording was coded and then the in-class observation data was compared with the tape-recorded data using Scott's n to check the reliability of the two observations. A coefficient of observer .agreement of .70 or better supports the researcher's contention that the information lost due to the use of audio-tape recordings is not significant enough to justify in-class observations. Critical to the reliability of an observation instrument is the training of observers to use the instrument. In this case, the training of observers to use the ISIA was accomplished by the use of a training manual (Appendix A). The training manual is essentially a self-teaching guide which briefly explains the procedures of interaction analysis, defines the categories with exemplars and non-exemplars for each category, and finally leads the observer through some exercises which introduce him/her to first the basic distinctions and then gradually incorporate more of the categories until s/he uses all of the categories to code a short transcript. When the observer had mastery of all the categories on typed transcript examples, s/he was introduced to audio-tape observations. The observer was trained on audio-tapes until s/he reached a level where s/he felt comfortable in coding a forty-five minute tape. For a more detailed description of the training procedures, see Appendix A, ISIA training Manual. 120 Validity With observation instruments, a great deal of confusion and debate centers on the question of reliability. The issues of validity for observation instruments seem to receive less debate in comparison. Herbert and Attridge (1975) point up the problem in their article "A Guide for Developers and Users of Observation Systems and Manuals:" Though much time and space has been devoted to discussion of the reliability problem in observational research, precious little has been assigned to that of validity. Most of the instruments developed in the observation field have yet to prove the validity of their measures (Rosenshire and Furst, 1973, pp. 125-126). System designers and users frequently do not go far enough in the development of their tools to estab- lish validity against measures of student growth or other pertinent criteria. Still the current progress towards proving the validity of the systems measures must be reported (p. 15). . The validity problems encountered in previous observational studies are also experienced in the present research. A major part of the difficulty in estimating validity arises from the vagueness in the various definitions. In discussing observation instruments, Medley and Mitzel (1963) state: "A measure is valid to the extent that differences in scores yielded by it reflect actual differences in behavior--not differences in impressions made on different observers," (p. 250). They go on to say that a valid observational scale provides a record of the behaviors that actually occurred in such a way that the scores are reliable. Herbert and Attridge (1975) point out the lack of data on validity for observation instruments but one cannot appreciate the neglect (either conscious or unconscious)of the topic until one goes through the literature. 121 Not only is the validity data absent from the literature, the present researcher found the topic of validity for observational instruments mentioned in only three articles, and two of those were commenting on how rarely the topic was examined. The lack of validity measures for observational instruments can be more easily understood when one considers the accepted measurement definition of validity-- validity is ". . . the degree to which it measures what is purports. to measure, . . ." (Ebel, 1972, p. 567). Using this definition, an author of an observation instrument would seem to have reason to claim validity for his instrument if the instrument were shown to have a high degree of observer agreement. Meeting the requirement of observer agreement does, in fact, show evidence for both construct validity and face validity. Using these definitions of validity, the ISIA can be shown to be a valid instrument if it evidences a high degree of observer agreement. Construct and face validity are both what psychometricians would call direct or primary validity (Ebel, 1972 and Thorndike and Hagan, 1955). But it is also important for an instrument to show derived or secondary validity. Derived validity depends on the extent to which a measurement correlates with a criterion score. There are two types of criterion-related validity, predictive validity and concurrent. Because the ISIA is a new instrument in a field which has had very little research (affective . education), it is impractical to attempt to show predictive validity. But because the instrument is being developed to add objectivity to previously subjective reports of group effectiveness, 122 the issue of the concurrent criterion-related validity is an important consideration. The most effective way of demonstrating the criterion validity of the ISIA is by showing the relationship of the ISIA to the most objective standard now being used. As was pointed out in Chapter II, very few evaluations use an instrument even approximating an objective measure but it would seem that the subjective reports by observers and participants comes closest to being an acceptable instrument. .Herbert and Attridge (1975) refer to this procedure, "An appropriate, though somewhat primitive, procedure to determine concurrent criterion- related validity might be the comparison of the instrument's findings with the opinion of one or more observers assessing the same behavior" (p. 15). Rosenshine and Furst (1973) in their discussion of the selection of variables for future observational studies recommend the use of both high-inference and low-inference variables together. They advise using student questionnaires as the source of high- inference measures and tape recordings as the source of low- inference measures. In examining the secondary validity of the ISIA, high-inference measures (questionnaires or opinionnaires) are used as a criterion measure to compare with the ISIA data (low-inference measure). The procedure for gathering criterion measures involved the collecting of opinionnaire data relevant to particular taped IPL class sessions. The collection of data involved taping IPL class sessions and at the conclusion of the class session having 123 the participants immediately fill out an opinionnaire (Appendix B) on the group interaction. During the summer term, 1974, both students and instructors were asked to fill out the opinionnaires. Five classes were tape recorded the last three weeks of the term.1 The twenty-seven recordings are, in effect, fifty-four observations since each tape is a two-sided cassette, forty-five minutes per side. The opinionnaire was designed so that the forty-five minute halves of the tapes could be examined independently: the first twelve questions of the opinionnaire refer to "the fiy§t_half of the session: and the second twelve questions refer to "the segggg_ half of the session." In the present study, the primary reason for collecting the criterion data is to contribute to the examination of the discrimi- nation ability of the ISIA. The purpose of the ISIA is to add objectivity to the subjective reports of the participants or observers and for this objectivity to be valid, it must relate to the subjective reports. One aspect of a discrimination index is the ability of a scale or test to distinguish accurately between extremes (Ebel, 1972). With a test, the discrimination index examines good and poor testees according to some standard, for observational instruments the judgment rests with the instrument's ability to distinguish extreme examples of the interactions or behaviors under investigation according to a standard. In the present study, the standard is the subjective reports of the group participants. 1A sixth class was recorded for two weeks but because of missing tapes and opinionnaires, it was excluded from the sample. 124 The opinionnaire data were used to choose extreme groups using question #8 ("In comparison to other IPL sessions to date, the jjy§t_half of the session was: (1) One of the best, (2) Above average, (3) Average, (4) Below average, (5) One of the worst") and question #20 (identical to #8 except reference is to "the second half of the session") as the criteria, the mean scores of the responses to questions #8 and #20 were rank ordered to choose the upper and lower 10% as the extreme groups. At first, the choice of the extreme groups was to be simply the t0p five and bottom five in the ranking. An examination of the ranking showed very small mean differences among the extremes, so an additional criterion was used in choosing the extreme groups. The use of a subjective opinionnaire leaves open the possibility of a certain halo effect in the ratings: One group might see every session as "one of the best," while another group might see every session as "one of the worst." For this reason, it was decided to use IPL sections to choose the extreme groups, selecting the group sessions rated highest and lowest among each of the five IPL sections. This resulted in extreme groups which were very similar to the original upper and lower 10% rankings. The data for the extreme groups consisted of one audio-tape recording of approximately forty-five minutes per session (with five sessions in each extreme group) and the opinionnaire data for each of the sessions (the number of opinionnaires collected per session varied from ten to sixteen). The effective group consisted of five sessions, two of the recordings were of the first half of 125 the session, and three were of the second half. The ineffective group consisted of five sessions, all five ineffective recordings were of the first half of the session.2 This difference is probably a function of the feeling that most group members have that it takes a certain period of time for "things to get warmed up." A_t-test was used to examine the opinionnaire data in reference to mean differences between the effective and ineffective groups. The results supported the hypothesized differences originally stated in the research proposal; the extreme groups were shown to be significantly different in relation to questions #8 and #20,3 the results being significant at the .001 level. This statistically verified the choices of the extreme groups. The group members did, in fact, perceive the groups to be different. The questions dealing with time spent in the affective domain (question #1 and #13) showed a significant difference for extreme . groups at the .01 level. Group members perceived the effective sessions to have spent more time in the affective domain than the ineffective sessions. Although the questions related to active listening did not show significant statistical differences (the differences being at about the .1 level), they did point out a ZOne of the sessions originally chosen as part of the in- effective sample had to be replaced because of taping difficulty. 3For data analysis parallel question data on the opinion- naire (e.g., 8 and 20, l and 13, etc.) were combined in the analysis. The questions are identical except for the reference to the first or second half of the session. 126 difference worthy of examination when comparing the ISIA data. Group members tended to perceive more active listening in the effective sessions than in the ineffective sessions. Group members also perceived the effective sessions to be more genuine (question #9 and #21), relaxed (#10 and #22), constructive (#11 and #23), and involved (#12 and #24) than the ineffective sessions. These questions cannot be compared to the ISIA data for validity but they do add descriptive data, lending support to the contention that the groups are, in fact, different. The original hypothesis related to self-disclosure (#2 and #14) and feedback (#3, #6 and #15, #18) were not supported. These results were used as the criterion in validating the ISIA. Group members perceived more time being spent in the affective domain in the effective groups when compared to the ineffective groups. They also perceived more active listening in the effective groups compared to the ineffective groups. For the ISIA to be a valid instrument for evaluating communication skills, it must reflect these same differences. Group members' opinionnaires are very subjective reports and this problem is compounded by the fact that students are not experts in communication skills. For this reason, an additional criteria was used to validate the opinionnaire data. The opinion of an expert in the field of communication skills was used to validate the group members' opinionnaire data. The ten tapes of the extreme groups (five tapes of effective sessions, five tapes of ineffective sessions) were randomly ordered to be 127 listened to by the expert. The expert was told that the ten tapes included five effective and five ineffective sessions as judged by the group members. No indications were given as to which tapes were effective or ineffective and the expert was instructed to listen to the ten tapes in the order they were numbered. Following listening to a tape, he was instructed to fill out the opinionnaire sheet for each of the tapes and after listening to all the tapes, he was to rank order the tapes from most effective to least effective. The expert opinion data lend support to the validity of the opinionnaire data. Four of the five effective tapes were ranked effective by the expert and four of the five ineffective tapes were ranked ineffective by the expert. The twp tapes which were judged differently by the group members when compared with the expert may be explained in light of the expert's written comment on one of those tapes, "This (tape) is hard to rate because there was some excellent data collected--I feel as though the potential for an exceptional group was present, but confrontation was needed." This particular tape was of the first half of the session and because the group members rated the session after experiencing the second half of the session (an experience the expert was not exposed to) this could have influenced their ratings, the "potential" may have been realized in the second half. This tape was rated effective by the group members and ineffective by the expert. This may be explainable in light of the group members rating the session following the entire taping. The second tape which was rated differently by the expert was rated 128 ineffective by the groUp members and fifth in effectiveness by the expert, very close to being ineffective. The expert opinion data supported the group member opinion- naire data and also shed light on another important area of interest. Each small group session is different and although it is possible and important to examine what behaviors constitute an effective group, it is also valuable to examine the behaviors demonstrated in each individual session within those samples labeled effective and ineffective. This is pointed out in the opinionnaire data where all but one of the effective sessions may rank high on one particular question. That one group is effective but for different reasons than the other four groups. The analysis must examine and describe that difference. Data Analysis Reliability All reliability estimates will be analyzed using Scott's n (Scott, 1955). For each reliability estimate a sum of the number of codes recorded in each category is the unit of analysis. Using Scott's n these totals are used in comparing pairs of observations (observer to observer or live-class coding to taped coding) to estimate to what degree the two observations exceed chance. To estimate the coefficient of observer agreement each of the four observers (the researcher, ED 200 instructor, public school teacher, and student) were required to code one forty-five minute tape. The reliability tape was randomly selected from the tapes 129 ggt_previously selected as effective of ineffective. Each observer was required to code the entire reliability tape, using earphones, in its entirety without stopping. That is, each observer was instructed to code the tape from start to finish without stopping or going back even if they missed a statement or section. This insured similar conditions for all observers. In addition to the four observers listed above, it was decided that an observation with corrected codes should be generated. Given unlimited time and the opportunity to change his/her codes an observer could generate what could be called a corrected code observation. This was done in the present study by the researcher who coded the reliability tape a second time. The second coding was corrected by listening to the same tape a number of additional times, stopping when he felt it was necessary, and using a stop-watch to insure a coding tempo of one code every five seconds. This coding represents the ideal coding, or the coding the researcher felt was perfect given unlimited time and the option to make any and all changes. Using Scott's n (Scott, 1955) the corrected observation and the four observer observations were compared to produce a correlation matrix of reliabilities. To estimate the loss of information due to using tape recordings as opposed to in-class observations two observations were made. During the Spring term, 1975 the researcher coded an IPL section while tape recording the same group: three months later the tape recording was used to code the group. During the summer Term, 1975 the researcher coded a different IPL section 130 while tape recording the group: seven months later the tape recording was used to code the second group. The conditions for the taped coding were the same as stated previously, the observer was not allowed to stop the tape or to go back and correct any codes. Scott's n was used to estimate the in-class versus taped reliability of the observer in relation to these two groups. To estimate the within-class stability and compare that to the between-class stability the codings of ten groups (5 pairs, validity tapes) were divided into two minute segments. To estimate the within-group stability the two minute segments within each group were combined to form the sum for each category of the odd two minute segments and the sum of the even two minute segments. These split-half (odd-even) category sums were used to estimate the within-group stability using Scott's n. To estimate the between-group stability the category sums of the odd two minute segment for groups one to five were compared with the category sums of the odd two minute segments for groups six to ten (by pair, 1-6, 2-7, etc.). The results of these computations were an odd-even stability coefficient for ten groups and five between- class stability coefficients. These were examined and the results explained in terms of within-class versus between-class stability. Validity Before beginning a discussion of the validity analysis it is important to examine the types of data to be analyzed. Unlike the reliability estimates, the ISIA data used in the validity 131 analysis were not simply a frequency count of the categories used. The ISIA validity data was based on the ISIA codes recorded by the researcher on the five effective groups and the five ineffective groups. These codes were then used to generate a matrix, on each of the ten groups, which reflects the chronological nature of the data. The frequencies within the cells of the 25 x 25 matrix (see Appendix I) shall be referred to as category pairs. These pairs allow an investigator to state what categories proceed and follow any other category. For example; in group 1 (see Appendix I) a 7 occurs in column 0, row 0, this indicates category 0 is followed by category 0, seven times. Also in group 1 an 8 occurs in row 36, column 0, this indicates that category 36 is followed by category 0 eight times. Of‘a possible 625 category pairs (25 x 25 matrix), 81 category pairs had a frequency of at least three in at least one group. The frequencies of these 81 pairs, in each of the ten groups, were the unit of analysis for all the validity statistics. Since the comparisons to be made were between the effective and ineffective groups and because it could be assumed that groups are effective because they include the objectives to be demon- strated in the group, it could be stated that some categories (e.g., active listening, self-disclosure, feedback) were more preferred than others. Siegal (1956) notes this type of relation- ship in defining an ordinal scale, "It may happen that the objects in one category of a scale are not just different from objects in other categories of that scale, but that they stand in some kind of relation to them. Typical relations among classes are: higher, 132 more preferred . . ." (p. 24). Therefore the data to be examined in the validity study are on an ordinal scale and non-parametric statistics, particularly rank-order statistics, would be most appropriate. The establishment of the validity of an observation schedule is a difficult and complex task. This is particularly true when one considers that a given system cannot be said to be valid or invalid, rather only degrees of validity can be supported. Assuming that the ISIA showed a high degree of observer agreement, a high correlation between in-class versus taped observations and a stability coefficient which supports greater within class stability than between class stability, there would be support for face validity and construct validity. To further support the construct validity of the system and to demonstrate the concurrent criterion- related validity of the ISIA a number of questions will be posed. The extent to which the researcher can answer and explain the following questions is the degree to which the system can be said to be valid. Question 1: Is the ISIA capable of even the most basic distinctions? That is, can the ISIA distinguish differences between the effective and ineffective groups, irregardless of what those differences are? As previously noted the participant opinionnaires and the expert opinionnaire showed a significant difference between the effective and ineffective groups. The most basic distinction the ISIA must be capable of making is to show a significant difference between the groups judged to be effective in comparison to the 133 groups judged ineffective. This question was answered by using the non-parametric Wilcoxon matched-pairs signed-ranks test (Siegal, 1956). The Wilcoxon test was chosen to compare the effective versus ineffective because the comparison is between two related samples. To perform the analysis the sum of the five frequencies (one from each group) within the effective group for each of the eighty-one category pairs, became the effective group data. The same procedure was done for the ineffective groups. This resulted in two related samples, 81 category-pair sums for the effective group and 81 category-pair sums for the ineffective group. The Wilcoxon test was used to test the null hypothesis that there was no significant difference for the frequencies of various categories between the effective and ineffective group. A two-tailed test of significance was appropriate as no inference could be made as to which group is "better." A significant result would give no indication as to what the differences were, only that there were significant differences in relation to the categories used.4 Question 2: If there were significant differences between the effective groups and ineffective groups are those differences related to the category-pairs which represent the objectives under study in those groups? That is, were the differences between the effective groups and ineffective groups due to category- pairs which represent self- disclosure, active listening, feedback, and the affective domain? To show that an instrument is reliable gives support to the instrument's primary validity. But if it could be shown that the 4All validity statistics were performed using Indiana Uni- versity' 5 Statistical Package for the Social Sciences (SPSS), particularly the nonparametric statistical package (Tuccy, 1974). 134 difference between groups were a result of the demonstration of the group's objectives in one group while in another group those objectives were not demonstrated, then that would accord greater support for the construct validity of the instrument. This is particularly true in the present study, if it could be shown that the effective groups demonstrated the previously mentioned skills more frequently. This question actually asks a number of questions that will be explored in this section. There are four areas that were examined. These four areas relate directly to the objectives for the group and can be phrased as questions. 1. Is there a significant difference between the groups in relation to self-disclosure? 2. Is there a significant difference between the groups in relation to active listening? 3. Is there a significant difference between the groups in relation to feedback? 4. Is there a significant difference between the groups in relation to the amount of interaction in the affective domain? These four questions were examined in three ways. First it was important to examine whether there was a significant difference between all ten groups, secondly whether there was a difference between the five effective groups, and thirdly whether there was a difference between the five ineffective groups. These questions were examined by the non-parametric Friedman 135 two-way analysis of variance (Siegel, 1956). In reference to the Friedman ANOVA, Siegel (1956) states, "When the data from k.matched samples are in at least an ordinal scale, the Friedman two-way analysis of variance by ranks is useful for testing the null hypothesis that the k_samples have been drawn from the same popu- lation" (p. 166). In the present study the Friedman ANOVA was used to test whether each of the 81 matched samples (81 categories) are randomly ranked within the ten groups (null hypothesis) or whether some groups consistently rank higher (in comparison to other groups) on a chosen set of categories. Unlike the Wilcoxon analysis (which used all 81 categories) the analyses using the Friedman ANOVA used only those category pairs which related to the objective being examined. For example, in examining self- disclosure the category pairs 36-36, 0-36, 32-36, etc. were used. The Friedman ANOVA was used twelve times to examine the four objectives under three conditions in the following way: all groups for self-disclosure, active listening, feedback, and affective interaction; effective groups for self disclosure, active listening, feedback, and affective interaction; ineffective groups for self- disclosure, active listening, feedback, and affective interaction. The sums of the category pairs for positive and negative feedback were combined to form feedback because the categories for feedback were recorded so seldom. If differences were found in the direction of more of these skills being used in effective groups this would support the construct validity of the ISIA. If differences were found between 136 the effective and ineffective groups the next question must compare those differences to the opinionnaire data. Question 3: If differences are found between the effective and ineffective groups on the objectives, do those differences correspond to the opinionnaire data? To demonstrate the concurrent criterion-related validity of the ISIA the category-pair frequencies should relate to the ratings of the group participants and the expert. This caused something of a problem in choosing a statistic to correlate the ISIA codings to the subjective judgments of participants and expert opinion. The most obvious problem centered around the fact that the participants ratings were based on only the groups they participated in, that is, different participants were judging each group and they may not have been using the same criteria in their judgments. This problem was not the case in the expert's judgments but his opinion- naire contained a large number of tied scores which also created problems. A simple (although perhaps not entirely statistically sound) solution to this problem was to add the participants ratings of each group on each skill (active listening, etc.) with the experts opinionnaire ratings, creating a score for each group on each skill. These scores reflected a higher weighting of the expert opinion (he had observed all the groups) but allevaited the problem ‘of ties in the experts judgments (there were no ties in the participants judgments). The opinionnaire data scores for each of the ten groups on active listening, self-disclosure, affective interaction, and feedback (positive plus negative) were then rank ordered. Data from the ISIA matrices were summed for each group 137 on the category-pairs for active listening, self-disclosure, affective interaction, and feedback and these sums were rank ordered. The opinionnaire data rank orders and the ISIA category-pair rank orders were compared using the Spearman rank—order correlation (Bruning and Kintz, 1968) to examine the relationship between the opinionnaire data and the ISIA data. Answers to the three questions posed above will indicate the degree to which the ISIA may be said to be valid. Another important addition to analyzing the communication patterns within groups may be matrix interpretation (Flanders, 1970). As previously stated the categories of the ISIA may be used to generate a matrix (25 x 25) which can be examined for major and minor patterns of communication. To demonstrate the usefulness of these matrices in analyzing groups, a matrix for each of the ten groups was generated and a flow chart (Flanders, 1970) drawn to graphically illustrate the major and minor patterns within each group. Conclusion The question raised concerning the reliability and validity of the instrument were subjected to the tests described in this chapter. They represent all the questions posed earlier in this study. The answers to these questions will be discussed in relation to the usefulness of an observation schedule for evaluating and interpreting the communication skills used in small group settings. Before presenting this discussion, the results of the reliability and validity estimates will be presented. CHAPTER V RESULTS _ Introduction The preceding chapter made note of three questions, first discussed in the initial chapter of this study, which make reference to the two essential ingredients in developing an observation instrument, reliability and validity. The methodology chapter also outlined the specific procedures that were used to address the reliability and validity issue relevant to the development of the ISIA. This chapter will be addressed to those questions and more specifically will present the results of the procedures used to answer those questions. The chapter will be organized as was the preceding chapter; beginning with the results relevant to the reliability of the ISIA, then proceding to the issue of validity, and concluding with a discussion of matrix and flow chart interpretation. Reliability The most basic question to be answered in the development of an observation schedule is the reliability of the instrument. Although many approaches can be taken in establishing the reliability of an observation instrument three issues will be examined in the present study, the coefficient of observer agreement or inter-rater 138 139 reliability, the live (in-class) versus taped reliability, and the stability coefficient. Coefficient of Observer Agreement In the present study the coefficient of observer agreement was examined by having four observers, with a wide range of skills and experience in group work and education, code a single forty-five minute tape under identical conditions. The four observers were the researcher (hereafter referred to as R), an E0 200 instructor (F), a public school teacher (C), and an undergraduate student (L). In addition to the four observers a corrected code (K) was generated by the researcher. Scott's n (Scott, 1955) was the statistic used to estimate the reliabilities. The results are presented in the form of a reliability matrix (Table 2). Table 2 Reliability Matrix - Intercorrelations of the Five Observations Using Scott's n K R L F c K -- R .88 -- L .79 .80 -- F .72 .75 .77 -- c .78 .77 .72 .72 -- 140 Table 1 presents inter-rater reliabilities for the five observations which range from approximately .70 to .90. The question now arises as to how reliable an observation schedule should be to be considered acceptable. That is not an easy question to respond to because the answer may well depend on how the system will be used and what form of evaluation it will be replacing. For perhaps just this reason very little mention is made in the literature in reference to acceptable levels of reliability. Flanders (1967), in referring to the training of observers in use of his schedule (FIAC), notes that a Scott' coefficient of .85 is a "reasonable level of performance." This gives some indication of what to compare the reliabilities in Table 2 with, but the .85 level of reliability cannot be used as the sole standard of comparison. Flanders' FIAC is a well tested schedule with very low inference categories. This is not true of the ISIA. The ISIA is a new schedule that requires a higher level of inference by the observer in some categories and the ISIA would be replacing a subjective evaluation which has no known reliability. Taking these points into consideration the researcher feels that the ISIA has demonstrated a reasonable level of reliability and although some modifications of the system will be explored (see Chapter VI) generally it may be stated that the ISIA has been shown to be a reliable measure of interpersonal communication skills. It may also be said that althoUgh there are differences in the degrees of reliability between observers, the ISIA may be used reliably by a variety of observers. It is obvious that the researcher 141 has a higher degree of reliability with the correlated code (in a sense, intra-observer reliability) than any other observer. This may be a result of the intra-observer nature of that reliability but the greater reliability may also be explained by the familiarity of the researcher with the system. Any other explanation for the differences in the levels of reliability would be speculative at best and will not be explored. Live Versus Taped Reliability The second reliability estimate that was examined related to the possible loss of information due to tape recordings. It was felt that because interpersonal communications relied so heavily upon non-verbal cues that tape-recorded observations might be much less reliable than live, in-class observations. Although it is impossible to decifer which is the more accurate, an in-class observation or a tape-recorded observation, Table 3 shows the coded observations to be very similar. Table 3 In-Class Versus Taped Observations Observation I - .72 Observation II - .79 Both reliability estimates are calculated by Scott's n and are based on an in-class observation followed much later by a taped 142 observation. In Observation I there was a three-month interval between the in-class and taped observation, for Observation II there was a seven-month interval. The lengthy interval between in-class and taped observations insured against the observer recalling the categories used to code particular interactions. One additional point should be made in reference to Observation II. For six months prior to the taped observation (the last six of the seven months) the observer did not code a single group and did not refresh his memory concerning actual coding. This certainly gives support to the notion that although the ISIA may be difficult to learn it is not something that is.easily forgotten (at least not for the researcher). The reliabilities presented in Table 3, although not extremely high, do alleviate a great deal of the concern over tape- recorded observations. The two reliabilities presented do not differ significantly from the inter-rater reliabilities (Table 2). One question which does arise out of the design of the in-class versus taped reliability is: What if the in-class observer and taped observer had been different people? .This question was con- sidered in planning this design but because of the inaccessibility of observers this consideration had to be excluded from the design. If future research were conducted this should be an important con- sideration in designing the research. Stability Coefficient The two previous sections have shown the ISIA to be a reasonably reliable instrument. But what about the behaviors 143 being categorized: are there differences between groups? and are the behaviors stable within a group? To answer these questions a split-half reliability coefficient was proposed in Chapter IV. Although the split-half reliability coefficient proposed is not an established procedure it was one way of addressing Medley and Mitzel's (1963) concern for a stability coefficient in estimating reliability, while still taking into consideration the lawful changes in behavior due to differing objectives between group sessions. Table 4 presents the stability coefficients using Scott's n to estimate the reliabilities. Because the proposed stability coefficient is not an established procedure there are no statistical methods to verify that there is greater within-group stability than between-group stability. But an examination of Table 4 certainly lends support to this hypothesis. In every pair of groups except one, the within-group stability is at least twice as reliable as the between- group stability. The one exception (Groups 2 and 7) may be explained by the similarity of the groups (the expert opinion rated groups 2 and 7 the same on six of nine questions and the flow chart of ISIA category-pairs, Appendix I, shows the groups to be very similar). The stability coefficient is an important yet difficult estimate to judge empirically. The split-half method in the present study lends support to the stability of the behaviors the ISIA is observing but it has shortcomings. In future research the stability coefficient could be more effectively examined by using group objectives as one dependent variable. That is, a more important 144 Table 4 Within-Group and Between-Group Stability Coefficient Estimates Using Scott's n Group 1, odd-even, rxx = .74 Group 6, odd-even, rxx = .75 Group 1 odd-Group 6 odd, rxy = .38 Group 2, odd-even, rxx = .62 Group 7, odd-even, rxx = .69 Group 2 odd-Group 7 odd, rxy = .66 Group 3, odd-even, rxx = .75 Group 8, odd-even, rxx = .84 Group 3 odd-Group 8 odd, rxy = .34 Group 4, odd-even, rxx = .48 Group 9, odd-even, rxx = .80 Group 4 odd-Group 9 odd, rxy = -.21 Group 5, odd-even, rxx = .67 Group 10, odd-even, rxx = .73 Group 5 odd-Group 10 odd, rxy = .12 145 issue to examine would be the stability between groups who have the same objective (e.g., positive feedback) in comparison to the stability between groups who have different objectives. This is not to take away from the information gained by using the split-half procedure, the ISIA has demonstrated within-group stability, but additional methods should be used in future research to explore this issue, particularly in light of the changing objectives between groups. Validity The second issue to be addressed in developing an observa- tion instrument is the validity of the instrument. As was pointed out in Chapter IV, the face validity and construct validity of an observation instrument can be demonstrated by the reliability of the instrument. Now that the reliability of the ISIA has been shown it is important to consider other methods of establishing the construct validity of the ISIA and to examine the concurrent criterion related validity of the instrument. The approach to those issues was established in Chapter IV in the form of three questions. This section will answer those questions by presenting the results of the procedures recommended in the previous chapter and interpreting those results. Question 1: Is the ISIA capable of even the most basic distinctions? That is, can the ISIA distinguish differences between the effective and ineffective groups, irregardless of what those differences are? 'Table 5 presents the results of the Wilcoxon matched-pairs test which answers the most fundamental of the validity questions. 146 Table 5 Wilsoxon Matched-Pairs Signed-Ranks Test - Effective Versus Ineffective Groups Cases = 81 Two-tailed probability 8 .0020 It is clear from the results of the Wilcoxon matched-pairs test that there is a significant difference between the summed category-pair totals for the effective versus ineffective groups. This result indicates that the distribution of the frequencies of the codes within the 81 category-pairs for the effective and in- effective groups is significantly different. This confirms that there are differences but sheds no light on what those differences are. Question 2: If there were significant differences between the effective and ineffective groups are those differ- ences related to the category-pairs which represent the objectives under study in those groups? That is, were the differences between the effective groups and ineffective groups due to category-pairs which represent self-disclosure, active listening, feedback, and the affective domain? Question 2 was first posed in Chapter IV along with the procedures directed toward answering the question. In this section the results of the Friedman two-way analysis of variance by ranks will be presented along with an interpretation of those results. The results will be presented in a fashion which allows an exami- nation of; first all ten groups, next only the effective groups, and finally the ineffective groups. 147 The reliability data already presented supports to some degree the construct validity of the ISIA. The Wilcoxon test has shown the ISIA to be sensitive to differences between effective and ineffective groups. But the most important issue in supporting the construct validity of the ISIA is the system's ability to N discriminate the objectives of the groups. To verify the ISIA's ability to discriminate these skills a Friedman ANOVA was performed on all groups in relation to the category-pairs associated with self-disclosure, active listening, feedback, and affective inter- action. The results are presented in Table 6. The analysis involves the ranking of the ten groups on each of the category-pairs associated with the objective under study. For example, the analysis of self-disclosure involves 38 cases (category-pairs) and the ten groups are rank ordered (from 1-10) on each of the 38 cases. The sum of the ranks for each group indicates the degree to which each group (in comparison to the other groups) demonstrates self-disclosure. Significant results are evidence of non-random rankings between the groups and indicate significant differences between the groups in relation to the amount and type of self-disclosure (different category-pairs being different types of self-disclosure, i.e., different in relation to what precedes or follows self-disclosure). The results from table 6 show a significant difference between the ten groups at least the .05 level for self-disclosure, active listening, and affective interaction and a difference at the .08 level for feedback. This confirms the differences between the groups 148 Table 6 Friedman Two-Way Analysis of Variance by Ranks--All Groups Allggroups, all self-disclosure Group 1 Group 2 Group Rank sums: 229.0 215.0 195.0 Group 6 Group 7 Group 197.5 237.5 188.5 Cases Chi-Square 38 26.1947 All grgpps, all active listening Group 1 Group 2 Group Rank sums: 91.5 94.0 121.0 Group 6 Group 7 Group 98.0 101.5 85.0 Cases Chi-Square 19 17.9914 All groups, all feedback Group 1 Group 2 Group Rank sums: 58.0 76.0 . 55.0 Group 6 Group 7 Group 55.0 65.0 55.0 Cases Chi-Square 12 15.5227 All groups, all affective interactions Group 1 Group 2 Group Rank sums: 111.5 91.0 78.5 Group 6 Group 7 Group 102.5 97.0 86.0 Cases Chi-Square 18 18.3788 3 D.F. D.F. Group 4 Group 5 246.0 249.5 Group 9 Group 10 151.5 180.5 Significance .0019 Group 4 Group 5 115.5 145.5 Group 9 Group 10 86.0 107.0 Significance .0353 Group 4 Group 5 94.0 82.5 Group 9 Group 10 55.0 64.5 Significance .0775 Group 4 Group 5 129.5 126.5 Group 9 Group 10 78.5 89.0 Significance .0310 149 on all the objectives.1 An examination of the rank sums shows these differences not to be random among the ten groups but rather the effective groups consistently have rank sums which exceed the in- effective group's rank sums. The Friedman ANOVA ranks every category-pair by assigning a rank of one to the smallest number and therefore the smallest rank sum indicates the group with the least amount of self-disclosure, active listening, etc. The sig- nificance level of the four analyses clearly points out the differ- ences between the ten groups and an examination of the rank sums shows the effective groups to be displaying each of the objectives more frequently than the ineffective groups. This distinctly confirms the construct validity of the ISIA. To take the analysis one step further sheds even more light on the differences between the groups. Table 6 reports the results of the Friedman ANOVA on all ten groups, Table 7 reports the results of the same statistic performed on only the effective groups and Table 8 reports the results for the ineffective groups. The results reported in these two tables afford the opportunity to examine more closely the differences between the effective and ineffective groups. The results of Table 7 indicates significant differences between the effective groups on active listening, feedback, and affective interactions but non-significant differences for self- disclosure. The results of Table 8 indicates significant differences 1Although the analysis of feedback does not show a signifi- cance at the .05 level it will be assumed to be significant. This was not a formal hypothesis testing but rather an exploratory study in which the researcher defines .08 as significant. 150 Table 7 Friedman Two-Way Analysis of Variance by Ranks--Effective Groups Effectivergroupsgrall self-disclosure Group 1 Group 2 Group 3 Rank sums: 111.5 108.5 99.5 Cases Chi-Square D.F. 38 5.2737 4 Effective groups, all active listening Group 1 Group 2 Group 3 Rank sums: 45.0 47.0 61.0 Cases Chi-Square D.F. 19 11.5789 4 Effectivergrgppsgrall feedback Group 1 Group 2 Group 3 Rank sums: 28.5 37.0 27.0 Cases Chi-Square D.F. 12 9.3167 4 Effectivergroups,all affective interactions Group 1 Group 2 Group 3 Rank sums: 55.0 46.0 40.0 Cases Chi-Square D.F. 18 10.7444 4 Group 4 Group 5 124.5 126.0 Significance .2604 Group 4 Group 5 58.0 74.0 Significance .0208 Group 4 Group 5 47.0 40.5 Significance .0537 Group 4 Group 5 65.5 63.5 Significance .0296 151 Table 8 Friedman Two—Way Analysis of Variance by Ranks--Ineffective Groups Ineffective groups, all self-disclosure Group 6 Group 7 Group 8 Group 9 Group 10 Rank sums: 117.0 139.0 110.5 94 109.5 Cases Chi-Square D.F. Significance 38 11.2263 4 .0241 Ineffectiveggroups, all active listening Group 6 Group 7 Group 8 Group 9 Group 10 Rank sums: 58.5 60.0 51.5 52.0 63.0 Cases Chi-Square D.F. Significance 19 2.1579 4 .7067 Ineffective groups, all feedback Group 6 Group 7 Group 8 Group 9 Group 10 Rank sums: 34.0 39.5 34.0 34.0 38.5 Cases Chi-Square D.F. Significance 12 1.0167 4 .9073 Ineffective grppps, all affective interaction Group 6 Group 7 _ Group 8 Group 9 Group 10 Rank sums: 59.5 57.5 51.5 47.5 54.0 Cases Chi-Square D.F. Significance 18 2.0222 4 .7317 152 for ineffective groups on self-disclosure but non-significant results for active listening, feedback and affective interaction. These results point out a possible difference between self-disclosure and the other skills. Self-disclosure is a skill which is demonstrated to some degree in all groups. An examination of the rank sums in Table 6 would indicate that self-disclosure occurs more frequently in effective than ineffective groups. To verify this difference a Wilcoxon Matched-Pairs test was performed comparing the effective and ineffective groups on self-disclosure. Table 9 shows a sig- nificant difference between the effective and ineffective groups at the .0001 level verifying that the effective groups do demonstrate self—disclosure more frequently than ineffective groups. Table 9 Wilcoxon Matched-Pairs Signed-Ranks Test--Effective Versus Ineffective Groups, all Self-Disclosure Cases = 38 Two-tailed Probability = .0001 This result along with the fact that effective groups have less variance on self-disclosure (more homogeneity, non-significant difference, Table 7) while ineffective groups have more variance on self-disclosure (more heterogeneity, significant difference, Table 8) tends to support the theory that there is a minimal amount (or mastery level) of self-disclosure that must occur in a group for it to be considered effective. 153 Beyond this minimal level there is not a great deal of difference between effective groups on self-disclosure, self- disclosure is a necessary but not sufficient condition for a group to be considered effective. But in ineffective groups the only significant difference between the groups is self-disclosure. It may be that the ineffective groups with the most self-disclosure are the least ineffective (this is supported by the fact that group 7 has the most self-disclosure of the five ineffective groups and was rated highest by the expert in effectiveness, of the ineffective groups). While self-disclosure is the skill which.distinguishes between the ineffective groups (mostly because it is the only skill demon- strated to any extent), in the effective groups each of the other skills shows a significant difference between the effective groups. This supports a hypothesis that effective groups are not all effective for the same reason. Some have a great deal of active listening, other a lot of feedback, while others deal more in the affective domain. These findings are significant and certainly support the construct validity of the ISIA but they are to some extent speculative. These interpretations are construed from statistical analysis and the subjective observations of a group leader (the researcher) and seem to make a lot of practical sense. But to state as a fact that the difference between effective and ineffective groups is self-disclosure or feedback is not possible until further research is conducted. 154 This analysis supports the validity of an instrument to be used in such research. There are differences between effective and ineffective groups and there is strong indication here that the difference involves self-disclosure, active listening, feed- back, and affective interaction. ,Further research is needed. The ISIA is one tool to aid conducting such research. Question 3: If differences are found between the effective and ineffective groups on the objectives, do those differences correspond to the opinionnaire data? In Chapter IV a method to establish the concurrent criterion- related validity was proposed which summed the expert opinionnaire ratings with the average participant ratings and correlated those sums with the ISIA matrix data. A Spearman rank order correlation coefficient established the relationship between opinionnaire data and ISIA data for self-disclosure, active listening, feedback, and affective interaction. The results are reported in Table 10. Table 10 Spearman Rank Order Correlation Coefficient: Opinionnaire Data--ISIA Category Data Self-Disclosure , rho = .70 Active-Listening, rho = .32 Feedback , rho = .85 Affective , rho = .63 With the exception of the active listening correlation, the rank order correlations show a relatively high relationship between 155 the opinionnaire data and the ISIA. These seem particularly impressive when one considers the subjective nature of the opinion- naire data. The opinionnaire data were not originally ranked by the expert or participants. Rather the participants rankings were generated from the opinionnaire data which is based on raters (participants) who had not observed all the groups. The expert observed all the groups but his ratings had many tied scores (a number of cases where four of the ten groups were rated the same). These two problems were alleviated to some extent by summing the expert and participant ratings but this probably weakened the power of the rank order statistic. The exception to the high correlation (active listening) may reflect a difference in definition between the expert and the researcher. In the rank ordering of the groups on active listening the expert's ratings ranked group 3 in tenth position while the ISIA data rank group 3 second. This one difference in rankings lowered the correlation from .71 to .32. The difference in the rankings of group 3 may be a function of the types of interactions in group 3. Group 3 was a very cognitive group with a high frequency of offering information and numerous questions related to the information offered. In the ISIA category system, continued questioning of information is coded active listening. In questioning the expert it was found that he did not consider these questions active listening. This further supports the sensitivity of the ISIA. An expert who uses a different definition of even a part of one category will dramatically effect the results. 156 Generally it may be said that the ISIA has demonstrated a high level or reliability and validity. In Chapter IV a helpful aid in analyzing and interpreting the interactions in groups was mentioned. To close out this results chapter a brief explanation of matrix and flow chart interpretation will be presented. ISIA Matrix and Flow Chart Interpretation Up to this point the ISIA has been used as a source of data to be used in statistical analyses to verify the reliability and validity of the instrument. But the ISIA can be used in other ways to examine the processes of small groups. One of Flanders' (1970) greatest contributions to the field of observational studies was the interaction matrix. As described earlier, the matrix allows an investigator (group member, facilitator, evaluator, researcher, etc.) to abstract the patterns of communication within a group from the content of the group. This is a valuable tool in comparing groups because an observer can be biased by the content of a group (i.e., what is talked about), and there are times when the process is more important than the content. _ In lieu of an abstract discussion of the value of matrix and flow chart interpretation this section will use a concrete example to demonstrate the kinds of interpretations that can be drawn from the comparison of two flow charts. The reader should be aware that every group is different and the discussion to follow is only one interpretation of two very dissimilar groups. Other groups would raise different questions. Other investigators might 157 even arrive at different interpretations of the same groups. But the generation of matrices and flow charts simplifies what can be an overwhelming amount of data into a manageable picture which points out significant patterns of interaction. This is not to say that the role of the expert observer (albeit, subjective observer) will not be needed in the future. The ISIA matrix or flow chart quantifies the interactions but cannot interpret the quality of the skills used. The interpretation made in this section will be based on the flow charts of an effective group (group 4, Figure l) and an ineffective group (group 9, Figure 2). These groups are the same IPL section (same group of participants) and represent the most effective (group 4) and ineffective (group 9) session tape recorded during that term for that group. The flow charts were generated from the matrices for each group. Flow charts are best used to interpret the amount of time devoted to particular categories and the communication patterns (or directional flow) within a group. This interpretation is facilitated by three aspects of the flow chart: Each box within the flow chart depicts a stable cell of the interaction matrix. A stable cell being a category-pair (e.g., 35-35, 22-22) which represents a category code followed by the same code. The size of the boxes in the flow chart depict the frequencies of the category- pair. The second aspect of interest in the flow chart is the arrows connecting the boxes. The arrows represent the transition cells within the matrix. A transition cell indicates the frequency of 158 ~Eggunnw -Lbuaau.22ue~nnp~ ~fi§gu1mmco52aL.£fimn¢ngac 16-16 Num- 1 1-1 I H-17 71° 27. Nfi’b 9 g C 4m i 3 W55 ”,9“ ‘1“. 1+ n| (I I no go n-t‘s . . L_:...0..:x0 .000. 025000.00... 000... 5...} :. 0:502... 0...0000 0... 0:0 .0305... 0.0022. 0... :. 00.2.0020 .0... .30 0.20.. >..00.0 2.022. 05 .0. 3.0.0. 0.0.. .0 02.85 0 0>.0>... >05 0005. 050050 0.0022. 0... 0. .:0>0.0. 000:0..00 .08 .0 .3500... .0m::00. 2.6000 0.00.539. >.00.. 2:32.20. 0... 0:0:.m:0..0 .0... >03 0 :. 005.00. 000... 00.00.5553 0:0 .0.>0..0.. :26 0... 0:0 83.000: .0 0.....000 0.0 >0... .052.) 005.00. 0... 50.30.. 00:0... :00 00.2:0m 0 030:0 .0020: 0... 50.. >..0_...0000 0:0:00 >.0>..00 0:0 0.0005 85:80 38:05. 0. .0.0._03 0 .0020: 0... 0. 003.5 .500 0. .0.m:.:005 0:0 0......00. u:.>..00:: w:..>...:00. >.. 5:00 .020: 05 :0...) :. 00:0000. < l‘l‘l‘ o6 608000. 0 .022. 05 :. 5.2.5... .0 00.0..0 0.0 .05 3:020:00 0... .0 3.2.0. 05 >0 000.3 0.. .005 .80. 2.. #005002 0.0. 0. :00: 0. 0.000 05 :05, 0.0.0.05. 80:008. 0350 0 :. 58:80.00. 0.. .0>0. :000 :. 00....0000 0:225:00 05 .0 :0 2:03 .26 = 20.0.... >03 3:00.00 0 :05 .050. .253 0 :. .3 0020.. 2.. 5.; 2:0 .:0..0.0. 0... 0000300.: .000. 000... 20...; :. 22.00.... 05 .00 .50.. .0: 0000 .2. .0.>0..00 0.0022. 0... :. 00.050200... .0 50.002008 0>..0.:0. 000.05 3502.50 0.0022. 2.. 0. .:0>0.0. 000:0..028 .0 5:030... 55.00. :30 0... 0.02.0. .8505 .0.0:0m 0 :. .2..0:0..0.0. 05.0 50500.20... 0... 35.200. 0. 00 00 m::00. .0 8.000.008 0... 0.0..:00 .2. 000:.:0..0 .0 0:20 0: 030:0 .:0..00 0020.. .0. m:. .8050 >.:0:0..00..0 0... .00 .50.. .0: 0000 .3: 2.05.0805. 0... .m:_ ..00.:05500 :. 0....0000 0. .0022. 0... :. m:.>0:0.. 0:0 .0. m:..00 0. :000 0. 0.. .0... 030:0 00.2.00» .0 20.00298 .3850: 0... 500.... .0225 0:0 :0..:0..0 0... 00.00 25.5500 .>.0>..0:...:00 0:..00 0:0 :85... 05000.98 0:0 m:. 0.5... .0 0.00000 .55; .0 50.00 0 00 0020.. 0... 000.5000. .2..0 52.20. 9:20.. 0 «5.02.0 0. 000: 5000 0... 00.00.:05500 3:2 :00 0... 20.0.... .0: 0000 0:0 0020.. 2.. .0 005.00. 000.. :0 0.... 0.02.0. .020: 05 :0...) :. 00:0000. < ‘l‘ 9m mN 2:32.03. :0... 050.30. 0020.. 0... 50.. 0:0..00.:05500 :0 >_ ..0.0_.t2...:0 0.5558 305.0 50... 0. 8.52: 30.0 .0: 000.... .3 .0.>0..0.. 0.0022. 0... :. 00.50020 .0... .00000 .0: 0000 .5853 0.0020: 0... 0. 308.0. 000:0..00 .x0 .0 .0. 5:0... .02....00. :30 0.... u::0.0w0. 0:0..0000 830:0 3.0.... >05 .3 ..00>0. >..:0.:_.._0> .0: 000.120.3050... 00 000.. .3 7:0:0000 0x00 :90. 0020.. 0... 50.. 3.0...0000 3.2.0.. .0 5:05.00 :30 .0 00.50 00...: rad. 30.00298 .00.? 0... :. 2.3000 0. .00. 3.00u 0:0 05.00298 .00.? .:o: 0... :. .0500: 0. 005.00. 0:... 0... 5.... 503.0505 0. .0: $50.0. 0. 0.. 0.0. 00:00:30.0 0500 5.3 500550 8:55 0 :. 00>0..0.. .:0..0:..0 0... 55:30.0}: >..00. 0.0.0.. 00.50 000:0 w:.>.m .0 £03 .00000 0 :. ”5.50000. .0022. 0... m:..0:m. .20.. 0. «£5.00: >0 0020.. 0... 5.; 5050205. 50.. .305... 00.9.5.3 50.00.: 55500 0020.. 0... .05.. 0.8.0... .0 0020.. 0... .0 0w:..00. 000.50 0... 0. 0.58.0 >__0...00 2:0 .022. 0... 5...; :. 00:0000. < [I‘ll ON . u._0..0... 0.0022. 0... :. 00.05020 .0... 020:». .0 0.00000 >.0>.00.0.. .0000: :30 0... .005 0. >_0>.0:.0 .08 :35... 0000.00... .0 ..005... .0000 :33: 0.03.0. .0022. 0... .25... 0. E0... 000: .0 0m:..00. 0... 00... 0. 00... .058. .0.0:0m :. 0020.. 0... 5.3 0.00.0 .0 0:»? 0. .0020: 0... :. .0..0n. .0 :0. m....00 .0 x00. 0 030:0 .0022. 2.. 00.000 .0 00.00.“... .00:.0>0u 5:00.00... .0020... 0... .0 0m:..00. 000.50 0... .0: 52:00 0... 85.0: 0. 00:0..0 .022. 0... 5...; :. 00:00.00. < ‘\.'I( o.p 259 ll APPENDIX D INTERPERSONAL PROCESS LAB OBJECTIVES 270 V.,..-a Appendix D Goals and Objectives for the IPL A. goal; The student will become aware of the interpersonal skills of communication necessary for constructive social-emotional growth, assess the effectiveness of these skills, and describe the transfer of these skills and processes to teaching experi- ences . Objectives: l._ To learn and exhibit self-description skills. The student will be able to share his own ideas, opinions, and feelings about himself as it relates to his perceived willingness and readiness to (l) teach and (2) explore, respect and be responsible to himself and others. 2. To learn and exhibit listening skills. The student will be able to not only re-state what has been said, but also to relate the feelings and intended meaning of the speaker to the speaker's satisfaction. 3. To learn and exhibit questioning skills. The student will be able to seek further information or clarification for self and others without cueing a particular response. 4. To learn and exhibit observation skills. The student will be able to recognize and interpret, through description and explanation, devise modes of non-verbal expression, i.e., hands, face, arms, etc. 271 272 To learn and exhibit responsible feedback skills. The student will be able to relate honestly his feelings about another person (to that person) in an effective manner. a. Discriminate between feedback that is responsible in intent from feedback that is irresponsible. b. To decide a course of action based on the feedback given and the evaluation of that feedback, i.e., to change self, to accept differences, etc. APPENDIX E FLANDER'S SYSTEM INTERACTION ANALYSIS 273 m:-.—-—--:§ #0379 0'! 10. Appendix E Flanders' System of Interaction Analysis (FSIA) Accepts Feelings Praises or Encourages Accepts or Uses Student Ideas Asks Questions Explaining or Informing Gives Directions Scolding/Reprimanding or Defending Authority Student Talk - Expected or Predictable Response Student Talk - Initiated Response No Talk/All Talk 274 W"-_—~—.—m APPENDIX F OBER'S RECIPROCAL CATEGORY SYSTEM (RCS) 275 . ‘LT' - '_.’.'"." ' Category Number Assigned to Teacher Talk 0.: OOGNO‘U‘l-D Appendix F The Reciprocal Category System (RCS) Harms (informalizes) the climate Accepts Amplifies the Contribution of Another Elicits Responds Initiates Directs Corrects Cools (formalizes) the climate Silence or Confusion '276 Category Number Assigned to Student Talk ll 12 13 14 IS l6 17 18 19 10 APPENDIX G ISIA CATEGORIES 277 wai‘ufl-‘fi - lla llb l2 l3 l4 l5 16 17a 17b 2la Zlb 22 23 26 27a 27b Appendix G 33 Possible ISIA Categories 278 31a 31b 32 33 34 35 36 37a 37b 41a 4lb 42 43 46 47a 47b “w. 7.; A.‘ ,1.A:il"h.'~n..’2‘ l . J APPENDIX H ISIA OBSERVATION SHEET 279 \ - O SeCtion number Instructor Date of coding Date Observer Page # _ 280 mw‘figfxsmatm APPENDIX I VALIDITY GROUP MATRICES AND FLOW CHARTS 281 W‘cg-‘aa n\.!l‘ -0 z.” .i ..-e 1945 H.“ I'b'l?) N‘t lb -m7'm«um Appendix I: lb.--Group 1 (effective) Flow Chart. 282 283 Appendix I: 2b.--Group 2 (effective) Flow Chart. 284 144; H- 1? 3 k1 ‘ 0 c Pf '9 C o 5 Y ‘32,: l-Ib moo Y m4 m5 >\ ”a 0'0 51 + How I h-4- Q c o 9 0'8" 11C i.e.-99 . D “'39 \ N 4' ‘ ‘ n-Q 1 al.-ea. N'b 9|: 3?: 32' ['00 9‘ N in“: 'a 7 I h! J: o’ . x row» - H'bb "Lib-ta men are: .. 9:4» ‘ m0 m6 ’ " ofiruoeH-r-Wu. menac- A 'W-m mW' \ Appendix I: 3b.--Group 3 (effective) Flow Chart. 285 ggggégggg-btmnwm3.22uaunaa- -[§én:ounL-:amu_qumnrnua- 5'15 N"!!!- l l-l I Hv 1.1 4.1-2.1. we. Appendix I: 4b.--Group 4 (effective) Flow Chart. 286 'mlumm- Menu. W e . Mm ' fiau. mo ms .H-m. = :0 é : l ll-H :4-14» 54 t g; E- o W 0 I9 e b} V 4 u l i” 0 ° j I : '5 W 00.4) we): hfw 90-96 4 nslo '_€J_'_moEN-rv Nausea. m - Appendix I: 5b.--Group 5 (effective) Flow Chart. 287 » 2W - . 5m. - lb-IB was “'16 l d—l-‘fi-b mi 9 r 6 C 9 c': 12.-Ia. ) ”"2- N-l'L 122-15 DID N"; J: A Q \0 8 (1' I50 Q {- n-b a-a .o '11-; ’ ’ NM- — N . c u-a 8 W00 - n- o!) °' 99‘” R km (8% a R E «Y a“ 3 Y évc g 3 £2“) a . a 9 v: = a E 3% was}. . a F H367 40'4“ > hf“ 14.-w . N'QI , urn. n-a. ( 40' ab [1.19 fir” “'00 ‘ now ' ma ’ °£33£3y11;19e$2uu=_ézxanzgu :Janéaait;:2su=_éaaainsa; Appendix I: 6b.--Group 6 (ineffective) Flow Chart. 288 9‘1"!" V Appendix I: 7b.--Group 7 (ineffective) Flow Chart. 289 1 ‘PSWTPI¢*C%£$»12§2£EEZL_ .' txunuz- - 1945 um ‘1‘: o. b 6 fur-211 ~61 war. I flafi I ’, [av-v.5 an» I ' n.4,. nu” ‘ 5 21:74:43qu Nmeku. ammo- -5ruoa~rr -§i<|u. W Appendix I: 8b.--Group 8 (ineffective) Flow Chart. Ii‘l‘fll‘ll‘l (1.1. II 111 ll. 11. ll: I Ill.1 l I (I'll 290 fiwLim-oe- W" " -Wm 545 #460 + 9x0 9 ° u a 8 C t: C $ 2 wow o-w; N 7 m+na9 I f °'3° 0'0 ~Y" L140 ‘59-» N!” l? s 9 C y 3 Q E 95w H-‘IL eras , u-H I 90-99 ”~96 ' nub we r i Appendix I: 9b.--Group 9 (ineffective) Flow Chart. V Win-I r- .2: ‘- A 291 WATOK.’ WU. M0 o m. . . *' ri-n ‘ . ":13 m J W '54; I540 \ use . _ 'k + ’30“ i n (1 i s- M b é um Y 5.: s‘ e e .v; . + (\O '_"—_> 5&9 H-Qa era-2:2. HMO fi-bl 1+7 Al: q. " 0.1 o, % al 45,55 99..” 6 e H0155 n-a f q. R,“ 1 hr» . «r45 ”'80 9:.9:—>— MD b 40 4&- j—J’ Ea mi I 10-65 wan . “'9 m4 _fiiogangILLchamuEuziutgzgz: -€bnx25¢r-€3m4_ - Appendix I: lOb.--Group lO (ineffective) Flow Chart. "I1LTTWILITIITITATTTATTT