.ranggaqmammuau r W: " .m w! ' :"i'j'j' “19.1119 1 I 1 1" - 1111111115 11111111 CI‘! 1 9119111111 111 1..“ m... ‘m “r « ‘ 2:: 1 I . ' J: '; _... .§§¢-§ £1 :.-'—‘ «V' ‘ it: “M . . a3... .1: . - .4 . 23:3:— rfifidg F 34%; - —; ‘a.-:— .7; L" ‘ x‘ _ £13}; ,- : «ae-‘H .. g. . «fr-44 4?? 3"" fl“ ~—~:§‘—I fig. . 3V . , .' .‘ v“: 2%; f ‘ ' 2 . _ .- fitn-I-mgfi .:~mAr . ; , .-@&- _ 4 3 ’52::— “it—gif— M.’ .é; - “2* _M_;,.. I ~_.Afié “ v - .22 “ ' "IS: .89 “.19.... 1.... “ m; .4 . T —.-1::<' '—' .- 1 _. “1-91...- “QM W ,4. {Kg “aw T w ,3” :59 - - 9x 3... 3g) ‘ 2..-..- 1 41' wk: . W! ~ _-‘ “tutu-2‘. 1 1... _'.= 2 "vi- q. “-7 -:J.‘ 1111"": 1‘ ‘r'hg 11.31”; wt. 1’111’n11 :11... 1,311‘111 .I‘u I . 1 l: . 1 _. ' DJ E I 2111;. ~3 ~ 111111-15 ' .. . w w I. , l. _ 1111""' 151,131 I 1‘ 1,1; ”11111111191139. M‘r- "m; 4:}.2“,~‘ ~4- ' ~24- 35523—35: Emmm. ;. .z‘;‘:.r~' ... -swr-‘W: '1' '—_ cue-rm 5:... AM 2.. .H -»:....m"-— r a. 'm @3253;- - . 3' ’ {g r.“ ‘9‘“ .. 7.. £5 .4. ’35:?! .5. ”‘ ' ~ 23:5 ._ ”if 4 ,. .- ... A 4 .., . CH 131111.;- 1111, ”:11" u 91:. 1; :11. :‘a"?\. ‘* “zit. ”we” .ri“? fix {T 5- ' 1‘11“” . 33.1” A. d " 2‘51”- 1.1101 1111111 “1 -;..._ ,1; 331:1 '3! many: 11 1% I ‘ 111:5: 111 4 ~23 . ,. . :w‘, m- 1111113111 11111111111 11 1 eeaxamzfi 31—. firing" A {Lg—ct; . 1‘ :r"-' ‘ '51“ v .' __ m "' ; “7“" 01,; .2. mm W” '-;,...... w: .1? "'51'111' "1. (1 :lg": "11 ’1' U? 11 ”'3 1.7”! ('1 : ‘1. WWWA“. a 1., if: '3'“- 151" 111112,} ‘ 11.111: 11511 N.» A. $1.11 "€14 ' fikxx'ni‘fi‘l’é'u ”‘1’ $111113 .129 1; z ‘u'L 3%? ‘ég'hiflfim'i? ’1? ”%w$mfi “11% ‘l 11" 1.. , '\. . ,"' HWM '41 1 u 1-3113") ’1‘. . LIBRARY Michigan State University ~—-—- This is to certify that the dissertation entitled EVALUATING MEDICAL ETHICS TEACHING presented by Kenneth Ross Howe has been accepted towards fulfillment of the requirements for Ph . D. degree in Philosophy and Teacher Education , / . ,7 _. ”7 A I flute/W fl Major plgfessor" Robert E. Floden Date April 25, 1985 MS U is an Aflirmatiw Action/Equal Opportunity Institution 0-12771 MSU LIBRARIES \— RETURNING MATERIALS: Piace in book drop to remoVe this checkout from your record. FINES wil] be charged if book is returned after the date stamped below. EVALUATING MEDICAL ETHICS TEACHING By Kenneth Ross Howe A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Departments of Philosophy and Teacher Education Kenneth R, Howe 1985 To Paul --whose soccer matches were a welcomed diversion ii ACKNOWLEDGEMENTS I thank the members of my committee, Bob Floden, Martin Benjamin, Bruce Miller, and Bob Bridgham for their many useful comments. Floden, the director, was especially dogged in holding this dissertation up to the light, and he no doubt devoted more time to reviewing the various drafts than duty required. I also thank my supervisors in the Medical Humanities Program, Bruce Miller, Howard Brody, and Andrew Hunt, for permitting me the flexibility needed to complete this task. iii FOREWORD The major problem surrounding the evaluation of medical ethics teaching has been too little interaction between ethics instructors and educational evaluators in working out appropriate strategies for evaluatiing instruction; they go their separate ways, each having little idea about the nature of the others' activity. To remedy this problem, this dissertation is aimed at both groups. Although this a reasonable goal, anyone who has ever been involved in interdisciplinary activities knows they create problems of their own. ‘In this dissertation, the attempt to speak to both audiences periodically engenders the difficulty that the same arguments which may appear too simplistic to one group may appear too technical to the other. Though by no means a total solution to the problem, I have tried to lessen it by a liberal use of notes to amplify some points and familiarize readers with others. The arguments and findings of this dissertation have broader applicablity than its title suggests. It is couched in terms of medical ethics teaching rather than applied ethics teaching in general for fortuitous reasons. Medical and nursing ethics are where most of the activity has been iv and are the areas in which I have done empirical research. Medical and nursing ethics, however, are just two instances of the more general field of applied and professional ethics. This connection is made explicit in the concluding chapter. TABLE OF CONTENTS CHAPTER I. INTRODUCTION.................................1 The Setting................. ....... ................ The Task........................................... Structure of the Dissertation...................... Development of the Arguments....................... O (13de NOTESOOOO ..... 0.0.0.0000...00......0.0.0.00000000000000001 CHAPTER II. EXPLICATING THE GOALS OF MEDICAL ETHICS TEACHING: LAYING THE FOUNDATION FOR CREDIBLE EVALUATION.O0.0.00.00.00.000000000011 Specifying the Goals..... ....... .........................12 Competence in Medical Ethics: Wilson's Components Of the "biorally EducateanOOOOOOOOOOOOOOOOOOOOOOOOOOO13 Accomodating Moral and Practical Constraints: Distinguishing Direct and Indirect Goals.... ..... ....17 The Wilson+1 Goals: Seven Goals for Medical Ethics Teaching Described................ ........ ....20 Direct Goals......................................21 Imparting Knowledge Improving Reasoning Instilling Appreciation Indirect Goals....................................2A Stimulating Moral Regard Eliciting Empathy Reinforcing Interpersonal Skills Promoting Courage Representativeness of the Wilson+1 Goals.............28 Advantages of the Wilson+1 Goals.....................31 Controversies and Misconceptions.... ........ .... ..... ....32 vi ControverSieSOOOOOOO.OOOOOOOOOOOO0.0.00.0000000000000314 Moral Behavior as a Goal..........................34 Eschewing a Behavioristic Criterion Moral Behavior is an Ulitmate Goal but not a Proximate Goal Appreciation as a Direct Goal.....................39 Medical Ethics Teaching and "Formalism"...........A2 Medical Ethics Teaching and Ethical Theory........AA MisconceptionSOOO0....OOOOOOOOOOOOOOOOOOOOO ...... 0.0.47 Ethics is Subsumed by the Social Sciences.........u7 Ethics is Just Interpersonal Skills...............48 Ethical Codes are Sufficient......................A8 Legal Considerations Preclude Ethical Ones........51 Ethics Teaching Conflicts with Religious Belief...51 Ethics Teaching is Indoctrinating.................54 Formal Education has no Role to Play in Moral Development..............................5A Conclusion............. ..................... . ........ ....55 NOTES 00000000 0000......O...OOOOOOOOOOOOOOOOOOOOOOOO0.0.0.57 CHAPTER III. BEYOND SCIENTISTIC EVALUATION: DEFENDING A FUNCTIONAL APPROACH...... ..... ..60 The Fact-Value Distinction................ ......... ......62 The Positivistic Justification for the Fact-Value Distinction ...... .. ....... . ..... ...65 Avoiding Bias............. ..... . .......... . ..... ..68 Conclusion........................................71 The Quantitative-Qualitative Distinction... ...... ........72 Qualitative and Quantitative Data..... ..... .......74 Scientific Inference..............................80 Inference in Physical Science ' Inference in Social Research Conclusion........................................83 An Illustrative ExampleOOOOOOOOO0.0.000.00.000.000000000085 The Presumed Superiority of the Criteria and Measures Moral Psychology...................90 Behaviorism.......................................91 Fallibility of Attributing Intentions Intentions as Theoretical Constructs Conclusion vii Kohlberg's Theory................................100 Insensitivity to Instructional Effects Lack of Interpretability Implicit Moral Hegemony ConC1us10n0000000.0.0.O.OOOOOOOOOOOOOOOOOOOOOOOOOOO0.0.0103 NOTESOOOO00.......0O...OOOOOOOOOOOOOOOOOOOOOOOOOOOOO....107 CHAPTER IV. EVALUATING MEDICAL ETHICS TEACHING.........111 A Basic Functional Model for the Evaluation of Medical Ethics Courses................112 Methods.............................................113 Data Collection..................................11A Tests Direct Observation Questionnaires Interviews Design...........................................117 Three course EvaluationSOOOOOOOOOOOOOOOOOO0.0.0.0.0.0...120 Example 1: Ethics in Nursing (1980).................122 Course Description...............................122 Evaluation Purposes..............................123 Evaluation Design................................12A Data Collection The Design and its Logic Evaluation Results...............................128 Knowledge and Reasoning Appreciation Evaluation Impact................................138 Conclusion.......................................139 Example 2: Focal Problems (1982)....................1u1 Course Description...............................141 Evaluation Purposes..............................1u3 Evaluation Design................................145 Evaluation Results...............................147 Appreciation Knowledge and Reasoning Indirect Goals Evaluation Impact................................158 Conclusion.......................................160 Example 3: Focal Problems (1983)....................162 Course Description...............................162 Evaluation PurposeSOOO.0.00.00.00.00...0.0.000000163 Evaluation DeSignOOOOOOOOOOO0.00.00....0.0.0.0...165 viii Evaluation Results...............................169 Analysis of Pre-Post Testing Distinguishing Knowledge and Reasoning in Terms of Testing Inter-Rater Reliability Usefulness of Measures of Knowledge and Reasoning Evaluation Impact................................177 Conclusion......................... ..... .........178 Conclusion.............................. ..... ...........181 NOTESO0.00.00.00000000000000..OOOOOOOOOOOOO0.00.00.00.00185 CHAPTER V. BUTTRESSING AND EXTENDING THE FINDINGS AND ARGUMENTS.................186 Buttressing the Model. ........ .......... .......... ......187 Misconceptions and Controversies Revisited.. ........ 187 Controversies.................. ....... . .......... 187 Moral Behavior as a Goal Appreciation as a Direct Goal Medical Ethics Teaching as Formal Medical Ethics Teaching Incorrectly Downplays Ethical Theory Misconceptions...................................189 Reducing Fallibility Without Scientism..............19O The Fact-Value Distinction.......................19O 1. Cognitive Standards in Ethics 2. The Behavior of Faculty and Students The Quantitative-Qualitative Distinction.........192 Measurement Grounded in Moral Psychology... ...... 19A 1. Sensitivity to Questions of Interest 2. Interpretability 3. Practicability Extending the Model ..... . ........ .... ..... ..............196 Extending the Model Within Medical Education.....196 Extending the Model to Applied Ethics Generally..198 Disclaimers... ................................ . ......... 199 Closing Remarks ....................... . .......... .......202 REFERENCES .............................................. 204 ix APPENDICES..............................................213 APPENDIX A. ETHICS IN NURSING INSTRUMENTS APPENDIX B. FOCAL PROBLEMS (1982) INSTRUMENTS APPENDIX C. FOCAL PROBLEMS (1983) INSTRUMENTS Table Table Table Table LIST OF TABLES Selected Results From Ethics in Nursing Mailed Course Evaluation Form...................135 Comparison of Two Groups on Fixed-Response and Essay Tests...OOOOOOOOOOOOOOOOOOOO0.0.00..0.170 Reliabilities of Preceptors-Pairs' Essay Grading.......................... ........ .174 Pre-Post Difficulties of Selected Items.........175 xi Figure Figure Figure Figure Figure Figure 1. 2. 3. 5. LIST OF FIGURES A Comparison of the Callahan and Wilson+16081500000000000.00.00.00000.000.000.030 A Comparison of the DeCamp Group andwilson+1GoaISOOOOOOOOOOOOOOOOOOOO00000.0..31 The Relationships Among Data Collection Techniques and Constructs in the Basic Functional MedelOOOOOOOOO00.00.000.00000000000117 Relationships Among Data Collection Techniques and Constructs for the Ethics in Nursing Evaluation..................125 The Relationships Among Data Collection Techniques and Constructs for Focal Problems (1983).........................145 The Relationships Among Data Collection Techniques and Constructs for Focal Problems (1983).........................169 xii CHAPTER I INTRODUCTION The_§ettins Moral problems have been part and parcel of health care since the time of Hippocrates. But due in large measure to the emergence of specialized hospital care and strides in medical knowledge and technology, the frequency and complexity of such problems has steadily increased over the past several decades (Hunt and Aras, 1977). The general phenomena of growing pluralism, litigiousness, and consumerism have no doubt also been instrumental (Toulmin, 1981; Starr, 1982). Although one might join McIntyre (1980) in lamenting the need to "rediscover" ethics, a burgeoning interest has occurred over the last decade and a half and has resulted in a significant rise in the number of individuals and institutions formally engaged in teaching medical ethics1 (Clouser, 1980). Related to this, the American Association of Medical Colleges has recently moved to alter the course medical education has taken since Flexner, calling for major shifts of emphases in medical school curricula (1984). One of the three broad areas of curricula it recognizes, "Personal Qualities, Values, and Attitudes", is related to ethics in an intimate way. In light of these developments, it is unfortunate that no very satisfactory ways of evaluating medical ethics teaching have been devised. Few researchers publish empirical studies or develop instruments relevant to the evaluation of medical ethics teachingz. This paucity of research reinforces one source of resistance by administra- tors and students to medical ethics, namely, doubt about whether there are appropriate means for evaluating ethics instruction (Hastings Center, 1980). Skepticism about evaluation is in turn rooted in a set of beliefs about ethics itself. As enumerated by Callahan (1978), these beliefs include the notions that ethics is soft, subjective, unscientific, indoctrinating, pedantic hair-splitting, irrelevant to the real world, and unteachable. Resistance by ethics instructors to evaluation, it seems, can only strengthen the very views they spend much of their time trying to undermine. Evaluation of innovative educational programs has become a fact of life.3 If proponents of medical ethics teaching wish to hold their present ground and to make further inroads in already crowded medical and nursing school curricula, then they need to develop means of assessing their curricula and demon- strating its value. Less prudentially, well-conceived evaluation can serve to enhance the quality of instruction. Because evaluation is essentially systematic and informed criticism, properly done, it can help locate successful and 3 unsuccessful modes of instruction and prompt needed improvements. Ihc_I£§K The general aim of this dissertation is to develop a defensible strategy for evaluating medical ethics teaching. This will require satisfying demands and meeting criticisms from two quite different perspectives. On the one hand, the strategy must incorporate methods and criteria which ethics instructors can endorse. On the other hand, the methods and criteria must be anchored in a defensible position on applied social research. To date, meeting one set of demands has precluded meeting the other, rendering eval- uation of medical ethics teaching trivial, impressionistic, or both. Evaluation is a complex and controversial task under the best of circumstances. Evaluating medical ethics teaching is especially problematic because evaluators and medical ethics teachers clash over the proper means (or over the very possibility) of evaluating ethics teaching. Those engaged in medical ethics teaching have found evaluation strategies ill-conceived for a variety of reasons. Criticisms have been advanced both from those whose projects have been evaluated (Ruddick, 1981) and from others attempting to forestall the use of questionable criteria (e.g., Clouser, 1973 and 1980; Callahan, 1980; and Goodpastor, 1982). These critics share the belief that evaluation is often far removed from the specific concerns u and goals of medical ethics teaching. They have urged that attempts to produce changes on measures borrowed from moral psychology can actually undermine ethics teaching and that attempts to produce changes in moral behavior can themselves be morally objectionable. According to Caplan (1980), resistance to evaluation is pervasive among ethics instructors, and such resistance is at least partially justified given evaluation's track record. But part of the problem may be attributed to ethics instructors' sometimes strident rejection of the methods of evaluation. For example, Clouser quips, "An evaluation too simple and mundane to be of interest to*the statistician, but of tremendous help to the teacher of the course, is the evaluation form filled out by the student at the conclusion of the course" (1980, p. 32). Clouser's remark is naive; student evaluations are taken seriously by educational evaluators ("statisticians") and a fair amount of research has been conducted in this area“. On the other hand, part of the problem may be attributed to evaluators. Many of them employ a pair of Procrustean distinctions--between facts and values and between qualitative and quantitative methods-~that are obstructive when evaluating ethics teaching. Facts and quantitative methods are associated with objectivity and science; values and qualitative methods, with subjectivity and non-science. Ethics falls squarely within the subjective, non-scientific, qualitative domain of values. By implication, ethics 5 instruction is evaluable only in a "soft" sense in which cataloging changes in values is the major focus. Ethics instructors rightly reject such a naive construal of the fact-value distinction and the evaluation approaches which grow out of it. Although there is some evidence for the belief that evaluation research is hopelessly muddled and therefore bound to distort and trivialize the aims of ethics teaching, the conclusion that more rigorous and systematic evaluation is impossible is hasty at best. The shortcomings of past evaluations result from inappropriate methods, criteria, and instruments (criticisms by no means confined to ethics instruction as an object of evaluation), not from the attempt to be rigorous and systematic per se. While the precise nature of ethics is an important consideration in developing a general evaluation approach, there is nothing inherent in ethics instruction that precludes systematic evaluation. S! l E I] D' l I' The basic framework of the dissertation is as follows: (1) A set of goals for medical ethics teaching is developed to serve as the basis for evaluation. (2) A general methodological approach that emphasizes generating useful information (instead of emphasizing methodological rigor for its own sake) is defended against positivist—inspired criticisms. (3) A model for the evaluation of medical ethics courses is developed on the basis of the goals and 6 the general methodological approach. (A) Three concrete examples of course evaluations are used to illustrate and provide a test of the model. (5) The model is further examined and suggestions are offered for extending it to medical education formats besides traditional courses and for extending it to applied and professional ethics more generally. Goals for medical ethics teaching are addressed in Chapter II. A set of seven goals is developed largely on the basis of the John Wilson's "components of the morally educated" (1967). The goals are distinguished into "direct" and "indirect" varieties, reflecting the respective moral and practical differences between educational aims such as imparting knowledge on the one hand and promoting courage on the other. The goals are then evaluated in terms of their ability to withstand criticism and in terms of the degree to which they are representative of the goals of ethics teaching which experts establish for themselves. Chapter II emphasizes the importance of clarifying what is to be evaluated in medical ethics teaching. Chapter III addresses general issues of methodology associated with how to evaluate medical ethics teaching. Because such fundamental issues are involved, the arguments in Chapter III at times lead far beyond the specific case of medical ethics teaching evaluation. In particular, ethics has, since the advent of positivism, been relegated to the "soft" side of the hard-soft dichotomy of knowledge. 7 Because the subject itself is viewed as soft, many educa- tional researchers believe means of evaluating it must also be soft. The general aim of Chapter III is to criticize the hard-soft dichotomy and the notion that social research might or should be value-free. Sound criticism of these related notions is required to rescue both ethics and appropriate evaluation methods from unwarranted and misguided attacks. The fact-value and quantitative-qualitative distinctions are considered in some detail. The advisability of appealing to the methods and concepts of moral psychology is also considered. The chapter shows that the drive to be rigorous and scientific (i.e., to disparage so-called soft knowledge) is based on untenable positivistic strictures and that following these strictures compromises evaluation's value. Achieving worthwhile purposes, rather than obeisance to a priori standards of methodological rigor, is the legitimate criterion for conducting and assessing evaluation. These general methodological considerations pave the way for Chapter IV, which characterizes a "basic functional model" for the evaluation of medical ethics courses and then illustrates how the model may be used to interpret three course evaluations. The examples illustrate how the more general and more-or-less conceptual considerations of the preceding chapters embodied in the basic functional model can be used to inform evaluation practice. The impact of 8 the findings of the three evaluations on teaching practice is also discussed, and is related to the credibility of the evaluation model employed. The concluding chapter uses the three examples of Chapter IV to support general theoretical claims about the goals advanced in Chapter II and the methodology advanced in Chapter III. The concluding chapter also suggests how the general approach defended and illustrated in the previous chapters might be extended to medical ethics teaching beyond the limited context of courses considered in Chapter IV. Problems that can be anticipated and changes needed in the nature of medical education are broached. How the conclusions of the dissertation might be extended to other areas in applied and professional ethics is also discussed. Finally, three explicit disclaimers about the scope of the dissertation are made. W The chronology of thought is not the neat one suggested by the outline above. The arguments about methodology in Chapter III were at best inchoate, and those about goals in Chapter II were non-existent when the first concrete example of Chapter IV was carried out in 1980. By the time the second course evaluation was accomplished in 1982 progress had been made on these fronts but it was slight. The thought reflected in Chapters II and III did not guide the evaluations until the study of 1983. The development of thought was thus back and forth 9 between practice and more general theoretical concerns. This has certain advantages. It provides a way of testing theoretical views against their practicability. This in turn leads to revisions in the general stance and a better approach on subsequent occasions. Implementation in practical settings also provides a means to detect interests, misunderstandings, and sources of resistance. This, too, informs the shape of the general theoretical stance. On the other hand, this chronology of thought has a certain disadvantage related to the general structure of this dissertation. In so far as the concrete examples illustrate and provide a test of adequacy of the general approach, at times they have to be shoehorned into the conceptual framework established. This problem is not insuperable, but is worthy of note. It should be kept in mind when reading Chapter IV, which by and large is a post hoc assessment of concrete evaluations in terms of a conceptual framework that did not dictate the ways in which they were designed and conducted. NOTES 1. Unless the more specific meaning is made clear by the context, 'medical ethics' will be taken throughout to include nursing ethics. 2. A 1982 MEDLINE search failed to produce any useful instruments or methods. Stolman et a1. (1982) and Siegler et al. (1982) report similar difficulties. 3. Passage of the Elementary and Secondary Education Act (ESEA) in 1965 is the generally agreed upon benchmark of the Federal Government's entry into public education (e.g., Worthen and Sanders, 1973). With ESEA came significant federal funding and the requirement for evaluation. A. For a review of the research, see Aleamoni (1983). 10 CHAPTER II EXPLICATING THE GOALS OF MEDICAL ETHICS TEACHING LAYING THE FOUNDATION FOR CREDIBLE EVALUATION We have to start with conceptual questions, then move on to the empirical theories...then on to educational practice in the classroom. If we do not get the concepts and categories clear in the first place, we shall not know what or what sorts of facts, theories and practice we ought to look at. (Wilson, 1983, p.192) This chapter aims to develop a set of goals to frame the evaluation of medical ethics teaching. As Wilson suggests, establishing a conceptual framework is the necessary first step: it provides the concepts and categories needed to organize and direct data collection and interpretation. In so far as it is the first step, the credibility of a framework of goals for medical ethics teaching is a crucial determinant of the credibility of evaluation. The chapter is divided into two major sections that correspond to two ways in which the credibility of the framework will be established. First, goals need to be credible to ethics teachers. A set of seven goals will be specified and then compared to goals endorsed by leaders in the field of medical ethics. In light of the resistance of ethics instructors to evaluation discussed in the first chapter, it is important to acknowledge the goals medical 11 12 ethics teachers adopt for themselves. Otherwise, medical ethics teachers will continue to dismiss evaluation as irrelevant or destructive. Second, goals need to be credible to a broader audience that includes educational researchers and health care professionals engaged in education. Critical discussions of "controversies" and "misconceptions" will be provided to help blunt the objection that evaluation in terms of the avowed goals of medical ethics teachers embodies a merely conventional, unexamined, and partisan perspective and set of interests. 5 'E . I] O 1 Developing a set of defensible and practicable educational goals requires adjusting means and ends--the end of roast pig, for example, looks silly if the means employed are burning down the barn (Dewey, 1939). The specification of the goals of medical ethics teaching will be accomplished in four steps. (1) Competence in medical ethics, the ideal end, will be described. (2) This ideal end will then be considered in terms of moral and practical constraints imposed by the educational context; "direct" and "indirect" goals will be distinguished as a way of responding to these constraints. (3) Seven goals will be advanced. (A) These seven will be compared to two current, representative views of the goals of medical ethics teaching. John Wilson's work (1967, 1969, and 1973) provides the point of departure. A philosopher, teacher, and prolific 13 writer in the fields of moral philosophy and educational research methodology, Wilson spent 10 years directing the Farmington Trust project in moral education. The project included empirical research as a major facet, and Wilson had primary responsibility for developing philosophically informed criteria and research methods to guide the research. Although his concerns were more general and detailed than the ones here, his "components of the morally educated" are readily adaptable to medical ethics teaching. Competence in Medical Ethics: Wilson's Components of the "Morally Educated" Wilson develops a set of traits that he labels the "components of the morally educated". Each of the components captures a dimension of the "morally educated" which is (1) necessary, and (2) logically independent of the other components. These components (capitalized throughout to indicate the intended usages) are the following: Moral Regard, Empathy, Interpersonal Skills, Knowledge, Reasoning, and Courage.1 The case described below will be used to illustrate Wilson's components. Margaret Scanson, North Lake Community clinical nursing instructor, is presently supervising students at Portage City Memorial Hospital. Common practice in the hospital is for nurses not to tell patients whether or not they have cancer, because some doctors prefer that they not know. Margaret has suggested to her students that if a patient asks such a question, the student should ask the patient what the doctor has said, and if the patient wishes, should offer to speak with the doctor about the matter. 1L1 Margaret assigned a student, Marie Blanchard, to care for Mrs. Bullough, a woman in her early thirties who had a brain tumor. Marie, in the course of being with Mrs. Bullough prior to surgery, observing the surgery, and caring for her afterwards, learned that the tumor was malignant. When Marie arrived the following day to care for her, Mrs. Bullough, who knew Marie had been with her in surgery and the recovery room, immediately asked, "Is it cancer?" Marie was at a loss because the stock answer, "What did your doctor say?" seemed such a denial of Mrs. Bullough's need for an answer, and since Mrs. Bullough had good reason to believe that Marie knew the answer, she couldn't blandly say she didn't know. Marie excused herself from Mrs. Bullough and sought her instructor. After learning that the doctor normally shared a diagnosis of malignancy with his patients but that he would not be available until later in the day, Marie questioned the value of ignoring Mrs. Bullough's concerns. She requested that Margaret allow her to tell the patient or, if that was unacceptable due to her inexperience, that the head nurse or Margaret herself tell Mrs. Bullough that the tumor was malignant. Both Margaret and the head nurse knew the diagnosis and the planned treatment. In addition, both nurses had spent time with Mrs. Bullough during the diagnostic period prior to surgery and probably knew her as well 9r even better than the surgeon. What should Margaret do? According to Wilson, Margaret would have to manifest the six components mentioned above to work her way through this problem. Moral Regard is the most fundamental and involves counting the interests and feelings of others as equal to one's own. Margaret would have to recognize that there are individuals whose interests might conflict--hers, the student's, the physician's and Mrs. Bullough's, to name the most salient ones-~and that each of these individual's interests carries the same initial weight. There simply would be no moral problem for Margaret otherwise. Margaret would also have to be empathetic. She would have to be able to recognize other's feelings and be able to correctly describe them. For example, is Mrs. Bullough 15 merely frightened, or is she so distraught that her competence might be questioned? Margaret could go wrong, for instance, if she described Mrs. Bullough as hysterical when she was instead understandably disconcerted. The third component that Margaret would have to exhibit is Interpersonal Skills. She would have to be able to correctly interpret Mrs. Bullough's facial expressions, tone of voice, and posture. She would have to ascertain Mrs. Bullough's values, sincerity, her understanding of the situation, and the appropriate language to use for effective communication. The manner in which she responded to Mrs. Bullough--that is, Margaret's own facial expressions, tone of voice, posture--would of course also have important consequences for the quality of the interaction. And the importance of Margaret's Interpersonal Skills would not be limited to her exchanges with Mrs. Bullough; it would also determine her effectiveness in dealing with other members of the health care team whose interests and feelings also would be relevant. Next, Margaret would have to possess Knowledge which would allow her to formulate reasonable strategies and to anticipate the consequences of her actions. Would telling Mrs. Bullough do more harm than good? Is there evidence supporting the claim that cancer patients would, despite their avowals, really prefer not to be told? Is there some especially effective way to tell them? What if, when told, Mrs. Bullough immediately requests information regarding her 16 treatment and prognosis that outstrips Margaret's expertise? And so on. Margaret would also have to be able to draw conclusions on the basis of the preceding components in order to derive some rule of conduct to apply in this case or to recognize the situation as an instance of one of her previously derived moral rules. In short, she would have to do some reasoning. For example, suppose she endorsed the following rule: Health care professionals are obligated to disclose information to patients that has significant impact on their life plans unless there are compelling reasons for not doing so. Suppose in addition that she is convinced that Mrs. Bullough is rational and sincere in her request. Other things equal, Margaret should be able to combine her assessment of Mrs. Bullough and her principle and draw the conclusion that Mrs. Bullough should be told. The last component Margaret would have to exhibit is Courage. Believing that Mrs. Bullough should be told is one thing, actually taking steps to ensure this occurs is quite another. Pursuing the matter at all entails certain risks. Rocking the boat could result in alienating her co-workers, moral criticism, or even the loss of her job. If these hurdles are cleared, telling Mrs. Bullough entails an emotionally trying, perhaps painful, experience. With Margaret's dilemma now in hand, Wilson's schema of components may be summarized as follows: Moral Regard is the fundamental presupposition of moral behavior and 17 decision-making. Empathy and Interpersonal Skills are required to ferret out and articulate the interests and feelings that others have. The results of the operation of these components informs beliefs about given individuals and is combined with more general background knowledge and beliefs. Information from all these sources is then used to reason through to a principle of action. Finally, Courage is required to convert conclusions to actions where difficult situations are involved. Accommodating Moral and Practical Constraints: Distinguishing Direct and Indirect Goals Wilson's components are constitutive of "morally educated" behavior, but they may not thereby be straight- forwardly converted into educational goals. In order to develop goals based on Wilson's components that are consistent with constraints on formal education, it will be helpful to construe Knowledge and Reasoning as cognitive characteristics necessary for knowing_ngw_gne_gught_tg_act and to construe Moral Regard, Empathy, Interpersonal Skills, and Courage as personality characteristics necessary for ll . E '! 'II I . l l! | ! (e.g., Frankena, 1975). This two-way division of the components, though crude, will help in the task of specifying appropriate educational goals. In an explicitly educational context (versus a therapeutic one, for instance), goals associated with 18 cognitive characteristics differ in both practical and moral dimensions from goals associated with personality characteristics. The practical difference is that personality characteristics are affected in profound ways by exogenous and non-rational causes which are unknown to teachers and students and largely beyond their control-- rearing, socio-economic status, health, and genetic make-up are a few examples. Personality characteristics are therefore significantly resistant to change, especially via brief educational experiences. All_thing§_equal, cognitive characteristics are amenable to change through standard educational techniques (e.g., lectures and discussions) and are under much greater control of educators. The moral difference between cognitive and personality characteristics involves the relationship between means and ends in education. Put crudely, education should aim at the rationality of the student; it should consist in giving reasons in conjunction with rational arguments, not in things such as coercion and manipulation (e.g., Scheffler, 1978; Peters, 1967; and Wilson, 1967). Again, all_thing§_egual, ends associated with cognitive characteristics are well-suited to means such as lectures and discussions that aim at the rationality of students. By contrast, ends associated with personality characteristics are less subject to influence through these rational means. Such ends are often more closely associated with techniques 19 that are potentially objectionable within the educational context, such as behavior modification and the "therapeutic manipulation" which Michael Scriven (1975) associates with "affective" moral education. Some (and perhaps many) ethics instructors conclude that these moral and practical differences between cognitive and personality characteristics entail that the latter are beyond the scope of ethics teaching. Such an extreme position is unwarranted. Presumably, no one thinks it is impossible for a course in ethics to improve students' ability to empathize. Furthermore, if two courses were otherwise the same, the one that improved Empathy would be judged superior; and the same could be said of Moral Regard, Interpersonal Skills, and Courage. It is therefore a mistake to altogether eliminate these personality characteristics from among the goals of ethics teaching. A less extreme way of accommodating the constraints that the educational context imposes is to distinguish between "direct" and "indirect" goals. Goals associated with relatively unproblematic cognitive characteristics-- Knowledge and Reasoning--may be considered "direct" goals. Goals associated with relatively problematic personality characteristics--Moral Regard, Empathy, Interpersonal Skills, and Courage--may considered "indirect" goals. The distinction between direct and indirect goals is designed to acknowledge the practical problem presented by personality characteristics (i.e., they are difficult to influence) and 20 thus signals the degree to which responsibility may be assigned for achieving goals associated with them. Medical ethics instruction is uniquely responsible for a direct goal such as imparting the Knowledge peculiar to its domain; it is responsible for promoting an indirect goal like Courage only partially and only as one influence among many. The distinction between direct and indirect goals also signals a genuine moral difference between the legitimate means by which student behavior may be influenced; it separates those goals most easily associated with education from those which potentially involve paternalism and manipulation. Acknowledging this difference in means is especially important in medical and nursing education where students have the full range of rights and expectations of autonomous adults. The Wilson+1 Goals: Seven Goals for Medical Ethics Teaching Described Employing the direct-indirect goal distinction, this section advances seven goals for medical ethics teaching. "Appreciation", a goal not associated directly with Wilson's components, combined with "imparting Knowledge" and "improving Reasoning", form a set of three direct goals. "Stimulating Moral Regard", "eliciting Empathy", "enhancing Interpersonal Skills", and "promoting Courage" form a set of four indirect goals. 21 Wis lmpanting_Kngwledge. Imparting Knowledge is a familiar and straightforward educational goal. In medical ethics, it consists in imparting the facts, concepts and positions that are peculiarly medical-ethical and especially pertinent to moral problems in medicine. Impngying_fiea§9ning. In contrast to Knowledge, Reasoning is an exceedingly complex concept, and moral philosophers vigorously resist reducing Reasoning to static principles or rules that can be precisely specified ahead of time. For instance, Goodpastor (1982) criticizes the Kohlbergian psychologist James Rest (1982) for suggesting that ethics teaching might be evaluated in terms of student progress relative to pre-established "stages" of moral judgment. According to Goodpastor, pre-set criteria ignore moral reasoning's self-critical and evolving nature. Dewey makes a similar point when he describes philosophy (including moral philosophy) as "thinking which has become conscious of itself" (1944, p. 326). Finally, MacIntyre (1981) contends that the quality of moral argument must be judged in terms of evolving criteria "internal" to the "practice" itself, and not in terms of criteria which are static and "external". Most moral philosophers would agree with Goodpastor, Dewey, and MacIntyre that pre-set, rigid standards of correct moral argumentation cannot exist. Despite the self-critical and evolving nature of moral reasoning, 22 however, moral philosophers (and ethics instructors) are able to distinguish good from bad arguments and to articulate general rules of thumb. One such set of rules (designed for medical ethics in particular) is suggested below. Although it could probably be improved, it provides the reader with the flavor of what is meant by Reasoning. 1. recognizing ethical problems and formulating them in terms of the relevant issues involved 2. engaging in conceptual analysis by a. drawing necessary and relevant distinctions b. clarifying important concepts which are vague or ambiguous 3. distinguishing the following dimensions of medical-ethical decision-making: a. legal b. medical-technical c. resource allocation 4. formulating arguments that are a. clear b. consistent c. logically correct d. factually correct and that e. identify alternative or competing positions g. identify presuppositions in various positions g. anticipate and address objections lnstilling_Appneciatign. Wilson's components may well implicitly include some notion of appreciation. It is sufficiently important, however, to be a distinct goal. For the purpose of illustration, imagine medical students A, B, and C are enrolled in a medical ethics course and fit the following descriptions. Student A does extremely well on the written exams and exhibits good verbal facility regarding the key concepts and arguments, but A 23 judges medical ethics to be a waste of time and just another educational hoop to jump through. Student B is enthusiastic about the course, but, unlike A, has much difficulty with the exams and cognitive aspects of the course in general. Although B is favorably disposed, B is unable to clearly articulate what value medical ethics has. Student C blends A's cognitive skills with B's favorable attitude, and C claims that medical ethics teaching is valuable because it stimulates thinking and prompts an awareness of alternative views. Given the intended sense of the term, student A does not Appreciate the course (for simplicity, Appreciation of the course is identified with Appreciation of ethical inquiry). Though successful with regard to cognitive skills, A denies the course has value. A is either unable to or refuses to see the point. B also does not Appreciate the course. Though possessing a positive attitude, B does not provide evidence of a clear enough understanding to infer that the positive attitude applies to the correct object. In this way B also fails to see the point. Only C can be said to Appreciate the course (i.e., to Appreciate ethical inquiry) in the intended sense of the term. Unlike B, C knows how to play the game according to its cognitive rules; unlike A, C values what the game is about. Appreciation is important because ethics teaching that involves students merely in intellectualizing on the one hand or merely in emoting on the other is incomplete and 2A distorted. Students need to value the importance of rational inquiry in ethics in a way that rivals the value placed on cognitive investigation in other aspects of clinical decision-making. As Clouser puts it, it is essential to "convey that this enterprise [medical ethics] is not just a mind game, but deals with matters of profound significance to self, society, and profession" (1980, p. 18). Simply getting students to want, like, or enjoy medical ethics teaching--getting them to be like student B--is not sufficient. To Appreciate ethical inquiry, students must, like student C, know why ethical inquiry is valuable, which in turn requires that they be able to successfully engage in it. Appreciation, then, has both cognitive/skill and affective/attitudinal dimensions. The educational importance of Appreciation is that its presence increases the probability that students will act in accord with the intended lessons of medical ethics teaching. W Indirect goals are distinguished from direct goals in terms of the moral and practical constraints that apply more forcefully to the former. Because these constraints apply more forcefully to the indirect goals, medical ethics teaching should. W. Want in terms of the direct goals, but is required only to draw students_gut--to reinforce or stimulate their development-- in terms of the indirect goals. Viewed in this way, direct goals may be associated with final_outcgme§ that 25 medical ethics teaching should achieve, and indirect goals may be associated with proce§§_gutcgme§ that medical ethics teaching should prompt. To illustrate how the indirect goals--Mora1 Regard, Empathy, Interpersonal Skills, and Courage--may be considered goals of the process of medical ethics teaching, the remainder of this section provides brief discussions of each of the indirect goals to show how they might be pursued "indirectly" (i.e., as part of the process of medical ethics teaching). Before turning to these illustrations, the reader should note that the verbs used in the goal-statements themselves reflect the direct/indirect distinction. The verbs 'imparting', 'improving', and 'instilling' are used in connection with the direct goals. Borrowing somewhat from Callahan (1980), the weaker verbs 'stimulating', 'eliciting', 'reinforcing' and 'promoting' are used with the indirect goals. Wand. Persons altogether lacking in Moral Regard (sociopaths, for instance) have no business in medical and nursing schools; no educators can be expected to make much progress with such individuals. Short of such extremes, however, medical ethics instruction should bear some responsibility for stimulating the Moral Regard of those who possess it and for stimulating them to be reflective about their moral views. Consider the common educational method of discussing 26 ethically problematic medical cases. The mere undertaking of such discussions stimulates Moral Regard because ethical issues, by their very nature, begin with Moral Regard--i.e., without a perceived conflict of interests and the desire to adjudicate the conflict, no moral puzzlement exists. Because Moral Regard is so fundamental, then, simply engaging students in moral inquiry in general is bound to stimulate Moral Regard. On the other hand, Moral Regard may be stimulated at more specific levels-~and as a consequence of the process of pursuing the goals of Knowledge and Reasoning. For instance, clarifying for medical students the religious beliefs underlying the refusal of a blood transfusion by a Jehova's Witness stimulates students' Moral Regard for the individual refusing treatment and for persons with unconventional views more generally. The clarification of the patient's beliefs per se is cognitive and educational, and stimulating students' Moral Regard is an outcome of this educational process. Stimulating Moral Regard by these "indirect" educational_mean§ may be distinguished from the causal_mean§ that would be involved, for instance, in a "values clarification" exercise designed to directly (and surrepticiously) bring about greater regard for others. Eliciting_fimpathy. The ability to assume the viewpoints and feelings of others is essential in providing the insight necessary to intelligently work through ethical problems. The close connection between reasoning and feelings 27 (feelings are a kind of "data") needs to be brought home to students, lest ethics teaching become incomplete and sterile. However, "touchy feely rap sessions", emotionally charged "encounter sessions", and other non-cognitive, causal means of affecting Empathy are out of place in the educational setting. Instead, conventional educational strategies such as role-playing, viewing video tapes, and simulated patient encounters should be employed to elicit Empathy. Simply discussing cases can serve the aim of eliciting Empathy, but probably less effectively. Beinf2Lcing_lnterpensgnal_§kills. Interpersonal skills are required in virtually all aspects of patient care. A physician or nurse who lacks such skills is handicapped, because the skills are required in the most rudimentary tasks, such as reducing anxiety, conveying information, or obtaining information. Although teaching Interpersonal Skills has become a legitimate direct goal of medical education (e.g., Kahn et al., 1979), it is unreasonable to require medical ethics teaching to adopt Interpersonal Skills as more than an indirect (or process) goal. The special charge of medical ethics instruction with respect to Interpersonal Skills is reinforcing students' ability to interact with both their patients and colleagues in connection with often difficult and emotionally charged moral issues. Role-playing, video tapes, simulated patient encounters, and discussion again come to mind as educationally acceptable methods to achieve this goal. 28 ££Qm2t1ng_§2unage. As is true of the other indirect goals, medical ethics instruction cannot be expected to shoulder the full responsibility for altering a trait acquired over many years and resulting from causes that are poorly understood. Like Moral Regard, however, Courage (or the relationship between avowals and actions) is a traditional topic in ethics. In connection with medical ethics in particular, there is a strong tendency among students and professionals to look to conventional practice as the standard of appropriate behavior. The claim is sometimes made that arguments of ethicists are all fine and good but are not "realistic". Courage may be promoted by pointing out that being "realistic" may be just shorthand for being timid, or even egoistic, and that one is morally obligated to act in accordance with one's well—considered moral judgments. In a more general vein, working through ethical problems and developing rational justifications for ethical viewpoints--characteristics of educational methods--promote Courage by providing students with the confidence in their moral beliefs they need in order to convert such beliefs into actions. Representativeness of the Wilson+1 Goals Seven goals for medical ethics teaching have been developed based largely on Wilson's "components of the morally educated". This section compares these seven to goals endorsed by experts on medical ethics teaching. 29 Although the issue of goals for medical ethics teaching has generated little careful discussion, two proposals merit examination: Callahan's (1980) goals and the 1983 DeCamp conference goals. Callahan's five goals for ethics teaching were prominent in the Hastings Center's investigation of ethics in higher education. Caplan employed them in the volume which resulted, Ethics_in_flighet_fiducaticn (1980) when discussing the evaluation of ethics teaching in general. Clouser (1980), in another Hastings Center volume, employed them in his discussion of evaluating medical ethics teaching in particular. The prominence in the field of medical ethics of the Hastings Center and of these individuals indicates that Callahan's goals are reasonably representative. Callahan's five goals are: stimulating the moral imagination, recognizing ethical issues, eliciting a sense of moral obligation, developing analytical skills, and tolerating-~and reducing--disagreement and ambiguity. Callahan also includes "context-dependent" goals, which allow for the special content of medical ethics. The mapping in Figure 1 is a suggested interpretation of Callahan's goals in terms of Wilson's+1. The more recent DeCamp conference involved eleven leaders in the field of medical ethics.5 Four general goals resulted from their deliberations, and a fifth, paralleling Callahan's "context dependent" goals, specified the minimum of issues (i.e., content) which any medical ethics teaching 30 Callahan's Goals ‘Wilson+1 Goals stimulating the moral --—> stimulating Moral imagination Regard, eliciting Empathy recognizing ethical ----> stimulating Moral issues Regard, eliciting Empathy, impa rt- ing Knowledge eliciting a sense of -—--> promoting Courage moral obligation developing analytical -—-> improving Reasoning skills tolerating-and reducing-- ----> enhancing Interper- ambiguity and disagreement sonal Skills, improv- ing Reasoning context dependent goals ---> imparting Knowledge Figure 1. A Comparison of the Callahan and Wilson+1 Goals program should address. The four general goals, A-D, are A. clarification of central concepts (e.g., competence); B. understanding of important decision-making procedures (e.g., when is it morally justified to treat an unwilling patient?); C. ability to apply concepts and decision-making procedures to actual cases; D. various interactional skills (e.g., discussing with a terminally ill patient her wishes about going on Do Not Resuscitate status). Like Callahan's, these goals are consistent with Wilson's+1; Figure 2 compares the two sets. 31 DeCamp Group Goals Wilson+1 Goals clarification of central --—-> imparting Knowledge, concepts improving Reasoning understanding of important ----> imparting Knowledge, decision—making procedures improving Reasoning ability to apply concepts -—--> all and decision-making to actual cases interactional skills --—-> stimulating Moral Regard,veliciting Empathy, enhancing Interpersonal Skills, promoting Courage Figure 2. A Comparison of the DeCamp Group and Wilson+1 Goals Callahan's goals and those of the DeCamp conference exemplify the goals of medical ethics teaching endorsed by those presently engaged in and knowledgeable about medical ethics teaching. The consistency between these and Wilson's+1 exemplified in the mappings of Figures 1 and 2 indicates that Wilson's+1 coincide with current thinking in the field. Advantages of The Wilson+1 Goals Having established the representativeness of the Wilson+1 goals, the four-step plan for specifying the goals of medical ethics teaching is complete. Before turning to a critical examination of the Wilson+1 goals, three reasons for preferring these goals over the DeCamp 32 and Callahan alternatives are given. Wilson developed his "components of the morally educated" with an eye toward empirical research, and the Wilson+1 goals have been developed with the same thing in mind. The aims of Callahan and the DeCamp group, by contrast, were to reach a consensus on goals that could be used to guide teaching and that could be effectively communicated to others. It should not be surprising that the Wilson+1 goals have several advantages over the DeCamp and Callahan goals for the purpose of framing the evaluation of medical ethics teaching. (1) The Wilson+1 goals are more discrete; this characteristic, reflected in the comparisons in Figures 1 and 2, facilitates focusing empirical research and aids the development of research methods and instru- ments. (2) The Wilson+1 goals explicitly distinguish direct and indirect goals. This helps establish priorities and reasonable expectations for medical ethics teaching. (3) The Wilson+1 goals explicitly include Appreciation. Appreciation is important in its own right and is readily amenable to investigation by the use of student evaluations of teaching. C ! . I H' c 1' As stated at the beginning of this chapter, the credibility of a given framework of goals for medical ethics teaching depends on (1) the degree to which the framework is representative of the views and practices of those engaged 33 in medical ethics teaching, and (2) the degree to which the framework withstands critical scrutiny. Representativeness of the Wilson+1 goals was established in the preceding section. This section critically examines the Wilson+1 goals by entertaining "controversies" and "misconceptions". As used here, "controversies" involve disagreement about the nature of medical ethics teaching and the direction it should take. Four such controversies will be discussed: moral behavior as a goal, Appreciation as a direct goal, medical ethics as "formal", and ethical theory's place in medical ethics teaching. In contrast to the controversies, which take for granted the legitimacy of medical ethics teaching and disagree about the best approach, "misconceptions" involve the contention that medical ethics teaching is unnecessary or objectionable. Misconceptions are associated with popular and relatively unsophisticated views of medical ethics teaching, views that may be associated with groups such as students, health care professions faculty, and evaluators. Seven such misconceptions will be considered: ethics is subsumed by the social sciences; ethics is simply interpersonal skills; professional ethical codes are a sufficient means of dealing with ethical problems; legal considerations preclude ethical ones; ethics teaching conflicts with religious beliefs; ethics teaching is indoctrination; and formal education has no proper role to play in moral education. 3A Controversies Moral Behayig: as a anJ The Wilson+1 goals (except for Appreciation) result from analyzing moral behavior into its "components", and thus the ultimate goal associated with the Wilson+1 framework is moral behavior. On the other hand, philosophers often eschew moral behavior as a goal of medical ethics teaching. Callahan (1980), for instance, claims that moral behavior is a "dubious" goal of medical ethics teaching, and his view is echoed by others (e.g., Macklin, 1980, Caplan, 1980, and Goodpastor, 1982). This apparent disagreement between the view Callahan represents and the Wilson+1 goals can be removed by observing that rejecting moral behavior as a goal of medical ethics teaching is justified provided it is identified with the behavioristic sense of overt behavior, or provided that it is construed as the proximate goal of medical ethics teaching which ethics teachers are to be held directly accountable for. Following a brief discussion of why philosophers are correct to reject moral behavior in its behavioristic sense, an argument will be advanced to show that moral behavior is an appropriate goal of medical ethics teaching in the sense of an ultimate or ideal goal (a goalu) but not appropriate in the sense of a proximate educational goal (a goale). Goalse are typically identified with evaluative criteria, and they are presumed to be constitutive or predictive of goalsu. The argument aims to show that success in terms of 35 the Wilson+1 goals (or some other appropriate set of proximate educational goals) is all that can reasonably be demanded, but that this is consistent with adopting moral behavior as the goalu of medical ethics teaching. The argument should remove philosophers' misgivings about endorsing moral behavior as a goal and, in the process, show that the evaluation of medical ethics teaching can compare favorably with the evaluation of teaching generally. Eschawing a Bahayigcistic Ccitecign. Behaviorists define (i.e., purport to define) behavior as "overt" and intersubjectively observable; references to agents' reasons, intentions, or mental processes are excluded. The theoretical inadequacies of behaviorism will be discussed in detail in Chapter III. It is sufficient for present purposes to observe that such a definition of 'moral behavior' effectively precludes reasoned moral evaluation. The behavioristic sense of "overt" behavior requires, in Callahan's words, "a preestablished blueprint of what will count as acceptable moral behavior" (1980, p. 70). Such a "blueprint" entails a list of the "overt" behaviors which teaching should produce, requiring a kind of unanimity on ethical issues that is difficult to imagine and inappropriate to demand. Closely related to this, the behavioristic blueprint ignores the importance of the intellectual dispositions of being critical, flexible, and sensitive to alternatives (i.e., the dispositions included in Reasoning). Such mental dispositions are central to moral 36 behavior and moral evaluation. Unless some obvious and serious breach of morality is involved, the appropriateness of alternative courses of action may not be decided until the particulars are known. Even then there may be room for reasonable disagreement--consider abortion, for instance-- in which case no clearly correct "overt" behavior exists. Medical ethics teaching and its evaluation should reflect the complex and provisional nature of what counts as appropriate moral behavior, and leave room for differences among individuals on hard cases. Because the restrictive, behavioristic sense of 'moral behavior' (with its implicit requirement for a "blueprint") is inadequate in these respects, "overt" behavior in the behavioristic sense is an inappropriate goal of medical ethics teaching. M B ' ' U ' u P Gaal. When one moves beyond a behavioristic criterion, the issue is whether moral behavior (in its ordinary workaday sense) establishes an unreasonably demanding criterion of success for medical ethics teaching. This demand (implicit, for instance, in Sider and Clements, 1984) emerges as the conclusion of the following argument: (1) The behavior of students in educational contexts (e.g., what they say they would do on a test, questionnaire, or in an interview) can be distinguished from what they do when faced with real situations. (2) Behavior in the educational context (behavior ) is contingently related to behavior in real life sitSations (behaviorr). 37 (3) Evidence for behaviore is not necessarily evidence for behaviorr. (A) Moral behavior has not been directly related to medical ethics teaching, nbr has it been correlated with behaviore. (5) The test of an educational endeavor is effectiveness in producing the desired behaviorr (6) Therefore, medical ethics teaching (a) has not demonstrated its effectiveness, and (b) can do so only by empirically demonstrating its positive effect on moral behaviorr. Making explicit the reasoning underlying the demand to adopt moral behavior (moral behaviorr) as a proximate educational goal (a goale) shows the demand is unreasonable. Premise (5), required to justify the demand contained in (6b), imposes a standard that few, if any, educational activities could meet. Precious little evidence exists to establish the relationship between criteria typically used to evaluate students' behaviore and their professional performance, behaviorr, in any area. Empirically establishing a positive relationship between behaviore and behaviorr is difficult, costly, and rarely done. In the overwhelming majority of cases, a positive relationship between behaviorse and behaviorr is presumed on the grounds that (1) the content of education is necessary for the behavior ultimately desired (e.g., knowing anatomy and being a good physician) or (2) the content of education is analogous to the behavior ultimately desired (e.g., taking histories from simulated patients and taking histories from real ones). Given the presumption of one of these two kinds of relationships, desired performance in terms of behaviorse 38 is accepted as adequate to establish the effectiveness of an educational endeavor. No arguments have been produced to show why ethics teaching should be required to meet stiffer standards. Moral behaviorr may be associated with the goalu of medical teaching, and moral behaviorse may be identified with goals the Wilson+1 goals. Because Wilson develops his 6’ "components of the morally educated" by analyzing moral behaviorr into its constituent elements, the evidence provided by investigating the behaviorse associated with the Wilson+1 goals sanction inferences to moral behaviorr. (For example, findings about Knowledge fit presumption (1), and findings about Reasoning fit presumption (2).) Such inferences, though not unproblematic, are no more prob- lematic than the analogous inferences made in educational research, for example, inferences about students' ability to interview real patients based on their interviews of simulated patients, or inferences about students' ability to solve real clinical problems based on their performance using paper cases. This discussion of moral behavior as a goal ignores certain complications--most notably, the distinction between direct and indirect goals and the assumption of a relatively clean distinction between "educational" and "real" contexts unlikely to apply to much teaching in the clinical context. Nonetheless, it does establish the important conclusion that medical ethics teaching is not unique regarding behavior as 39 a goal of teaching. Unmasking the two senses of 'goal'-— ultimate and proximate-—leads to the conclusion that the demand to establish positive relationships between moral behavior (moral behaviorr) and the effects of teaching (moral behaviore) should (1) take into account the nature and limitations of educational evaluation, and (2) be no more stringent than is customary for other educational pursuits (Archambault, 1975). Given the nature of the Wilson+1 goals--constituents of "morally educated" behavior——medical ethics teaching that achieves them compares well to many other educational endeavors on the question of the relationship between performance in educational and performance in real contexts. In conclusion, moral behavior is an appropriate ultimate goal of medical ethics teaching but is not an appropriate proximate goal or evaluative criterion. Instead, proximate educational goals, namely, the Wilson+1 goals, function as evaluative criteria and sanction inferences to the ultimate goal of moral behavior. E . l' E' I g 1 A controversy related somewhat to moral behavior as a goal concerns Appreciation as a direct goal. A likely objection is that Appreciation should be at most an indirect goal, contrary to the way it was classified earlier. Three reasons might be advanced. First, like the indirect goals (e.g., Courage), Appreciation is a function of attitudes and other causal factors largely beyond the control of 40 instructors. Second, the real aim of medical ethics teaching should be cognitive learning, not pleasing students. Third, students are not competent judges of what is valuable (i.e., worthy of appreciation) in their educations. Appreciation can be defended against the first argument by invoking the all_tninga_agaal clause presupposed in making the distinction between direct and indirect goals. All_tning§_agaal, medical ethics instructors are in a better position to instill Appreciation than an indirect goal such as Courage. Furthermore, Appreciation is a more customary and natural outcome of teaching than progress in terms of the indirect goals. Two answers can be given to the claim that direct aims of teaching should be confined to cognitive goals. On the one hand, empirical evidence shows that Appreciation (as measured by student course evaluations) correlates with cognitive learning (Scriven, 1981). Thus, Appreciation provides one indicator of success at achieving cognitive aims. On the other hand, confining medical ethics teaching solely to cognitive aims is too restrictive. It would render ethics teaching sterile and of questionable relevance to health care professionals. In this vein, understanding and Appreciation are not as easily separated in ethics as they might be in certain other subjects. For example, an engineering student might fail to Appreciate (i.e., fail to see the point of) algebraic derivations, even detest doing A1 them, and still understand them in the sense of being able to perform them when the need arises. Goods "external" to the practice of algebraic derivation (say, the satisfaction of designing a bridge and getting a fat paycheck) may motivate the engineering student. A parallel is lacking when it comes to the "practice" of ethical inquiry. Attempting to avoid malpractice suits is a reason (i.e., an "external" one) for telling patients the truth, but it is surely not a mgLal one. Students who do not see the point of ethical inquiry cannot be said to understand ethical inquiry, and will have no "external" motivation for engaging in it once out of the classroom. Thus, instilling Appreciation (getting students to see the point) is required in medical ethics teaching. The claim that students are incompetent judges of what is valuable in their education is based on the paternalistic stance that students are not in a position to judge what their profession will demand and thus not in a position to judge what the content of their education should be. According to this argument, whether students Appreciate medical ethics when it is taught is therefore beside the point. The response to this third criticism of Appreciation as a direct goal also has two aspects. On the one hand, research suggests that Appreciation (again, as measured by student course evaluations) is stable over time. That is, there is a high relationship between evaluations of courses 42 at the time they are taught and later retrospective evaluations (Aleamoni, 1981). More significantly, if Appreciation is not cultivated at the time ethics teaching is done, when medical and nursing students are developing their professional identities, it would seem unlikely that it will be developed later on. In summary, although Appreciation resembles the indirect goals more than the direct goals in some respects (e.g., it is not taught directly and is not a criterion of students' performance), all things aqual, Appreciation is under reasonable control of ethics instructors, is a customary educational aim, and is associated with the value of the cognitive goals of ethics teaching. Thus, Appreciation may be legitimately construed as a direct goal. M 'c 'c c ' a "F ' " A third controversy involves a recent and relatively fundamental criticism of mainstream medical ethics. Nobel (1982) has criticized medical ethics for being "acontextual"; Caplan (1983) has criticized it for employing the "engineering model"; and Clements and Sider (1983) have criticized it for being "formal". The general complaint expressed by each is that medical ethics, as presently practiced, is far removed from real ethical concerns and amounts to little more than a pointless philosophical exercise. Although the primary target is the literature of medical ethics, the criticism applies to teaching as well since discussions from the literature of medical ethics A3 often comprise the methods and content of teaching. If there are such "formalists", who attempt to apply the tools of philosophical ethics with no regard for psychological, social, and historical contingencies, then they deserve to be criticized. But such individuals are rare, or at least the tide has turned against them. Influential philosophers such as John Rawls (1971) Alasdair MacIntyre (1981), Richard Rorty (1979 and 1982b), Hilary Putnam (1983), and Sissela Bok (1978) explicitly reject the "formal" approach. In one way or another, they all agree with Putnam's claim that, when applied to actual moral problems, "traditional" ethical theories (he excludes Rawls) "prove too much" and therefore prove nothing, or with Bok's claim that A system of moral philosophy put to such uses is like a magician's hat--almost any thing can be pulled out of it, wafted about, let fly. No one can be quite sure it was not in the hat all along. And the philosopher is often in the end his own most amazed spectator. He may not know how he did it--but the doves are aloft, the silk scarves in his hands! (p. 57) Singer (1982), Wikler (1982),6 and Beauchamp (1982) speak directly to the critics on the issue of how, if not by the application of ethical theories, philosophy and philosophers have any constructive contribution to make. Their answer is that philosophers contribute, not by the application of some specialized philosophical knowledge, but by consistently demanding things which characterize the discipline of philosophy, such as high standards of argument and conceptual clarity. They do not contend that AA philosophers have a corner on even these, but that "applied ethicists" (most of whom are philosophers) simply spend a good deal of their time working through the problems of interest. A procedure of the kind described by Nobel—~in which "moral problems must be abstracted from their social settings to appear purely moral" in order that philosophers may apply their "methods or moral reasoning" (1982, p.8)—-is clearly not endorsed, at least not by the present spokespersons in the field. More to the point, present medical ethics teaching—~embodied in the Wilson+1, DeCamp, and Callahan goals-~15 not merely formal. M 1. J Elli I l' 1 E 1' J I] The final controversy involves the content and teaching methods naturally associated with the Wilson+1 goals. Two criticisms have been advanced by Troyer (1982). The first (something like the flip-side of the charge of "formalism") is that medical ethics teaching pays inadequate attention to ethical theory. In particular, he criticizes Benjamin and Curtis (1981) for relying too heavily on discussions of cases at the expense of ethical theory. Benjamin and Curtis do consider ethical theory and adopt "wide reflective equilibrium" (see note 6) as a general theoretical stance. But they by no means stress ethical theory. Indeed, the chapter of their text Ethiaa_in_NuLaing devoted to the issue is entitled "Unavoidable Topics in Ethical Theory", much to Troyer's dismay. A5 Troyer's concern seems misplaced. Although some exposure to issues in ethical theory is appropriate-~naive utilitarian reasoning and the problem of justice, for instance7--ethical theory should take a backseat to working through specific moral problems endemic to the health professions and which impinge on the concerns and interests of students in professional training. Ethical theories, after all, have historically been concerned with investigating the epistemological foundations of moral knowledge, not with application. Indeed, the test of such theories is often how well they square with moral judgments that are clear on independent, intuitive grounds. When this observation is combined with the fact that any reasonable candidate for acceptance is going to have to square with almost all intuitive judgments, it is evident why Bok would wonder whether the solution to a moral problem wasn't there in the "hat all along" and independent of the moral theory being employed. Moreover, ethical theories break down at just those places where intuitive moral judgments do. The point of medical ethics teaching ought to be to get students to make good moral judgments in the first place. For those with an interest, ethical theory can come later. Perhaps Troyer's demand is related to his second criticism of Benjamin and Curtis: they fail to press students to come up with the "correct" position. Now, one doesn't have to view ethics as all just a matter of subjective opinion or anything of that sort to agree with 46 Benjamin and Curtis. In the first place, there simply may be no good solution at hand. Perhaps a problem is truly insoluble, or perhaps more time for thought and information gathering is needed. In the second place, our pluralistic society allows persons a range of freedom in their moral beliefs. Would Troyer fail a student for insisting that abortion is permissible? Would he fail one for insisting that it is not? (How students are evaluated carries a powerful message.) More to the point of ethical theory, it is by no means clear how studying it could help with the problem of "correct" answers. On the contrary, disagreements between competing ethical theories have proven intractable, and the practice of emphasizing these disagreements in ethics teaching has led MacIntyre to remark, "It is no wonder that the teaching of ethics is often destructive and skeptical in its effects upon the minds of those who are taught" (1981, p. 112). It is not uncommon for students to declare their allegiance to an ethical theory and then proceed to turn the crank (or play the instructor's game), as if the declaration insulated them from glaring difficulties in the position they wish to defend. When the details of ethical theory are made secondary to working through real-life issues of concern, the likelihood is greater that "correct" positions will emerge because it is less likely that ethical inquiry will be viewed as an arcane intellectual exercise. A7 Misconceptions The discussion now turns from "controversies" to "misconceptions". As stated previously, the label 'misconception' applies to criticisms of a kind likely to be voiced by students, medical school faculty, and others unfamiliar with the nature and goals of ethics teaching. While the controversies raised doubts about the Wilson+1 goals in particular, the misconceptions question medical ethics teaching generally. '5 S ’a ' c One misconception about medical ethics teaching is exemplified by the tendency of medical schools to lump ethics with other disciplines, primarily the behavioral sciences, under the portion of the curriculum designated as "psychosocial." The mistake involved in identifying ethics with the social sciences is related to the charge of "formalism" in that both muddy (or deny) a distinction between factual knowledge on the one hand and the use of such knowledge in making moral evaluations on the other. For the Wilson+1 goals, social scientific knowledge is important for only one of the six characteristics (i.e., Knowledge) required of moral problem-solving. Although the findings of the social sciences are relevant to intelligently working through moral problems, the role of such findings is limited. The social sciences, after all, only tell what people do do, not what they ought to do. For example, it does not follow that because physicians tang not A8 to communicate the costs of alternative treatment plans to patients in obtaining informed consent that they gaght not to.6 In short, teaching the findings of social science is not a substitute for teaching ethics. Nor does teaching the findings of social science permit ethics to take care of itself. 511' 1 I ! I I ] $1.1] A second misconception fairly prominent in medical education is that ethics can be identified with interpersonal skills. Training in interpersonal skills has become commonplace in medical education in recent years (Kahn et al., 1979), and this development is desirable except to the extent that it is believed to constitute all of medical ethics teaching. Having good Interpersonal Skills (a good "bedside manner") is important in its own right, but, like knowledge in the social sciences, Interpersonal Skills are only one element of moral problem-solving. As Margaret's dilemma illustrates, Interpersonal Skills leave a lot of territory uncovered. 511' 1 C I 8 EE' . A third misconception is that professional ethical codes provide a sufficient means for resolving ethical problems. Ethical codes are keyed to the specific problems which are endemic to various professions; they highlight these and establish ideals to which professionals should aspire. But they are inadequate to the task of giving answers to many actual moral problems. For example, the medical profession 50 has a long history, beginning with the Hippocratic Oath, of protecting patient confidentially. Section 9 of the AMA code reads as follows: A physician may not reveal the confidences entrusted to him in the course of medical attendance, or the deficiencies he may observe in the character of patients, unless he is required to do so by law or unless it becomes necessary in order to protect the welfare of the individual or of the community. (Veatch, 1977, p. 355) The problem with this section of the code as a means by which to resolve problems involving patient confidentially is that it is silent on just what such problems normally turn on: balancing the welfare of the individual against the welfare of the community. What should a physician do when faced with an alcoholic bus driver or a psychiatric patient threatening to kill his girlfriend?7 The code is of little help. This limitation of ethical codes is not confined to the example in question. In general, ethical codes are at once too broad and too narrow. They are too broad because, if they are to carry any weight at all, they have to be sufficiently open-ended to be acceptable to a wide variety of viewpoints (Benjamin and Curtis, 1981). They are too narrow because they cannot be formulated to anticipate all of the relevant contingencies and all of the different moral problems that might arise. 51 a C ' a ' c ‘c 0 Another misconception about medical ethics teaching, not uncommon among physicians, is that legal considerations preclude ethical considerations. Such a view is sometimes used to urge the futility of teaching medical ethics, and it involves several confusions. First, although individuals have a prima facie obligation to obey the law, this clearly can be overridden, and when this might be appropriate is itself a moral issue.8 Second, there is significant overlap between law and morality. The slogan, "You can't legislate morality" is clearly false if it is taken to mean that the law does not embody moral principles. It is thus quite misleading to claim that legal considerations preclude ethical ones. Finally, like ethical codes, the law remains both too broad and too narrow. The law requires interpretation, and moral considerations frequently figure in. ‘c c ' C 'c Religious training invariably includes moral training. Thus there is an understandable tendency to identify religion and morality and to view secular moral education as an illicit encroachment on religious belief. Much of the force of this view is based on the fact of moral disagreement. For example, the religiously-based view that a fetus is a person, fully endowed with the right to life, cannot be reconciled with secular views which deny this. Accordingly, so the argument goes, religious and secular 52 moral judgment are clearly at odds: one depends on the dictates of wholly human judgment and the other on the dictates of God.10 Two observations count against this interpretation of the above disagreement about abortion. First, the belief that fetuses have a right to life is not necessarily religiously based. John Noonan (1977), for instance, assumes that newborns have a right to life and, without invoking any religious premises, argues that newborns cannot be distinguished from fetuses in any morally relevant ways. Second, and more generally, different moral beliefs, or "rules" do not entail different methods of moral problem solving or different basic moral "principles" (Singer, 1970). For instance, in the abortion example the moral principle 'Preserve innocent human life' is not at issue; the disagreement is rooted instead in competing views over the status of fetuses. A position on the status of fetuses in turn leads to a position on whether they should be granted the right to life. This disagreement between a secular and religious moral position, an apparently intractable one, entails neither differences about the nature of moral judgment nor that the goals for medical ethics teaching outlined are biased against religious ethics. The disagreement instead turns on opposing metaphysical premises that may or may not be religiously based. The appeal to religion does not provide a means to put 53 ethical inquiry on "automatic pilot" (Benjamin and Curtis, 1981). Religious moral codes, like legal and professional codes, cannot be devised so as to anticipate all or even most eventualities. This is especially true with medical-ethical issues engendered by the advance of technology. For example, religious authorities currently grapple with concepts such as "death with dignity", "quality of life", "ordinary" versus "extraordinary" care, "proportionate" versus "disproportionate" care, and so on, in order to come to grips with current problems surrounding withdrawing treatment from the hopelessly ill. There seems no escape from giving careful attention to individual moral problems as they arise. After all, substantial disagreement over moral issues occurs within religions as well as between them; there is nothing even approaching a united religious front. Social cooperation in general would be impossible without considerable agreement on most issues. General agreement does exist, even among cultures, and this fact is difficult to account for if there are not shared moral standards that cut across religious boundaries (Nowell-Smith, 1967). Where specific disagreement does exist in our own culture, it is in the interests of persons in general to use secular moral argument to win assent to their views on controversial issues. Anyone who endorses freedom of religion would be hard pressed for a viable alternative. 5A Another general misconception about ethics teaching is that training in ethics entails indoctrination. The conditions under which an educational endeavor amounts to indoctrination raises some rather thorny philosophical issues that cannot be considered here. (See Macklin, 1980, for a recent review.) In its simplest terms, the issue boils down to whether any reasonably "neutral" approach is possible. Provided that teaching is balanced in the views that it presents, the fear of indoctrination is unjustified. The issue of indoctrination can be turned on its head. That is, the consequences of flat engaging students in ethical inquiry should be considered. Such a "hands off" approach only reinforces the all too prominent belief that ethics is simply a matter of personal preference. One must also wonder whether this approach does not actually increase the likelihood of indoctrination by failing to provide students with the skills necessary for fending off would-be indoctrinators. a P ' a The final misconception is that moral development is not a proper function of formal education. The misconception finds its roots in subjectivism-—a view of ethics that conflates rational justifications for moral beliefs and the causes of such beliefs, and which maintains a strict dividing line between facts and values. Subjectivists see no roles for rationality and formal education in connection 55 with ethics. Rationality and formal education apply only to kngulaaga and agianaa, and exclude the preferences and feelings which subjectivists believe exhaust ethics. Given this view, the most that can be done by way of medical ethics teaching is the inculcation of the conventional standards of the profession. If MacIntyre (1981) is correct, subjectivism ("emotivism") characterizes our culture in general and pervades social research in particular. If unchecked, this poses a fundamental obstacle to the evaluation of medical ethics teaching as well as teaching ethics per se. Removing this obstacle will be one of the central aims of the next chapter. ancluaign The following seven goals have been described, defended, and proposed as a framework for the evaluation of medical ethics teaching: . imparting Knowledge . improving Reasoning . instilling Appreciation Direct Goals: . eliciting Empathy . reinforcing Interpersonal Skills 1 2 3 Indirect Goals: A. stimulating Moral Regard 5 6 7 promoting Courage Key issues in their development were representativeness, defensibility against criticisms, and the distinction between direct and indirect goals. 56 The central aim of this chapter was to lay a foundation for the credibility of the evaluation of medical ethics teaching by establishing the credibility of a framework of goals. This in turn required demonstrating the representativeness of the goals and their defensibility against criticisms. Representativeness of the goals, required for them to be received as relevant by medical ethics teachers, is demonstrated by their consistency with those advanced by experts in medical ethics, namely, the Hastings Center and DeCamp groups. Defensibility of the goals, required for them to be received as other than self-serving by interested groups in general, is demonstrated by their ability to withstand criticism from both inside and outside the field of medical ethics. The distinction between direct and indirect goals is based on the moral and practical differences between the sets when viewed in the context of professional education. Moral behavior is the ultimate aim of medical ethics teaching. Its complex nature, reasonable expectations regarding the impact of education, and general educational evaluation practices, however, preclude adopting it as a proximate goal. Instead, surrogate constituents of moral behavior, i.e., the Wilson+1 goals, comprise appropriate proximate goals and hence appropriate evaluative criteria. Given this brief summary, three further clarifications about goals are in order. First, although the Wilson+1 goals are representative, they are nevertheless 57 prescriptive. That is, they are goals to be urged and used in addition to or in the place of goals which may be avowed in particular contexts (though justifiable adaptations and improvements are always appropriate). Second, the seven goals comprise a minimal set for medical ethics programs. That is, an adequate program ideally would have to give some attention to each. Individual activities (e.g., courses or lectures), however, might legitimately focus on some goals to the exclusion of others. In this connection, relatively small programs or new and inchoate ones might legitimately give little attention to indirect goals in response to the need to order priorities. Additionally, the general milieu of professional education creates an obstacle for any educational goals that may be viewed as promoting moral virtues (like Courage). Although there are demands to demonstrate effects in terms of moral behavior, there are also marked countervailing demands to avoid advocating or judging the shape such behavior should take. Finally, the Wilson+1 goals are relatively silent on the issue of curricular "content" (i.e., the issues and concepts which should be taught). Although this is an important issue for evaluation, (Callahan and the DeCamp group include it in their statements) it is beyond the scope of this study. NOTES 1. The terms Moral Regard, Empathy, Interpersonal Skills, Knowledge, Reasoning, and COURAGE are mine, not Wilson's. He uses PHIL, EMP, GIG , GIG , DIK, and KRAT respectively. He also includes a component omitted by me, namely, PHRON, which corresponds roughly to personal prudence. In later writings (1969 and 1973), Wilson refines and further explicates his components. These later versions are unwieldy, at least for my purposes. 2. From Benjamin and Curtis (1981, p. 162). 3. The conferees were Charles Culver, Dan Clouser, Bernard Gert, Howard Brody, John Fletcher, Al Jonsen, Loretta Kopelman, Joanne Lynne, Mark Siegler, and Dan Wikler. The results of this conference are now available in published form (see Culver et al., 1985). A. Wikler challenges Nobel for a positive account of moral epistemology, and makes reference to Daniels' (1979) discussion of the Rawlsian notion of "wide reflective equilibrium" as one alternative to her position (or (non-position) on this matter. Given Daniels' account--one which squares with present day anti—foundationalist (or anti—formalist) views—~Nobel's charges of "acontextualness" and "abstracting the purely moral" do not apply. The account makes explicit reference to the import of empirical knowledge as well as common moral intuitions. 5. Naive utilitarianism is the unqualified view that an action is right if and only if it promotes the greatest good for the greatest number (the principle employed by Roskolnikov to justify robbing and murdering the pawn broker). The conflict between the utilitarian principle and justice is glaring. For example, it sanctions using the indigent (the most vulnerable group) in medical research with no regard for informed consent provided only that the good which will be derived outweighs whatever harm is done to the experimental subjects. Naive utilitarianism is insidious because the more vulnerable the group being unjustly treated the less likely there is to be an outcry and the aaaiar it is to achieve a positive balance of good over harm. 58 59 6. Some 38% of physicians do not believe it is necessary to inform patients of the costs of therapies and procedures; whereas, 70% of patients believe such information is relevant to deliberations (The President's Commission, 1982). This is evidence of a "performance deficit" (a failure to perform appropriately) on the part of physicians, not a guide to moral behavior. 7. Iacasgff y. Regents of the Unjygnsjty gf Canchnja (Beauchamp and Walters, 1978, pp. 176-185) is a landmark case which established that the confidentiality of the doctor-patient relationship is not protected absolutely. The court ruled that physicians, particularly psychiatrists, have a "duty to warn" individuals whose lives may be in danger, even if the information is obtained in contexts believed to be confidential by patients who pose the threat. 8. Acting contrary to the law for moral reasons is the basis of civil disobedience, which is generally viewed as laudable. In the context of the practice of medicine, one can imagine a physician who periodically failed to comply with reporting requirements for child abuse and neglect laws on the grounds that reports result in greater harm than benefits to children in certain instances. 9. See Hart (1961) for his discussions of the "open texture" of the law (the feature which makes a continuing need for interpretation unavoidable) and of the relationship between law and morality. 10. The notion that morality is and must be wholly grounded in commands issued from God (or the gods) was first criticized by Plato in the fluthyphrg. Wilson exhibits a view similar to Plato's in the following: Do they [certain religious persons] really want to maintain that any religious or non religious authority can in_it§elf act as an acceptable basis for morality: that commandments must be obeyed just because they are commandments? Do they rather not believe that God (or the Church, or the State, or whatever authority it may be) ought to be obeyed because he says what is right or good, rather than just because he is God? And, if so, will not what the authority says about morality turn out to be justifiable in terms of our own criteria [i.e., the components], so that they need have nothing to fear from their application? Do they not think they are jusiified in accepting the authority--that they accept it for some rather than as a completely wild leap in the dark, a leap which (unless backed by some kind of good reason), would be logically indistinguishable from leaps made by Fascists, Baal- worshippers, or lunatics? And if all this is so, can we not find common ground, at least so far as morality is concerned, on the basis of those reasons for morality which we all accept as primary? (1969, p. 180) CHAPTER III BEYOND SCIENTISTIC EVALUATION: DEFENDING A FUNCTIONAL APPROACH Education...is not an enterprise which we know just how to handle and can reduce to a series of techniques, like vine-pruning or cutting corn or building bridges. It is a much more general enterprise--like marriage, religious counseling, childrearing, and psychotherapy. Such enterprises are shot through and through with conceptual confusion and uncertainty, and the necessity of making value judgments (Wilson, 1983, p.192). Evaluation is not "an enterprise which we know just how to handle and can reduce to a series of techniques" either-—it, too, is "shot through and through with conceptual confusion and uncertainty and the necessity of making value judgments." The preceding chapter clarified what to look at; this chapter will clarify how to look at it. This will be a complex task because many evaluators in fact endorse a "vine-pruning" approach: "Designs too often reflect a narrow operationalism based on obeisance to the controlled experiment as the design ideal" (Patton, 1983, p. 29). An attenuated form of positivism predominates in educational evaluation, inherited from behavioristic educational and social science research more generally. A rigid fact-value distinction, a related distinction between quantitative and qualitative methods, and a presumption that measurement must 6O 61 be grounded in psychological theory are central features of what shall henceforth be referred to as "scientistic"1 evaluation. Scientistic evaluation is flawed in general but is especially obstructive to the evaluation of medical ethics teaching. Given a fact-value distinction that places values beyond rational scrutiny, evaluating teaching in the realm of moral values becomes an incoherent activity. If morality is simply a matter of tastes, feelings, and upbringing, and all positions are equally rational, then there are no criteria to support evaluative judgments about the performance of students.2 Given a quantitative—qualitative distinction, which labels qualitative methods unscientific and impressionistic, the evaluation of ethics teaching is bound to be labeled "soft". All that is relevant to the evaluation of ethics teaching cannot be reduced to a collection of operationalized characteristics, measured with paper-pencil tests and checklists of gyent_hehayigr, and analyzed statistically. Qualitative methods, such as interviews and direct observation, are indispensable. Finally, the presumed superiority of criteria and measures grounded in psychology leads to an emphasis on reliable 3 that Messick (1981) measures at the expense of valid ones has dubbed the "tyranny of reliability". In connection with the evaluation of ethics teaching, evaluators search the terrain for some proven instrument-—proven in the sense that it measures reliably--with little or no attention paid to 62 what is being measured. As a result, measures of attitudes, overt behavior, and Kohlberg's stages of moral development have been frequently advocated or used in the evaluation of medical ethics teaching with far too little attention paid to the validity of such measures for evaluating teaching. This chapter criticizes each of the three features of scientistic evaluation--its construal of the fact-value and qualitative-qualitative distinctions and its presumption in favor of criteria and measures grounded in psychological theory--with an eye toward vindicating an alternative "functional" evaluation approach. The functional alternative holds that evaluation should emphasize adjusting methods and measures to fit different objects and circumstances of evaluation (Cronbach, 1982; Cronbach and Associates, 1980; and Patton 1980); it holds that the relevance and usefulness of information, not a priori judgments based on abstracted criteria of methodological rigor, determine the quality of evaluation. II E -V 1 E' l' !' Persons who take the evaluation of ethics teaching seriously too frequently encounter the following general question: How can you exaluatg ethics teaching? Seemingly innocent, this question poses a fundamental challenge to the evaluation of ethics teaching because it is so often motivated by ethical subjectivism, i.e., it is so often motivated by the belief that there is nothing to be "right" 63 about in ethics and thus nothing to evaluate. Ethical subjectivism is an especially serious obstacle when it is exhibited in groups such as teachers and those responsible for evaluation, and subjectivism is indeed prevalent among these groups. In addition to MacIntyre's observation about the pervasiveness of ethical subjectivism in social research mentioned earlier, Scriven claims that to deny that value judgments are essentially subjective and undecidable "even in these days of the decline of positivism requires considerable intestinal fortitude and intellectual competence" (1979, p.13). In response to current thinking among evaluators and the "value-free doctrine", he remarks, ...attacking the value-free doctrine accurately and effectively is important because there are so many spurious reasons for adopting it and so much value—phobic pressure to accept it...if the doctrine is not rendered completely absurd by complete exposure, it will simply continue to rise from the ashes (1983, p.81). One doesn't have to look too far to find instances of Scriven's "value-phobia". Anderson and Ball (1978) devote three chapters of The Profession and Practice 9: ECQgcam Exaluatign, a standard text, to ethics and values in evaluation. The following sample of passages is eye opening: [1] Given the effect of ideology on evaluation, what is the evaluator's professional obligation? What should the evaluator do to inform or, perhaps, protect the program director and the other audiences for the evaluation? ...We should emphasize two principles--the evaluator cannot become a Spock-like emotionless Vulcan, and it is worthwhile for the evaluator to make explicit, in as honest and open a way as possible, the values he holds (p. 115). 64 [2] The evaluator had seen what the evaluator had wanted to see. (Or maybe the evaluator was correct in his perceptions, and our perceptions were biased by our distaste for such a subjective procedure) (p. 118). [3] Messick...discusses the impact of these professional values in an excellent paper. He points out that values pervade not only our decisions on where to look but also our conclusions about what we have seen. And he presents an illustration that is better left in his words than paraphrased in ours (lest our values distort his meaning!) (p. 118). [A] Evaluators with different basic social, moral, and economic values, different predispositions, and different preferences will perform different evaluations. Unless the audiences for the evaluations are informed about these underlying influences misconceptions are likely to arise (p. 12”). [5] To the extent possible and without creating intolerable confusion, the evaluator should inform audiences for the evaluation results how...results are based upon a particular evaluation approach...The evaluator will generally have to depend on a simple accounting of what decisions were made during the evaluation, why those decisions were made, and what the major alternatives were. This is the honest and open approach. Given our values, we recommend it! (p. 125) Viewed in isolation from one another, the first four of these passages may appear reasonable and, indeed, true. But together they paint a rather clear picture which the last passage (the closing one of the chapter) brings into sharp focus. The message is that there is an obligation to get values out on the table, but no suggestion is made about what ought to be done with these values once they are made explicit.14 What is worse, it looks as if value claims are always up for grabs and are to be made only with great trepidation. The qualifier in the last passage, "given our values", brings home the value phobia—-as if one had to hedge on the demand for an honest disclosure of relevant information and had no justification for writing-off autocrats. 65 The value phobic fact-value distinction has at least two sources: positivistic epistemology and the attempt to avoid bias. Donald Campbell appeals to both reasons, and at a level of philosophical sophistication unusually high among educational researchers. His views are well-suited to frame a general examination of the issues. The conclusions of this examination of Campbell's views may then be related to the evaluation of medical ethics teaching in particular. I] P [l' . I' I I'E' l' Campbell follows the logical positivists on the fact-value distinction by his own admission; his basic epistemological stance is contained in the following: The tools of descriptive science and formal logic can help us implement values which we already accept or have chosen, but they are not constitutive of those values. Ultimate values are accepted but not justified (1982, p. 123). The claim that "ultimate values are accepted but not justified" is true in the sense that not all values can be up for grabs at once, but the contrast Campbell presumes between values and the "tools of descriptive science and formal logic" does not exist. Campbell himself elsewhere (1974) argues for the "presumptive" nature of scientific knowledge. That is, he shares the general Kuhnian view that knowledge is based on theory-laden observations and beliefs. The consequence of construing knowledge as presumptive, which Campbell fails to appreciate, is that there can be no "ultimate" justification for beliefs of any kind, including basic scientific ones. As Wittgenstein observes, "Justification...comes to an end. If it did not it would 66 not be justification" (1958, p. 136e). Wittgenstein's observation applies as well to scientific claims as to value claims. Thus, Campbell's way of distinguishing factual and value claims-—on the basis of whether they must be "accepted but not justified"-—fails. The positivistic fact—value distinction is, after all, based on positivism's central (and contra-Kuhnian) notion that purely observational, atheoretical knowledge can be isolated and used as the basis for theory. To qualify as a legitimate knowledge claim, to be "cognitively significant", a sentence had to be testable in one of two ways: either in terms of direct observation or in terms of formal logic. (Notice the similarity to Campbell's "tools of descriptive science and formal logic".) Since value claims could not be verified (or falsified) in either of these ways, they were judged to be devoid of cognitive content, to be mere expressions of emotions. The positivists more or less backed into "emotivism"-—a view that equates ‘Abortion is wrong' with evincing an emotion, e.g., 'Boo abortion!'--as a result of their more general epistemological position. The often overlooked but crucially important implication of the abandonment of positivism is this: If the positivistic attempt to ground all knowledge in some sort of atheoretical reality is untenable (which few, including Campbell, nowadays deny), then the justification for the rigid distinction between facts and values is equally untenable. The positivistic construal of the fact-value distinction is 67 merely a corollary of the more general observation-theory distinction. Because the positivistic fact-value distinction is baseless, it is illicit to separate value judgments from the conduct of research, especially social research, on the grounds that values are merely emotive and non-cognitive. In his discussion of post-positivistic social science, Scriven puts the matter as follows: There is no "ultimate observation language..." Analogously, there is no ultimate factual language. And the more interesting side of this coin is that many statements which in one context clearly would be evaluational are, in another, clearly factual. Obvious examples include judgments of intelligence and of the merit of performances such as those of runners of the Olympic Games (1969, p. 199). Scriven concludes, "...there is no possibility that the social sciences can be free either of value claims in general or of moral value claims in particular..." (p. 201). ' Richard Rorty echos Scriven: ...there is no way to prevent anybody using any term "evaluatively." If you ask somebody whether he is using "repression" or "primitive" or "working class" normatively or descriptively, he might be able to answer in the case of a given statement, made on a given occasion. But if you ask him whether he uses the term only when he is describing, only when he is engaging in moral reflection, or both, the answer is almost always going to be "both". Further-and this is the crucial point-~unless the answer is "both," it is not the sort of tenm that will do us much good in social science (1982, p. 195-96). Scriven's and Rorty's conclusions follow from the refutation of the central tenets of positivism. These tenets will be explored further in the subsequent section on the quantitative-qualitative distinction. It is sufficient to observe at this point that ultimate, theory-free, factual 68 knowledge cannot exist, and that the corollary fact-value distinction is untenable. As a consequence, setting values to one side because they don‘t have to do with matters of fact is indefensible. Value judgments are subject to rational criticism, and no researcher can avoid value commitments (whether such commitments are acknowledged or not). It is absurd, for example, to suggest that the participants in the Manhattan Project were engaged in a value-free enterprise, i.e, one not subject to moral evaluation. Value commitments are even more fundamental to the conduct of social research, because, as Scriven and Rorty observe, the very concepts that social researchers employ are evaluative of human behavior. 9 11' B' An epistemological justification for the fact-value distinction went the way of positivism, but the distinction persists-—rises from the ashes a la Scriven--as the result of misguided efforts to safeguard pluralism and avoid bias. For example, consider the motives Campbell (1982) exhibits in the following: An established power structure with the ability to employ applied social scientists, the machinery of social science, and control over the means of dissemination produces an unfair status quo bias in the mass production of belief assertions from the applied social sciences...This state of affairs is one which...I deplore, but I find myself best able to express my 69 disapproval through retaining the old-fashioned construct of truth, warnings against individually and clique selfish distortions, and a vigorously exhorted fact-value distinction...(p. 125) [The] effbrt to make us aware of biased-paradigm co-optation is again one best done by retaining a traditional fact-value distinction; it is a matter of becoming self-critically aware of our profoundly relativistic epistemologic predicament and using this awareness in the service of a more competent effbrt to achieve objectivity, rather than employing it to justify giving up the goal of truth (p. 126). Campbell clearly opposes the manipulation of social scientific knowledge in a way that serves the interests of powerful groups-~a laudable position. Rather than avoiding bias, however, by waving the banner of value-free social science, Campbell is more likely contributing to bias by obscuring tacit value commitments. To illustrate this danger, consider the concept of intelligence and whether it is possible to strictly separate truth, facts, and values. (The possibility of such a strict separation is an implication of Campbell's view and has recently been defended explicitly by Jensen, 198“). If research on intelligence involves the "goal of truth" solely, it should be possible to divest 'intelligence' of evaluative meaning. 'Intelligence', however, is one of Scriven's "obvious examples" of a concept with both descriptive and evaluative uses. This two-edged feature of 'intelligence' is not a problem per se: social science concepts must have this feature to be useful for guiding practice. That is, unlike 'velocity' Qua physical science concept, 'intelligence' qua social science concept would be worthless if it did not have an evaluative use. The quest, 70 then, for the unvarnished, value-free "truth" about intelligence is misguided. This quest introduces the potential for bias, and not merely the kind of bias involved in the outrage of administering intelligence tests to those not fluent in the language of the test.5 Intelligence tests measure characteristics that are implicitly viewed as valuable or good (which, again, is what makes 'intelligence' a useful social science concept) and therefore introduce the possibility of a more fundamental kind of bias. It is not too hard to imagine a society in which "1.0. tests" would measure the ability to construct a bark canoe. Closer to home, just as the label "intelligent" entails roughly "having something good", the label "mentally retarded" entails roughly "lacking something good." The fate of some famous "Baby Does"--anomalous newborns allowed to die for their "own good" and the good of others-—is dramatic testimony to the evaluative meaning of intelligence and its consequences.6 Less dramatic but much more common are the well-known effects of labeling school children on the criterion of intelligence (and on numerous other criteria as well). The problem with Campbell's "vigorously exhorted fact-value distinction" is the implication that issues of value can (and should) be bracketed and set to one side while researchers go about the task of collecting purely descriptive data. At an epistemological level, because the 71 positivistic fact-value distinction is untenable and social science concepts are inherently evaluative, it is impossible to sharply distinguish factual from value claims. At the level of practice, the attempt to bracket values in the name of truth and science, to protect pluralism, and to avoid bias, only results in a more insidious bias. If one insists that value judgments are irrevocably non-cognitive, or biased, or to be "accepted but not justified", one is left with the conclusion that social research is flawed in the same way. The way to avoid this conclusion is to accept that value judgments are part of the fabric of social research and to recognize that they must be defended and may criticized like any other kind of judgment. Value judgments, like factual judgments and theoretical analyses, are of two kinds--the well-supported and the poorly supported. No scientist can avoid making them, although it is gertainly possible to avoid making good ones (Scriven, 1983, p. 1) anclusicn As stated earlier, these general conclusions about the need and unavoidability of using value-laden concepts and of making value judgments in social research may be made specific to the evaluation of medical ethics teaching. In the preceding chapter, seven goals were proposed as a framework for evaluation. Although these goals may be open to criticism of various kinds, a criticism which does not apply is the global one that the goals are value-laden and thanaflgna objectionable. Moral Regard, Empathy, Interpersonal Skills, and so on, are construed as good 72 things, but they are also descriptions. In this, they are just like other educational goals and criteria such as, achievement, cognitive skills, positive attitudes, and so on. The fact-value distinction thus cannot render the evaluation of ethics teaching suspect or untenable unless it reduces all educational evaluation (indeed all of social science) to the same position. At the most general level, the question, "How can you ayalnata ethics?" may be answered, "The same way you evaluate anything else". I] Q I'I I' -Q ].| I' D' I' I' The identification of quantitative methods with something epistemologically respectable and qualitative methods with something epistemologically suspect is the second feature of scientistic evaluation this chapter set out to criticize. The quantitativeequalitative distinction7, although less directly related to the evaluation of ethics teaching than the fact-value distinction, requires careful consideration because those who espouse it as marking some deep rooted epistemological distinction between the scientific method and some other non-scientific one are likely to scoff at ethics as an unscientific, second-rate pursuit. The same positivist view of knowledge which undergirds a rigid fact-value distinction undergirds a quantitative-qualitative distinction based on the purported difference between science and non-science. Thus, the fact-value and quantitative-qualitative 73 distinctions go hand in hand. In so far as qualitative methods are indispensable for evaluating ethics teaching, they need to be freed from their status as the "soft" and subjective handmaiden of quantitative methods, lest the evaluation of ethics teaching suffer from guilt by association. A combination of quantitative and qualitative methods will be advanced as the appropriate methodological stance. Although advocating such a combination is not new, the practice has been questioned on the grounds that it is an ad hoc expedient and, accordingly, is epistemologically incoherent. The positivists can claim as much responsibility for the rigid quantitative-qualitative distinction as for the rigid fact-value distinction. The advent of positivism prompted a debate over whether social research should employ the physical science model portrayed and advocated by positiv- ism, or should employ some alternative "interpretative" model of its own. This positivist-inspired forced choice has set the terms of the contemporary debate about quantitative versus qualitative methods and—-where the positivistic physical science model is identified with quantitative methods and the interpretive model with qualitative methods--limits the positions to two: one may divorce the issue of research methods from more abstract epistemological issues and employ whatever method or combination of methods seems to make sense (e.g., Reichardt and Cook, 1979), or one 7A may hold that abstract epistemological issues dictate methods and seek to reconcile the competing positivistic and interpretive views (e.g., Smith 1983a and 1983b). This creates a dilemma. Reichardt and Cook offer good arguments in support of combining quantitative and . qualitative methods, but their general suggestion that the two "paradigms" of research (roughly, positivistic and interpretive) are logically independent from the means of obtaining knowledge is a heavy price to pay (and they should be suspicious of "paradigms" that are independent of methods). 0n the other hand, Smith is correct to require a logical connection between epistemology and research methods; but, by tracing out the implications of the positivist versus interpretive frameworks, he winds up with the unwelcome conclusion that qualitative and quantitative methods "do not seem compatible given our present state of thinking" (1983a, p. 12). The plan of this section is to escape both horns of the dilemma by criticizing the underlying positivistic presuppositions that generate it. The result will be the the best of both worlds: a free hand to use whatever method or combination makes sense and a legitimate epistemological foundation for doing so. Qnalitatixe_and_flnantitatixe_Data The most frequent positivist-inspired charge against qualitative data is that it is "subjective". Scriven (1972) responds that 'subjective' is ambiguous and that trading on 75 this ambiguity leads to erroneous conclusions about the merit of qualitative methods. He distinguishes between "quantitative" and "qualitative" subjectivity. To say that a claim is "quantitatively" subjective roughly means that it is based on the observations and arguments of a few; to say it is "qualitatively" subjective roughly means that it is highly contestable. Scriven's crucial point is that a claim that is subjective in one of these senses is not necessarily subjective in the other sense. For example, at one time the claim "The earth is round" was quantitatively subjective, but it proved not qualitatively subjective because it was backed by convincing evidence and reasoning. On the other hand, "Chocolate ice cream is better than rocky road", is qualitatively but not quantitatively subjective. Although many would assent to this claim, it is inappropriate (and unimportant) to try to establish whether it is QQLL39L. Scriven's distinction provides some help in clarifying the sense in which data ought not be subjective and the sense in which whether data is subjective is beside the point. The best course, however, is to dispense with the subjective-objective distinction in discussions of research methodology precisely because of the ambiguity Scriven identifies. The real issue for epistemology and for research (and I think this is essentially the point Scriven is making) is fallibility: to disparage qualitative data as subjective is to accuse it of having high fallibility (H-fallibility); to laud the objectivity of quantitative 76 data is to construe it as having low fallibility (L-fallibility). For the positivists, the limit of L-fallibility for empirical claims (an attempt at infallibility) was atheoretical, "protocol" sentences. Concerted efforts, however, to produce a satisfactory explication of the relationship between such "purely" observational protocol sentences and scientific theories (which, if successful, would have met the positivists' goal of reducing theory to a logical concatenation of observation sentences) met with failure. The concept of "cognitive significance" (the verifiability criterion)8 grew ever more complex and unwieldy and was eventually abandoned. As Phillips observes (1983, p. 7): "The principle of verifiability suffered the same fate as the 'Elephant Man'--it became a contorted monstrosity that choked under its own weight". (One might say, alternatively, that the poor creature was choking and Quine administered euthanasia.) The best post-positivistic approximation to the protocol sentence (the limit of L-fallibility for empirical claims) is Quine's "observation sentence". Although it does not measure up to the demands of the positivists, it "accords with the traditional role of the observation sentence as the court of appeal of scientific theories" (1969a, p. 87). His characterization is as follows: An observation sentence is one which all speakers of the language give the same verdict when given the concurrent stimulation. To put the point negatively, an observation sentence is one that is not sensitive to differences in past 77 experiences within the speech community (1969a, pp. 86-87). Two things to note about this definition. (1) Given the positive formulation, observations are based on the criterion of intersubjective agreement among observers and, accordingly, always retain some degree of fallibility (no matter how small) since it is possible for virtually everyone to be mistaken. (2) Given the negative formulation, observation sentences are objective (or unbiased) in that they do not depend on irrelevant idiosyncracies of observers. As observations move away from the limit of Quine's observation sentences the issue of fallibility becomes complicated, and this is the crux of the distinction between quantitative and qualitative data. At first glance, quantitative data might appear to be uniformly superior. For example, counting the number of students in a classroom generates quantitative data. The claim 'There are x students' is an instance of Quine's limiting case of an observation sentence; the data is markedly L-fallible. By contrast, observing the workings of a classroom in terms of the group dynamics results in qualitative data far removed from the limiting case and markedly H-fallible. Given this example, quantitative data is much better than qualitative data on the criterion of fallibility, and such examples no doubt account for the tremendous faith placed in quantitative data. The essential point is that this example cannot be generalized; just the opposite ordering of 78 fallibility between quantitative and qualitative data is possible and indeed common. Consider pilot testing an attitudinal instrument. The ultimate aim of developing the instrument is to gather quantitative data. Experience has taught that attitudinal measures frequently suffer from difficulties in interpretation, which may render data of questionable validity. In other words, in the absence of concerted development efforts, attitudinal measures tend to be H-fallible relative to the questions of interest. What is the solution? It is to get some subjects together, administer provisional versions of the instrument, and request their opinions about interpretation. Since their opinions constitute qualitative data, and since this data is being used to reduce the fallibility of the quantitative data to be ultimately collected, the qualitative data is presumed to be less fallible than potential quantitative data. Judgments about validity have this characteristic in general, i.e., they are undergirded by qualitative data and judgment. To conclude the discussion of quantitative versus qualitative data, the nature of concepts used in educational research--concepts like intelligence, reasoning, achievement, and attitudes--is such that ultimate dependence on qualitative judgments and data always lies at the bottom of minimizing the fallibility of quantitative instruments. So long as educational research remains couched in terms of 79 such concepts (and it must to have a bearing on practice) quantified data gathering will have to remain faithful to and parasitic on qualitative judgments and terms; the latter cannot be eliminated. This section does not aim to throw out the baby with the bathwater, i.e., the aim is not to disparage quantification. The section has been devoted to a defense of the use of qualitative data because the use of quantitative data has not been required to endure the same amount of unjustified criticism. Quantitative data is valuable because, following Campbell (197” and 1979), Quantitative data goes beyond and provides a check on qualitative data. Provided relatively L-fallible (reliable and valid) instruments are employed and (not unrelated to this) suitable research questions are addressed, quantified data has distinct advantages over qualitativedata. Generally speaking, quantified instruments allow attention to be focused on variables of interest, they reduce distractions or "noise", and they permit finer discriminations. Moreover, the introduction of mathematical symbols permits an economical summary of data that in turn facilitates analysis. (Imagine trying to do arithmetic in English, with no mathematical symbols.) Comparisons on the variables of interest become manageable, and the magnitudes of differences can be investigated. Finally, once the instrument development stage is completed, quantitative data can be much more efficient. (Imagine sending out an army of observers to obtain the information 80 available from standardized achievement testing.) W2: Data, whether quantitative or qualitative, is used to support inferences, and the way the positivists construed saiantifig inference in particular contributes markedly to engendering the forced choice between the physical science and interpretive models of social research. The positivists' general view of scientific inference had two important features: (1) scientific inference consists of confirming (disconfirming) quantitative theories and laws by appeal to their logically inferred observational consequences, and (2) the logic of scientific inference is the same for social science as for physical science. If the positivistic construal is correct, then quantitative and qualitative methods are indeed incompatible, for the use of quantitative methods in social research would be tantamount to the attempt to build quantitative laws. Both of the positivists' claims about scientific inference were wrong. Showing how they were wrong will remove the final vestige of the rigid distinction between quantitative and qualitative methods lnfananga_in_£hyaigal_§gianag. Post-positivistic philosophy of science rejects the positivistic notion that the relationship between empirical evidence and corresponding laws and theories is a precise one, explicable in terms of formal logic. The post- positivistic (Quineian-Kuhnian) view is roughly as follows: one begins 81 with a hypothesis which is tested against evidence deemed appropriate. The evidence will either provisionally confirm the hypothesis or prove inconsistent with it. In the latter case, the evidence may either be discounted (attributed to a poor reading of the results, inaccurate measurement, or simply viewed as anomalous), or it may be accepted as falsifying. If it is accepted as falsifying, then matters become complicated because the empirical test does not apply to the hypothesis in isolation, but to a constellation of beliefs (a "conceptual scheme") in which the hypothesis is embedded. In effect, a conjunction of beliefs, including the hypothesis of interest, is put to the test rather than--as positivism would have it--some deductive consequent that may be directly confirmed or falsified. When evidence is accepted as falsifying, some further choice must be made regarding which belief included in the conjunction is affected, and this decision cannot be read off, as it were, from the evidence provided by the empirical test. In addition to the evidence from given empirical tests, other empirical beliefs, metaphysical beliefs, and general guiding principles--simplicity, scope, and familiarity--come into play in deciding how the evidence should affect the shape of the revised conceptual scheme (Quine, 1970). In the physical sciences, quantified laws are employed that significantly circumscribe the area of interest and dictate what is to count as confirming and disconfirming evidence. Nonetheless, the demise of positivism entails 82 that quantitative evidence, even in the physical sciences, can never be interpreted independent of extra-observational and extra-theoretical (qualitative) considerations that help define both the theory in question and the broader conceptual scheme in which the theory is embedded; quantification does not eliminate qualitative judgments and therefore is not an altexnalile to them. lnfananga_in_§ggial_fla§aangh. The purpose of social research is to improve human practices, and this counts against the second feature of the positivistic construal of scientific inference, namely, that the same characterization of inference applies to social research that applies to physical science. If social research is to inform constructive change, it has to employ the kind of two-edged concepts described previously. Consequently-~and this is where it importantly differs from physics--the concepts used in social research must be validated in terms of human interests and practices.9 Because quantitative data must be grounded in such two-edged concepts, only modest networks of quantitative laws are possible--concepts like 'reasoning', 'achievement' and 'attitude' do not readily lend themselves .10 Even this modest level is to relationships like f : ma rare and controversial, and nothing like the physical sciences. Inferences based on quantitative social science data are therefore much more piecemeal and disjointed than inferences in physical science, which is to say they require extra-theoretical (or qualitative) judgment to a 83 dramatically higher degree11. Although the fit between hypotheses, the empirical tests associated with them, and resultant data is much looser in social research than in physics, there is no incoherence in employing quantitative methods to investigate non- nomological issues. Even in the physical sciences, where the aim is quantitative law-building, quantitative findings ,do not dictate all of the scientific judgments that have to be made. Again, various assumptions and beliefs within both a theory itself and the broader conceptual scheme in which the theory is embedded invariably come into play. To conclude the discussion of the quantitative- qualitative distinction, criticisms were advanced at two levels. At the level of data, qualitative data are not a priori highly fallible. Indeed, quantified measures always presuppose qualitative data and judgments. At the level of inference, the belief system in question always incorporates substantive qualitative beliefs that play an ineliminable role in drawing conclusions. The consequence is that quantitative and qualitative methods are not incompatible. On the contrary, the methods are inextricably interwoven, and all who advocate combining quantitative and qualitative methods are thus on solid epistemological ground. 92mm Champions of scientistic evaluation identify quantitative methods with objective data and scientific 8A inference and identify qualitative methods with subjective data and non—scientific inference. They disparage functional evaluation because it permits practical research aims to determine research methods and thereby compromises methodological rigor. The scientistic view is undermined by observing that fallibility reduction (not "objectivity" or "the scientific method" per se) is the interesting epistemological issue. Quantitative data and quantitatively-based inferences are not a priori less fallible than qualitative data and qualitatively-based inferences. The aim of evaluation is reducing fallibility angst gaastigns Qf intanast, and the guastigns Qf intanast have to do with improving human practice. Meeting this aim entails employing a two—edged (evaluative-descriptive) vocabulary that renders inference in the social sciences very unlike that in the physical sciences, straining the scientistic view to the breaking point. The discussion of the scientistic construal of the qualitative-quantitative distinction may be concluded by expanding on an analogy attributed to Hilary Putnam: "If you want to know why a square peg does not fit into a round hole you had better hat describe the peg in terms of the positions of its constituent elementary particles" (Rorty, 1982b, p. 201). The "functional" alternative to scientistic evaluation responds to a general, and one would think uncontroversial, definition of rationality12: "Maximization 85 of utility when the labor and costs of calculations and thinking are taken into account" (Good, 1983, p. 23). One simply does what makes sense, or "what works". As Scriven puts it: "The [evaluator's] task is...not to use some experimental design but the best one that is feasible--and that only if it is good enough to establish a worthwhile conclusion" (1983, pp. 76-77). An_IllustLatixe_Examnle A number of issues have been considered to this point at a relatively abstract level. Although the next chapter will consider three examples in some detail, a concrete example at this point will help tie the discussions of the quantitative—qualitative and fact-value distinctions together and suggest how educational evaluation can bridge the gap between quantitative and qualitative methods on the one hand and facts and values on the other. Example. An effort is undertaken to investigate the effects of a medical ethics course required of second-year medical students. The format of the course is 1 hour of lecture per week combined with 2 hours per week of small group discussion. Students meet jointly for the lectures and divide into 6 groups of approximately 10 students each for discussion. Each discussion group is led by a distinct set of preceptors consisting of one physician and one Ph. D. from the humanities or behavioral sciences. Assume that data on learning is collected in 3 ways: direct observations of the discussion groups, interviews with the preceptors and students, and pre-post cognitive testing. Assume the following set of results: (1) Direct observations of the small group discussions and testimony from interviews support the hypotheses that students improved in Knowledge, Reasoning, and the Interpersonal Skills associated with collegial give-and-take. (2) Preceptors, students, and an independent expert question the validity of the cognitive test. A student informant reports that many students, knowing the posttest did not count toward their grade in the course, did not take it seriously and 86 some intentionally did poorly. (3) An analysis of the pre-post testing in terms of a paired t-test yields a statistically significant negative change. This study combined quantitative and qualitative methods in two ways. First, quantitative methods (pre-post testing) and qualitative methods (direct observation and interviews) were combined disjunctixely to investigate distinct issues: pre-post testing was keyed to cognitive learning, and observation and interviews were keyed to interpersonal skills. Second, quantitative and qualitative methods were combined ganinngtixelx (i.e., as multiple indicators) to investigate the same issue: pre-post testing, observations, and interviews were all used to investigate cognitive learning. When quantitative and qualitative methods are combined disjunctively, the interpretation of results is relatively straightforward--one simply draws distinct conclusions based on distinct evidence. The more interesting case is when quantitative and qualitative methods are used conjunctively. The conjunctive combination of methods is typically the most puzzling and is the combination that leads thinkers (e.g., Smith, 1983a and 1983b) to question whether quantitative and qualitative methods can be coherently combined. To reiterate, the contention that quantitative and qualitative are incompatible is an upshot of accepting the positivistic notion that scientific inference consists in building quantitative laws in a mechanistic fashion. Consider the positivistic portrayal of inference in light of 87 the reasoning that is warranted on the basis of the evidence described in the medical ethics example. If quantitative methods have the capacity to yield straightforward mechanical conclusions, then the interpretation in the example is clear: the course had a negative impact on students, and the fact that this conclusion is extremely difficult to accept (that it is H-fallible) should be irrelevant to the inference. But quantitative methods have no such capacity precisely because other evidence based on considered (qualitative but L-fallible) judgment may overrule quantitative findings. Contrary to positivism, it is not possible to deduce isolated observation consequences from hypotheses or theories. Not even the results of experiments in the physical sciences have the capacity to coerce conclusions because individual beliefs or hypotheses are never tested directly. An anomalous finding surely requires some adjustment in the belief system, but just where is not something that follows mechanically from quantitative results. The data from the medical ethics course suggests one especially attractive alternative to the negative impact interpretation, namely, that the cognitive test, the testing procedure, or both, were invalid. The evidence from the observations and interviews supports this interpretation, as does the testimony that the test lacked validity from students, preceptors, and an outside expert. One obviously i r" A d. A‘RA-ianoa. —« .‘ 88 does not have good grounds to conclude the course caused cognitive learning. Because of the qualitative findings, however, a decision to dismiss the quantitative findings and thus the hypothesis that the course had a negative impact is warranted. This conclusion is straightforward and one I trust virtually any researcher would draw, but its obviousness should not obscure two important points. First, the medical ethics study employed quantitative methods, but nothing remotely resembling quantitative law building was involved. Second, qualitative evidence was relevant to the same issue as quantitative evidence, namely, cognitive learning. The two kinds of evidence checked one another, reducing the confidence that could be placed in either alone. The example shows that quantitative and qualitative methods may be combined both disjunctively and conjunctively. Without falling back on dogmatic methodological assertions, it is difficult to see how either kind of combination of methods is epistemologically suspect. The combination of quantitative and qualitative methods comprised the descriptive element of the medical ethics study, and this element was inextricably linked to the evaluational element. The medical course was studied in terms of cognitive learning and interpersonal skills. These concepts were used to describe the effects and events of the course (the facts), but they also embodied value judgments. It would have been redundant in the context of the 89 study to say, "Students learned a great deal and this is good." 'Cognitive learning' and 'interpersonal skills' were not just convenient descriptors, but valued and defensible ends of the course, worth gathering the facts about. The use of quantitative data in the study does not undermine these value commitments or the ability to make claims about whether desired ends are achieved. Whether a concept is quantified is not directly linked to whether it is evaluative. (Consider judging a diving contest or assigning grades to students.) The data on cognitive learning from the medical ethics study does not mysteriously shift back and forth between being value-free and value— 1aden, depending on whether the quantitative or qualitative data are at issue. In summary, the medical ethics course evaluation was driven by judgments about what educational ends are worthwhile. Quantitative and qualitative methods were combined to determine whether these ends were being achieved. There is (or should) be little difference between the general principles that guided this study and those that guide educational evaluation more generally. Again, the question, "How can you ayalnate ethics?" may be answered, "The same way you evaluate anything else." The third general feature of scientistic evaluation is a preoccupation with being scientific that leads scientistic evaluators to borrow the measures and mimic the methods of the social sciences, especially psychology. Research on moral education is no exception. The previous section on the quantitative-qualitative distinction addressed research methods; this section will bracket that issue and focus on general approaches to measuring the characteristics of_ interest. This marks a partial return to the specification of goals and evaluative criteria--the question is whether moral psychology can and should provide the evaluation of medical ethics with such criteria. Moral psychology is customarily divided into three schools: psychoanalytic, behaviorist, and cognitive moral development (e.g., Hall and Davis 1975; Wren, 1982). Psychoanalytic theory is rarely considered in connection with moral education because its relationship to education as normally conceived is tenuous and because instructors can not be expected to be (nor should they be) therapists. Accordingly, the focus will be on behaviorism (including its near cousin, social learning theory) and Kohlberg's cognitive development theory. The aim is to show that the strategy of borrowing from these two schools of moral psychology is ill-advised in the present context; that whatever their merit as psychological theories, they have 91 little to contribute to the evaluation of medical ethics teaching. Behaxicnism Behaviorism is directly linked to an issue previously discussed. The inappropriateness of behaviorist-inspired evaluative criteria was mentioned in connection with Callahan's criticisms of moral behavior as a goal of medical ethics teaching. It was observed that Callahan's rejection of the criterion of moral behavior is well-taken provided that 'moral behavior' is construed in the behaviorist's sense. Further support for that observation is provided here. Crudely put, behaviorism was the attempt to render psychology scientific by eliminating reference to non-observables like intentions. Strongly influenced by positivism, behaviorists sought to mimic the methodology of physics (which ironically the positivists had gotten all wrong). Not surprisingly, the behaviorists' project failed, both practically and theoretically. On the one hand, the attempt at "thin" description, as it were, proved impracticable (Mackenzie, 1977). On the other hand, behaviorism's positivistic methodological constraint on what language was to count as permissible involved a fundamental philosophical flaw. The untenability of the observation-theory distinction was especially striking in social science. The elimination of all reference to the unobservable and mental-~the 92 elimination of things such as reasons, motives and volitions--renders it impossible to distinguish between what Melden (1966) terms "actions" on the one hand and mere "bodily movements" on the other. To use Melden's own example, the elimination of intentions makes it impossible to distinguish a person's arm simply rising (by reflex, for instance) and a person purposively raising their arm (to signal a turn, for instance). Especially relevant to moral behavior, it also becomes impossible to distinguish behavior on the basis of different intentions within the class of "actions". For example, a physician may tell a patient the truth either to avoid malpractice suits or to show respect for the patient, depending on whether the motive is self-regard or other-regard. At the level of overt behavior, (i.e., the movement of telling patients the truth) which motive is involved is beside the point. And this is the problem for behaviorism, because there is a clear and fundamental moral difference between acting out of regard for self and acting out of regard for others--a difference that 'overt behavior' cannot possibly capture. Social learning theorists who eschew intentions in favor of empirically based "theoretical constructs" do not provide an advance over behaviorists. Consider how Rushton defines the moral concept 'altruism': "Social behavior carried out to achieve positive outcomes for another rather than for the self" (1982a, p. 429). This definition is supposed to exclude intentions, to be stated in "objective, behavioral 93 terms", but as Krebs points out, in order to avoid reference to intentions, the following kind of definition is required: "Social behavior that achieves positive outcomes, that is, any behavior that produces a positive consequence" (1982, p. AA9). The problem with such a definition is manifest: It is difficult to imagine anyone arguing that a person who intended to kill another person but in the process shot a malignant tumor out of the victim's stomach (and therefbre, in effect, helped the person) was behaving altruistically" (Krebs, 1982, p. A49). Rushton avoids this absurd result only by doing what he claims not to be doing, namely, making reference to intentions--the expression "carried out to achieve positive results" smuggles them in. It is ironic that some of the most telling criticisms of behavioristic research methods are to be found in the works of so-called behavioristic philosophers. Gilbert Ryle, usually classified as a behaviorist, contends that behavioristic psychology takes one of two forms: mechanical and para-mechanical (1949). The mechanical form discards the mental altogether (e.g., radical behaviorism); the para-mechanical form allows mental concepts but requires that they be inferred from observation of overt behavior (e.g., social learning theory). According to Ryle, both the mechanical and the para-mechanical forms are based on an illicit bifurcation between overt behavior and the intellect. On the one hand, identifying a piece of behavior as intelligent requires more than a description of mere 94 movements; some intention must be attributed to the agent as well as the disposition to behave similarly in similar circumstances. This eliminates the mechanical model. On the other hand, intentions and dispositions are not inferred on the basis of indasangantly_dasgnihad overt behavior. That is, the description of behavior itself requires attributing intentions and dispositions (if the behavior is intelligent) or withholding such attributions (if the behavior is a movement). This eliminates the para—mechanical model. As Taylor (1964) observes, behaviorists and near behaviorists face a dilemma of either being saddled with an intentionless language that is too descriptively impoverished to capture anything interesting about human behavior or a language that surrepticiously incorporates intentions. One wonders why the attempt to avoid intentions is so persistent in light of these difficulties, and there appear to be two primary reasons. Avoiding intentions is believed to contribute to the goal of rendering psychology scientific by (1) reducing fallibility (i.e., increasing objectivity) and (2) eliminating unsupported theoretical constructs (e.g., Rushton, 1982b). Neither of these reasons is convincing in light of criticisms of positivism. Eallibilil¥_9£_ALLLihuLin£_lnifiniinns. The belief that the attribution of intentions is subjective and unscientific is on a par with (indeed it is an instance of) the criticisms advanced against qualitative data. Drawing on 95 the earlier discussion of the quantitative-qualitative distinction, it can be dismissed for the same kinds of reasons. Attributing intentions often depends on testimony, and much is made of this. Rushton (19826) gives the example of a defendant's testimony in a criminal trial as evidence for the tenuousness of testimony about intentions. Ironically, the doubt in such cases about one intention depends on confidence about another. In the case of the criminal defendant, doubt about the veracity of testimony is based on confidence in the intention to avoid punishment. (Here one is reminded of how unconvincing John Ehrlichman's claim was that only he could know his intentions concerning the Watergate scandal.) Intentions have explanatory and predictive force. If, for instance, it turns out that a piece of overt behavior is self defense and not murder, then, aside from the moral difference in these two actions, one can predict that releasing the accused will not lead to harm to innocent members of society. Likewise, knowing that physicians tell patients the truth out of respect for them as persons (rather than to avoid lawsuits) is a basis for predicting that patients will be treated well in general. The notion that attributions of intentions are inherently H-fallible is a bit of positivistic dogma, not an obvious truth. Examples drawn from the criminal law are extreme and are by no means paradigmatic. There is often little reason to doubt the inference from agents' testimony 96 to their intentions. As Wilson observes, I can be certain that Churchill did not intend to give into Hitler and I can be certain that the reason why my wife went to town yesterday was to buy a hat--not because of any scientific procedure, but (briefly) because they said so and there is no reason to suppose them insincere; moreover, their behaviour gives me supporting reasons for what their intentions were (196L.p.£NO). Even granting that attributions of intentions are less reliable than identifications of overt behavior, it does not follow that intentions should be eschewed--this would amount to succumbing to the "tyranny of reliability" (Messick, 1981). Increased reliability alone does not entail reduced fallibility regarding the question of interest. Suppose the interest were in the percentage of physicians who tell patients the alternatives to radical mastectomy in the treatment of breast cancer in order to determine physicians' respect for patients' rights. Although it is altogether possible to get a reliable measure of the degree to which physicians communicate treatment alternatives, the possibility of systematic error is real. Without knowledge of why physicians tell patients about treatment alternatives, inferences about whether physicians respect the rights of patients (even assuming the measurements are reliable) are precluded. Some might indeed have respect for the rights of patients, but others could be trying to avoid lawsuits, trying to please colleagues, or simply trying to keep their patients happy. Investigating only the overt action of communicating alternatives overlooks these distinctions, reducing the accuracy of generalizations to 97 other situations where the intentions would be pivotal. W915 Behavioristic theorists remain wedded to’a positivistic construal of the distinction between observation and theory: intentions are admissible only if they may be reduced to observation. Rushton (1982b, p. 461) claims that using intentions begs the question by requiring "definitions of behavioral phenomena to incorporate a favored explanatory concept"; he contends: Intentions, like motivations, or needs, or stages of development, or attitudes, or beliefs, or moral principles, or any other intangible hypothetical construct, are themselves inferred from W (1982b. p 460) The view exemplified presupposes that intentions can be clearly distinguished from and set over against descriptions of actions in the same way the positivists believed that theoretical-constructs could be set over and against observations more generally. This view may be criticized in two ways. First, it presupposes the untenable positivistic distinction between theory and observation. Observations (i.e., descriptions) of behavior are, in a manner of speaking, "intention-laden". For example, 'murder', 'voluntary manslaughter', 'involuntary manslaughter', and 'justifiable homicide' all name different actions. To attribute one intention rather than another is to identify one action rather than another. The relationship between intentions and actions is thus not contingent. It therefore would make no sense to investigate whether the intention to 98 help others regularly accompanied altruistic actions because the intention is definitive of the behavior. The charge that to use intentions in describing actions "begs the question" is thus incoherent. Inferring intentions from "regularities of behavior" is also incoherent, since identifying a regularity presupposes classifying a set of actions which in turn presupposes attributing intentions. Second, it is a distortion of major proportions to lump such things as intentions, moral principles, and the like, under the rubric "hypothetical constructs"--a rubric which would include such things as the id, quarks, and anti-matter. Intentions and moral principles are not posited in order to explain behavior causally; they have instead to do with describing and guiding it respectively. Social scientists are not free to use these (or operationalize them) in any way they see fit, for such concepts have entrenched ordinary meanings. Distortion, irrelevance, and insidious bias are bound to result when operationalizations trade on, but are not faithful to, ordinary meanings of action descriptions. analwsign. The discussion of behaviorism may now be brought to bear on the evaluation of medical ethics teaching. First, the claim that behavioristic approaches are the only means by which to obtain credible empirical knowledge about moral behavior is unconvincing. The claim is rooted in the tenets of positivism and subject to the standard criticisms advanced against that now moribund 99 construal of science. Second, the evaluation of ethics teaching is concerned with the kind of ordinary language meaning attached to descriptions of behavior like Moral Regard, Empathy, Reasoning, and so on, not with theoretical, intentionless concepts. Even if there is something to be gained in psychological theorizing by developing a set of "theoretical constructs"--and, arguably, there is not--such a vocabulary has little usefulness in the context of evaluating ethics teaching. Third, a primary aim of ethics instruction is the development of "discursive moral competence" (Ruddick, 1981). Ethics instructors seek to enhance the ability of their students to discourse about and provide reasons for their ethical views and behavior. Extending Ryle's position, Wren (1982) argues that because behavioristic approaches take either mechanical or para-mechanical forms, they are incapable of capturing the prescriptive and self-regulating nature of moral behavior. For behavior to have a moral dimension, reference has to be made to some prescribed norm. Praise (blame) follows from regulating (failing to regulate) one's self in terms of the norm. Mechanical and para-mechanical explanations depend only on which desire wins out; they leave completely out of the picture how reasons might serve to regulate behavior and in the process overrule one's strongest desire. 100 Wu Kohlberg's theory of moral development pummeled with criticisms,1n but the criticisms differ from 13 has been those lodged against behaviorism. Unlike behaviorism, his research seeks to investigate cognitive moral judgment: ethics instruction's primary focus. He explicitly acknowledges the importance of moral philosophy, and, according to Puka, "Cognitive-developmentalism is no less than an attempt to empiricize ideal models of moral rationalism" (1982, p. 471). Despite the promise of Kohlberg's theory as an alternative to behavioristic moral psychology, there are three compelling reasons for viewing it as ill-suited for evaluating medical ethics teaching: insensitivity to instructional effects, lack of interpretability, and implicit moral hegemony. WW Medical ethics instructors stimulate students to be more autonomous and less conventional in their moral thinking. In Kohlberg's terms, the aim is to move students from "level 2" to "level 3", so the theory captures one important aspect of ethics teaching. This aspect, however, is too general to have practical significance. Although pre-post gains in ethics courses have been reported using Kohlbergian measures (Rest, 1979), this is exceptional. Kohlbergian measures typically fail to demonstrate instructional effects (Lickona, 1980 and Rest, 1982). A medical ethics instructor would likely view 101 getting students to understand informed consent--the medical ethics literature addressing it and its reasoned justification--as considerable progress, yet such progress would be undetectable in terms of Kohlberg's levels and stages. The explanation might be that ethics teaching simply doesn't work, i.e., that there are no effects to be measured. Although plausible, this conclusion is hasty at best. Given the formal, i.e., content-independent, cognitive "structures" Kohlberg's theory posits, it is not surprising that measurable progress would not follow from the relatively brief exposure typically afforded by ethics instruction. A more reasonable conclusion is that Kohlberg's structures are too abstract to detect progress at more specific levels. For instance, in the present version of Kohlberg's theory (Kohlberg, 1982) the highest stage is the post-conventional and the next highest is the conventional, "law and order" stage. Progress would have to indeed be dramatic to move from the lower to the higher of these stages. One would also expect much variation within the post-conventional stage (e.g., two individuals, both at the post-conventional stage, would disagree about whether cancer patients should be told their diagnosis if they disagreed about whether such disclosures are harmful). Furthermore, it is doubtful that critical reasoning in general is as formal as Kohlberg's theory takes moral judgment to be 102 (McPeck, 1980), and empirical studies indicate that medical decision-making (Elstein et al. 1978) and moral reasoning (Iozzi and Paradise-Maul, 1980) are not. Lagk Qf Intezpcetahility. Even if progress were detected using Kohlbergian measures, it is not clear what to make of it. What, for example, does a one-qdarter stage gain on James Rest's DIT mean?15 If presented with these results, the ethics instructor is likely to press for a clarification--Does that mean that the students now see the reasons for the presumption in favor of telling their patients the truth? The evaluator will be at a loss. Although the problem of test interpretation cannot be altogether eliminated--What, for example, does a 5 point gain on a 50 item test mean?--interpretation is more straightforward when familiar reference points are available. A gain of 5 points on a 50 item test is equal to a gain of 10%, for instance, and may be the difference between a "B" and an "A-". In addition, when tests possess "face validity" (i.e., when the relationship between tests and the objectives of instruction is clear by inspection), interpretability is enhanced. For example, if students do poorly on items having to do with the concept of patient autonomy, then it may be inferred that the teaching of the concept is in need of improvement. Similar uses for Kohlbergian-type tests are unavailable. Implicit Mgcal Hegemgny. Kohlberg's theory is biased against women (Gilligan, 1977) and against the religious 103 (Dykstra, 1981 and Flanagan, 1982). This by itself raises a serious moral question about the use of Kohlberg's theory as the basis for evaluating students or-programs; it is also symptomatic of a more fundamental problem. By imposing a pre-set sequence of stages of development, where "higher" equals better, the stages become authoritative and effectively cut off discussion: judging the quality of a moral position becomes a matter of determining what stage it exemplifies. Although it is perfectly acceptable to describe and classify moral judgments in the research context, such descriptions and classifications cannot be straightforwardly converted into prescriptions for judging either individual moral positions or ethics teaching. As argued in the previous chapter, moral reasoning is evolving and self-critical, and "pre-established blueprints" are ruled out of court. Ccnelusicn This chapter endorsed functional evaluation and considered scientistic tenets that imply a functional approach is somehow epistemologically suspect or unscientific. Three obstacles were identified which are especially relevant to evaluating ethics teaching: a rigid fact-value distinction, a rigid quantitative-qualitative distinction, and a presumption in favor of criteria and measures grounded in moral psychology. 104 A rigid fact-value distinction underlies the demand that social research be descriptive only. If the distinction and demand are legitimate, the consequences for evaluating medical ethics teaching are disastrous. Ethics teaching is pre-eminently concerned with examining and defending moral judgments; a prime objective of such teaching is thus a species of just what is declared illicit--value judgments. But the grounds for a fact-value distinction that renders value judgments non-cognitive and undecidable went by the boards with positivism. There are no good reasons to view value judgments in general and moral judgments in particular as essentially non-cognitive. The demand that social research be wholly descriptive went by the boards as well. The concepts employed in educational research in general-~achievement, intelligence, and the like-~have evaluative interpretations. There is thus nothing particularly objectionable or epistemologically suspect about using clearly value-laden concepts, such as the Wilson+1 goals, in ethics teaching and its evaluation. For scientistic evaluators, quantitative methods are identified with science and objectivity, and qualitative methods are identified with non-science and subjectivity. This way of distinguishing the methods presupposes the untenable positivistic distinction between theory and observation. There is no a priori epistemological difference between quantitative and qualitative methods that warrants judging one or the other uniformly superior. 105 Reducing the fallibility of beliefs about issues deemed important is the only defensible criterion for assessing methods. Given this criterion, whether quantitative methods, qualitative methods, or some combination is best (rational, scientific, and so on) cannot be answered in the abstract; it depends on the nature of an investigation and the constraints under which it must be conducted. 'The evaluation of ethics teaching thus cannot be criticized a priori for employing or failing to employ this or that type of research method. Finally, the demand to use instruments based on moral psychology is not defensible. The behavioristic criterion of overt behavior is fundamentally flawed, inimical to the teaching of ethics, and subject to the same general criticisms that apply to positivism. Kohlberg's cognitive developmental theory is beset with practical problems when proposed as a basis for evaluating medical ethics teaching. Because formal instruction in medical ethics is relatively new and involves complex and poorly understood aims, a flexible approach to evaluation is indicated--an approach which begins with the nature of medical ethics teaching and works its way outward. Chapter II explicated the nature of medical ethics teaching; this chapter has overcome obstacles to working outward from that nature. The general criterion of the merit of evaluation is reducing fallibility about questions of interest. There are no good reasons to avoid value judgments, qualitative methods, or 106 non-scientific concepts in the evaluation of ethics teaching (or any other kind). On the contrary, each of these is indispensible. NOTES 1. "Scientistic" is borrowed from Cronbach (1982). He is not committed to my use of the term. 2. It is puzzling that many who would endorse this claim go on to derive the position, considered toward the end of Chapter II, that we ought not be teaching ethics at all. They seem oblivious to the fact that they are precluded from making any ought-statements whatsoever. Their problem is similar to the determinist's who, after arguing that all actions are determined and therefore beyond our control, advises us not to cry over spilt milk. 3. For those who are unfamiliar with the term, 'reliability' roughly means consistency. For example, if a test renders dramatically different scores for an individual on two administrations within a reasonably short period of time, its test-retest reliability is low; if two individuals assign markedly different scores to the same test, its inter-rater reliability is low (a common problem with essay exams). The term 'validity' roughly means fidelity. It is commonly held that some degree of demonstrated reliability is a necessary condition of validity. 4. Lipman et al. (1977) criticize "values-clarification" for having this characteristic. The hidden message is that values have no cognitive, rational dimension amenable to critical scrutiny. 5. Mercer (1971), for instance, documents the effects of intelligence testing on educational placement of groups such as Hispanics and blacks. 6. Refusal by parents to consent to surgery for Down's infants with correctable life-threatening anomalies is based in large measure on the expected intellectual functioning of such infants. Whether such refusals are morally justified has been a topic in the medical ethics literature for at least a decade (e.g., Shaw, 1977 and Rachels, 1975). 7. The distinction between qualitative and quantitative is by no means clear, if for no other reason than the sheer number of descriptors used to characterize it. For example, below is a partial list adapted from Reichardt and Cook (1979, p. 10). 107 108 QUALITATIVE: naturalistic and uncontrolled observation subjective close to the data; the "insider perspective" grounded, discovery-oriented, exploratory, expansionist descriptive, and inductive process-oriented valid; "real", "rich", and "deep" data QUANTITATIVE: obtrusive and controlled measurement objective removed from the data; the "outsider perspective" ungrounded, verificationist-oriented, confirmatory, reductionist, inferential, and hypothetico— deductive outcome-oriented reliable, "hard", and replicable data 8. The "analytic-synthetic" distinction was crucial to logical positivism's concepts of "cognitive significance" and "verifiability". Analytic statements were defined as ones whose truth-value could be determined solely by appeal to logic and meanings (e.g., All bachelors are unmarried). Synthetic statements were defined as ones whose truth-value could be determined solely by appeal to observation (e.g., Grass is green). Any statement which was to count as "cognitively significant" (i.e., legitimate in science) had to be verifiable either in terms of observation (if it were synthetic) or in terms of logic and meanings (if it were analytic). All other statements were barred. The class of illegitimate statements included "metaphysical" ones, e.g., 'God exists', and "emotive" ones, e.g., 'Abortion is morally wrong'. Quine (1962) attacked the analytic-synthetic distinction directly and convincingly. In the process, he paved the way for the kind of post-positivistic interpretation of science made prominent by Kuhn in which all scientific knowledge is seen to be theory-laden and not neatly divisible into the nwnaly observational (or synthetic) and the pwnaly theoretical (or analytic). 109 9. The much lamented gap between theory and practice is partially attributable to the vocabularies social researchers choose. Rorty (1982b) sets down two requirements for the vocabulary of social science: (1) It should contain descriptions of situations which facilitate their prediction and control. (2) It should contain descriptions which help one decide what to do (p. 197). Service to the first requirement in the name of science detracts from attention to the second, and this reverses priorities. Given the aim is to improve practice, the first order of business is to ensure that concepts employed are useful toward this end. That such concepts might be mundane, ansaiensifig, and value-laden is an objection only if the question of purposes is begged. 10. Terms such as 'knowledge' and 'belief' are "opaque". Unlike "transparent" terms, e.g., 'is greater than', opaque terms are not "well-behaved" in formal logical and mathematical languages. The use of opaque terms requires judgments of relevance, "similarity judgments", whereas transparent terms do not (Quine, 1969b). Related to this, Toulmin (1960) argues that physicists are free to invent vocabularies and a well-behaved mathematical system to suit their aim of investigating "the form of given regularities". By contrast, "natural historians" (which presumably includes social scientists) are saddled with a less technical and public vocabulary fitted to their aim of investigating "regularities of given forms" (p. 53). 11. Notably, this is consistent with Kuhn's remarks about social research (1961). Despite the manner in which his notion of a "paradigm" has caught fire among educational researchers and been used to contrast quantitative and qualitative methods, he excluded social science (let alone individual methods) on the grounds that it has never had an accepted core of theory required to constitute a "scientific paradigm" or to make for "scientific revolutions". 12. More specifically, Good calls this "Type II Rationality" or the "Principle of Non-Dogmatism". I don't claim that the principle is easily applied. 13. Kohlberg's theory construes moral development in terms of progression through successively more adequate "levels" (with two "stages" each) of cognitive moral judgment. The progression through levels and stages is held to exist across cultures. Although few individuals reach the highest stage, progress is putatively irreversible. See Lickona (1980, p. 106) for a description of the stages and levels in Kohlberg's theory. Also see Kohlberg (1982) for the latest 110 version of "level 3" which now has only one stage. 14. Lickona enumerates the following litany of criticisms, all of which he claims have merit: [Kohlberg] has been taken to task for centering too much on justice in defining the moral; for being culturally biased in his definition of morality; for being sex-biased in studying an exclusively male sample and for defining the stages in ways that emphasize "masculine" themes of rights and justice and neglect "feminine" themes of responsibility and love; fbr going from a description of what moral development is to a prescription of what it ought to be; for devaluing conventional morality; for overestimating the role of reasoning in moral functioning, and underestimating the role of other factors, such as affect, personality, habit, and expectations of consequences; for failing to take into account adequately the impact of the particular nature of the moral dilemma on the stage of moral reasoning elicited from a subject; for lack of sufficient validity and reliability of his research methodology; for having insufficient evidence for his two highest stages; and for failing to respond to his critics (1980, pp. 107—08). 15. The DIT (Defining Issues Test) is a paper-pencil version of several (6) of the moral dilemmas used by Kohlberg in his research. It was developed by one of Kohlberg's students, James Rest. See Rest (1979) for a description and thorough review of the research. CHAPTER IV EVALUATING MEDICAL ETHICS TEACHING The general goal of educational evaluation is reducing fallibility about questions of interest. Chapter II specified the questions of interest peculiar to medical ethics teaching, and Chapter III suggested considerations that go into reducing fallibility. This chapter employs three concrete examples of course evaluations to demonstrate how the abstract considerations of these earlier chapters may shape the actual practice of evaluation. The chapter aims to show that valid and credible evaluation of medical ethics teaching is possible and can serve both of the customary functions of evaluation-~improving practice and measuring achievement of goals. The chapter is divided into two major sections. The first specifies a "basic functional model" that is keyed to the special nature of courses (versus other modes of instruction). The model is grounded in the Wilson+1 goals and combines qualitative with quantitative research methods. The second section uses variants of the model to frame discussions of three course evaluations: a nursing ethics course, and two successive offerings of a medical ethics COUI‘SG. 112 E B . E l' J N I 1 E I] E ] l' E N 1' J 511' 9 Goals Revisited In Chapter II the goals of medical ethics teaching were developed by means of a four-step procedure: the ends of medical ethics teaching were described in the abstract, modified to accord with the typical constraints on educational practice, compared to expert opinion, and then assessed in terms of "controversies" and "misconceptions". Further refinement regarding practical constraints is required when courses are the particular object of evaluation. Given-the typical context for courses--one in which students meet in groups for discussions or lectures and do not encounter patients-~the direct goals, Knowledge, Reasoning, and Appreciation, are the focus; the indirect goals, Moral Regard, Empathy, Interpersonal Skills, and Courage, fade into the background. There are several reasons for this emphasis. Within courses, instructors (or evaluators) are unable to make very careful observations of the behavior of individual students and are altogether precluded from determining how students respond to patients. Accordingly, the indirect goals apply only in so far as they relate to the issue of how effectively and sincerely students engage in classroom activities. Although the indirect goals remain important in this special sense (since collegial discussion, 113 for example, is an important means by which ethical problems are addressed in the practical setting of hospital ethics conferences), the aim of enhancing students' ability to interact with and respond to patients in morally appropriate ways is, as a practical matter, secondary. On the other hand, the practical disadvantages of courses regarding indirect goals are counter balanced by advantages regarding direct goals. Courses provide an especially suitable context to emphasize the cognitive nature of medical ethics and instill an appreciation for it. They provide an economical and effective means to provide the necessary cognitive foundation that may be built upon and reinforced in other, less controlled contexts, which involve numerous other educational aims.1 In light of these considerations, the model and the examples of this chapter will heavily emphasize the direct goals. Only minor attention will be paid to the indirect goals and only in one of the three examples. Methods The general methodological problems are to (1) fit data collection techniques to the relevant goals and (2) develop research designs that have a reasonable chance of sufficiently reducing fallibility. Four means of data collection (tests, direct observation, student evaluation forms, and interviews), coupled with a "multiple indicators" design strategy (explained later), make up a suitable model 114 for most contexts. W 1 Tasts. It is customary to measure cognitive achievement with paper-pencil tests. Although the practice is not without problems (test wiseness, bias, teaching to the test, and invalidity, for instance, defeat the purposes of testing), using tests is often indicated both on intuitive grounds and because suitable alternatives do not exist. Testing in ethics is not fundamentally different from testing in anything else. It is an appropriate means of evaluating the cognitive aims of medical ethics teaching provided it avoids the customary problems and is appropriately keyed to the content and skills of interest (i.e., Knowledge and Reasoning respectively). 2 Qizgst_ghsazwatigh. Unstructured direct observation (structured observation is distinguished below) is a second method of data collection, and it, like tests, may also be used to trace progress relative to the goals of Knowledge and Reasoning (but in the context of discussions of ethical issues as opposed to performance on examinations). Unstructured direct observation is also useful for examining Appreciation and the indirect goals (such as collegial discussion as a variant of Interpersonal Skills). The primary advantage of direct observation is its capacity to detect what is unanticipated and therefore not amenable to pre-designed measures; its primary disadvantage is that it permits the idiosyncracies of observers to 115 determine the focus of data collection and to influence interpretation. In general, the more unstructured observation is, the more economical, unrestrictive, and limited in generalizability it will be; the more structured observation is, the more expensive, restrictive, and generalizable it will be. Trade-offs must be made which balance these considerations against the nature of the question and its fallibility. Structured observation (e.g., checklists of overt behavior) is indicated where there is a desire to narrow the focus in order to perform comparisons across large groups; unstructured observation (e.g., simply observing activities and taking notes) is indicated where the focus is broad or undetermined, and where the scope of the study is small. Relatively unstructured observation is usually indicated when medical ethics courses are the object of evaluation. 3 fiifldfifli.§1filflfl£iflfl.£9£m§. A third method of data collection is student evaluation forms. Used during or, more typically, at the end of a course, student evaluation forms are well suited for measuring attitudes. Strictly speaking, Appreciation cannot be measured directly by attitudinal measures because it has a cognitive dimension in addition to an attitudinal one. Student evaluation forms, however, provide the most convenient and direct information about Appreciation in practical educational contexts. Students are accustomed to filling them out and they are 116 almost universally employed in higher education to evaluate teaching. Although frequently criticized for measuring wants rather than needs, student evaluation forms provide a valuable source of information in light of the special commitment to "internal goods" that moral competence requires. Where such student evaluation forms provide for free responses (e.g., "liked best", "liked least"), the wants versus needs issue can be addressed directly because the reasons given for wanting or not wanting, liking or not liking, inform interpretation. For example, "The course made me think" is a kind of reason that would allow going beyond mere wants to infer needs, "The course was entertaining" would not sanction such an inference. 4 Intenxiews- A fourth method of data collection is interviews. These may range from highly structured protocols, bordering on questionnaires, to informal chats, perhaps prompted by something observed. Interviews, like direct observation, may be used to measure Knowledge, Reasoning, and attitudes. The same considerations regarding the degree of structure that apply to direct observation apply to interviews as well. Figure 3 keys the four data collection techniques discussed in this section to the effects they are intended to measure. 117 Knowledge Reasoning Attitudes tests x x observation x x x evaluation forms x interviews x x . x Figure 3. The Relationships Among Data Collection Techniques and Constructs in the Basic Functional Model Design Of the four data collection techniques described, only tests and student evaluation forms are easily amenable to quantitative analysis. Thus, the suggested model combines quantitative and qualitative methods. Given the arguments of Chapter III, there is nothing suspect or incoherent about such a combination; the only proviso is that the various means of data collection work together to provide useful information that enjoys reasonable support. This combination of data collection methods makes implicit use of the strategy of "multiple indicators"-- applying various kinds of data to the same research question(s). Intuitively, the use of multiple indicators helps protect against the peculiarities of given instruments or observers, and it helps strengthen inference (reduce fallibility) in the same way a second opinion or an additional laboratory test increases (or reduces) confidence in a medical diagnosis. The use of multiple indicators is 118 also supported on technical statistical grounds (e.g., Cronbach, 1982). Figure 3 illustrates a multiple indicators approach. Notice that both the Reasoning x tests and Reasoning x observation cells are occupied. The two kinds of data cross check one another and bolster inferences that may be drawn. For example, if students exhibit improved Reasoning on the basis of testing and observation or on the basis of neither, then the inference is stronger than it would be if only one kind of data were available. If students show improvement on one but not the other (e.g., the Chapter III example), the interpretation would be more unclear but the conclusions reached would be more trustworthy than they would be if which only one source of data was available. In addition to strengthening inferences, the use of multiple indicators also enhances the flexibility of evaluation. It makes little sense to persist in a pre-conceived design if, on the basis of information from other data sources, it seems likely that the design in question is headed down a blind alley, or if it is suspected that persisting for the sake of the design compromises the education of students. Heading down a blind alley occurs when an instrument, once in use, is judged invalid. An example of this kind was discussed previously. A pre-post design was employed to investigate cognitive effects in a medical ethics course. Information derived from alternative data sources, namely, 119 the testimony of students and faculty discussion leaders indicated that the aims of the testing would likely be thwarted because the test lacked face validity. In this situation, it would have been wise to have seriously considered aborting the testing; this would have freed time and resources to pursue more promising avenues. The second situation, compromising the education of students, occurs when the instrumentation is acceptable but the treatment (i.e., the course) needs changing in a way that would invalidate the pre-measures. An example of this kind arose in connection with an ethics write-up assignment for third-year medical students doing their clinical rotation in medicine. 0n the basis of feedback from students, it was evident that the written instructions for the assignment were inadequate and the kinds of changes that could be made to improve them were straightforward. The supervisor of the write-up assignment was reluctant to make the needed changes on the grounds that this would compromise comparisons between the performance of initial cohorts of students who had had no training in ethics in their first two years of medical school and later cohorts who would have had such training. Now, changing the exercise did preclude the desired pre-post comparisons, but the decision to proceed with the changes and compromise the design was justified on the grounds that one's first obligation is to provide students with the best education possible. Because the exercise was questionable, the information that could 120 have been obtained from the original design would be nearly worthless in any case. Preserving the design probably would have reduced fallibility, but not about a question of interest. In summary, the methods section described four means of data collection--tests, direct observation, Student evaluation forms, and interviews-~and how these four methods can be combined in a multiple indicators evaluation approach. A basic functional model for medical ethics courses was described in terms of how the four data collection techniques may be keyed to Knowledge, Reasoning, and attitudes. More specific research methods and designs will be discussed in terms the three examples in the next section. Three ngcse Egalgatigns This section applies the basic functional model to three course evaluations, with the modest aim of showing that medical ethics teaching can be evaluated in a way that meets the reasonable demands of evaluation researchers and ethics instructors. The discussion of each example is divided into six sections: (1) course description, (2) evaluation purposes, (3) evaluation design, (4) evaluation results, (5) evaluation impact, and (6) conclusion. The "course description" sections are straightforward: they describe the content, pedagogical methods, and settings of the examples. The "evaluation purposes" sections further set the stage of 121 the examples, and establish one criterion for the success of the evaluations, namely, achieving substantive evaluation purposes. The "evaluation design" and "evaluation results" sections emphasize meta evaluation issues, i.e., issues about the effectiveness of evaluation methods (versus substantive evaluation questions, such as the relative merit of the courses themselves). These two sections are the heart of each example, and are most directly linked to the functional evaluation approach that grows out of Chapters II and III. The "impact" sections report the utilization of results and relate substantive evaluation purposes to evaluation effectiveness. The concluding sections summarize what each example shows. Before turning to the examples themselves, two clarifications about the use of the basic functional model are in order. First, as stated in Chapter I, the perspectives on goals and research methodology defended in Chapters II and III did not dictate the conduct of the three course evaluations, especially not the first two. Instead, the goals and evaluation methods were developed concomitant with, and largely in response to, the concrete evaluations. The examples have been recast in a somewhat ahistorical way to facilitate exposition on the one hand and linkages to Chapters II and III on the other. The "evaluation design" sections in particular incorporate nQ§L_hQQ data analysis decisions that constituted reasonable uses of the data collected, but that were not always envisaged when the 122 evaluations were in fact designed. Second, the discussion of the examples does not aim to provide a blueprint for the evaluation of medical ethics teaching or to establish that some alternative (perhaps superior) ways of evaluating medical ethics teaching are out of the question. The basic functional model presupposes that various practical constraints are operative (e.g., that resources are not available to cross-check direct observation among multiple observers, to analyze verbatim transcripts, to record and analyze video tapes, and so forth). The discussions of the three course evaluations are designed to illustrate only how more specific limitations and contextual features than those already contained in the basic functional model can be accommodated by adapting the model. Example 1: Ethics in Nursing (1980) CQucsg Dgsgcijan "Ethics in Nursing" is an elective course offered to junior and senior nursing students by Michigan State's Philosophy Department. It is a two-credit course and spans a ten-week term, meeting once a week for two hours. Development of the course was funded by a National Endowment for the Humanities Grantz. The evaluation described here was conducted in 1980, the second time the course was offered. The written material for the course was Ethigs_in 123 Nunsing (Benjamin and Curtis, 1981). Most of the instruction focused on Socratic discussions of cases from the textbook; mini-lectures were used when appropriate. The course was team taught by a philosopher and a nurse. Students were evaluated on the basis of their performance on short essay responses to cases and on their Contributions to class discussions. W The evaluation had two primary motivations. First, an evaluation was required by the funding agent. Although the university provided such services upon request, one of the instructors had had previous experience with evaluators provided by the university and found their methods simplistic, trivializing, and atomistic (i.e., he viewed expert evaluation with the same general skepticism that is pervasive among ethics instructors, Caplan, 1980). In the instructor's estimation, the available evaluation experts had virtually no understanding of the nature and aims of ethics teaching. Thus, the instructors welcomed aid from a former philosophy graduate student, re-tooling in evaluation research. Second, the instructors desired to demonstrate the evaluability of this course (and similar ones). Despite their skepticism about the methods typically used in the evaluation of ethics teaching, the instructors earnestly believed that measurable effects, consistent with the course objectives, would result from the course; and they were willing to put this belief to an empirical test. They 124 simply required that the evaluation fit the nature and aims of the course. Related to the instructors' desire to demonstrate the evaluability of their course, the design of the Ethics in Nursing evaluation (described in the next section) was motivated by three meta evaluation questions: (1) Can measures of Knowledge, Reasoning, and Appreciation be developed? (2) Can the basic functional model for medical ethics courses (depicted in Figure 3) yield warranted conclusions? and (3) Can the empirical results obtained by employing the model affect the practice of medical ethics teaching? These three questions are related to the remainder of the Ethics in Nursing example in the following way. At an epistemological level, sensitivity of the design to instructional effects (in terms of the desired goals) is relevant both to whether Knowledge, Reasoning, and Appreciation can be measured and to the value of the basic functional model. At the practical level, the impact of evaluation findings is relevant to the usefulness of the model as a means of gathering information for criticizing and improving medical ethics teaching. E ] l' D . Data CQllegtiQn. Data for the Ethics in Nursing course were collected by the four means listed in the model of Figure 3, namely, observation, student evaluation forms, interviews, and tests. The course was observed eight of the 125 ten times it met, usually in the second of the two hours making up each meeting. Two student evaluation forms were used: the University's "Student Instructor Rating Form" (SIR), a two-part instrument with fixed-response items and a comments section, and a mailed fixed-response instrument designed especially for the course. Informal interviews of the instructors were periodically conducted. Finally, students were required to write an ungraded essay in the first and last meetings (The essay was an analysis of Margaret's dilemma, the example of Chapter II). Interviews of the instructors did not yield information pertinent to the present discussion. Figure 4 depicts the relationships among the constructs and the other three methods of data collection. Knowledge Reasoning Attitudes essay tests x x observation x x x student evaluation forms SIR x mailed x Figure 4. Relationships Among Data Collection Techniques and Constructs For Ethics in Nursing Evaluation Ihe_Design_and_its_ngig. For cognitive effects, the basic research design was a case-study, Observation- Treatment-Observation (O X 0). Resources were not available to arrange a comparison group, and even if there had been, the self-selected nature of the group who elected the Ethics in Nursing course would render a comparison group suspect. 126 Moreover, the course was sufficiently distinctive in the curriculum to make competing explanations for observed results implausible. For student attitudes, the basic research design was contingent on the means of data collection: Time-Series (O X 0 X 0) for direct observation and AfterLOnly (X 0) for student evaluation forms. Reasoning similar to that used to justify the design for cognitive effects justified foregoing control groups in investigating students' attitudes-~given the nature and setting of the course, competing explanations for observed effects would be implausible. Feasibility (primarily resource constraints) was also a consideration in the designs. These individual designs combined with the three kinds of data (testing, observation, and student evaluation forms) to form the following logical framework for interpreting the data. Data from essay testing and direct observation focused on measuring Knowledge and Reasoning, and created the foundation for a three-step analysis. (1) Statistical comparison of the pre-post essays would provide evidence for or against cognitive effects in terms of Knowledge and Reasoning and, at the same time, evidence for or against essay tests as a means of measuring Knowledge and Reasoning. (2) Depending on the results of the statistical analysis, close inspection of the essay tests in terms of specific criteria associated with Knowledge and Reasoning would lend 127 interpretability to the statistical analysis. If, for example, the statistical test was significant in the desired direction, documenting that the difference was explicable in terms of criteria associated with Knowledge and Reasoning would help ensure that the essays were valid measures. If the statistical test was not significant, close inspection of the essays might reveal that the statistical test was insensitive to effects that can be documented by alternative means. (3) Depending on the results of the two kinds of analysis of the essay testing, pre-post comparisons based on direct observations of student performance in discussions (also keyed to Knowledge and Reasoning) would support or count against the findings from testing. Direct observations of discussions and student evaluation forms were used to measure students' attitudes. As explained previously, these data collection techniques cannot directly measure the real target, Appreciation, because Appreciation includes both cognitive and attitudinal components. Thus, positive attitudes alone (not accompanied by evidence for cognitive effects) cannot be distinguished from mere wants. Nonetheless, because the logic of the Ethics in Nursing design incorporated independent means of investigating cognitive effects, accommodating Appreciation within the overall design was largely a matter of adding the measurement of student attitudes to the measurement of cognitive effects. 128 W: As stated previously, this section is keyed to the related questions of whether Knowledge, Reasoning, and Appreciation can be successfully measured and whether the basic functional model can yield warranted conclusions. These questions will be addressed first with respect to Knowledge and Reasoning and then with respect to Appreciation. Kngwlagga_ana_fleasgning. Primarily two kinds of data were used to explore Knowledge and Reasoning: essays and direct observations of discussions. Following the logical framework introduced earlier, these data were analyzed in a step-wise fashion. 8 First, the essays were shuffled and given to an advanced philosophy graduate student for blind grading, and a depend- ent t—test was used to investigate pre—post differences. The statistical test was significant (p < .001, df. = 28). This finding provided evidence that the essays were indeed sensitive to something. Second, to help ensure that this something was Knowledge and Reasoning, one student's pre- and post-tests (rated 1.0 and 3.0 respectively, on a four point scale with half-point gradations) were selected for further analysis. Regarding Knowledge, the student identified fewer and less pertinent issues in the pre-test than in the post-test. In addition, virtually none of the vocabulary of ethics was employed in the pre-test, whereas, in the post-test, notions such as 129 "parentalism", "autonomy", and "rights-based framework" were used to ferret out the issues and organize the discussion. Regarding Reasoning, the pre-test consisted of a string of assertions and a statement of a position. What little argument was present amounted to an uncritical appeal to conventional behavior. By contrast, in the post-test the student listed several alternatives (including the position taken in the pre-test), critically evaluated the alternatives, rejected them, and then defended an additional alternative against objections she anticipated. (See Howe, 1982, for the actual tests and a more thorough discussion of the differences between them.) In conjunction with the statistical test, the analysis of the paired pre-post essays supported the existence of the constructs Knowledge and Reasoning, as well as the use of essays as viable means of measurement. It also demonstrated the virtues of multiple indicators, since the statistical test by itself was not very rich in meaning with respect to the explicit course goals (a point made by the instructors that helped motivate the morefine-grained analysis) and was subject to multiple interpretations. Third, confirming (or disconfirming) evidence for Knowledge and Reasoning was sought in the observational data, and evidence supporting the constructs Knowledge and Reasoning (paralleling the evidence for these constructs in the pre-post essays) was detectable. The sensitivity of direct observation to Knowledge and Reasoning is __.._._...~-.. r; .c..'. -_ .. 130 demonstrated below by a comparison of the first and last meetings of the Ethics in Nursing course in which evidence for Knowledge and Reasoning was not and was present respectively. The existence of detectable differences between the two class meetings implies perforce that Knowledge and Reasoning are measurable via direct observation and, because the observational findings may be added to the evidence from essays, that the basic functional model yields warranted conclusions. When confronted with the pre-test (i.e., Margaret's dilemma) in the first session, students asked the following kinds of questions to "clarify" the assignment: "Isn't that in the physician's medical category?" "Isn't it allowable to say, 'You're seriously 111'?" "I've never been in a hospital [qua nurse]. What is done?" Compare these three questions to excerpts from a discussion which occurred in the eighth meeting concerning the case "Working Extra Hours" (Benjamin and Curtis, 1981, p. 122). The case involves the issue of the degree to which nurses and nursing students should volunteer their professional services. One nurse in particular has been volunteering a good deal of her time to the hospital in which she works, and this is resented by her co-workers. (Codes are indicated in brackets to facilitate subsequent discussion of this example: {1} = challenging proffered views, {2} : offering counter-examples, {3} = requesting clarifications, and {4} = defending one's position. Codings 131 follow the remark to which they apply.) Student 1: Do you think that others would care if you took more hours at a college situation? It seems to me that it's kind of different than a work situation. In what ways are they alike? (3} Nurse: Well, she is working these extra hours, and saying it's to learn these things so that she can be a better nurse. And I think she should be applauded for her efforts and encouraged in her efforts so that she can be... But I paid my dues. Why should I have to continue to do more? Would you give it away free to the hospital? I think that's the issue... Student 2: I don't want to be assigned to a job that's going to require certain hours of me. I just want to volunteer my time to something else. Does that mean that I am being less than profeseicnal? It's free--You are getting my professional services free. 2 Philosopher: What possible difference is there? Let's say you volunteer to the Hospice, say, here in town. Everyone in the Hospice group here in town is a volunteer. One of the things that leads to the friction is that she does this stuff as a volunteer and every one else is getting paid. That's the thing. Nurse: And when she leaves, then unless somebody else comes along and volunteers, the work isn't going to be done without a lot of specially trained people being pulled into this, giving of themselves for free. In some sense it becomes part of their job. Student 2: Could it be more justified then if she did go and ask for pay for this work? {3} Nurse: If she were paid for the job, then the other students would feel it was--if the job had money attached to it. A woman who had other obligations and could only work 40 hours a week could just as easily work those hours doing that kind of job. Once it becomes paid and part of the job description for the nurses on that unit, it can be held by more than just the person who is able to volunteer. So then it would be a problem among the administration and staff. Philosopher: Part of the problem here is what the codes say and all that. What are the limitations of your job or role as a nurse? Student 3: This reminds me--do you remember last week? Someone asked what would you do at a cocktail party if someone you know comes up to you and wants advice on something. She [the nurse] said, "Well, if somebody approached me looking at me as a nurse, then I would refer them to somebody else. I wouldn't give out free information." You [referring to the nurse] were basing it on economic growth and you wanted nurses to be recognized as professionals and that your advice is valuable, so you weren't willing to give that out for nothing. And I thought about that for 132 awhile ... It all comes down to how everybody perceives their role as a nurse differently, and I thought about what was said--and you are perceiving it as an 8 to 5 job. And I do not accept that. So for me, you know, for myself, I don't see it as an 8 to 5 job, and so therefore I don't see anything wrong with it if you want to take extra courses or work extra hours ...{4} With your view [the nurse's] you are restricted there if you want to obtain more knowledge. {1} Philosopher: The question is this: I want to know whether doing the things that you are talking about are things that you are permitted to do but don't have to, or, on your conception of nursing, these are things such that if you could do them and didn't, you would be neglecting you responsibility. Do you see what I mean? Nursing, after all, is doing things for people. Using your specialized knowledge and abilities without getting paid or anything like that-—Is that a gift or is it a responsibility? Student 3: I'm not in nursing for the money, so I can't base anything on economic reasons. I am not pushing for better pay... The thing too is that me being a nurse--that's part of you. You don't turn around and stop being a nurse. {4} I just got an awful lot of free advice from my doctor too, and, you know, it's very hard to see anything wrong with that. {2] Nurse: But he's already your provider. That relationship was already established. He didn't go across a crowded room and... Student 3: No, but on the other hand, while he is sitting there eating his hamburg he really didn't have to talk to me about my medical problems. {4} Philosopher: He already has that relationship with you. He is your physician. I really think you have to think about, you know, just where you're drawing lines on what. If you drive by and see an accident, are you going to stop? When people come up and identify you as nurses, when do you give it away? I mean there's a difference between emergencies and non-emergencies. Likewise, there is a difference between strangers and those with whom you already have a relationship. Student 4: Your position [the nurse's] to me seems to contradict what you said about selfedetermination and autonomy. {1} Maybe you can explain to me how this could be selfedetermination. {3] I think it's important to have self-determination... 133 The differences between the discussion in the initial meeting and the later one mirror the differences in the pre- post-tests. The vocabblary of the course is employed by the students, e.g., "autonomy" and "self-determination", in the later discussion, and there are also detectable changes in the way students go about the task of ethical inquiry. Rather than pumping the instructors for the answer, the students take an active role in a genuine Socratic exchange: they challenge proffered views, offer counter- examples, request clarifications, and defend (rather than assert) their positions (see codings accompanying the excerpt from the later meeting for specific illustrations). The use of appropriate vocabulary may be taken as a measure of Knowledge, and participation in the Socratic method may be taken as a measure of Reasoning (see Chapter II for the list of criteria associated with Reasoning). One may conclude that direct observation is sensitive to Knowledge and Reasoning. This finding may be combined with the findings from the two kinds of analyses of essays. Taken together, the results of the three kinds of analyses strongly support that the constructs Knowledge and Reasoning can be measured and, per force, that they exist. The results of the analyses also support the value of the basic functional model. In short, the three kinds of analyses combine to support the arguments of Chapter II that Knowledge and Reasoning are appropriate goals and evaluative criteria for medical ethics 134 teaching, and the arguments of Chapter III in defense of a functional evaluation approach. Aagzaaiatign. With good evidence for cognitive effects in hand, a start was already underway on the question of Appreciation. That is, because Appreciation has both cognitive and attitudinal dimensions and the'former had already been established, positive attitudes aboutthe course would support the existence of the construct Appreciation and that it is measurable. On the other hand, Appreciation is a more useful construct if it can be inferred in the absence of independent evidence for cognitive effects. One cannot be assured ahead of time that cognitive effects will be demonstrated. Investigations of cognitive effects typically demand significantly more resources than measurements of attitudes and may be otherwise unworkable in many contexts. If evidence for Appreciation is relatively self—standing, then the case for Appreciation (versus wants or likes) is that much stronger when such self—standing evidence is combined with independent evidence for cognitive effects. In the Ethics in Nursing study, the measurement of Appreciation was distinguished as much as possible from the measurement of cognitive effects. As described in Chapter II, Appreciation of medical ethics involves a positive attitude about the right objects (e.g., the value of cognitive ethical inquiry). Consistent with this focus, the mailed student evaluation form was constructed to emphasize 135 issues of particular relevance to medical ethics teaching. The analysis of free responses on the SIR forms also emphasized issues of particular relevance to medical ethics. Both kinds of student evaluation data were augmented by data from direct observations. Selected results from the mailed evaluation form are displayed in Table 1. (The response options for the items were: strongly agree, agree, neither agree nor disagree, disagree, and strongly disagree, and the values 5, 4, 3, 2, and 1 respectively were assigned to these responses.) Table 1. Selected Results From Ethics in Nursing Course Mailed Evaluation Form (Response rate : 23 of 32 or 72%) Statement Mean Range 1. The course improved my ability to see the complexity 4.7 4-5 of moral problems which nurses face. 2. The course helped me develop a framework or basis 4.0 3-5 for my moral positions. 3. The course helped me see the importance of giving 4.5 4-5 reasons and careful arguments for my moral beliefs and decisions. 4. The topics discussed and the written assignments 4.7 4-5 were relevant to nursing. 5. The cases discussed were realistic enough to bring 4.6 3-5 out the emotional side of moral problems. 6. I feel more confident about recognizing and dealing 4.3 3-5 with moral problems because I have had this course. 7. All medical professionals should study moral problems 4.7 3-5 in medicine in a similar way. 8. Multiple-choice or true-false exams would have been a 1.4 1-3 better way of grading than essays. 136 Items 1-3 are most relevant to the cognitive dimension of Appreciation; items 4-7 are most relevant to Appreciation's attitudinal dimension. The relatively high means on items 1-7 support Appreciation as a measurable effect of the course. The responses support Appreciation rather than mere wants or likes because the items are keyed to the "right objects" (i.e., the value of cognitive ethical inquiry and its relevance). Although the halo effect may well be at work regarding items 1—7, the mean and range for item 8 eliminates the hypothesis that students settled on the upper end of the response scale and, having made this decision, unthinkingly responded to the remainder of the items. The free responses on the SIR forms provide collateral evidence for Appreciation as a measurable construct (response rate : 32 of 32 or 100%). In this case, students zeroed in on the "right object" with no prompting from pre-formulated items. For example, 15 of 32 respondents (47%) specifically mentioned the relevance of the course. Their comments included: What I liked most about this course was that it was in direct relation to my field... This class was most interesting as we as nurses are going to encounter these dilemmas. I have really gained a heightened sense of awareness of ethical considerations of situations I face daily at work. Paralleling these general comments, 13 (41%) praised the topics. Two examples of these are: .._—_.---. ... . ___. .- 137 One thing I liked most was that many topics were brought up that are not brought up in other classes. This has a more realistic approach to it--these are things that can and do happen... I felt the topics reached into many different facets of the profession. This exposure has helped me to formulate ideas that I never knew about. Finally, 6 (19%) volunteered that the course was important enough to be required of all nursing students. One student wrote, "This class was just great and I wish so much that it was required. It's insane that it's not." The third kind of data supporting Appreciation as a measurable effect was direct observation of discussions. Like measurement of cognitive effects, direct observation of Appreciation is an independent criterion in the sense that it does not depend on the perceptions of students. Direct observation of the discussions indicated that student interest in the discussions was generally high. In the half-session in which "Working Extra Hours" was discussed, for instance, 28 of the 32 students enrolled were present and, of these, 22 participated directly in discussion. Other indications of Appreciation were observed attentiveness and interest during discussion and the tendency of the students to talk among themselves and with the instructors before and after the formal meeting time and during the breaks between the half-sessions. The three kinds of data--the mailed evaluation forms, SIR forms, and direct observation--provide independent evidence for Appreciation as a measurable outcome of the Ethics in Nursing course. Independent evidence for 138 cognitive effects from the essays and observations further supports the existence of the construct Appreciation and its measurability. W The previous section on results provided affirmative answers (amplified below in the concluding section) to two of the meta evaluation issues used to frame the discussion of the Ethics in Nursing example: (1) Knowledge, Reasoning, and Appreciation can be measured and (2) can work together within a functional evaluation approach to yield warranted conclusions. This section briefly addresses the third meta evaluation question, namely, whether empirical results obtained from the application of the functional evaluation approach can affect practice. Evaluation research in general is plagued by laments about poor utilization of results. Chapter III argued that two of the causes of this problem are the use of inappropriate vocabulary (i.e., the vocabulary of moral psychology) and an associated over-emphasis on methodological rigor (i.e., "scientistic" evaluation) that cause the "questions of interest" for medical ethics teaching to be lost. Hence, it is germane that the evaluation methods employed in the nursing ethics course generated results that were interesting and useful to relevant audiences. The most immediate impact of the evaluation was that it convinced the instructors of the nursing ethics course that 139 an effective course had been developed; the course was taught in subsequent years in virtually the same way. The evaluation also had a more diffuse impact. Its results were communicated to broader audiences of individuals interested in ethics teaching. One version was directed at an audience of philosophy teachers (Howe, 1982), and was designed to show that ethics teaching is evaluable in terms of methods and criteria acceptable and familiar to ethics instructors. Other versions were presented in conference and workshop formats to a more limited audience of those interested in nursing ethics in particular. These versions were designed to promote the design and content of the course and to suggest means of initiating and evaluating similar courses. anclusion The Ethics in Nursing example entertained three primary meta evaluation questions: (1) Can the goals of Knowledge, Reasoning, and Appreciation be measured? (2) Can the basic functional model for medical ethics courses yield warranted conclusions? and (3) Can the empirical results obtained by employing the model affect the practice of medical ethics teaching? The answer to all three questions is "yes". (1) Knowledge and Reasoning were measured using essays and observation. The statistically significant difference between the pre- and post-test scores provided one piece of evidence that Knowledge and Reasoning are valid and are measurable. Against the scientistic fact-value distinction, the grader distinguished pre- from post-tests blindly, in 140 terms of cognitive criteria such as the use of appropriate vocabulary and quality of argumentation. This speaks for objective cognitive standards by itself and was further supported by multiple indicators: the analysis of one pair of essays in terms of Knowledge and Reasoning and the analysis of observational data supported the'same conclusion. One may conclude that (contrary to scientism) recognizable (objective) standards for detecting ethical Knowledge and for distinguishing better from poorer ethical Reasoning exist and are measurable. Appreciation was measured using direct observation and student evaluation forms. The observational data provided evidence for the construct Appreciation in terms of criteria like interest, participation, and regular attendance. The student evaluation forms provided more focused evidence for Appreciation, and further support for Appreciation (versus mere wanting or liking) followed from the independent essay testing evidence that indicated cognitive effects concomitant with the positive attitudes exhibited in the student evaluation forms. (2) The very same evidence and arguments that support the measurability of Knowledge, Reasoning, and Appreciation support the warrant of the conclusions drawn from the Ethics in Nursing Example. That is, evidence that the various means of data collection-~essays, observations, and student evaluation forms-~are sensitive to Knowledge, Reasoning, and Appreciation is also evidence that those three things exist, 141 to be measured. (3) Against the scientistic demand for psychologically grounded criteria and measures, the customary reason for this demand--that it is the only way to generate objective, L-fallible results--had no force. In addition to the problems discussed in Chapter II (that scientistic criteria hypostatize Reasoning) and Chapter III (that scientistic criteria and methods are invalid, impracticable, or both) a method that employed non-scientistic criteria (Knowledge, Reasoning, and Appreciation) and measures (essay tests, questionnaires, and observations) generated results sufficiently L-fallible to be credible to the relevant audiences. Example 2: Focal Problems (1982) Canse Desgcjptign Focal Problems is a three-course sequence required of all second-year medical students enrolled in the Track I3 curricular option at Michigan State's College of Human Medicine. In fall 1981, support from the National Fund for Medical Education and the National Endowment for the Humanities made it possible to revise the courses to incorporate decision analysis and ethics. Each of the courses meets for a ten-week term. The format combines one hour of lecture per week and two hours of small group discussion. The discussion groups consist of from 8 to 12 students and are each staffed by two 142 preceptors, one physician and one faculty member from the behavioral sciences or the humanities. Discussion in the groups focuses on applying the material presented in lecture to paper-cases designed to approximate medical problem-solving situations. The first two terms emphasize decision analysis (see Howe et al., 1984, for a fuller description); the third emphasizes ethics and is the concern here. Four cases, chosen to coincide with students' course work in basic science and to create a logical progression of issues, comprised the ethics course: (1) a four-year old rendered brain dead by smoke inhalation, (2) Karen Quinlan, who though not legally dead, exists in a "persistent vegetative state", (3) an apparently competent burn victim who wishes to be allowed to die, and (4) an ill, possibly demented elderly woman who is a "social admission" to a hospital. The bio-ethical issues raised in the cases included the following: the distinctions between brain death, poor prognosis, and persistent vegetative state; how courts impinge on bio-ethical issues; refusal of medical treatment by competent versus incompetent patients; the conflict between physician paternalism and patient autonomy; the conditions under which patients may be declared incompetent and the relationship between medical criteria and moral considerations involved in establishing such conditions; and access to medical care for the indigent in light of costs and aims of medical care. 143 Lectures keyed to the cases were given by philosophers from Michigan State University's Medical Humanities Program (MHP). Except for one MHP philosopher who volunteered, preceptors for the small groups were assigned solely on the basis of availability. Students were evaluated on the basis of two midterm examinations and a final. Each exam was composed exclusively of fixed-response items, the customary examination format in the college. Faculty development consisted of two two-hour seminars composed of a general introduction to the nature and aims of medical ethics teaching and practice in discussing the cases. Preceptors were also provided with "leader's guides" to assist them in conducting the discussion groups. E J l' E The evaluation of this course, like the evaluation of the ethics in nursing course, aimed to generate findings useful for informing improvements. Various features of the Focal Problems course, however, limited the degree to which the methods and findings of the ethics in nursing evaluation were applicable. It was required versus elective, it was part of medical school versus undergraduate curriculum, and it was staffed in large part by inexperienced conscripts versus enthusiastic devotees. Added to this, the course was was part of a larger project, "Ethics in the Core Curriculum",u involving Michigan State University's two colleges of medicine and its college of nursing. The project's aim was to integrate ethical issues 144 into existing educational activities wherever feasible, employing philosophers as curriculum developers but regular faculty as teachers. The Focal Problems course approximated this general approach and thus provided a relatively controlled and circumscribed context to test the integration model, from which results could be generated quickly and applied as appropriate to the more diffuse and inchoate activities of the project. Evaluation was also intended to assess the view that integration using regular clinical faculty is the way to demonstrate to health professions students the relevance and value of serious study of medical ethics. Given the pivotal position of the clinical faculty--linchpins between the material and the students--the evaluation of the Focal Problems course was two-tiered. One substantive purpose of the evaluation was assessing the teaching performance of the non-philosopher faculty, especially physicians, since their acumen and support (Knowledge, Reasoning, and Appreciation) was relevant to the warrant of the integration model and constituted in large measure the execution of the course. In the process of investigating substantive questions about the Focal Problems course and the integration model, three meta evaluation questions were also pursued: (1) Can fixed-response tests supplant essay tests as a measure of Knowledge and Reasoning? (2) Can the basic functional model defended in this chapter detect evidence for indirect goals? and (3) Can the model be adapted to the medical school 145 context and yield useful results? Consistent with the emphasis in this chapter on meta evaluation issues, the remainder of the discussion*of the Focal Problems example will focus on these three meta evaluation questions. The subsequent "evaluation design" and "evaluation results" sections focuses on framing and answering the first two questions, the "evaluation impact" section answers the third. The concluding section reviews the major meta evaluation lessons this example provides. Wu Data were collected with two evaluation forms, semi-structured interviews with the preceptors, direct observation of lectures and discussion groups, and a fixed-response cognitive performance test. These data collection techniques were keyed to Knowledge, Reasoning, and Attitudes in ways that should be familiar to the reader by this time. Observation and interviews were applied to the "indirect goals", in addition to their uses in the previous example. Also, interviews of preceptors were Knowledge Reasoning Attitudes Indirect Goals fixed-response tests x x observation x x x x evaluation forms mid-term x end-of-term x interviews x x x x Figure 5. The Relationships Among Data Collection Techniques and Constructs for Focal Problems (1982) 146 to assess students' Knowledge and Reasoning. Figure 5 summarizes the various purposes of the data collection methods used in the Focal Problems evaluation design. Observations had both students and preceptors as targets. As intimated above, the preceptors themselves were one group at which the curriculum was aimed, i.e., they had to be brought up to speed in Knowledge, Reasoning, and Appreciation before they could be expected to impart these to students. Resources did not permit earlier and later observations of the small group discussions; an investigation of progress within a discussion group (on the model of the nursing course) was thus not possible. Each small group was observed only once. An 0 X 0 design was again employed for the pre-post cognitive testing. Unlike the nursing course, however, a fixed-response examination was used rather than short essays. Several considerations led to this decision. First, medical students are tested exclusively by this means in the remainder of the curriculum, and the course designers desired that the ethics course be as similar as possible in format to other course work. In addition to serving as a measure of entering skill and knowledge, the cognitive test also introduced students to the nature of the testing that would be employed in the course. Although fixed-response examination formats were the norm, the examinations designed for the Focal Problems ethics course required a good deal more by way of inferences (versus recognition or recall) 147 than students were accustomed to because they were designed to be sensitive to Reasoning as well as Knowledge. Second, given the multiple small group format (6 groups in all), the logistics of grading essays posed problems. Third, it was doubted that preceptors had the skills necessary to grade essays reliably. Unreliable grading creates serious problems by itself, and in the context of medical school, where ethics is often viewed as "soft" and "a matter of opinion", unreliable grading is taken as evidence that these claims are true. Interviews of instructors loomed large by contrast to the ethics in nursing course. The interviews were designed to assess the performance and receptiveness of preceptors, gather formative information based on their perspective, and obtain their estimation of students' Knowledge, Reasoning, and Appreciation. Exalnatinn_flefinlifi The discussion of results is divided into sections on Appreciation, Knowledge and Reasoning, and indirect goals. The section on Appreciation is not directly applied to the three meta evaluation questions that frame this example. Instead, the discussion of Appreciation reinforces the argument of the nursing ethics example that Appreciation is measurable and illustrates how Appreciation may be extended to medical school faculty. These two issues are discussed with an eye toward the subsequent "impact" section of this example (Focal Problems, 1982) and an eye toward the 148 evaluation design in the next example (Focal Problems, 1983). The section on Knowledge and Reasoning emphasizes the meta evaluation question of whether fixed-response tests can supplant essay tests as a measure of Knowledge and Reasoning. The section also illustrates the value of combining other indicators of Knowledge and Reasoning, such as observations and interviews, with tests. Knowledge and Reasoning, like Appreciation, are also extended to medical school faculty. The section on indirect goals is keyed to the meta evaluation question of whether the variant of the basic functional model employed in Focal Problems can be sensitive to indirect goals. Annnaaiatign. The course was positively received by the preceptors. Of the 13 who participated, 11 were interviewed (one was unavailable and the other was an MHP philosopher). Each of the respondents believed the course was highly valuable and that such experiences are important for an adequate medical education. Although two had some reservations about the course being the best approach, all expressed an interest in teaching it again. These findings are germane to Appreciation. Below are two excerpted comments that illustrate how preceptors' praise focuses on the "right object" (i.e., the value of cognitive ethical inquiry). Psychiatrist: It's kind of reassuring to know these things are being discussed versus merely assimilated as they were when I was 149 in medical school. Not that we practiced unethically, but we freelanced and didn't think it through and discuss it as much as we should have. Internist: Medicine is not certain. There's an open gate. I've got this 25 year old girl with an uncertain diagnosis but a suspected malignancy. What do you tell her? Ethics is no different from any other area of medicine. It isn't always clear what to do. You have to use clinical judgment, the circumstances, literature, precedent, etc. The materials for the course were rated good, very good, or excellent by 10 of the eleven preceptors; 1 rated them average. The "leaders' guides" were judged extremely helpful and complete, but a number of preceptors urged that the practice of distributing them to the students be changed (this urging is relevant to the later discussion of preceptors' Knowledge and Reasoning). Results of an evaluation form distributed at the end of the term showed that the students also received the course well. Seventy percent of the students said that it is "quite important" for physicians to be skilled in dealing with ethical problems, and all the rest called it "important". Over three quarters of the respondents believed the course would help them deal with such problems in the future. These positive attitudes apply to the "right objects"--namely, the relevance and value of careful study of medical ethics--and thus support Appreciation as opposed to mere wanting or liking. On the negative side, some made the familiar claim that ethics is "unteachable". A significant minority (6) criticized what they believed were biased materials, lectures, and tests. A few (3) believed themselves no match 150 for the lecturers and called for readings and positions representing conservative and religiously grounded viewpoints. Testing received the greatest criticism. Nearly half of the students thought the exams were too difficult, and 17% believed they were otherwise inappropriate or irrelevant. An evaluation form distributed near the middle of the term foreshadowed these results: the students generally viewed the course as worthwhile but grumbled about the tests. Moreover, preceptors testified to this state of affairs in the interviews and it was also evident from observations. Despite the acrimony over testing, however, students were genuinely interested in and engaged by the course, both in discussion and lecture. In summary, at the level of meta evaluation, the construct Appreciation could be applied to both students and preceptors. At the substantive level of evaluation findings, preceptors exhibited a somewhat higher level of Appreciation (or at least approval) than students. Knnwlanga_ann_fiaasgning. The Focal Problems course showed that interview data could be applied to the constructs of Knowledge and Reasoning. The reasons that were given by preceptors for withholding the "leaders' guides" from students are relevant to preceptors' levels of Knowledge and Reasoning; they gave two reasons: (1) Providing students with the guides encourages them to "look up the answers" and thus inhibits careful thought and 151 reflection. (2) It disarms preceptors-~one remarked, "Brody and Miller [the curriculum designers] have exhausted the issues at our level of sophistication in the leader's guides". These concerns suggest that the preceptors lacked the necessary skills (loosely, Knowledge and Reasoning) to perform optimally. The philosopher-preceptor had no similar misgivings about distributing the "leaders' guides" to students. Other remarks made within the interviews added further evidence about preceptors' ability to perform within the discussion groups. Most preceptors reported that they sometimes groped in discussion group and ran out of things to say-~problems they believed philosophers would not encounter. One reported he sometimes didn't know the right "philosophers' moves"; another, that the discussions were sometimes a bit too "touchy feely"; and, a third, that "the discussions were sometimes hard to keep on track". Direct observation confirmed that these difficulties did arise in all of the discussion groups except the one staffed with an MHP philosopher. This group was also rated higher than the others on student evaluation forms, as were the lectures given by philosophers. Students' Knowledge and Reasoning was examined in three ways: a pre-post fixed-response cognitive test, observations of discussions, and preceptors' opinions elicited in interviews. Evidence regarding students' Knowledge and Reasoning was unclear. On the one hand, testimony from the 152 preceptors and observations indicated that students did gain in terms of use of the appropriate vocabulary, use of distinctions, consideration of alternatives and objections, and so forth, much as the nursing students had done in the previous example. On the other hand, and unlike the nursing example, progress was not evidenced on the basis of the pre-post testing. On the contrary, a dependent t-test was significant in the wrong direction (p < .01, t = -2.49, df : 56). Rather than mutually supporting Knowledge and Reasoning, as the multiple indicators (observation and essay testing) did in the nursing evaluation, the multiple indicators (observation, interviews, and fixed-response testing) conflicted in the Focal Problems evaluation. It is highly unlikely that (as the statistical test suggests) students actually became worse in terms of Knowledge and Reasoning: such a conclusion is implausible on its face and is contradicted by the evidence from observations and interviews. These contradictory findings about Knowledge and Reasoning illustrate the value of multiple indicators. Without the data from observation and interviews, the testing results would be less easily dismissed. As one consultant to the "Ethics in the Core Curriculum" project (namely, Michael Scriven) remarked, "The multiple indicators saved [the evaluation]". The mixed results about students' Knowledge and Reasoning led to a negative conclusion regarding the meta evaluation question of whether Knowledge and Reasoning 153 should be measured with fixed-response tests. The attempt to measure Knowledge and Reasoning with a fixed-response test in the Focal Problems evaluation failed, and, although the test had virtues (described below), these were outweighed by several disadvantages that probably cannot be overcome in most educational contexts. The exam consisted of 33 items and required less than a half hour to complete, which rendered its .785 reliability (KR-20) quite respectable. It was designed to be more sensitive to particular content and experience than measures such as James Rest's DIT, for instance, and data suggested it possessed the desired characteristic (i.e., construct validity). When given to groups of philosophy undergraduates, medical students, preceptors, and philosophy professors, it ranked them in the predicted order: philosophy professors scored highest, next came preceptors, then medical students, and finally undergraduates (means equalled 29.75, 24.85, 24.08, and 21.6 respectively). Finally, the negligible difference between medical students and preceptors (24.08 versus 24.85) is consistent with the difficulty preceptors experienced in leading the students in discussion. (See Jones and Howe, 1983, for a more thorough discussion of the instrument.) Despite these virtues, the test proved inappropriate for the circumstances. Indeed, it suffered from some of the same shortcomings considered in connection with Kohlbergian measures in Chapter III. It lacked face validity: students —.'_':L-‘- .. ...... u-.. ‘5‘ 154 described it variously as "brutal", an "IQ test", a "logic test", and a "reasoning test". The exams used throughout the term and approximating the pre-post instrument were similarly described and were, accordingly, ineffective for feedback. In addition, the descriptions suggest that the exams were too formal and hence unlikely to detect effects which could be anticipated to result from a ten-week course. This judgment was reinforced by the observations of some of the preceptors as well as an outside consultant. The testing experience in Focal Problems provides some important practical lessons that count for functional and against scientistic evaluation. It is conceivable (indeed, quite likely) that the exclusive fixed-response testing strategy could have been preserved in the name of rigorous evaluation. Three important practical considerations, however, argue against doing this. First, fixed-response exams require considerable time and technical skill to develop, especially when they are designed to measure reasoning skills. Because they require so much time and effort, it is wise to keep such exams "secure" (i.e., to not return them to students), a practice that compromises effective feedback to students on their performance. Second, the results that are generated from such fixed response exams are of questionable value. Because such tests lack face validity, just what to make of results, even if positive, is unclear to those who are interested but who question the relevance of such tests. Third, such exams can 155 be obstructive. Using them in the name of rigorous evaluation findings can compromise the effectiveness of the program itself (as it did in the Focal Problems course by prompting considerable criticism and ill-will). lndinast_§gals. The second meta evaluation question entertained in this example is whether the design employed can be sensitive to the indirect goals of medical ethics teaching. The observations of the small groups and interviews of preceptors suggested that behaviors associated with indirect goals-~goals of the "process" of medical ethics teaching-~were evident within the Focal Problems discussion groups. Although the evaluation was in fact conducted in terms of criteria such as emotional involvement, interest, seriousness of purpose, and participation, the results may be recast in terms of Moral Regard, Empathy, Interpersonal Skills, and Courage. In general, observations and reports from the preceptors within interviews indicated that students were stimulated to care (show Moral Regard) and to identify (Empathize) with the patients discussed in the cases. Students also showed the ability to interact with one another about the issues raised by the cases (Interpersonal Skills) and were willing to express their personal views, even if unpopular (Courage). Two specific examples support these general impressions about indirect goals. The first involves "enhancing Interpersonal Skills" and "promoting Courage", and is based 156 on interview data. One preceptor observed that the importance of the issues to students as future physicians plus the difficulty of resolving them (students couldn't "look up the answers") required students to confront one another about disagreements--something they were not accustommed to doing. Initially, they became confused, frustrated, and, at times, quarrelsome. The preceptor remarked, Maybe this is not fun for the students. They're going to have to face this. It's frightening for them...if they don't get it, they become so confused and upset they don't even know what to ask for after awhile...Sometimes the discussions generated helplessness, depression, and anger and stimulated some unfair ways of arguing. Sometimes they were blocked by students' emotional makeups--for example, students with a fundamentalist background. Sometimes civility just went down the tubes. The preceptor in question went on to claim that students became better at "civility" as the term progressed. Now, these data may be related to the goals of "enhancing Interpersonal Skills" and "promoting Courage" in the following way. Courage is required to express an unpopular view. In so far as the discussion groups created the context in which unpopular views are brought to the surface, and in so far as the context encourages the expression of such views, Courage is "promoted". At the same time, because agreement (or compromise) must be hammered out in this context of competing views, Interpersonal Skills are "enhanced". The second example involves the goals of "stimulating Moral Regard" and "eliciting Empathy", and results from observation and an informal interview with a preceptor. The . ___.__ I“. _, .. .. 157 setting was the viewing of a videotape that provides a particularly graphic depiction of the disfigurement and painfulness of therapy for severe burn patients. The primary ethical question at issue was whether the patient's request (expressed in the videotape) to forego treatment and to be allowed to die should be honored. Following the showing of the videotape, students spent about half an hour discussing therapy for burns. One student, who had worked for several years as an orderly in a burn center, reported to the group about how therapeutic techniques had progressed since the videotape was made (1975), how inept the attendants were, how outdated the facilities were, and so on. Finally, one student asked whether the prognosis for severely burned patients was really much better at present and whether therapy was any less painful. This question, to which a satisfactory answer was not proffered, turned the discussion toward the difficult cognitive issues raised by the case, for example, the conditions under which patients may be judged incompetent to make treatment decisions and the conditions under which medical paternalism is justified. The portion of the discussion most relevant to Moral Regard and Empathy was the first half-hour. The outside observer interpreted the discussion of the present state of burn therapy as a lack interest by the students in the ethical issues, and a preference for the technical, purely medical side of the problem. By contrast, one of the 158 preceptors, a psychiatrist, provided an interpretation that fit the data better. He argued that the students did indeed care about the patient (and future patients) and identified with his ordeal (i.e., they exhibited Moral Regard and Empathy). According to him, the first half-hour of the discussion did not indicate a lack of intereSt, but instead indicated students' hope that improved technology and therapeutic techniques may have eliminated the problems the videotape raised. He pointed out that once students realized that similar problems still must be faced, and once they had "cooled off" from the emotions prompted by the videotape (and had overcome "avoidance"), they turned to a pointed discussion of the issues raised. The above two examples illustrate how the variant of the basic functional model used in the Focal Problems course can be sensitive to indirect goals. In hindsight, a much more systematic examination of the indirect goals would have been possible. It is noteworthy, however, that the flexible and open-ended natures of unstructured observation and interviews (and thus the evaluation design as a whole) enabled the data to be analyzed and conclusions drawn post hoc. It was not necessary to plan the analyses in advance. W The third meta evaluation question of this example is whether the basic functional model can be adapted to the medical school context and yield useful information. An important consideration in answering this question is the ‘nd‘ .- ' 159 impact of the substantive evaluation findings. As stated before, the Focal Problems course was a central proving ground in the initial stages of the "Ethics in the Core Curriculum" project; the evaluation was primarily formative. Despite the problems identified, problems potentially more damaging to the codrse and the project as a whole did not arise. Specifically, neither faculty nor students viewed ethics as mere window dressing. On the contrary, both groups were engaged by the issues and impressed with their importance and relevance. Moreover, the problem of subjectivism was not significant; it arose only obliquely in the claim made by some students that they should not be tested whatsoever in ethics. The evaluation of the Focal Problems course thus provided evidence early on that the project could proceed on the assumption that ethics would be well-received and not woefully misunderstood. The evaluation also provided information useful for improving the Focal Problems course itself. Two general problems had been identified: testing and skill of preceptors in leading discussions. These findings motivated the project staff to make several changes. Five rather than two faculty development seminars were planned for the following year and were to span the preceding term as well as the term in which the course was taught. The aim was to increase the amount of training given to the preceptors and to monitor their progress. 160 Also, the participation of preceptors who gained experience the first time the course was taught was solicited. Other changes included matching experienced preceptors with inexperienced ones, withholding the leaders' guides from students, and making some revisions in the case materials. Changes in testing methods were also made, and these will be considered in the next example. I Conclusion Three meta evaluation questions were the focus of the Focal Problems evaluation: (1) Can fixed-response tests supplant essay tests as a measure of Knowledge and Reasoning? (2) Can the variant of the basic functional model used to evaluate the Focal Problems course detect evidence for indirect goals? and (3) Can the model be adapted to the medical school context and yield useful results? The answers to each of these questions may now be summarized. (1) Testing exclusively with fixed-response examinations is ill-advised. Although construct valid fixed-response tests of Knowledge and Reasoning can probably be devised, other considerations argue against their use. Like Kohlbergian measures, such exams are rarely useful for providing feedback to students and thus have little pedagogical value. Also, because they lack face validity, such exams are uninterpretable to students and to faculty responsible for teaching, and can even engender hostility. Finally, the resources required to develop defensible fixed—response measures of Reasoning are significant. 161 Overall, the possible advantages of fixed-response tests of Reasoning for reducing fallibility are outweighed by the practical disadvantages of such tests. (2) Unstructured observations and interviews provided evidence regarding the indirect goals. Although the evidence was somewhat thin and the analysis somewhat unsystematic, two provisional conclusions may be drawn. First, the basic functional model, in virtue of incorporating observation and interviews, can detect evidence of the indirect goals at the same time it detects evidence of the direct goals. This renders the model both functional and efficient. Second, goals like "stimulating Moral Regard", "eliciting Empathy", "enhancing Interpersonal Skills", and "promoting Courage" are (or can be) a part of medical ethics teaching, and this provides additional support for the arguments in Chapter II about the appropriateness and role of the indirect goals. (3) The variant of the basic functional model used in the Focal Problems course produced findings regarding Knowledge, Reasoning, and Appreciation that were less positive and more uncertain than those of the ethics in nursing course. The testing results in particular were disappointing-~the failure to demonstrate cognitive effects left the door open for the scientistic charge that ethics teaching and its evaluation are "soft", and that the latter consists merely in collecting "happy data". On the other hand, the general design of the Focal 162 Problems evaluation generated results (in terms of its "soft" qualitative indicators) that were credible to the "Ethics in the Core Curriculum" program staff and to external consultants. The results of the evaluation (a) warranted the conclusion (in light of the difficulties experienced by preceptors) that ethics teaching was more difficult and specialized than originally believed; (b) stimulated changes in the Focal Problems course itself (described above); (c) combined with other attitudinal data collected regarding the "Ethics in the Core Curriculum" project to reduce apprehension about possible resistance to the introduction of ethics into nursing and medical education; and (d) provided the first data regarding responses to (versus predispositions toward) required ethics curricula in these contexts. In short, the results of the Focal Problems evaluation had a significant formative impact on the conduct of the "Ethics in the Core Curriculum" project. The functional evaluation approach employed in the Focal Problems course thus achieved its major substantive purpose. Example 3: Focal Problems (1983) W The 1983 Focal Problems course had the same basic format as the 1982 version: one hour of lecture per week in joint meetings combined with two-hour small group discussions led by preceptor pairs. The 1983 course differed from the 1982 163 one in terms of the previously mentioned changes in faculty development and staffing motivated by the 1982 evaluation. Changes in course materials were also motivated by the 1982 evaluation results: the senile dementia case was deleted, the Karen Quinlan case was pared back, and two new cases were added. The two new cases involved a suicidal multiple sclerosis patient and a terminally ill leukemia patient under consideration for a research protocol. These cases were less well integrated with the remainder of the curriculum than the one deleted and placed less stress on biomedical knowledge, both of which were also justified on the basis of preceptor and student opinion from the previous year. W The evaluation of the 1983 course continued with the substantive purpose of improvement which had been central in 1982. Regarding the Focal Problems course, it was necessary to investigate the success of the changes which had been instituted (such as increased faculty development), the change in cases, and the feasibility of using clinical faculty to grade essays (the implementation of essay tests is described below). More broadly, results from 1982 Focal Problems, coupled with evaluation results from other activities of the "Ethics in the Core Curriculum" project (well into its second year by this time), dictated that evaluation should focus on an investigation of cognitive effects. In light of the mixed results on cognitive effects 164 from the Focal Problems (1982) evaluation, the "Ethics in the Core Curriculum" project needed to refute the scientistic notion that ethics teaching and its evaluation are inherently "soft". Student and faculty Appreciation (or at least endorsement) seemed widespread, and the Focal Problems course still presented the best opportunity to investigate cognitive effects because it remained the most circumscribed context and involved the most intensive exposure. Because the substantive evaluation aim was by-and-large limited to investigating cognitive effects, the Focal Problems (1983) evaluation focused more explicitly on meta evaluation issues than the previous two examples. Three meta evaluation questions were of particular concern: (1) Can Knowledge be fruitfully distinguished from Reasoning in terms of testing (where Knowledge is identified with what is measured by fixed-response tests and Reasoning is identified with essay tests)? (2) Can essay tests on ethical issues be scored with a reasonable degree of inter-rater reliability? and (3) Can a testing strategy that combines fixed-response measures of Knowledge with essay measures of Reasoning avoid the practical difficulties that attend more formal and abstract measures (such as Kohlbergian measures and the pre-post test used in the Focal Problems, 1982, evaluation)? As in the previous examples, the discussion of Focal Problems (1983) focuses on meta evaluation: the remainder of the discussion will concentrate on answering the above three 165 questions. Substantive evaluation findings are discussed when appropriate. Wan Scientistic evaluation tends to employ one-shot quantitative studies, feigning ignorance of (or disparaging) relevant (often "qualitative") background kndwledge. A functional approach holds that individual evaluations should not "stand alone", i.e., relevant knowledge (quantitative and qualitative) should be put to use in devising methods. The design of the Focal Problems (1983) evaluation made liberal use of pre-existing knowledge. In the 1983 version of Focal Problems, exams that combined fixed-response items with short essays replaced the 1982 strategy of exclusive reliance on fixed-response tests. In addition to the Focal Problems experience from the previous year and the successful use of essay examinations in the nursing ethics evaluation, two additional sources of data suggested that essay tests might be more workable than previously believed. First, the difficulty in training non-philosophers to grade essays may have been overestimated. A structured list of criteria was devised to aid a clinician in grading ethics case write-ups in 3rd year clerkships, and it met with reasonable success. Second, student resistance to writing may also have been overestimated. Students gave the following as their first choices of evaluation on the 1982 Focal Problems end-of—term course evaluations: 28% for fixed-response; 23% for short answer; 12% for in-class 166 essays; and 35% for short take-home papers (3-5 pages). The distinction between Knowledge and Reasoning crystallized in terms of instruments. In the Ethics in Nursing and Focal Problems (1982) evaluations, single tests (essay and fixed-response respectively) were used to measure both Knowledge and Reasoning. In the Focal Problems (1983) evaluation, Knowledge of content was identified with what would be measured by fixed-response tests and Reasoning with what would be measured by short essays regarding novel medical-ethical problems. It was reasoned that fixed-response tests could adequately measure recall and recognition of reading and lecture content, but were unwieldy and beset with practical difficulties in connection with Reasoning. The findings from the Ethics in Nursing evaluation suggested that essay tests could be used to measure Reasoning. Employing this strategy, two measures were developed, test forms A and B. Each form had 12 objective items based on course content (readings and lectures) and a short essay on a medical-ethical problem not discussed in the course. The essays required students to indicate whether they agreed with the position taken in the case description provided and, independent of this, whether the support provided for the position was satisfactory. Each form of the exam required approximately 35 minutes, and was administered in the first small group meetings and again as a portion of the final examination for the course. 167 A more sophisticated design was employed than the X 0 X type of the previous two examples. The students were divided into two groups. The first group consisted of 28 students who met in small groups on Wednesdays; the second group consisted of 32 students who met in small groups on Thursdays. The group which took form A of the test as a pre-test took form B as a post-test and vice versa. This crossed design improved confidence about conclusions over the X 0 X design by controlling for testing effects and providing two replications at once. Although an X 0 X design would have been adequate, the greater reduction in fallibility afforded by the more sophisticated design was consistent with a functional approach because the greater reduction in fallibility could be purchased at an acceptable cost: two measures had to be devised rather than one, and the data analysis was somewhat more demanding. This investigation of cognitive effects addressed the substantive question of the impact of the Focal Problems course regarding learning. It was also bound up with the investigation of several ancillary meta evaluation questions. First, an examination of correlations between scores on the fixed-response and essay portions of the exams would shed light on the practical difference between Knowledge and Reasoning. Second, study of the correlations (or lack thereof) between the positions taken on the essays and the ratings received would constitute evidence for (or against) the claim that essay grading is biased. Third, if 168 successful, the strategy of combining fixed response and essay tests would constitute a face valid, practicable alternative to other means bf evaluating cognitive performance such as those based on Kohlberg's theory. The design of the 1983 Focal Problems evaluation emphasized cognitive effects but was not confined to these. Though observation of the small groups and formal interviews of preceptors were not employed as they had been in 1982, the progress of the preceptors was monitored by informal discussions within the expanded faculty development sessions. Again, evaluations should not stand alone. Foregoing direct observations and formal interviews was justified in light of the resources required for the more elaborate effort of 1982; the reduced likelihood that much new would be learned as a result of the similar findings from the Ethics in Nursing and Focal Problems (1982) evaluations; and the changed make-up of the 1983 cohort of preceptors such that each group had an experienced preceptor, one from the staff of the MHP, or both. The 1983 design included student evaluations for the same purposes as before. The design also provided for an investigation of the inter-rater reliability of essay grading, an issue which was introduced with the change in the testing format. Figure 6 summarizes the relationships between data collection methods and constructs measured. (Because of the emphasis on meta evaluation in the design of the evaluation, 169 the figure is less informative as a guide to the discussion of this example than the previous figures were. It is included for consistency and as a characterization of the substantive evaluation.) Knowledge Reasoning Attitudes fixed-response tests x essay tests x evaluation forms x interviews x x x (informal) Figure 6. The Relationships Among Data Collection Techniques and Constructs for Focal Problems (1983) MW: Following the pattern of the two previous examples, the report of results is keyed to meta evaluation questions. To reiterate, the Focal Problems (1983) study involved three such questions: (1) Can Knowledge be fruitfully distinguished from Reasoning in terms of testing (where Knowledge is identified with what is measured by fixed-response tests and Reasoning is identified with essay tests)? (2) Can essay tests on ethical issues be scored with a reasonable degree of inter-rater reliability? and (3) Can a testing strategy that combines fixed-response measures of Knowledge with essay measures of Reasoning avoid the practical difficulties that attend more formal and abstract measures (such as Kohlbergian measures and the pre-post test used in the Focal Problems, 1982, evaluation)? 170 The subsequent "evaluation impact" and "conclusion" sections also follows the previous pattern. The "evaluation impact" section argues that the variant of the basic functional model used in the Focal Problems (1983) established worthwhile and credible conclusions; the concluding section summarizes the major meta evaluation conclusions of this example. Analysis_nf_£na;ngst_1asting. The crossed design and the nature of the data (especially the essay ratings) . suggested the Mann-Whitney U as the appropriate means by which to analyze the pre- post-test data. Essays were graded using the same blind procedure used in the nursing example. Four comparisons were made; results are reported in Table 2. Table 2. Comparison of Two Groups on Fixed-Response and Essay Tests Pre-test* Post-test* Test Mean S.D. Test Mean S.D. Form** Score Form** Score Group 1 Af 7.82 1.93 Bf 8.93 1.36 (n:28) Ae 2.13 .88 Be 3.31 .58 Group 2 Bf 7.50 1.98 Af 8.72 1.59 (n=32) Be 2.66 .87 Ae 2.80 .89 *Pre-post comparisons for Af, Bf, and Be were significant at p < .002; for Ae, p < .012. "*A = test A; B = test B; f : fixed response test, possible range 0-12; e essay test, possible range 0-4. In contrast to the 1982 Focal Problems testing, these results provided evidence based on testing that the course 171 did produce cognitive gains. Although this was not buttressed by observational evidence, preceptors provided informal testimony within the faculty development sessions and it was reasonable to assume that observations of the groups would have provided the same kind of evidence as they had in the 1982 evaluation. The results of the pre-post testing, then, provided rather solid evidence for the substantive conclusion that cognitive learning had occurred, especially when these results were combined with the previous findings from Focal Problems and the ethics in nursing course. These findings also helped in the inference from positive student evaluations of the course to Appreciation. Unlike the 1982 Focal Problems course, students' favorable evaluations of the course could not plausibly be attributed to merely wanting an easy course. Demonstrated cognitive effects (coupled with evidence from multiple indicators in relevantly similar circumstances) rendered this explanation untenable. Answers to the three meta evaluation questions of focus in this example were obtained by further analysis of the testing data. D' I' i l' K J I I B . i I E Testing. Results on the Mann-Whitney U tests indicate that the tests employed were sensitive to Knowledge and Reasoning and therefore are valid measures of these two goals. The apparent circularity of this argument (namely, the 172 instruments measured something, therefore they measured what was of interest) may be mitigated by making several observations. First, although such an analysis was not performed, one could expect the same differences to be apparent in the essays that were documented in the ethics in nursing study. This is supported by the fact that, when debriefed, the grader of the pre- post essays in Focal Problems reported that he detected differences in the essays in terms things such as listing alternatives and responding to objections. Second, observational evidence from the nursing study and the 1982 Focal Problems course provides independent evidence that learning of the desired kind occurs in similar courses. Third, the fixed-response exams were judged to be face valid measures of Knowledge and the essay exams were judged to be face valid measures of Reasoning by the MHP staff, as well as by preceptors and students. This latter characteristic as well as sensitivity to effects are possessed neither by the type of instrument used in the 1982 Focal Problems study nor by Kohlbergian- type measures. Certain findings from the testing add to the appeal of the distinction between Knowledge and Reasoning. If these are indeed distinct, one would expect correlations between tests of them to be low. Goodman's and Kruskal's gamma was used to investigate the relationship between the fixed-response and essay portions of the two test forms, A and B. Only one of these was significant, namely, the 173 portions of B as a pre-test (gamma = .348, p < .013). Further support for the distinction derives from considerations of face validity mentioned above, and the fact that the ratings on the essays did not correlate significantly with the positions defended in the essays (tau = .215 for the correlation between position and rating on the form A essay and .132 on the form B essay). This supports the contention that the essays measured (and were rated on the basis of) Reasoning (or quality of argument) and not on substantive claims. In connection with this, the essays involved cases not pursued in Focal Problems. Students were required to determine for themselves what content from Focal Problems was relevant and how it might be applied. -R R ' ' ' . The second meta evaluation question in the Focal Problems (1983) study was inter-rater reliability. A significant shortcoming of essay tests is lack of consistency among graders. Many believe (scientistic evaluators among them) that this shortcoming is especially acute in ethics because there are no "right answers". The 1983 cohort of preceptors were required to grade 2 sets of essays. They were provided with general grading criteria and broad outlines of anticipated arguments. Four pairs followed the instruction to grade independently, and inter-rater agreement (calculated using the Spearman rank-order correlation coefficient) was respectable. A 174 z-transformation (using r to approximate rs) yielded a weighted average of .80; the individual coefficients are reported in Table 3. Table 3. Reliabilities of Preceptor-Pairs' Essay Grading Preceptor pair 1rst essay 2nd essay** A* rS : .68(n : 12) rS : .70(n : 11) B rs : .82(n = 12) rs : .73(n = 12) C* rs = .83(n = 9) rs : .89(n = 9) D" rs = .90(n : 10) rs : .76(n : 10) *One member of the pair was a philosopher * *Instructions were changed for the second essays. Both were graded on a 0-10 scale, but 7 was explicitly defined as minimally adequate on the second set to reflect the customary 70% pass level in the college. This had the e ffect of restricting the practical range of scores on the second set as compared to the first, accounting for the slight decrease in overall agreement. In addition to these results which testify to the feasibility of using non-philosophers to grade essays on ethical problems, the preceptors also claimed that the essays were useful for providing feedback. Students confirmed this on the evaluation forms and expressed much less dissatisfaction with testing in general than the 1982 cohort. Furthermore, few students claimed that the grading was subjective or biased toward the favored views of the graders. Usefulness Qf Measures 9f Knewledge and Reasening. The third meta evaluation question was whether using fixed-response and essay tests to measure Knowledge and Reasoning respectively can avoid the practical problems associated with more abstract measures. 175 In the Focal Problems (1983) evaluation, keying fixed-response tests to Knowledge and essay tests to Reasoning provided formative as well as summative information. Table 4 contains 3 items from the fixed-response portion of form A of the pre- post-test, accompanied with correct responses (indicated by *) and pre- post difficulties (percentage of students responding incorrectly). Table 4. Pre-Post Difficulties of Selected Items Item Difficulties Pre(n = 28) Post(n = 32) 1. The right of a research subject to 9 0 informed consent requires a) informing subjects of all alternatives b) informing subjects of the risks and benefits of the research procedures 0) informing subjects that they may withdraw at any time d)* all of the above 2. The right to autonomy (self-determination) 47 16 is included in the constitutional right to privacy a)* true b) false 3. The right to life implies that life saving 53 72 medical treatment may never be withheld a) true b)* false These three items may be interpreted in the following way. Item 1 is indicative of progress regarding the knowledge in question. Because the item was so easy to begin with (only 9% answered it incorrectly), however, it is 176 more useful for diagnosis than assessment of learning. That is, students apparently had the desired knowledge entering the course, and this would warrant the decision to give little attention to the issue in future offerings of the course. Item 2 shows a substantial gain in knowledge about the relationship between law and morality, and is useful as a straightforward gauge of knowledge acquisition. Item 3 shows a loss, and a reasonable conclusion is that the instruction confused the students. For instance, perhaps the students' respect for the rights of patients increased but was not accompanied by a sufficiently sophisticated understanding of rights (e.g., the right to life does not entail the right to an artificial heart). Essay exams augment the fixed response tests and may be used for similar purposes (i.e., diagnosis and assessment of gains) with respect to Reasoning. Although interpretation of essay test performance is rarely as straightforward as interpretation of fixed-response test performance, essays as pre-tests provide an indication of whether students can put together coherent positions and, accordingly, help determine how much energy should be put in this direction. They also may be used to trace progress, as was done in the nursing and Focal Problems 1983 evaluations. Two points made earlier in different contexts may be expanded and applied to the question of the usefulness of the combined fixed-response/essay testing strategy. First, fixed-response and essay exams of the type used in the Focal 177 Problems (1983) course are face valid measures of Knowledge and Reasoning respectively. Because the measures had face validity, they were useful for feedback and were interpretable to the "Ethics in the Core Curriculum" project staff, the preceptors, and the students. Lack of face validity is a major shortcoming of Kohlbergian-type measures and the abstract kind of fixed-response test used in the Focal Problems (1982) evaluation. Second, and with respect to essay tests in particular, essay tests on ethical issues can be scored with an acceptable degree of inter-rater reliability and do not automatically prompt charges of subjectivity and bias from students. Furthermore, essay tests have a distinct advantage over fixed-response measures of Reasoning in terms of the time, effort, and expertise required for development. W The 1983 evaluation results demonstrated the value of the changes prompted by the 1982 evaluation results. In addition to the solid findings regarding cognitive effects (and the related meta evaluation findings), students' evaluations of preceptors' performance improved from the previous year. The higher ratings given to the lectures in general and the small group involving a philosopher preceptor in 1982 diminished. In 1983, 3 of the 6 groups had philosopher preceptors and differences between these groups and the others were negligible, as were the ratings of lectures versus small group discussion. In addition to 178 data from student evaluations, the preceptors reported in the faculty development sessions that things went well and those with experience reported greater confidence the second go around. Notably absent were reports of discussions stalling out, not knowing the right "moves", and so on. The results from the 1983 evaluation of Focal Problems (like the results from the Ethics in Nursing course) were interpreted to mean that a viable course had been developed. Like both of the previous examples, the functional evaluation approach again produced credible results. The relative success of the course and the strategy of distinguishing and devising separate tests for Knowledge (content) and Reasoning (problem-solving) were disseminated in the form of journal articles (Howe et al., 1984; Howe and Jones, 1984). The written materials for the course have been widely distributed, and the evaluation findings have been used in arguments presented to the college's curriculum committee. Conclusicn This example emphasized the study of cognitive effects of medical ethics teaching, and both substantive and meta issues were addressed. The substantive findings of gains in terms of the cognitive goals of medical ethics teaching corroborate the strong findings of the Ethics in Nursing example (based on testing and observation) and the more tenuous findings from the Focal Problems (1982) example (based on observation and 179 interviews but contradicted by testing). The pre-post testing gains also reinforce a point made in the nursing example: the gains count against the subjectivist View of ethics implicit in the scientistic fact-value distinction. Students gained on samething, and doubt about whether these gains were cognitive ones is silly in light of the evidence which has been adduced from the two previous examples in addition to this one. Moreover, the inter-rater agreement exhibited by the preceptors is prima facie evidence for the existence of agreed upon (i.e., objective) standards for judging the quality of ethical arguments that (contra Nobel, 1982) extend beyond the interests of a specialized group of philosophers. The three meta evaluation questions of this example were: (1) whether the constructs Knowledge and Reasoning could be fruitfully distinguished in a way that identifies Knowledge with fixed-response tests that measure course content and identifies Reasoning with essay tests that measure skills; (2) whether essay tests on ethical issues can be scored with a reasonable degree of inter-rater reliability; and (3) whether the combined fixed-response/ essay testing strategy avoids the practical problems that attend abstract measures. The combined fixed-response/essay testing strategy did well on each question. (1) The low correlations between the fixed-response and essay tests support the contention that Knowledge and 180 Reasoning are separable constructs that can be independently measured. Although one might argue that the correlations are "in reality" higher (e.g., because of the restricted range of the scales), the evidence militates against the conclusion that Knowledge and Reasoning amount to the same thing. The theoretical conclusion is that the distinction made between Knowledge and Reasoning in Chapter II is warranted; the practical conclusion is that fixed-response tests of Knowledge and essay tests of Reasoning are not L redundant, and thus it is useful to employ both kinds of tests. (2) The finding that the inter-rater reliability of the essay test scoring was reasonably high (.80) supports the feasibility of employing essay tests of ethical reasoning. This finding helps remove one of the most serious obstacles to the use of essay tests, clearing the way for measures that are more manageable and face valid than fixed-response measures of Reasoning. The observed inter-rater agreement also helps remove the obstacle posed by the scientistic fact-value distinction. (3) The combined fixed-response/essay testing strategy avoids the practical problems that plague the use of abstract measures of the cognitive goals of ethics teaching. The fixed-response/essay strategy may be easily keyed to the particular content and skills objectives of individual courses, rendering such tests sensitive to specific (though by no means trivial) cognitive effects. Related to this, 181 such tests are face valid and are thus interpretable to instructors and students and are useful for feedback on student performance. Finally, fixed-response and essay tests on the model of the ones employed in the Focal Problems example are easily developed and are flexible to changes in curricular content. W The stated purpose of this Chapter was to show that a functional evaluation approach, grounded in the Wilson +1 goals, can produce credible empirical results that achieve the customary evaluation purposes of improving practice and warranting judgments about success. To accomplish this purpose, a basic functional model for the evaluation of medical ethics courses was described and then variants of this model were used to explicate three concrete course evaluations. Each concrete example was cast in terms of two major themes: meta evaluation and impact. The discussions of meta evaluation issues pertained to the credibility (in the sense of rational warrant) of the conclusions resulting from the variants of the basic functional model used in each evaluation. The discussions of evaluation impact pertained to the variants' capacities to produce results that could influence the practice of medical ethics teaching (i.e., the discussions of evaluation impact pertained to credibility in a psychological, or rhetorical, 182 sense). The Ethics in Nursing example provided evidence that essay tests, direct observation, and student evaluation forms can be combined to measure Knowledge, Reasoning, and Appreciation. The multiple indicators established the credibility (rational warrant) of conclusions about the measurability of these three constructs, and the impact of the findings--their influence on the instructors and their disseminability--established persuasiveness. The rational warrant and persuasiveness taken together entail that the model employed was effective. The Ethics in Nursing course was judged successful, and continues to be taught in the same way that it was in 1980 when the evaluation was performed. The Focal Problems (1982) example showed that the basic functional model could be adapted to the medical school context and to the special multiple discussion group Focal Problems format, and that the constructs Knowledge, Reasoning, and Appreciation could be extended to faculty. In addition, the example provided suggestive evidence that direct observation (an indicator included in the basic functional model) can be sensitive to the indirect goals of medical ethics teaching. Because the fixed—response pre-post test yielded negative results, the Focal Problems (1982) evaluation did not substantiate cognitive effects. However, other indicators--observations and interviews-- helped render implausible the conclusion that students 183 actually lost in Knowledge and Reasoning. Moreover, the unwelcomed results of the quantitative analysis of the testing did not affect the overall credibility of the evaluation. The Focal Problems (1982) evaluation identified problems with the course and prompted constructive changes. The Focal Problems (1983) evaluation demonstrated that Knowledge and Reasoning can be fruitfully distinguished and can be measured respectively with locally constructed fixed-response and essay tests. Such a testing strategy overcomes the practical problems that beset Kohlbergian-type and other abstract measures. Like the Ethics in Nursing evaluation, the Focal Problems (1983) evaluation demonstrated that medical ethics teaching can yield cognitive gains among students. Also like the Ethics in Nursing example, the results of the Focal Problems (1983) evaluation were received as credible and as warranting the belief that a successful course (one not in need of revisions) had been developed. The three evaluations establish that a functional evaluation approach, grounded in the Wilson+1 goals, can be effectively employed to evaluate medical ethics teaching. The examples can also be used to directly criticize the scientistic alternative. In this chapter, for instance, pre-post gains on cognitive tests and inter-rater reliability of essay ratings were used as evidence against the scientistic fact-value distinction, which implies that cognitive standards in ethics do not exist. This argument, 184 and others that can be based on the three concrete evaluations, will be elaborated in the next and concluding chapter. NOTES 1. Survey results from the "Ethics in the Core Curriculum" project (described in the Focal Problems, 1982, example) indicate that a majority of students and faculty believe that self-standing courses in ethics perform such a ground laying function. 2. National Endowment for the Humanities grant number EP-32926-78-1231. 3. Michigan State University's College of Human Medicine has two pre-clinical curricular options. "Track I" consists largely of lectures in the basic sciences, augmented by one three course sequence of problem-based "Focal Problems" courses. The "Track II" curricular option is composed almost exclusively of "Focal Problems" courses. 4. National Endowment for the Humanities grant number ED-20020-81-0509. 185 CHAPTER V BUTTRESSING AND EXTENDING THE FINDINGS AND ARGUMENTS This chapter is divided into three sections. The first relates the practical examples from Chapter IV to the more theoretical and abstract themes of Chapters II and III. Whereas Chapter IV illustrated how theory is converted to and exemplified in practice, the aim in this chapter is to illustrate how practice (i.e., the three evaluations) supports theory. The discussion in the first section rounds out the central task of this dissertation: developing a defensible means of evaluating medical ethics teaching that meets the legitimate demands of both evaluation researchers and medical ethics instructors. The second section suggests two ways of extending the evaluation model employed in Chapter IV: how it might be extended within medical and nursing education to include more than just courses, and how it might be extended to "applied" ethics teaching more generally. The third section makes three explicit disclaimers of this dissertation with respect to both evaluating and teaching medical ethics. These disclaimers help define the 186 187 scope and limits of the dissertation and, related to this, help eliminate potential misunderstandings. Bullzfififiin£_1h£_fl9n£l The "controversies" and "misconceptions" discussed in Chapter II and the tenets of scientism discussed in Chapter III are highly conceptual issues, but are not altogether disconnected from empirical considerations. The three examples--the evaluation findings and the behavior of students and faculty--provide relevant empirical evidence (though by no means direct empirical tests) that may be added to the previous conceptual arguments. Misconceptions and Controversies Revisited Contzcxensies Manal_flehawian_as_a_Geal. The ethics in nursing and Focal Problems (1983) examples demonstrated cognitive effects in terms of Knowledge and Reasoning; the Focal Problems (1982) example provided suggestive evidence for the indirect goals. As argued in Chapter II, Knowledge, Reasoning, and the indirect goals make up desired behaviors for the educational context (i.e., behaviorse), and are constitutive of moral behavior in a full-blown sense (i.e., behaviorr). The findings from the three examples sanction inferences to the ultimate goal of moral behaviorr, and thus buttress the claim of Chapter II that it is unnecessary and 188 misleading to deny, without qualification, that moral behavior is a goal of medical ethics courses. Wail. Appreciation was implicitly adopted in each course as a proximate (i.e., direct) goal. The course designers took student evaluations of their courses very seriously, but measured student evaluations against cognitive goals. The positive attitudes of students accompanied by evidence for cognitive effects in the nursing and Focal Problems (1983) examples were taken to mean that adequate courses had been developed. By contrast, the positive student attitudes not accompanied by good evidence for cognitive effects in the Focal Problems (1982) example prompted changes in the testing and preparation of preceptors. In short, the designers of each of the courses adopted Appreciation as a direct goal for which they were responsible, and construed Appreciation in the way defined in Chapter II, namely, as combining attitudinal and cognitive elements. MW The design and content of the three courses belies the charge that medical ethics teaching is too "formal" (i.e., abstract, irrelevant, and of interest only to philosophers). Each course emphasized problems health care professional inevitably face in the execution of their day-to-day duties, and students (especially the nursing students) and faculty (the Focal Problems preceptors) testified to the relevance and value of the content and approach employed in the three courses. 189 Although medical ethics teaching could be (and perhaps sometimes is) too formal, it does not have to be, simply in virtue of employing a philosophical approach. M ' E ' I D Iheeny. Because ethical theory is abstract and primarily of interest to philosophers, medical ethics teaching has turned away from stressing ethical theory-~teaching ethical theory is teaching that is too "formal". The three course evaluations provide no evidence that greater stress should have been placed on ethical theory. In particular, neither students nor faculty requested such an emphasis. On the contrary, the concentration on concrete cases was praised. Furthermore, there was ample evidence that the courses achieved their desired goals. Himmnticns The "misconceptions" discussed in Chapter II were the following: ethics is subsumed by the social sciences; ethics is simply interpersonal skills; professional ethical codes are a sufficient means of dealing with ethical problems; legal considerations preclude ethical ones; ethics teaching conflicts with religious beliefs; ethics teaching is indoctrination; and formal education has no proper role to play in moral education. If these views in fact apply to medical ethics teaching, one would expect them to be voiced by students and faculty actually exposed to medical ethics teaching. Very little support for these views is to be found on the basis of the 190 three course evaluations. A few students claimed that the teaching conflicted with religious belief, a few that it was indoctrinating, and a few that testing in ethics is inappropriate because ethics is solely a matter of personal opinion (all from the Focal Problems 1982 example). By and large, however, "misconceptions" (in the form of expressed criticisms of the courses) were uncommon. The received wisdom is that "misconceptions" about ethics teaching are pervasive. Notwithstanding, students and faculty who had first hand experience with the courses discussed in the preceding chapter did not exhibit "misconceptions". Perhaps "misconceptions" are not as pervasive as they are believed to be, or perhaps the courses in question made major strides in removing them. Reducing Fallibility Without Scientism F - D' Chapter III argued that defeating the scientistic (positivistic) construal of the fact-value distinction is a pre-condition for making sense of the evaluation of medical ethics teaching-~and of educational evaluation in general. The findings from the three examples provide empirical evidence that may be used to further criticize the scientistic fact-value distinction. If the scientistic construal of the fact-value distinction is correct--if some version of subjectivism is correct in which value judgments have to do only with 191 personal, non-cognitive preferences--then two things should follow: (1) there should be no recognizable cognitive standards by which to distinguish the warrant of ethical judgments, and (2) subjectivism should be exemplified in the behavior of faculty and students. On the other hand, if these two expectations are contradicted, then the bases for "value phobia" are further undermined. 1. Mandamus Evidence for cognitive effects from testing, observation, interviews, or some combination was present in each of the three examples. In the nursing and the Focal Problems (1983) examples, advanced philosophy graduate students rated post-tests higher than pretests, and did so in terms of accepted standards of argumentation (e.g., specifying a position, defending it with reasons, and responding to counter— arguments). This evidence was buttressed by pre-post observations in the nursing example, and by observations and interviews in the Focal Problems (1982) example. Although in light of these findings one would be hard pressed to deny that same standards were employed, it might be objected that these findings only show that students were brought around to the biases implicit in the content of the courses. In response to this objection, the standards--specifying a position, defending it with reasons, and responding to counter-arguments--speak for themselves as cognitive standards; they are "biases" only in some very curious sense. Moreover, the inter-rater agreement 192 exhibited by the preceptors grading essays in the Focal Problems 1983 example indicates that they were able to recognize and apply the standards in question. It is unlikely that faculty could so easily be converted to accept the "biases" of the course designers. 2. The Behayiec Qf Eaenlty and Students. Students and faculty consistently behaved as if they believed ethical subjectivism is false. That is, data from the three examples regarding discussions and examinations indicates they were engaged in activities that seemed to be substantially cognitive in nature and to involve much more than the mere expressions of emotions and personal preferences. Moreover, as already noted, "misconceptions" in general were rare in the three examples, including the one that formal training in ethics is inappropriate because ethics is essentially non-cognitive. Ethical subjectivism, it seems, is avowed but not lived: the evidence from the three examples supports the contention of Chapter III that ethical subjectivism is not appealing in its own right but follows from the tenets of positivism. II 0 lil I' -Q 11! II D' II I’ The general theoretical points made in Chapter III about the quantitative-qualitative distinction were (1) that the scientistic epistemology which elevates quantitative methods to a distinct and superior position is virtually identical with positivism and is identically flawed, and (2) that 193 educational evaluation unavoidably makes use of both kinds of data to justify inferences. The model and undergirding epistemology that guided the three evaluations is supported by the credibility and utilization that the examples enjoyed. In each case qualitative and quantitative methods were combined in a multiple indicators approach which responded to prevailing conditions and purposes, and in each case the findings had sufficient credibility to be utilized. Results of the ethics in nursing and Focal Problems (1983) studies provided evidence that the courses in question were achieving desired and defensible aims, and the results were disseminable to the relevant audiences. [Results of the Focal Problems (1982) evaluation raised questions about success and located the source of problems, leading to needed improvements. At the practical level of the three examples, the scientistic criticisms that the results are suspect or incoherent because qualitative methods were combined with quantitative methods begs the question. The scientistic quantitative-qualitative distinction flows top-down from positivism: in light of the utilization of the results from the three examples, the epistemological standards employed by those interested in information about medical and nursing ethics teaching are not scientistic ones. Saving the scientistic qualitative-quantitative distinction, in addition to meeting the criticisms advanced in Chapter III, thus requires accusing this audience of being ignorant of 194 aannaet epistemology (i.e., it requires accusing them of not knowing a good argument when they see one). HEW Chapter III argued that behavioristic criteria and measures are fundamentally inappropriate for the evaluation of medical ethics teaching and that Kohlbergian criteria and measures are hopelessly flawed from a practical point of view. The relative success of the use of alternative, more intuitive criteria and measures in the three examples reinforces these contentions. 1. W The combination of interviews, observations, tests, and questionnaires was sensitive to the goals of Reasoning, Knowledge, and Appreciation. Sensitivity to Reasoning demonstrates the superiority of the strategy of the model employed in the three examples over behaviorism and Kohlberg's theory. Behaviorism fails to even broach the important question of Reasoning (i.e., cognitive moral judgment) and Kohlberg's theory provides an overly formal (if not incorrect and objectionable) construal of it. By contrast, the observational and testing strategies employed in the three examples were able to detect the desired effects. Also, the Focal Problems (1983) example measured Knowledge and Reasoning separately, supporting a distinction in the cognitive aims of ethics teaching that Kohlberg's theory fails to account for. 195 2. Intenpcetahility. The superior sensitivity of the methods of the model employed in the three examples applies perforce to interpretability of results: obtaining no observable effects or observable effects in terms of the wrong criteria is uninformative or misleading. In the nursing and Focal Problems (1983) examples, desired effects were detected in terms of the three major questions of interest, Reasoning, Knowledge, and Appreciation, and the results led the designers to be satisfied with the courses as they stood. In the Focal Problems 1982 example, sources of problems were identified that prompted needed improvements. In none of the examples were there questions in the minds of the designers about what the results of the evaluations meant. 3. Phaetjeahility. An approach based on psychologically based measurement requires the efforts of specialists in psychology, who are unlikely to be permanently available to monitor efforts and who employ measures which match their own interests and expertise. Considerable effort is likely to be expended on’a one-shot research project that will soon be forgotten (and perhaps never understood or endorsed) by those responsible for teaching. In contrast to this approach, each of the three evaluations discussed in the preceding chapter was accomplished by one staff evaluator working no more than one-quarter time. This time commitment becomes considerably less after strategies and instruments have been developed. Once deve can becom instructo from exp medical the dual function For courses Interper indirect possible feasibl One in the Univers format through encount telling rotatio disadVa eValuat 196 Once developed, the kind of measurement strategy employed can become a permanent fixture and turned over to instructors themselves, requiring only periodic monitoring from experts in evaluation (who are usually available in medical schools). The kind of testing employed can serve the dual function of program evaluation and the more common function of evaluating students to assign grades. .Extendins_the_flodel For the reasons advanced in Chapter IV, traditional courses are not well-suited for teaching Moral Regard, Interpersonal Skills, Empathy, and Courage, i.e., the indirect goals. However, where direct patient contact is possible, more actively pursuing the indirect goals is both feasible and indicated. One context for this is the teaching of skills required in the doctor-patient relationship. Southern Illinois University (SIU), for instance, has incorporated a teaching format termed "multiple stations" in which students rotate through several simulated patient encounters. Many of these encounters incorporate common ethical problems like truth telling and informed consent. Another setting is clinical rotations. Hospitals and physicians' offices have the disadvantage of providing less controlled conditions for evaluation than simulated patients. They have the 197 advantages, however, of a richer variety of encounters and the involvement of actual patients. It is feasible to incorporate ratings on the indirect goals into evaluations in both contexts. Student performance in the SIU simulated patient encounters is rated in two ways: by the simulated patients themselves and by unobtrusive observers. Incorporating indirect goals into these evaluations, students might be rated on their abilities to "read" patients (Empathy), to follow through in difficult situations (Courage), to demonstrate concern (Moral Regard), and to effectively communicate in a way sensitive to the demands of the situation (Interpersonal Skills). Similar kinds of ratings could be incorporated into actual clinical contexts. Working out the details of evaluation in either context is largely a practical matter. Such evaluations could be accomplished by direct observation resulting in narratives, observational checklists, debriefing interviews, or some combination. Although feasible, obstacles to this more comprehensive brand of evaluation of medical ethics surely exist. In addition to the practical and moral problems associated with the indirect goals discussed in Chapter II, the simulated patient, physicians' office, and hospital settings introduce imposing practical methodological problems. Moreover, marked resistance exists to what may be perceived as passing judgment on the moral virtue of students. Although, on the one hand, there is a demand to affect moral behavior, on the 198 other hand, there is a strong presumption against influencing or judging it. The issue of feasibility, then, cannot be divorced from the general milieu of medical education. Given the recalcitrant nature of the traits and dispositions that correspond to the indirect goals, medical schools need to incorporate these traits and dispositions as explicit criteria in admissions standards. This would both alleviate some of the potential pedagogical problems and clarify the commitment to moral behavior as essential to the practice of medicine. An explicit statement and commitment is also required within curricula. Only then will it be reasonable to demand that medical ethics teaching promote the desired aims and be accountable for them. And only then will evaluation of such aims be appropriate and not be met with considerable resistance from all sides. E | 1' I] M 1 J I E 1' 1 511' g 1] The Wilson+1 goals are specific to medical ethics only with regard to particular ethical issues of interest and the relationship between such ethical issues and appropriate domains of knowledge. With suitable adjustments in the specific issues addressed, the methods of evaluating medical ethics teaching advanced in this dissertation may be straightforwardly implemented in other areas of applied ethics. There are roughly two major types of educational programs in which applied ethics plays (or should play) a 199 role. One kind is the standard undergraduate program, consisting of a variety of courses leading to a undergraduate degree. Engineering, business, and journalism are examples of such programs. Applied ethics teaching in these areas would seem limited to self-standing courses, and the model for evaluating medical ethics courses of Chapter IV is readily adaptable to the differing content and issues involved. A second kind of program is one that adds apprenticeship field experience to standard course work. In addition to medicine and nursing, veterinary medicine, dentistry, and primary and secondary teaching are examples of this kind of program. The nature of these educational programs, as well as the professions themselves, entail direct interpersonal contact with individual persons. This kind of training program is amenable to the broad model which includes both direct and indirect goals and is subject to the same limitations and problems mentioned in connection with medical education. D‘ J . This dissertation does not have the presumptuous aim of providing the last word on how medical ethics ought to be evaluated, much less on how it ought to be taught. In order to avoid possible misunderstandings on these two points, this section makes three explicit disclaimers. Djselajmec 1. To borrow from Clouser (1980), evaluation should not be the tail that wags the dog. If certain 200 pedagogical practices that impinge on evaluation are defensible on independent grounds (for instance, requiring students to write papers), such practices should not be avoided in deference to the aims and preferred methods of evaluation research. To be sure, evaluators have something to say about legitimate aims and means of achieving them, but aims and means should not be compromised merely to simplify evaluation or to yield more methodologically defensible evaluation results. As stated previously, evaluation should work its way out from the nature of the object under investigation--this is a fundamental tenet of functional evaluation. Djselaime: 2. This dissertation aimed to show that medical ethics teaching can be evaluated in a defensible way. This is a much more modest aim than showing the only or best way to do evaluation. The basic functional model used in Chapter IV was dictated by resources and practices that prevailed at a given institution at a given point in time, and one could easily imagine both being different. For example, literature might be incorporated into a medical ethics course to, among other things, imnrcxe (versus eiieit) students ability to empathize. Incorporating literature in this way would press the classification of "eliciting Empathy" as an indirect (or process) goal. If resources were available to carefully research the impact of literature (e.g., students might be given passages from a 201 novel pre-post and asked on each occasion to describe the feelings of characters), the incorporation of literature would also render the basic functional model of Chapter IV methodologically inadequate. To take another example, one defensible aim of medical ethics teaching is helping future physicians balance their own interests against the interests of others (e.g., Pellegrino et al., 1985). Given this aim of medical ethics teaching, it could be suggested that the Wilson+1 goals are incomplete. The criticisms of the preceding paragraph would undermine this dissertation if the aim was to provide a blueprint for doing evaluation, but the aim was merely to show that methodologically defensible, yet fruitful, evaluation is possible. For the reasons given in the introductory chapter, this is something that needed showing. Diseiaimen_3. Just as this dissertation aimed to show only that medical ethics teaching can be evaluated, it aimed, in the process, to show only that the three courses used as examples achieved desirable goals. Showing that these courses achieved desirable goals falls considerably short of showing any of them is the best way to teach medical ethics. Determining what methods of teaching medical ethics are best would require careful comparative studies that go far beyond the limitations of this dissertation. 202 Clesjng Remarks Ethics is frequently viewed by educational evaluators as different from other academic subjects. The perceived difference finds its roots in the gnaat_aiwiae between what is purportedly objective, scientific, factual, quantitative, and cognitive on the one hand and what is subjective, unscientific, value-laden, qualitative, and affective on the other. Ethics is believed by many to fall squarely within the second set of descriptions, which gives rise to "value phobia" and the notion that evaluating ethics in terms of cognitive standards is out-of—place. Even those who stop short of going all the way with this demand that cognitive standards have some objective basis in science, most notably, in theories of cognitive moral development. For their part, ethics instructors also have a tendency to view ethics as radically different from other academic subjects, but for different reasons than educational evaluators. They believe ethics is not amenable to reductivist social research methods and is something that must be valued for its own sake. They identify all evaluation as reductivist and draw the conclusion that their teaching is not subject to the kind of methods that experts in evaluation employ. This dissertation has shown that evaluators and medical ethics instructors who hold the above views are mistaken. Educational evaluation incorporates values by its very nature, and ethics differs from other objects of evaluation 203 only as a matter of degree. Ethics teaching thus does not defy evaluation because it inherently subjective, unscientific, or anything of that sort. The other side of this coin is that ethics teaching does not lie above the fray when it comes to demonstrations of worth and effectiveness. REFERENCES Aleamoni, L. (1981). Student ratings of instruction. In J. Millman (Ed.), Handheek Qf Teaehen Eyaluatign (pp. 110-145). Beverly Hills, California: Sage Publications. American Association of American Medical Colleges (1984). P -F' C . Washington, D. C. c P . San Francisco: Jossey-Bass. Anderson, S. and Ball, 8. (1978). The_£rcfessicn_and Archambault, A. (1975). Criteria for success in moral education. In I. Chazan and J. Soltis (Eds.), Mega] ' (pp. 159-69). New York: Teachers College Press. Beauchamp, T. (1982). What philosophers can offer. Hastings Beauchamp, T. and Walters, L. (1978). Cantemnanany_isswes jn Bieethies. Belmont, California: Wadsworth Publishing Co. Benjamin, M. and Curtis, J. (1981). Ethies in Nncsjng. New York: Oxford University Press. Bok, s. (1978). ' - M c ' ' P ' P Pciyate Life. New York: Pantheon Books. Callahan, D. (1980). Goals in teaching ethics. In D. Callahan and S. Bok (Eds. ), Ethics Teaching in Higher Eaneatian (pp. 61 -74). New York: The Hastings Center. Callahan, D. (1978). The rebirth of ethics. Natjenal FQEHm, 5§(2), 9—12. Campbell, D. (1982). Experiments as arguments. In E. House Beverly Hills, California: Sage Publications. Campbell, D. (1979). "Degrees of freedom" and the case study. In T. Cook and C. Reichardt. (Eds. ), Qualitatiye ' M R (p pp 49-67). Beverly Hills, California: Sage Publications. 204 205 Campbell, D. (1974). Qualitative knowing in action research. Kurt Lewin Award Address, Society for the Psychological Study of Social Issues, meeting with the American Psychological Association, New Orleans, September. Caplan, A. (1983). Ethical engineers need not apply: the state of applied ethics today. In S. Gorovitz et al. (Eds. ), WWII—6.91mi]: (2nd Ed.), (pp- 38-43). Englewood Cliffs, New Jersey: Prentice-Hall. Caplan, A. (1980). Evaluation and the teaching of. ethics. In D. Callahan and S. Bok (Eds. ), H ' (pp. 133- 50). New York: The Hastings Center. Clements, C. and Sider R. (1983). Medical ethics assault on medical values. ‘ M ' Asmiation. 259(15) 2011-15- Clouser, K. D. (1980). ' ' ' ' S R . New York: The Institute of Society, Ethics and Life Sciences. Clouser, K. D. (1973). Medical ethics: some realistic expectations. Jeanna] at Medjeal Egneatjen, 48, 373- 7" Cronbach, L. (1982). D ' and Seejal Pnegnams. San Francisco: Jossey-Bass. Cronbach, L. and Associates (1980). Tswana Reflenm efi Pregnam Elalllatien . San Francisco: Jossey-Bass. Culver, C. et al. (1985). Basic curricular goals in medical ethics. New England Jeannal Qt Medjejne, 312, 253- 56. Daniels, N. (1979). Wide reflective equilibrium and theory acceptance in ethics. Jeanna] 9f Philesephy, May, 256— 82. Dewey, J. (1944). Demeenaey and Edneatien. New York: The Free Press. Dewey, J. (1939). Theeny 9f Valuatien. Chicago: University of Chicago Press. Dykstra, K. (1981). Vjsjen and Chanaeten. New York: Paulist Press. Elstein, A., Sprafka, S. and Shulman, L. (1978). Medjea] P . . A A . C . . . . Cambridge, Massachussets: Harvard University Press. Flanagan, O. (1982). Some philosophical reflections on the moral psychology debate. Ethies, 92(3), 499-512. 206 Frankena, W. (1975). Toward a philosophy of moral education. In I. Chazan and J. Soltis (Eds.), fiscal ' (pp. 148-58). New York: Teachers College Press. Gilligan, C. (1977). In a different voice: women's conception of the self and of morality. Hanyand Educational_Re11ew,.IZ, 481- 517 Good, I. J. (1983). The Bayesian influence, or how to sweep subjectvism under the carpet. In Gaea_Ihinking (pp. 22- 58). Minneapolis: The University of Minnesota Press. Goodpastor, K. (1982) Is teaching ethics 'making' or 'doing'?, Hastings Cente: Eepgnt 12(1), 37- 39 Gorovitz, S. (1984). Preparing for the perils of practice. .astings_£enten__enont. 13(6). 38-41. Hall, R. and Davis J. (1975). M finaetiee Buffalo, New York: Prometheus Books. Hart, H. L. A. (1961). The aneth at Law. London: Oxford University Press. Hastings Center Staff (1980). The teaching of ethics in American higher education: an empirical synopsis. In D. Callahan and S. Bok (Eds.), ' ‘ ' H' Edweatian (pp. 153-70). New York: Plenum Press. Howe, K. and Jones, M. (1984). Techniques for evaluating student performance in a preclinical medical ethics course. Jeannal Qfl Medjea] Edneatjgn, 59, 350- 52. Howe, K., Holmes, M. and Elstein, A. (1984). Teaching clinical decision making. Jeannal Qfl Medjejne and _n1lcsnnbx 9(2), 215- 228. Howe, K. (1982). Evaluating philosophy teaching: assessing student mastery of philosophical objectives in nursing ethics. Ieachins_flhilcscnhx, 5(1),11-22 Hunt, R. and Aras, J. (Eds.), (1977). M ' ' . Palo Alto, California: Mayfield Publishing Company. Iozzi, L. and Paradise- Maul, J. (1980). Issues at the interface of science, technology, and society. In L. Kuhmerker et al. (Eds.), M ‘ E H M Qimensian. Schenectady, New York: Character Research Press. Jensen, A. (1984). Political ideologies and educational research. Phi Delta Kappan, 65(7), 460-63. 207 Jones, M. and Howe, K. (1983). Measuring analytic moral reasoning in medical students. Unpublished. Kahn, G., Cohen, B. and Jason, H. (1979). The teaching of interpersonal skills in U. S. medical schools. Medical_Ednnatinn. 5_(1), 29—35. Kohlberg, L. (1982). A reply to Owen Flanagan and some comments on the Puka-Goodpastor exchange. Ethies, 92(3), Krebs, D. (1982). Psychological approaches to altruism: an evaluation. Ethies, 92(3), 447-58. Kuhn, T. (1961). The function of measurement in modern physical science. In H. Wolf (Ed.), Quantification, (pp. 31- 63). New York: Bobbs- Merril Company. Lickona, T. (1980). What does moral psychology have to say to the teacher of ethics? In S. Bok and D. Callahan (Eds. ), Ethies leaehjng 1n H1ghec Ednaatign (pp. 103—32). New York: Plenum Press. Lipman, M., Sharp, A. and Oscanyan, F. (1977). Ph1]Qsthy 1n the Classcggm. West Caldwell, New Jersey: Universal Diversified Services, Inc. MacIntyre, A. (1981). After Virtue. Notre Dame, Indiana: Notre Dame University Press. MacIntyre, A. (1980). A crisis in moral philosophy: why is the search for the foundations of ethics so frustrating? In H. T. Engelhardt and D. Callahan (Eds. ), Knawing_and ' (pp. 18— 35). New York: Institute of Society, Ethics and Life Sciences. Mackenzie, B. (1977). Behayigcism and the Limits Qf Sejentjfje Metth. Atlantic Highlands, New Jersey: Humanities Press Inc. ’ Macklin, R. (1980). Problems in the teaching of ethics: pluralism and indoctrination. In S. Bok and D. Callahan (Eds. ), Ethies Ieaehing 1n H1ghe: Egneatien (pp. 81-102). New York: Plenum Press. McPeck, J- (1980). Qr1t1cal.1n1nkins_in_Edncatinn. New York: St. Martin's Press. Melden, A. I. (1966). Free actions. In B. Berofsky (Ed.) Ecee W111 and Detecminjsm (pp. 198— 220). New York: Harper and Row Publishing Company. 208 Mercer, J. (1971). Institutionalized anglocentrism: labeling mental retardates in the public schools. In P. Orlenas and W. Ellis (Eds.), Rage, Change, and Unhan Seeiety (pp. 311-38). Beverly Hills, California: Sage Publications. Messick, S. (1981). Evidence and ethics in the evaluation of tests. WW. 1.0(9), 9-20. Nobel, C. (1982). Ethics and experts. Hastjngs Centen My 12(3), 7‘9. Noonan, J. (1977). An almost absolute value in history. In I M R. Hunt and J. Aras (Eds.), Ethieal__ssnes_in__gdenn Menieine. Palo Alto, California: The Mayfield Publishing Company. Nowell-Smith, P. H. (1967). Religion and morality. In Paul Edwards (Ed.), WW. Vol- 7, (pp. 150-58). New York: Macmillan Free Press. Patton, M. (1983). D ' ' ' ' Small—mam? a by L. Cronbach. Won—News, 5(2). 29- 33 Patton, M. (1980). Qualitat1ye Eyalnatien Methgds. Beverly Hills, California: Sage Publications Inc. Pellegrino, E., Hart, R., Henderson, 8., Loeb, S., and Edwards, G. (1985). Relevance and utility of courses in A . . medical ethics. a Asmiaticn, .253. "9-53. Peters, R. S. (1967). Ethies_ana_flgneatinn. Atlanta, Georgia: Scott, Foresman and Company. Phillips, D.C. (1983). After the wake: postpositivistic educational thought. Edneatiena] Reseanehec, 12(5), 4-12. President's Commission for the Study of Ethical Problems in Medicine and Biomedical and Behavioral Research (1982). Makjng Health Cane Deeisiens. Washington, D. C.: U. S. Government Printing Office. Puka, B. (1982). An interdisciplinary treatment of Kohlberg. Ethisfi, 92(3), 468—90. Putnam, H. (1983). How not to solve ethical problems. Linaiey_heetnnes. Lawrence, Kansas: University of Kansas. Quine, W.V.O. (1970). The basis of conceptual schemes. In ' K C. Landesman (Ed.), (pp. 160572). Englewood Cliffs, New Jersey: Prentice Hall, Inc. 209 Quine, W.V.O. (1969a). Epistemology naturalized. In 0 ‘ R ‘ ' O E , (pp. 69-90). New York: Columbia University Press. Quine, W.V.0. (1969b). Natural kinds. In Qnteiegieai ' ' O E , (pp. 114—38). New York: Columbia University Press. Quine, W.V.O. (1962). Two dogmas of empiricism. In Enem_a Legieal Point 91 View (2nd. ed.). Cambridge, Mass.: Harvard University Press. Rachels, J. (1975). Active and passive euthanasia. New We. 292, 78- 80. Rawls, J. (1971). A Theeny ef Justiee. Cambridge, Mass.: The Belknap Press of Harvard University Press. Reichardt, C. and Cook, T. (1979). Beyond qualitative wensus quantitative methods. In T. Cook and C. Reichardt (Eds.), ' ' M ' E ‘ (pp. 7- 32). Beverly Hills, California: Sage Publications. Rest, J. (1982). A psychologist looks at the teaching of ethics. Hastings Centen Regent, 12(1), 29-36. Rest, J. (1979). D ' J ' M a I . Minneapolis, Minnesota: University of Minnesota Press. Rorty, R. (1982a). Method, social science and social hope. In Censeeuenees Qfl Pnagmatism (pp. 191-210). Minneapolis: University of Minnesota Press. Rorty, R. (1982b). Pragmatism relativism and irrationalism. In Censeguenees efi Pnagmatism (pp. 160- 75). Minneapolis: University of Minnesota Press. Rorty, R. (1979). ' M' N Princeton, New Jersey: Princeton University Press. Ruddick, W. (1981). Can doctors and philosophers work together? WWW, 1.1(2) 12- 17 Rushton, P. (1982a). Altruism and society: a social learning perspective. Ethies, 92(3), 425-46. Rushton, P. (1982b). Moral cognition, behaviorism and social learning theory. Ethies, 92(3), 459-67. Ryle, G. (1949). The Ceneeet Qfl Mine. London: Hutchinson and Company, Ltd. 210 Scheffler, I. (1978). Ine_Lansnaae_of_Educatinn. Springfield, Illinois: Charles C. Thomas. Scriven, M. (1983). The evaluation taboo. In E. House (Ed.) Eh11eseehy efl Eyaluatien (pp. 75-82). San Francisco: Jossey-Bass. Scriven, M. (1981). Summative teacher evaluation. In J. Millman (Ed.), Haneeeek Qt Teaehen Eyaluatien pp. 244-71). Beverly Hills, California: Sage Publications. Scriven, V. (1979). Clinical judgment. In H. T. Engelhardt, S. Spicker, and B. Towers (Eds. ), ' ' ' Cn1t1ea] Aeenaisal (pp. 3- 16). Boston: Reidel Publishing Company. Scriven, M. (1975). Cognitive moral education. Ph1 Delta Keenan. 56, 680—94. Scriven, M. (1973). The methodology of evaluation. In B. Worthen and J. Sanders (Eds.), Edueatienal_E1aluatienl P ' (pp. 60-103). Belmont, California: Wadsworth Publishing Company. Scriven, M. (1972b). Objectivity and subjectivity in educational research. In L. Thomas and H. Richey (Eds.), ' ‘ R ' ' E ' R , The Seventy-first Yearbook of the National Society for the Study of Education (pp. 94-142). Chicago: University of Chicago Press. Scriven, M. (1969). Logical positivism and the behavioral sciences. In P. Achenstein and S. Barker (Eds. ), ' ‘ (pp. 195- 210). Baltimore: The John Hopkins Press. Shaw, A. (1977). Dilemmas of "informed consent" in children. In R. Hunt and J. Aras (Eds. ), ' M Meejeine, (pp. 182- 95). Palo Alto, California: Mayfield Publishing Companny. Sider, R. and Clements, C. (1984). Medical ethics assault on medical values ' M ' , .51(21), 2791-94. Siegler, M., Rezler, A., and Connell, K. (1982). Using simulated case studies to evaluate a clinical ethics course for junior students. InnLnal_nf_nedicaI_Educatinn, 517 38 0'85 Singer, M. (1970). Generalization in ethics. In W. Sellars and J. Hospers (Eds. ), a T , (529-47). New York: Appleton- Century- Crofts, Educational Division, Meredith Corporation. 211 Singer, P. (1982). How do we decide? Hastings Centen Repent, 12(3), 9-11. Smith, J. K. (1983a). Quantitative versus qualitative research: an attempt to clarify the issue. Researcher, 12(3) 6- 13 Smith, J. K. (1983b). Quantitative versus interpretive: the prcblem of conducting social inquiry. In E. House (Ed.), (pp. 5- 27). San Francisco: Jossey-Bass. Stolman, C. and Doran, R. (1982). Development and validation of a test instrument for assessing value preferences in medical ethics. JcnLnal_cf_MedicaI_Education, 51. 170- 79 Starr, P. (1982). T ’ ‘ A Meejeine. New York: Basic Books. Taylor, C. (1964). Ihe_Exnlanaticn_nf_8enaiionr. New York: The Humanities Press. Toulmin, S. (1981). The tyranny of principles. The Hastjngs Centen Repent, 11(6), 31-39. Toulmin, S. (1960). The Ph11eseehy ef Sejenee. New York: Harper and Row Publishers. Troyer, J. (1982). Ethics in nursing, by Martin Benjamin and Joy Curtis. Jeunnal efi Mee1e1ne and Phileseehy, 1(4), 382-84. Veatch, R. (1977). C ‘ ' M _ase_Studles_1n__ed1cal_Etn12s. Cambridge, Massachussets: Harvard University Press. Wikler, D. (1982). Ethicists, critics and expertise. The Hastinss_9enten_fiennnt, l2(3), 12-13. Wilson, J. (1983). A letter from Oxford. Hanwane_fldueatienal Rewiew, 53(2), 190-94. Wilson, J. (1973). A ' ' M ‘ . London: Geoffrey Chapman. Wilson, J. (1969). Mena] Eeueatjen and the Cunnjeulum. Oxford: Pergammon Press. Wilson, J. (1967). What is moral education? Part 1. In J. Wilson, N. Williams, and B. Sugarman, _ntneeuetien_te Mena] Edueatjen (pp. 39-226). Baltimore: Penguin Books. Wittgenstein. L. (1958). Ine_£nilnscnhinal_In1esiisations. New York: The Macmillan Company. 212 Worthen, B. and Sanders, J. (1973). Iheeny and Pnaetiee. Belmont, California: Wadsworth Publishing Company. Wren, T. (1982). Social learning theory, self-regulation, APPENDICES 213 APPENDIX A ETHICS IN NURSING INSTRUMENTS Nursing Student Questionnaire Key: SA: Strongly agree A: Agree N: Neither agree nor disagree SD: Strongly disagree The course improved my ability to see the complexity of moral problems which face nurses. The course helped me develop a framework or basis for my moral positions. The course helped me see the importance of giving reasons and careful arguments for my moral beliefs and decisions. The topics discussed and the written assign— ments were relevant to nursing. The cases discussed were realistic enough to bring out the emotional side of moral problems. Discussing cases is a better way to learn about moral problems in nursing than lectures. I will be a better nurse because I have had this course. My patients will receive better care because I have had this course. I feel more confident about recognizing and dealing with moral problems because I have had the course. All medical professionals should study moral problems in medicine in a similar way. Multiple choice or true—false exams would have been a better way of grading than essays. Writing the essays contributed to what I learned in the course. A] A2 The instructors' comments on my written assignments helped me learn. The course improved my general abilities to discuss, reason and write about issues, not just my abilities with respect to moral problems in nursing. APPENDIX B FOCAL PROBLEMS (1982) INSTRUMENTS WINTER '82 FOCAL PROBLEMS STUDENT QUESTIONNAIRE Preceptors' Names How important is it for physicians to be skilled in dealing with ethical problems? quite important important not very important Will your experience in this course help you deal with such problems in your practice? yes no don't know Comments: Were the aims of the course initially unclear? Yes No Did they become clearer as the term progressed? Yes No Did you get a "feel" for the issues and how to deal with them? Yes No If not, what would have helped you? Please rate the following course areas. low average high 1 4 5 usefulness of lectures interest in lectures usefulness of discussion groups interest in discussion groups quality of readings quality of discussion guides skill of preceptors B1 B2 Winter '82 Focal Problems Student Questionnaire Page 2 11. Was the number of cases appropriate? no, should be smaller yes no, should be greater Was the amount of biomedical content appropriate? no, should be more es no, should be less Please rank the cases in order of how stimulating you found them. Case I (Anoxic Encephalopathy) "‘ Case II (Karen Quinlan) A Case III (Donald C.) Case IV (Hospital Admission for Dementia) Was the difficulty of the exams appropriate? no, too difficult yes no, too easy Rank the following means of evaluation in order of your preference for this course. objective tests short answer in—class essay exams short take—home papers (3—5 pages each) Any additional comments to help improve this course (use reverse side if necessary). II. B3 FOCAL PROBLEMS - POST TEST Student # Instructions: 1. Detach the page headed VIEWPOINTS, DEFINITIONS AND ETHICAL THEORIES. This will be referred to during the test, and will be collected with the answers after you have finished. 2. The test is composed of 55 true—false questions. Circle the T or F provided to the left of each question to indicate your response. Later transfer those responses to a machine—scored answer sheet using A for True and B for False. EXAMPLE: A stethoscope is a device for T F l listening to the heartbeat T F 2 listening to the lungs T F 3 looking in the ears NOTE: In this example there are three separate questions beginning with, "A stethoscope is a device for". Each way of completing the sentence, 1—3, is a separate question requiring a separate response. 3. Read Scenario I until you come to STOP. Respond to the first set of questions. Read Scenario II until you come to a second STOP. Respond to the second set of questions. Resume reading Scenario II until you come to a third STOP. Respond to the third set of questions. In responding to the 3 sets of questions, PAY CAREFUL ATTENTION TO THE DIFFERENT INFORMATION CORRESPONDING TO EACH SET. B4 SCENARIO I Henry Black, a 56 yr old white male, was brought to the emergency room at a San Francisco hospital at 3:00 P.M. one Saturday afternoon, apparently the victim of a massive myocardial infarction (heart attack). His wife, who rode with him in the ambulance to the emergency room, reported that she had returned home from shopping and found him lying on the floor next to a table and a lamp he had upset. In her words, "He was not breathing and was blue." She immediately called an ambulance and began administering mouth to mouth resuscitation. When the ambulance arrived the attendants discovered Mr. Black's heart was beating and immediately placed him on a respirator. The diagnosis of a massive myocardial infarction was confirmed 36 hours later in the cardiac care unit. Apparently Mr. Black had been without oxygen for a significant length of time; testing showed that he satisfied all the Harvard Criteria for brain death including a flat EEG. The tests had been applied 24 hours apart; the last administration was 48 hours after Mr. Black's admission. A discussion among the staff ensued. Three viewpoints were represented. STOP Refer to the page headed VIEWPOINTS, DEFINITIONS, and ETHICAL THEORIES as necessary. RESPOND TO QUESTIONS 1—36. If you accept VIEWPOINT I, then you must accept T F 1. DEFINITION A T F 2. ETHICAL THEORY 1 B5 If you accept VIEWPOINT II, then you must accept T F 3. DEFINITION B T F 4. ETHICAL THEORY 2 If you accept VIEWPOINT III, then you must accept T F 5. DEFINITION C T F 6. ETHICAL THEORY 3 If you accept DEFINITION A, then you must accept T F 7. VIEWPOINT I T F 8. ETHICAL THEORY 1 If you accept DEFINITION B, then you must accept T F 9. VIEWPOINT II T F 10. ETHICAL THEORY 2 If you accept DEFINITION C, then you must accept T F 11. VIEWPOINT III T F 12. ETHICAL THEORY 3 Assuming you accept DEFINITION A, deciding whether the patient is dead is 13. an ethical issue 14. a factual (medical/technical) issue 15. an issue of how to distribute limited medical resources 16. a legal issue >-3'-3'—3*—] "1'1'11'1'1'1'1 The claim that the quality of life is more important than the quantity of life in this situation is an assumption of T F 17. VIEWPOINT I T F 18. VIEWPOINT II T F 19. VIEWPOINT III A distinction between an oxygenated human organism and a living person is an assumption of T F 20. VIEWPOINT I T F 21. VIEWPOINT II T F 22. VIEWPOINT III B6 The claim, "The cost of maintaining the patient in his present condition relative to other potential uses of the medical resources expended is something which should be taken into account", must be denied by a consistent supporter of T F 23. VIEWPOINT I T F 24. VIEWPOINT II T F 25. VIEWPOINT III The question, "Would you bury the patient RIGHT NOW?" is a way of objecting to T F 26. DEFINITION A T F 27. DEFINITION B T F 28. DEFINITION C The claim that DEFINITION A is satisfied in this situation could ‘ reasonably be denied by a supporter of T F 29. VIEWPOINT II 11‘ T F 30. VIEWPOINT III The remark, "Another patient in need of kidneys would probably benefit more from Mr. Black's kidneys than he can, but we can't go around making decisions solely or primarily on those grounds," is a way of objecting to T F 31. ETHICAL THEORY 1 T F 32. ETHICAL THEORY 2 T F 33. ETHICAL THEORY 3 A person who remarks in this situation, "What's being considered here is euthanasia and I want no part of it," must reject T F 34. DEFINITION A T F 35. DEFINITION B T F 36. DEFINITION C SCENARIO II Exactly the same sequence of events occurred as described in SCENARIO I up to the point of the second administration of the tests for the Harvard Criteria. In SCENARIO II Mr. Black began breathing spontaneously before the tests were applied a second time. His condition was otherwise the same. That is, except for the requirement of no spontaneous breathing, he satisfied all of the Harvard Criteria including a flat EEG. B7 STOP Refer to the page headed VIEWPOINTS, DEFINITIONS and ETHICAL THEORIES as necessary. RESPOND TO QUESTIONS 38—43. The facts by themselves in SCENARIO II rule out T F 38. VIEWPOINT I T F 39. VIEWPOINT II T F 40. VIEWPOINT II Suppose someone wants to defend VIEWPOINT II for SCENARIO II. If so, they must reject T F 41. DEFINITION A T F 42. DEFINITION B T F 43. DEFINITION C RESUME SCENARIO II Mr. Black could possibly be maintained for months provided appropriate measures were undertaken, for instance, tube—feeding or preventive antibiotic therapy. But, as it turned out, Mr. Black had executed a "living will". (This is a document which communicates the wishes of a person pertaining to medical treatment in the event they are rendered incompetent and unable to express their wishes. Such documents are legally binding in California where the case of Mr. Black took place.) There were two wishes expressed by Mr. Black in this document which are especially relevant to the present situation: (1) "In the event that I become incapable of living a genuine human existence and am unable to express my wishes, I DEMAND that no extra- ordinary medical measures be used to maintain me." (2) "In the event that the conditions set out in the first clause of (1) are satisfied, I give my permission to the appropriate medical authorities to B8 immediately remove any and all of my bodily organs for transplant or research purposes." STOP Refer to the page headed VIEWPOINTS, DEFINITIONS and ETHICAL THEORIES as necessary. RESPOND TO QUESTIONS 44—55. In order to remove reasonable doubt about how to carry out Mr. Black's wishes as expressed in his "living will", you would have to clarify the meaning of which of these expressions? F 44. genuine human existence F 45. unable to express my wishes F 46. I DEMAND F 47. extraordinary medical measures F 48. immediately remove F 49. transplant or research purposes HHHHF—JH Mr. Black's "living will" provides him with a means of T F 50. exercising his right to refuse medical treatment T F 51. preserving limited medical resources T F 52. exercising his right of informed consent Assuming Mr. Black's "living will" should be interpreted to mean that he would not want to be treated under the circumstances described, a person who sought to treat him anyway would probably accept T F 53. VIEWPOINT III T F 54. DEFINITION C T F 55. ETHICAL THEORY 2 B9 VIEWPOINTS, DEFINITIONS and ETHICAL THEORIES This page is provided for reference purposes. VIEWPOINTS, DEFINITIONS The arrangement of the and ETHICAL THEORIES has no significance. VIEWPOINT I is not necessarily associated with DEFINITION A and ETHICAL THEORY l; VIEWPOINT II is not necessarily associated with DEFINITION B and ETHICAL THEORY 2; nor is VIEWPOINT III necessarily associated with DEFINITION C and ETHICAL THEORY 3. VIEWPOINTS I This patient is dead since he meets DEFI— NITION A (THE HARVARD CRITERIA). Therefore, we should go ahead and remove his kid— neys. There is not a moment to spare if they are to be main— tained in good condi— tion for transplant. II The patient is not dead because he is breathing and his heart is beating. True, he is in an irreversible coma and will never regain con- sciousness. In fact, he has reached a point where he should be allowed to die with dignity. After we allow him to die, we can take his kidneys. III The patient is not dead. Moreover, we must do whatever is medically possible to keep him alive. Only if he dies despite our utmost efforts can we take his kidneys. DEFINITIONS A Persons are dead when they are: unreceptive and unresponsive, ex— hibit no voluntary movement, no spon— taneous breathing, no reflexes, and have a flat EEG. (THE HARVARD CRITERIA) B Persons are dead when respiration and heart- beat cease despite efforts, including mechanical, to main— tain them. C Persons are dead when they no longer respond to stimulation, are incapable of voluntary movement, exhibit no signs of cognitive activity, and are un- able to undertake plans regarding how to conduct their life. ETHICAL THEORIES 1 Actions are morally correct which result in maximizing the general good, that is, which result in creat- ing the greatest bene— fit for the greatest number of people. The rights of individuals and their duties de— rive from and are secondary to maximiz— ing the general good. 2 Actions are morally correct which respect the rights of all in— dividuals concerned. Duties derive from and are secondary to rights. Maximizing the general good is allowable to the ex— tent that it does not violate the rights of individuals. 3 Actions are morally correct which are in accordance with duties. Rights derive from and are secondary to duties. Maximizing the general good is allowable to the ex— tent that it does not involve the viola— tion of duties. II. III. IV. VI. B10 INTERVIEW PROTOCOL (PRECEPTORS) Quality of Materials A. Cases 1. # 2. quality B. Study questions C. Leader's guide D. Suggestions Quality of faculty development and suggestions Students in Discussion a. Performance and understanding of aims B. Interest and enthusiasm C. Problem areas D. Suggestions Yourself in discussion initial — later reactions A. Clarity of aims B. Groping at times? C. Will you improve with experience? D. Interest in precepting in future Your view of medical ethics A. Importance B. "Teachability" C. What should it include? D. Evaluation of students Any additional comments Lecture scheduling APPENDIX C FOCAL PROBLEMS (1983) INSTRUMENTS Focal Problems Track I Term 3 Pretest FORM A I Multiple Choice 1. Which of the following is least relevant where treatment de- cisions involving "newly" incompetent adults (e.g., Karen Quinlan) are concerned? the patient's prognosis what the patient would want if competent the values of the physicians involved the law In a random clinical trial of two drugs the treatment the patient undergoes is chosen by CLOUD) The c. d. the patient the physician the patient and the physician jointly none of the above right of a research subject to informed consent requires informing subjects of all alternatives informing subjects of the risks and benefits of the research procedures informing subjects that they may withdraw at any time all of the above Which of the following is the least relevant to a decision of whether to continue a respirator for a terminally ill patient? a. quality of life b. what the patient wants c. what the family wants d. the right to life A patient's right to accept or refuse life—saving medical treatment a. depends on the patient's prognosis is equal to the physician's right to make treatment decisions imposes a check on what physicians can do all of the above C1 H- C2 Focal Problems Track I Term 3 Pretest II True—False T F 6. In cases where the prognosis is very poor (e.g., Karen Quinlan) a good way to avoid becoming ensnarled in a messy legal situation is not to begin treatment in the first place. T F 7. A patient's right to refuse treatment does not imply that every voluntary refusal of life—saving treatment should be honored. T F 8. The right to autonomy (self—determination) is included in the constitutional right to privacy. T F 9. The distinction between ordinary and extraordinary treatment could justify stopping the use of a respirator, but could not justify stopping the use of antibiotics. T F 10. Enrolling a patient in a research protocol is sufficiently justified if the physician sincerely believes it is in the patient's best interests. T F 11. The right to life implies that life-saving medical treatment may never be withheld. T F 12. If it is justifiable to start a patient on a respirator, it is not justifiable to subsequently stop the respirator (unless the patient is or becomes legally dead). III CASE DESCRIPTION On December 14, 1982, Charlie Brooks Jr. was the first U.S. prisoner ever executed by lethal injection. In order for the execution to take place, a physician, Ralph Gray, inspected the arm of Brooks to insure that a catheter could be inserted into the veins, instructed a technician on where and how to make-the insertion, and stood by as sodium thiopental, pancuronium bromide and potassium chloride flowed into Brooks' arm in sequence. He died within minutes. As a result of this execution, the question of whether physicians can be involved in these kinds of executions without violating their professional obligation to preserve life has been raised. Below are several claims made in support of Dr. Gray. Dr. Gray loaded the pistol, but he did not pull the trigger. There was no doctor involved in the actual process of the execution. Looking for veins doesn't count. From Time, Dec. 20, 1982 C3 Focal Problems Track I Term 3 Pretest 1. Dr. Gray should should not have participated in the execution. Check off and defend your answer in a paragraph or two. C4 Focal Problems Track I Term 3 Pretest 2. The claims made in support of Dr. Gray were were not satisfactory. Check off and defend your answer in a paragraph or two. C5 Focal Problems Track I Student # Term 3 Preceptors Pretest FORM B I Multiple Choice 1. Which of the following is not necessary to justify a research protocol? a. informed consent of the subjects b. benefit to the subjects c. benefit to future patients d. a favorable overall risk/benefit ratio 2. Which of the following is least important to adhering to a patient's refusal of life-saving treatment? a. patient competence ‘ b. an informed patient c. physician agreement d. a-c are equally important 3. In a random clinical trial of two drugs, the treatment the patient undergoes is chosen by the patient the physician . the patient and the physician jointly none of the above CLOU‘OJ 4. Which of the following is least relevant where treatment decisions involving "newly" incompetent patients (e.g., Karen Quinlan) are concerned? a. the patient's prognosis b. what the patient would want if competent c. the values of the physicians involved d. the law 5. Which of the following is the strongest justification for honoring a refusal of treatment? a. the agreement of the family b. the patient's rights c. the agreement of the physician d. all of the above are equally strong II True—False T F 6. If the life of persons is infinitely valuable, then with— holding life-saving medical treatment is never justified. C6 Focal Problems Track I Term 3 Pretest T F 7. Patient autonomy and "death with dignity" require that terminally ill patients never be treated against their expressed wishes. T F 8. The decision of whether to start a patient on a respirator should not be based on whether there may be reasons to stop it later. T F 9. Active euthanasia (e.g., giving a lethal injection) is never ethical and passive euthanasia (e.g., "no codes") is sometimes ethical because killing is always morally worse than allowing to die. T F 10. The use of an experimental treatment is sufficiently justified . if there is a possible benefit to future patients and risks .‘ to the subjects are minimal. -.— T F 11. A person's right to life excludes the use of quality of life considerations in a decision to withhold or withdraw treat— ment. T F 12. The decision of the New Jersey Supreme Court in the case of Karen Quinlan illustrates the tendency of the courts to leave the standards for practice of medicine to the medical profession. III CASE DESCRIPTION On December 17, 1976, a baby boy was born 15% weeks prematurely and weighing 514 grams. He suffered from numerous afflictions, many of which were iatrogenic. On December 24 he was placed on a respirator. (One of the reasons given by a doctor for respirator dependence was that it "hurts like hell every time he breathes." The infant had de— veloped rickets resulting in numerous broken ribs.) The prognosis was bleak indeed; survival was unprecedented. The parents had objected to the decision of Dr. Farrell, the attending physician, to ventilate the infant. They preferred to allow him to die. They described Dr. Farrell's response to their objection as follows: When we objected to the decision Dr. Farrell accused us of wanting to "play God" and to "go back to the law of the jungle"..."I would not presume," he told us, "to tell my auto mechanic how to fix my car." From the Atlantic, July 1979 C7 Focal Problems Track I Term 3 Pretest 1. The infant should should not have been ventilated. Check off and defend your answer in a paragraph or two. C8 Focal Problems Track I Term 3 Pretest 2. Dr. Farrell's responses to the parents' objections were were not satisfactory. Check off and defend your answer in a paragraph or two. C9 HM 214 STUDENT QUESTIONNAIRE WINTER 1983 1. Preceptors' Names 2. How much will your experience in this course help you deal with ethical problems in your practice? a great deal somewhat _____yery little don't know Comments: 3. Please rate the following course areas. low average high 1 2 3 4 5 usefulness of lectures interest in lectures usefulness of discussion groups interest in discussion groups skill of preceptors quality of readings quality of discussion guides quality of examinations 4. Was the number of cases appropriate? no, should be greater yes no, should be fewer C10 HM 214 Student Questionnaire Winter 1983 Was the amount of biomedical content appropriate? no, should be less _____yes no, should be more Please rank the cases in order of overall interest and importance _____Case I (anoxic child) _____Case II (Karen Quinlan) ______Case III (Donald C.) _____Case IV (man with MS) Case V (leukemic woman) Any additional comments to help improve this course: WTITIGIITINHIWITT! TTTTITITTITIHYII1111111!“ 3 1293 03082 8994