DEVELOPMENT AND VALIDATEON OF A MODEL EXPUCATING THE FORMATIVE EVALUATION PROCESS FOR MULTI - MEDM SELF e TNSTRUCTIONAL LEARNING SYSTEMS Thesis for the Degree of Ph D. "MICHIGAN STATE UNIVERSITY ALLAN JOSEPH ABEDOR 1971 L IE R A R Y Michigan Stair: University «Tum; “l 9T 3mm 1“ i 1 IT TIT 115T; TIT || 1 L This is to certify that the thesis entitled DEVELOPMENT AND VALIDATION OF A MODEL EXPLICATING THE FORMATIVE EVALUATION PROCESS FOR MULTI-MEDIA SELF-INSTRUCTIONAL LEARNING SYSTEMS v presented by ALLAN JOSEPH ABEDOR has been accepted towards fulfillment ' of the requirements for g Ph. D. degreein Secondary Education Eda/2% Mdmwmflauu Date August 13, 1971 '4. 0-7639 3 0mm: mes; q .g‘p. "\ .- 4’ 25¢ per day per itel ' [:Ié'j‘a“ F , momma mam mums: ; , ‘:;,.,,,',',' Place in book return to remove ‘ TENW' charge from circulation records ‘ 4 ABSTRACT DEVELOPMENT AND VALIDATION OF A MODEL EXPLICATING THE FORMATIVE EVALUATION PROCESS FOR MULTI-MEDIA SELF-INSTRUCTIONAL LEARNING SYSTEMS By Allan Joseph Abedor Tryout and revision are steps considered by many to be essential to the deveTOpment of an instructional system. Virtually all theoretic models of instructional system development include tryout and revision as an integral part of the process. However, the formative evaluation pro- cedures included in such models are either too general to be useful, or when specific, seem applicable to simple textual programmed instruction. New tryout and revision procedures are needed to operationally apply the principles of formative evaluation to instructional systems of increased complexity and scepe. The purpose of this study was, therefore, to deveTOp and validate (field test) a flowchart or analog model pre- scribing specific formative evaluation procedures for tryout and revision of prototype multi-media self-instructional learning systems. The initial (MK 1) model was developed from a review of the litera- ture on formative evaluation. This model addressed three main methodolo- gical issues: (l) how to identify major discrepancies in prototype multi-media lessons by data collection; (2) how to analyze these data and develop revision hypotheses; and (3) how to design, integrate, and evaluate revisions. The MK I model stipulated an elaborate three-stage process, including technical review, tutorial tryouts, and large group tryouts. Allan Joseph Abedor Validation of the MK I began by having its procedures assessed by means of interviews with seven faculty members who had previously devel- oped (and revised) multi-media lessons. Data from these interviews clearly showed that the MK I procedures were far too complex and time consuming for practitioners to use. Therefore, an MK II version was developed which simplified procedures throughout and introduced a small group (N=l2) tryout and debriefing procedure as the main method of identifying instructional problems and develOping revisions. This technique required nine to twelve volunteer students of varying ability to individually interact with prototype lesson materials. During student use of the prototype, the lesson author personally answered questions in a tutorial fashion. After completion of the lesson, students were given a 15-minute recess so the lesson post-test and attitudinal survey could be scored. Items which indicated that 30% or more of the group were having problems were tallied and became the agenda for the debriefing to follow. During the debriefing, which was conducted by the lesson author, students were encouraged to freely discuss any and all problems they encountered--and to provide solutions to these problems if possible. The identification of prototype lesson problem areas and de- velopment of revision hypotheses thus became an author/student group responsibility. Validation of the MK 11 procedures were conducted in five field experiments conducted with three Michigan State University faculty, Fall term, l970. The purpose of the experimental comparisons was to determine, inSofar as possible, the overall validity, feasibility, and effectiveness of the MK II model in facilitating tryout and revision of prototype multi- media lessons. Allan Joseph Abedor Faculty member A had developed three prototype multi-media lessons, designated A], A2, and A3. Faculty member B and C had deveTOped one lesson each, designated B1 and C]. Each field experiment consisted of the lesson author applying the MK II procedures to tryout and revision of his proto- type lesson. In each field trial, the experimenter (E) performed technical assessment of prototype instructional stimuli, after which the materials were tried out with the first student group. Following the first tryout and debriefing, revisions suggested by the students were incorporated into revised versions. As revised lessons were completed, a second iteration of student tryouts was initiated. The purpose of the second tryout was twofold: (l) to compare the revised version with its prototype counterpart to determine the effect of the revisions on measures of student attitude and achievement; and (2) to gather additional feedback for further revisions. On two trials (A3 and C1), however, after the first student tryout the authors concerned felt that the initial prototype was sufficiently effective and did not war- rant revision. Hence, in these two cases, an experimental comparison between prototype and revised versions was not possible. In the three trials in which experimental comparisons were conducted, simple statistical tests were used to compare four dependent measures: (1) student achievement on the post-test, (2) gain score, (3) percentage of stun dents achieving criterion, and (4) student attitudes. In two field trials (A1 and Bl)’ significant differences were obtained (P<(.0l) favoring the revised version on all four dependent measures. In the third field trial (A2), a significant difference (P<(.05) favoring the revised version was obtained on the post-test only. Allan Joseph Abedor It was concluded that: (l) the MK II model was valid, in that authors were able to identify and remediate major instructional problems through use of MK II procedures; (2) the MK II was feasible, in that two out of three authors were willing and able to use MK II procedures; and (3) the MK II model was effective, in that statistically significant differences favoring the revised versions were obtained on nine out of twelve dependent measures in the three separate field trials. The MK II model provides an Operational framework within which instructional development personnel can train or consult with faculty regarding formative evaluation of mediated self-instructional systems. Whether the model can be generalized to other types of instructional systems is a question yet to be answered. The MK II procedures are developed at two levels of detail. The "mini" MK II is a simplified version used for orientation purposes. The "maxi" MK II provides the detailed procedures needed by an instructional development Specialist. The general principles of the model are as follows: (I) use a carefully develOped prototype to provide a common instructional experience for a group of volunteer students of varying abilities; (2) collect data by means of learning and attitudinal measures after the common eXperience; (3) identify, discuss, and prepose solutions to major problems by means of a group debriefing conducted by the author; (4) consult with "experts" on data interpretation; and (5) revise the instructional unit and recycle as necessary. DEVELOPMENT AND VALIDATION OF A MODEL EXPLICATING THE FORMATIVE EVALUATION PROCESS FOR MULTI-MEDIA SELF-INSTRUCTIONAL LEARNING SYSTEMS By Allan Joseph Abedor A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education 1971 @Copyright by ALLAN JOSEPH ABEDOR l97l DEDICATION This thesis is dedicated to my Mother and Father. 11 ACKNOWLEDGMENTS The writer wishes to express his appreciation to the many persons who have contributed to the design, development, and execution of this thesis. Particular thanks are expressed to Dr. Paul N. F. Witt for his encouragement and counsel as chairman of the guidance committee and to Drs. Robert H. Davis and Norman T. Bell for their inspiration and guid- ance during critical phases of this thesis. Thanks are also expressed to Drs. Kent L. Gustafson and James R. Nord for their insightful sug- gestions. Special thanks go to Dr. Harold A. Henneman, Dr. Howard H. Hagerman, Mr. Thomas w. Burt, and the students in their courses without whose cooperation this thesis would not have been possible. Gratitude is also expressed to Drs. David K. Berlo, Randall P. Harrison, Lawrence T. Alexander, and Stephen L. Yelon who provided con- ceptual guidance during the formative stages of this thesis. Deepest appreciation goes to my wife Betty, my daughter Carolyn, and my son John for their patience and forbearance in accepting the absence of their husband and father during much of the time the thesis was in progress. To them, I can only offer myself and my gratitude for the constant love, encouragement, and devotion they have provided. TABLE OF CONTENTS LIST OF TABLES ................ . ......... LIST OF FIGURES ......................... Chapter I. II. BACKGROUND ....................... Purpose of the Study .................. Assumptions ....................... Limitations of the Study ................ Organization of the Thesis ............... Definition of Terms ................... Methodology of the Study ................ Theoretic Phase ........ . . . . . ...... Exploratory Field Test Phase ............. Potential Payoff From This Line of Research ..... REVIEW OF THE LITERATURE LEADING TO DEVELOPMENT OF A PRELIMINARY (MK I) MODEL OF FORMATIVE EVALUATION . . Assumptions Underlying DeveTOpment of the MK I Model Specific Questions Used to Focus the Review of the Literature ....................... Review of Research by Individual Authors in Formative Evaluation .................. The Tutorial Approach .................. Research by Robeck .................. Research by Silverman and Coulson .......... Theoretic Work by Horn ................ Descriptive Research by Dick ............. Discussion of the Tutorial Approach ......... The Large Group Approach ................ Research by Vandermeer ................ Discussion of the Large Group Approach ........ An Approach Combining Individual and Group Data ..... Discussion of the Combined Approach . ........ Related Methodological Issues . . . ........... Matrix Summary of the Literature Reviewed ........ Formulation of the MK I Model .............. iv Page viii ix Chapter III. IV. Page ASSESSMENT AND REVISION OF THE MK I MODEL ........ 44 Introduction ...................... 44 Overview ......................... 45 Procedures ....................... 45 Questionnaire Development .............. 45 Selection of Respondents .............. 46 Interview Procedures ................. 46 Interview Data .................... 49 Discussion of Individual Questions .......... 49 Discussion of Interview Data ............. 6l Conclusions from the Interview Data ......... 63 Revisions to the MK I Model ............... 65 Simplification ......... . .......... 65 Obtaining Corroborative Data ............. 65 Group Debriefing as a Feedback and Problem Sovling Technique ........................ 66 Reconceptualizing the Problem ............ 67 Development of Group Debriefing/Problem Solving Procedures . . . . . . . . . . . . . . ..... 67 Review of Literature of Group Processes ....... 68 Summary of the Group Debriefing Technique Incorporated into the MK II Model .......... 75 Description of the MK II "Mini” and "Maxi“ Models . . . . 77 MK 11 "Mini" Model ....... . .......... 77 MK II "Maxi" Model .................. 80 Chapter Summary ..................... 82 METHODS AND PROCEDURES ................. 84 Research Strategy ......... . . . . ....... 84 Descriptive Methodology ................. 85 Data Collection . . . ..... . . . ......... 85 Experimental Procedures and Methodology ........ . 86 Experimental Design . . . . . . . . . . . ...... - 86 Selection of SLATE Authors ............. . 87 Selection of Students . . . . . . . . . . . . . . . . 87 Stratified Random Sampling . . . . ...... . . . . 88 Treatments ..................... . 90 Independent Variable ................. 92 Dependent Variables . .......... . ..... 92 Development of Instruments . . . . . ......... 93 Experimental Procedures . . . . . . . . ..... . 97 Research and Statistical Hypotheses ......... 101 Data Analysis and Statistical Treatment . ...... l02 Chapter Summary .......... . .......... 103 Chapter Page V. DESCRIPTIONS AND RESULTS OF FIVE FIELD TRIALS ...... 105 Technical Assessment Cycle . . ............ 106 Logistics for Consultant Tryouts (Box l. 2) ...... 106 Data Collection on Technical Problems (Box 2.0).. . . 106 Problem Analysis and Interpretation (Box 4.0) . . . 108 Revision Development (Box 5.0) ............ 109 Discussion of the Technical Assessment Cycle ...... 109 Student Tryout Cycle ................. 110 Logistics for Student Tryouts (Box 1.3) ....... 110 Collect Student Tryout Data (Box 3.0) . ...... . 112 Collect Individual Tryout Data (Box 3.2) ....... 113 Collect Group Debriefing Data (Box 3.3) . . ..... 115 Data Analysis (Box 4.0).. . . . . . . . . ..... 124 Design of Revisions (Step 5. 0) . . . ......... 127 Recycle (Step 6.0) .............. . 128 Discussion of Data from Student Tryout Cycle ...... 129 Experimental Data from Field Trials . . . . ....... 130 Discussion of Findings Relative to Post-test Achievement ........... . ....... 131 Discussion of Findings Relative to Mean Gain Score Data ................... 135 Discussion of Findings Relative to Percentage of Students Achieving Criterion ............. 138 Discussion of Findings Relative to Attitudinal Survey Instrument Data ................ 141 Summary of Findings ................... 142 VI. SUMMARY AND CONCLUSIONS ................. 144 Overview ......................... 144 Summary of the DevelOpment and Validation of the MK II Model ...... . . . . . ........ 144 Conclusions .......... . . . . ....... . 145 Heuristics ............. . . . ....... 155 Recommendations for Further Research . . . . . . . 157 Research Leading to Refinements of the Model . . . . . 158 Research Leading to Generalization of the Model . . . 160 Assessment of the Model . . . . . . . . ......... 161 Concluding Remarks ................... 162 BIBLIOGRAPHY ........................... 164 APPENDICES . . ............ .. . . . . . . ....... 169 A. SLATE AUTHOR INTERVIEW QUESTIONNAIRE . . . . . . . . 169 B. MK II "MAXI" MODEL OF FORMATIVE EVALUATION ..... 172 vi Chapter 53 ‘TII'HUO zzr-xc. "AGENDA" FOR MK II TRYOUT/DEBRIEFING -------- CHECKLIST FOR MK II TRYOUT AND DEBRIEFING ...... STUDENT BY ITEM MATRIX ............... BACKGROUND INFORMATION ON THE THREE PARTICIPATING AUTHORS ............... STUDENT ATTITUDE SURVEY INSTRUMENT ......... TRYOUT "CHECKLIST" AND INTERVENTION PRINCIPLES RULES TO BE FOLLOWED FOR THE REVISION OF A CALCULUS PROGRAM .................. SLATE A.I RAW DATA ................. SLATE A SLATE A SLATE B 2 RAW DATA . . . . . . . ..... . . . . . 3 1 SLATE C1 vii RAW DATA ................. RAW DATA ................. RAW DATA ................. Page 191 193 195 196 197 . 200 202 203 204 205 206 207 Table \IOSU'l-bw 10. ll. 12. 13. 14. 15. 16. 17. LIST OF TABLES Page Matrix Showing Organization of the Review of the Literature .................... 20 Classes of Data and Specific Indicators for Formative Evaluation .................. 37 Matrix Summary of the Review of the Literature ..... 40 Factors Used in Questionnaire Development ........ 47 Background Data from Respondents ............ 48 Number of Items on Pre- and Post-tests ......... 94 Comparison of Experimental and Control Treatment Post-test Scores .................... 132 Comparison of Experimental and Control Treatment Gain Scores ....................... 134 Comparison of the Proportion of Students Achieving 80% Criterion on Post-tests Between Experimental and Control Treatments . . . . ............. 137 Comparison of Experimental and Control Treatment Mean Attitudinal Scores ................. 140 Summary of Findings ................... 143 Background Information on the Three Participating Authors ......................... 196 SLATE A1 Raw Data .................... 203 SLATE A2 Raw Data .................... 204 SLATE A3 Raw Data .................... 205 SLATE B1 Raw Data .................... 206 SLATE C1 Raw Data .................... 207 viii Figure 0'! Otoooxnos LIST OF FIGURES Flow Diagram Showing the Specific Steps of the Systems Approach in Developing Instructional Systems ..... ' Schematic Representation of the Recommended Testing- Revision Procedure .................. Major Stages in MK I Model of Formative Evaluation MK I Model Showing First Level of Detail ....... Configuration of the MK I Model of Formative Evaluation Showing the Fourth Level of Detail ..... MK II Group Debriefing/Problem Solving Technique MK II "Mini" Model of Formative Evaluation . . . . . . The MK II "Maxi” Model of Formative Evaluation . . . . Before and After Control Group Design ......... Procedure for Assignment of $5 to Treatments ..... Schematic of Experimental Comparison Methodology ix . 43 76 78 . 86 CHAPTER I BACKGROUND Tryout and revision are steps considered by many to be essential to development of an instructional system. Virtually all theoretic models of instructional system development include tryout and revision as an integral part of the process. For example, models developed by Barson (1965), Paulson (1969), Hamreus (1968), Briggs (1970) and Smith (1966), take the form of a flowchart describing a programmatic sequence of activities of which approximately the last one-third is devoted to tryout and revision. Tryout and revision have long been recognized by writers in the field of programmed instruction as essential components of the program development process. According to these authors, programs should be tried out and revised until they meet some predetermined standard of student performance. Susan Markle (1967) cites the principle of “developmental testing” (her term for tryout and revision) as one of the major factors differentiating programmed instruction from conventional instruction. There is some evidence that the principle of tryout and revision has been attempted with various types of instructional systems. Gropper, Lumsdaine and Shipman (1961) demonstrated increased student recall and retention after applying the tryout and revision process to conventional television lessons. 0. Markle (1967) developed a first aid training course using films, texts, and practice based largely on empirical tryout and revision. It would seem, therefore, in light of the emphasis given this topic in programmed instruction and instructional system develop- ment that the need for empirical tryout and subsequent revision would be well understood today. Nevertheless, in a recent Review of Educational Research, Popham (1970) observes: From an inspection of the research related to curriculum materials during the past several years, one is impressed by several deficiencies. First, studies of the revision process to improve the quality of curriculum materials have not been clearly demonstrated. Certainly the manner in which revisions can be made most efficiently has not been carefully treated’ (emphasis addedT’ (p. 335). Later, in the same review, Popham quotes Lumsdaine as saying, "There was little research which demonstrated that revision based on empirical tests, as opposed to skilled editorial revision, produced better learner achieve- ment (p. 331)." There are two points to be stressed here. First is the paradox wherein many writers in programmed instruction, instructional technology, and instructional system development strongly advocate the principle of tryout and revision. Yet on the other hand, educational researchers seem to have ignored the topic. Perhaps it was felt that the principle was so self-evident that little corroborative research was needed. The second and more important point is that the few research studies and theoretic papers which address this question usually deal only with tryout and revision of a simple instructional system of the size and complexity, for example, of a single (usually short) programmed text. The techniques and procedures used in tryout and revision of "pure" programmed textual materials (such as error rate, response time, frame analysis, criterion frames, etc.) seem inappropriate or irrelevant for a lecture, a laboratory, a multi-media (slide-tape) presentation, or other instructional modes commonly employed together in a single instruc- tional system. Therefore, a research question of importance to the instructional development specialist is: What specific methods are appropriate for tryout and revision of complex, multi-component, instructional systems? In other words, how ought the principle of tryout and revision be imple- mented in developing an instructional system having several components such as lecture, laboratory, small group discussion, and multi-media self-instructional units? A more fundamental question is: How can in- structional system designers utilize systematic feedback from students or others in the design process? This question has several aspects. First, the available theoretic models of instructional system development are written at a very general level. Most of these models provide a "what-to-do“ orientation, but not "how-to-do-it" detailed information. The few models which do try to pro- vide specific "how to" information invariably recommend procedures drawn directly from simple programmed instruction texts and these do not appear generalizable to other modes of instruction. How, for example, does one compute an error rate or frame analysis of a lecture, laboratory, recita- tion or film presentation? Another aspect of this problem is that available procedures for tryout and revision focus almost exclusively on identification of general problems, with little guidance on specific remediation. Obviously, problem identification is critical, but general identification per se does not necessarily indicate what the specific solution, or range of solutions ought to be. It is the central assumption of this study that new methods of tryout and revision must be developed for complex instructional systems, and their components. Further, that tryout and revision methods must go beyond problem identification and develop viable techniques for remediat- ing deficiencies and improving the product. At present the available guidance on tryout and revision is either too general to be useful, or when specific--directed towards simple textual programmed instruction. What is needed is an extension of previous research to develop detailed tryout and revision procedures which are adaptable to systems of in- creased complexity and scope. Purpose of the~Study This study attempted to explicate the tryout and revision aspect of the instructional system development process. This explication in- cluded the development and validation (field test) of.a flowchart model and a set of heuristics for applying the model to the tryout and revision of multi-media self-instructional systems. Such multi-media systems re- qf\ present a far greater level of stimulus complexity than textual programmed instruction, so new procedures were developed for both problem identifica- tion and remediation. In sum, the purpose of this study was to develop techniques which enable systematic feedback from students and/or others, to be used as an integral part of the development process used in the creation of multi-media self-instructional systems. Assumptions It was assumed that the model developed in this study represents an expansion of a part of the larger process of instructional system development (ISO) and should therefore be compatible with existing models of this process. The present day 150 models all include a tryout and revision phase, thus selection of an 150 model within which to embed the expanded tryout and revision model was based on the experimenter's pre- vious familiarity rather than any functional differences. The ISO process model within which the tryout and revision model developed in this study was assumed to operate is the Hamreus (1968) "maxi" version shown in Figure 1. It is important to note that the flowchart model developed in this study attempted to specifically explicate steps 16 through 22 in the Hamreus model. Limitations of the Study The flowchart models of tryout and revision procedures developed in this study were designed for and validated with a single type of in- structional system; namely, a multi-media self-instructional presentation. This type of instructional system was selected because: (1) mediated in- structional stimuli may be replicated exactly, providing greater experi- mental control than many other instructional modes; and (2) increasing numbers of university and community college courses use multi-media self- instructional units to accomplish a large part of the instructional function. Selection of this class of instructional subsystem is not to be construed as an evasion of the problems of tryout and revision of less replicable or less controllable instructional subsystems (lectures, anab— Hudowuugnfiu gagged 5. cocoa“? avohm 23 no cacao 04333 23 $5.65» as 6.8 .H and»: j I I I I I I I II I I I 'J I h Stung- Edam II no Soon. 533.3360 _ . - , 53m — - - , £33333 Begum ofipououm a: an»... cannula duaowuosnaH # Hancogoaunc H unoagoo " Nu HN 5360: . QOHSSQPL . - I II I .l . _ . - 305.3939 - 1 3.330: - 3.530: , _ WESEH 3 coca—noun: 35:58.": , 803363 mfiflaem m 336:8 _ 35:5 053.8909 0H 33.39.80 , _ m use ocgoon 1:33.555 no age J1. 3 .5953 - , , T _ 2 3033.30 $3.33 «253.0030 3230030- owopm _ __ waggq I 0 was 339.36 agofifisom 39392 . _ . .583 n éflfifl 89.8qu £353 583 r I I I k I I I ‘I— I I I I I I I I I I I I -I II IIIIIIIII — Ir _ #80300 - 1530335 , . , _ o 093.52 aflohnoo _ econ—among: , _ 539E owavm 30303»an £3.53 _ . H 05qu us 09896 3on - - :owugwoa _ 33.993 . 3.3%: r n «563a II. 09:50 pagodom .. m 33.80 9 team _ , r. . _ ROSES _ . _ aoapaasnom . 6 4 .833 _ laboratories, recitations, etc.) or evasion of the problems of tryout and revision of the "course" as a total system. On the contrary, it was felt that the tryout and revision model developed for mediated self- instructional lessons may be generalized to the more emergent, spontan- eous, non—mediated subsystems as well. However, the generalizability of the model was not specifically investigated in this study. Another limitation of this study related to the difficulty in differentiating between unique contributions of personnel using the model versus the contribution of the model per se. It was assumed that the lesson author, the experimenter, and the students made unique individual as well as interactive contributions to the revision process. These unique contributions were not necessarily reflected in the formalized model or methodology. Thus, it became very difficult to assess what part of the differences between prototype lessons and revised versions were due to use of the model/method or due to the unique contribution of the personnel involved. In some cases, differential contribution Of method and personnel variables can be identified by SOphisticated experimental design. In this study, however, a new model was conceptualized; consequently, it was not feasible to hypothesize specific relationships between methodological and/or personnel variables. Instead it was assumed that the first step in extending the tryout and revision process to instructional systems of greater complexity than simple programmed texts was to develop and de- scribe a workable, viable model. . "tryout and revision" or "developmental testing.I Organization of the Thesis In the balance of the present chapter, the organization of the thesis, its major objectives and methodology are described. Limita- tions and assumptions are stipulated and key terms defined. In Chapter II, literature relevant to tryout and revision are reviewed and a preliminary flowchart model (MK I version) developed. In Chapter III, the results of interviews with seven experienced multi-media lesson designers are presented along with a rationale for revision of the MK I model and development of the MK II version. In Chapter IV, the descriptive and experimental methodology for five field tests of the MK II version are outlined. In Chapter V, the results of the five field tests are described and the experimental data reported and analyzed. Finally, in Chapter VI, the major findings of the study are summarized, conclusions drawn and recommendations for further research provided. Definition of Terms Formative Evaluation Formative evaluation involves the tryout and revision of new instructional units in an effort to improve quality prior to large scale use with students. As used in this study, the term is synonymous with ' Generally, formative evaluation is the process by which information is obtained and used by a decision maker to identify problems and revise instruction to the point where it is ready to be used with substantial numbers of students. The decision maker of interest in formative evaluation is the devel0per of the new instructional system. In this study, the process of formative evaluation was concep- tualized as having three components: (l) identification of instructional deficiencies through data collection; (2) analysis of these problems leading to revision hypotheses; and (3) design, integration, and evalua- tion of revisions. Scriven (1967) defines formative evaluation as "outcome evalua- tion of an intermediate stage during development of the teaching instrument . . . to discover the deficiencies and successes in intermed- iate versions of new curriculum (p. 5l)." Anderson (l969) emphasizes that "the purpose of pilot tests is formative evaluation, to locate weaknesses in student understanding or performance so that editors, writers, or teachers can revise and presumably improve instructional materials and procedures (p. 5)." Summative Evaluation Summative evaluation is the process of describing the effects of such fully developed units of instruction (Paulson, 1969). The decision maker of interest in summative evaluation is the consumer or user of the instructional system rather than the developer. Both formative and summa- tive evaluation are emphasized in ISD models and both reflect the basic principle that any system requires feedback to achieve its objectives (Wiener, 1954). SLATE SLATE is an acronym for Structured Learning and Teaching Environ- nent (Davis, l968). Typically, a SLATE involves a single student in a 10 carrel interacting with multiple instructional stimuli in the form of slides, tape, film, models, Specimens, and a workbook. The learning experience is "structured" in that objectives are predetermined and students' responses are designed to facilitate achievement of these objectives. A SLATE is, therefore, a multi-media self-instructional learning system. Flowchart Model A flowchart model is a graphic analog showing the total struc- ture, organization, and interrelationships of a process, event, or other phenomenon. In the present study, flowchart symbols represented ideas, information flow, and human action with narrative explanation being pro- vided for each symbol. The LOGOS symbol system (Language for Optimizing Graphically Ordered Systems) developed by Silvern (1969) is used in this study. am: As used in this study, the term "author" refers to a faculty member who has developed one or more multi-media self-instructional lessons. Prototype Prototype refers to a complete, but untried version of a self- instructional multi-media lesson. In other words, all instructional stimuli are finished, but student feedback on the efficacy of these stimuli has not been obtained. Debriefing Debriefing refers to a formalized procedure where through face- to-face interaction the prototype lesson author obtains information from students on lesson deficiencies and how to remediate these deficiencies. ll Methodology of the Study The present study used an exploratory approach to develop and validate the new model of formative evaluation. The exploratory approach is described by Kaplan (1964) as follows: It (an exploratory study) is frankly intended just to see what would happen if . . . . Often it is associated with a new technique, which is tried.on a wide variety of problems and subject-matters until the most promising sorts of applications become apparent . Or, it may be conducted according to a trial and error pattern to exhaust some set of possibilities. In general, an exploratory experi- ment invites serendipity, the chance discovery; it is part of what we do to deserve being lucky (p. 149). In the present study, it was intended to see what happens when the "new technique" (the model) was tried in a series of five different ”field experiments" (Kerlinger, 1964) involving three different academic subjects. Since previous research (Baker, 1970; Silberman & Coulson, 1965) has clearly shown simple programmed texts to be significantly improved through use of certain tryout and revision procedures, the functional outcome of this study was to tell us something new only about a certain way of achieving such results in other cases; e.g., in an instructional system of increased complexity, such as a multi-media self-instructional lesson. Having this exploratory orientation and methodological purpose the study organizes naturally into two phases: (1) design of the theo- retic model, and (2) exploratory field test (validation) of the model. Each phase is fully described in a later chapter of the thesis. For orientation purposes, however, the major objectives and methodology of each phase are described next. 12 Theoretic Phase Objectives.--The major objective of the first phase was to develop a flowchart model showing a sequence of tasks, decision rules, criteria, and implementing methodology for empirical tryout and revision of a proto- type multi-media self-instructional lesson. The criteria for assessing the utility of this model were threefold. l. Validity.--The model was considered valid if (a) through its use the prototype lesson author was able to distinguish those sequences of instruction which were unsatisfactory, and (b) if the model predicted revision alternatives which remediated the unsatisfactory instructional sequences. 2. Feasibility.--The model was considered feasible if fewer than 20 students were required for its use, and if faculty were willing and able to use it in the field situation. 3. Effectiveness.--The model was considered effective if com- parative measures of student achievement and/or attitude between proto- type and revised versions showed statistically significant differences in favor of the revised versions in 75% of the field experiments. Methodology,--The model was developed from two primary data sources: (1) review of pertinent literature, and (2) interviews with a selected sample of university faculty or research and development per- sonnel who have personally developed multi-media self-instructional lessons. The purpose of these interviews was to: (a) get faculty reactions to the model, and (b) assess and integrate (if possible) the tryout revision procedures actually used by practitioners. 13 The interview data were summarized to highlight the procedures and/or recommendations common across respondents. These data were then used to modify the flowchart model developed purely from the review of research and theoretic literature. The product of the first phase there- fore was a synthesis of research, theoretic and practitioner data stated as a "first draft" flowchart model of tryout and revision procedures. Exploratory Field Test Phase Objectives.--This phase was somewhat unique in that two distinctly different types of objectives were being sought, e.g., product and process. The product type of objectives relate to experimental comparisons between two lessons to determine which is superior. In this study, revised lessons were experimentally compared to their original (unrevised) counterparts. The intervening or independent variable was the use of the tryout and revision model, and the dependent variables were measures of student achievement and attitudes. By empirically comparing a given lesson before and after tryout and revision, tentative conclusions regarding the valid- ity, feasibility and effectiveness of the revision techniques could be drawn. A second, and possibly more important class of objectives centered on understanding and describing the process through which the deficiencies in any given instructional system were recognized and remediated. The model per se, was simply a conceptual tool designed to influence a series of complex interactions between human beings; e.g., faculty, stu- dents, and consultants. It was these interactions which resulted in modifi- cations to a given instructional system. Therefore, it was important to assess the nature of those interactions to determine what factors, besides l4 the model, were influential in develOping the revised instructional system. In short, it was important to describe and analyze the interac- tive process of conducting tryout and revision so that major variables could be identified and the procedures guiding the process modified to take account of these variables. To summarize, the overall objective of the field test phase was to generate and analyze empirical data from which tentative conclusions could be drawn regarding the efficacy of the model. In addition, this phase provided descriptive data for: (l) identification of variables in the formative evaluation and revision process of a given instructional development system, (2) recommendations for procedural modifications and/ or alternatives which take account of such variables, (3) recommendations for further research. Methodology.--In light of the two types of objectives just de- scribed, the methodology used in this phase combined experimental and descriptive techniques. That is, in addition to conducting several field experiments, the experimenter (E) described the process through which the experimental treatments generated. A total of five (5) field experiments were conducted in which measures of student attitude and achievement on a prototype (unrevised) lesson were compared to identical measures of achievement and attitude obtained on a revised version. A field experiment is defined by Kerlinger (l964) as follows: A field experiment is a research study in a realistic situation in which one or more independent variables are manipulated by the experimenter under as carefully controlled conditions as the situa- tion will permit (p. 382). 15 A field setting was chosen because it was desired that the rela- tionship between use of the model and improved student performance be demonstrated in the actual using environment. In this way, unforeseen or contaminating variables are identified and accounted for in revi- sions to the model. Each experiment employed the classic pre-post control group design (Campbell & Stanley, 1963) in which a sample of volunteer students (55) were stratified and randomly assigned to experimental and control groups. Control groups received the prototype (unrevised) SLATE, while experimental groups received the SLATE revised in accordance with the model of formative evaluation. The independent variable in each case was the model plus unique contributions by the users of the model. It was hypothesized that revi- sions resulting from student feedback should produce gains on achievement and attitudinal measures in the experimental groups. T tests were used to determine statistical significance between experimental and control groups. Unlike more conventional experimental/control group comparison research, the experimental treatments in this study were not designed a priori in accordance with some set of theoretic principles. Instead, feedback from students who had utilized the original prototype lessons provided three types of data which were used by the lesson author and experimenter to develOp revisions. These types of data were: (l) mea- sures of student achievement, (2) measures of student attitudes (both 1 and 2 were collected during the lesson via specific instruments), and (3) experiential data generated during an author/student debriefing fol- lowing the lesson. Each experimental treatment evolved by means of )6 several student/author/experimenter interactions prescribed by the theoretic model. The total process of trying out a prototype lesson, revising it, and testing out the revised version was replicated five times. Three individual Michigan State University faculty in different disciplines were the prototype lesson authors. Two of these faculty members revised one lesson each, while a third faculty member revised three separate les- sons. The selection of five replications and three academic disciplines was based on (l) availability of faculty with unrevised prototypes, and (2) the need to provide a sufficient number of cases from several disci— plines from which to identify critical variables in the process and deter- mine the efficacy of the model. The series of five field experiments were organized chronologically so that the experimenter was able to assist individual faculty as required, as well as observe and document the entire revision process for each ex- periment. After obtaining verbal commitment from the faculty indicating their interest and willingness to invest a substantial amount of time in revision activities, the experimenter began formal observations and pro- vided assistance to each faculty in applying the model to their prototype lessons. The assistance given each faculty was of two types, logistical and conceptual. The logistical help consisted of making the physical and administrative arrangements necessary for the field experiment, e.g., to tryout the lesson on a selected sample of students. T7 The conceptual assistance given the faculty was essentially to explain the model, justify its theoretic orientation and attendant methodology, and provide guidance as required in performing the tasks Specified in the model. In many ways this was a tutorial training function, since in its initial stages of development the model was not self-explanatory, although this was the long-range goal. After experi- menter guided utilization on several lessons, it is possible that faculty may understand the model sufficiently to apply it independent of the ex- perimenter, but this contingency was not specifically tested in this study. Potential Payoff From This Line of Research Greater sophistication in the tryout and revision phase of in- structional systems development can lead first of all to fewer and fewer revisions and improved learning from students. More importantly, it might lead to the formulation and assessment of principles which, if in- corporated in initial preparation of instructional units, could lead to units requiring minimal revision. In other words, a highly developed instructional technology may always require empirical tryout, given the individual differences in human learners. But, the more SOphisticated the body of procedures, techniques, and principles used in initial de- sign, the better the initial preparation of new instruction can be. In short, greater sophistication in tryout and revision can lead to prin- ciples which may improve the process of initial design. CHAPTER II REVIEW OF THE LITERATURE LEADING TO DEVELOPMENT OF A PRELIMINARY (MK I) MODEL OF FORMATIVE EVALUATION The review of formative evaluation research presented in this chapter addresses three methodological issues: (l) how to identify problems in prototype instructional units, (2) how to analyze such prob- lems and develop revision hypotheses, and (3) how to design and integrate revisions. - The review is organized as follows. First, the assumptions underlying the research reviewed and derivation of the MK I model are stipulated followed by specific questions relating to problem identifi- cation, problem analysis, and problem remediation. These questions form the basis for analyzing the research of nine selected authors in the field of formative evaluation. Next, the research of the selected authors is described followed by an analysis to determine the given author's specific approach to the methodological questions stipulated earlier. After the last research study has been presented and analyzed, several conclusions with respect to the three methodological issues are drawn and a preliminary MK I model presented. The model thus represents an integration of the literature reviewed. Assumptions Underlying Development of the MK I Model The selection of literature for review and the conclusions reached thereafter were based largely on the assumptions and definitions 18‘ l9 stipulated in Chapter I. Briefly, the most critical of these were: (1) formative evaluation is basically data collection and use of information by a decision maker to revise deficient instructional sequences; (2) the critical decision maker in this study is the author/developer of a self- instructional multi-media lesson; and (3) the primary source of informa- tion relative to identification of instructional deficiencies should be students for whom the prototype lesson was intended. A secondary source of information may be "experts." Specific Questions Used to Focus the Review of the Literature The following questions were used in analyzing each of the re- search studies reviewed. ISSUE SPECIFIC QUESTIONS ( l. What types of data were collected? HOW TO IDENTIFY PROBLEMS 2. What types of instruments were IN PROTOTYPE INSTRUCTIONAL ( used and how were they developed? UNITS 3. What sampling procedures were used? L.4. What administrative procedures were used? , 5. How were data reduced and inter- preted? PROBEEMSNANDZEEVEESP < 6. How were priorities assigned REVISION HYPOTHESES among instructional deficiencies? L 7. How were revision hypotheses developed? HOW TO DESIGN AND 8. How were revisions designed, in- INTEGRATE REVISIONS tegrated, and evaluated? Essentially, the review of the literature attempted to fill in each cell of the matrix shown in Table l. The completed matrix (Table 3) 20 umpmapm>m 3°: cowamapm>m mawmmwwwmm :o_umgmmch manmw>mm so: .cmwmmo cowmw>wm vmao—m>mo mmmmspoa»: acmEQoPm>mo cowmwwmm 3pm, mwmmspoa»: umcmwmm< mm?“ cowmw>mm -wcowca 30: a umpmgmumucH mwmxpmc< EmFaoga a emuacmm mama so: mmczumu -ocm m>_pmgu -mwcwae< “as; mmcszuocm m=w_a5mm “we: mucosacamcH maaw um mumxp mumolpmgz cowpauwc_p=auH Emynocm m < mcowumozo uwc_aaam mammH (mmzmp>mm mgosp=< mczgmgmpwg mg» mo 3mw>mm asp mo cowpm~wcmmco mcwzosm xwgumz--._ m_nMH 21 thus functions as a transition device to summarize the data which was incorporated into the MK I flowchart model. Review of Research by Individual Authors in Formative Evaluation Generally, the literature on formative evaluation presents three different approaches or evaluation design strategies (Paulson, 1969): the "tutorial" or single student feedback approach, the "large group" or multi-student feedback approach, and some iterative combination of the first two. Each approach has unique advantages and disadvantages. The Tutorial Approach The writings of B. F. Skinner (1954, 1958) introduced the funda- mental concepts of linear programmed instruction and teaching machines. Skinner's laboratory-like technology emphasized study of the individual organism and precise control of behavior to be learned by manipulating (the consequences of behavior. Following Skinner's lead, many writers in programmed instruction have advocated that the optimal unit of analysis for development of pro- grams was a single student. For example, Gilbert (1960) suggested twelve rules for programming a specific subject matter. Rule five is: Get yourself one student. I repeat, gng_student. You are about to perform an experiment in which you are permitted no degrees of freedom--that is, if the word ”self" in "self-instruction" can be taken seriously. Once you have discovered an efficient program for one student, you will have described the gross anatomy of the most generally useful program (p. 479). Some empirical evidence in support of the tutorial approach and a single student as the unit of analysis was provided by Robeck (1965) and Silberman and Coulson (1965). 22 Research by Robeck.--Robeck demonstrated empirically that obser- vation of a single student can significantly improve a first draft program. Using a short (50 frame) prototype PI text on "English Money," he incorporated revisions based on test item errors and verbal responses of a single "bright" sixth-grade student to produce a second draft. This draft was further revised on the basis of feedback from a second individual student. The three drafts were then tested on three matched groups of students. The performance on the two revised versions was significantly better than on the original draft (P<(.05 for the second and P<(.01 for the third version) although student performance on the third version was not significantly improved over the second version. While the revisions depended on the ingenuity of the tryout editor- tutor as well as verbal data from single bright students, Robeck did demonstrate the feasibility of a single solitary student as the total sample for formative evaluation. Unfortunately, the study was not clear as to the sampling pro- cedures (how the "bright” students were selected) or the procedures used during the tutorial interaction to identify discrepancies. Moreover, the study was not clear as to how the test performance and error rate data was integrated with verbal responses of the students to identify causal factors, so revision hypotheses could be developed. Further, Robeck pro- vided no information as to how the revisions were designed and integrated into the original program. He reported, however, that evaluation of the revisions was obtained through a suitable experimental/control group design. The implication of Robeck's research for the present study was that achievement and interview data from a single student could be 23 used to complement one another in development of an improved programmed text. Research by_Silberman and Coulson.--A far more extensive and sophisticated set of experiments was conducted by Silberman and Coulson (1965). In this series of exploratory studies, a technique called "tutorial engineering" was developed in which an experimenter served as a tutor while presenting the program (PI text) to one child at a time. The experimenter stopped the presentation and provided tutorial assist- ance whenever the student exhibited difficulty (cues were verbal, "I don't understand"; non-verbal, a puzzled look; and test item errors). Tutorial assistance was ad hoc, but records were kept of student diffi- culties and tutorial procedures which seemed most beneficial. Similar tutorial assistance needed by more than two students was incorporated into the programmed text as revisions. When the experimenter-tutor felt that the program had been re- vised sufficiently (sufficiently was not Operationally defined but was a subjective decision) a comparison of the original and revised program was made. If the revised version proved to be both statistically and practically (not too much longer than the original) superior to the original, the tutorial sessions were ended; if not, the experimenter conducted several more cycles of tutorial engineering. A total of four programs representing verbal and quantitative skills was develOped in this manner (first grade reading, first grade arithmetic, junior high Spanish, and senior high geometry). After all four programs were significantly improved, the data collected during the tutorial sessions and the student responses to different versions of the \V 24 programs were analyzed for consistencies and patterns. This analysis produced three hypotheses about major instructional problems which were common to all four programs. These hypotheses were termed the "gap," "irrelevancy," and "mastery” hypotheses. The "gap" hypothesis refers to the necessity for explicit inclusion in the program of information relevant to each criterion test item. The "irrelevancy" hypothesis re- fers to the desirability of eliminating material which is unrelated to criterion test items. Finally, the "mastery" hypothesis refers to the requirement that the student not be permitted to move on to subsequent topics until he had ”mastered" the present one. Since these three hypothesis had been derived by analysis of re- visions to four programs, an independent experiment was then conducted with a different set of PI texts and new students to test the hypotheses. The new experiment validated these findings by reversing the process. It was shown that when these designated improvements were taken out of effec4 tive programs, there was a corresponding performance decrement. The importance of Silberman and Coulson's series of studies was that for the first time, the formative evaluation and revision process was formalized (the "tutorial engineering technique") and at the same time empirically proven to be successful. Moreover, generalizations were drawn leading to higher order revision principles: the gap, mastery, and irrelevancy hypotheses. One implication of Silberman and Coulson's work for the present study was that discrepancies in prototype programs were identified by a combination of student errors on achievement tests and tutor observation of students' non-verbal behavior (such as frowns). A second implication 25 was that a number of different tutors were used indicating that non-verbal data was recognizable and provided important feedback in a number of cases. The research was unclear on the administrative procedures used during the tryout sessions; hence it was difficult to separate the unique contribution of each tutor from the general procedure of tutoring a single student as required and recording all tutorial interaction for later inclusion in the lesson. Furthermore, the procedures used for selection and training of tutors and selection of students were unclear. The research did indicate, however, that interpretation of data and development of remediation was a joint decision between tutor and project directors. If several students were tutored on the same problem, this information was usually added to the program increasing its length. It did not seem as though program objectives were subject to revision; rather the four programs were simply lengthened to permit students to achieve the objectives. An absolute performance criterion was not es- tablished, so revisions were apparently ended when the tutor felt that students could use the materials without further tutorial assistance. While many of the "tutorial engineering" procedures were not de- scribed as precisely as one would like, it was clear that given a proto- type PI text, a tutor, and perserverance, it appears feasible to success- fully revise a program based on multiple tutorial sessions using single students to provide feedback. Theoretic work by Horn.--The clearest example of the "tutorial approach" was provided by Horn (1966) who developed a self-instructional program called Developmental Testing. This programmed text was designed .0. ———V‘ _ I, .‘_-_— 26 to train evaluators and/or programmers in the tutorial approach to for- mative evaluation. Horn not only programmed the elements of his technique, but presented simulated problems in formative evaluation-~and provided feedback on appropriate solutions. Horn asserted that his approach could be used successfully with as many as four students simultaneously, although he advocated the use of one student of relatively high ability, one of average ability, and one of low ability, singly, and in that order. This notion of progres- sion from high to low ability students was similar to procedures suggested by Scott and Yelon (1969). Many of the procedures recommended by Horn were similar to those used by Silberman and Coulson, but Horn went far beyond earlier works in his explication of the administrative procedures or "ground rules" which should apply during a tutorial tryout and revision session. In order to provide the reader with a clear understanding of this methodology, Horn's "checklist" for tryout sessions and "principles for determining when to intervene in the tryout process" are shown in Appendix 1. While Horn's text is an important contribution to the literature on formative evaluation, it was unfortunate that no empirical data as to the success (or failure) of the procedures was included. On the other hand, Horn's work may be considered as an eXplication of the "tutorial engineering" approach which was empirically tested by Silberman and Coulson (1965). Descriptive research by Dick.--While the tutorial approach has been advocated theoretically and has some empirical support, a study by Dick (1968) showed as one of its findings that non-professional 27 inexperienced program writers preferred to base revisions on data from a large sample (N=40 to 50) rather than from individual students. In this study, Dick developed a method for integrating seven types of feedback for revising prototype programs (Appendix J). This method consisted of a set of seven decision rules which stipulated tech- niques to be used for data interpretaion and revision design. The task given to four non-professional, inexperienced program writers was to re- vise a previously used programmed text in calculus using the seven decision rules and the various types of data provided. Each of the four revision programmers was given the original program plus seven types of data collected the previous year from four intact classes totaling eighty-five students. These data included: item analysis of post-tests, error rate, student comments, teacher com- ments, list of correct and incorrect answers for all test items, and page number where a specific test item was taught in the text. During the revision process, Dick found that the revision authors were utilizing two primary data sources: error rate and teacher comments. If the error rate became excessive (which depended upon the individual writer's opinion) teacher and sometimes student comments were studied and revisions developed accordingly. Few revision authors used item analyses or tried to relate test item performance with particular frames in the program. Moreover, it was clear that none of the revision authors fol- lowed the suggested sequence of seven rules. These four authors reported that the end-of—lesson tests, which they had not constructed, were inade- quate tests of the lesson objectives. Furthermore, the revision authors were interested in knowing more about the ability level of the students 28 who had made specific comments on segments of the program and wanted data on the student's overall impression of program continuity, readability, and difficulty. With reSpect to questions of sampling and administrative proce- dures, Dick summarizes his findings as follows: It was of interest to the author to note that when the writers were given a hypothetical alternative of gaining information about the program by going through it personally with three or four stu- dents vs. gaining statistical data from 40 to 50 students, the latter procedure was much preferred. There seemed to be a greater number of students (which appears to provide greater generaliza- bility) and an acknowledgement of difficulty of obtaining suitable guinea pig students (p. 101). Dick failed to provide background data on the four revision authors, so it was difficult to extrapolate his findings to any subset of potential program writers. Nevertheless, some implications may be inferred for the present study. First, assuming inexperienced program writers (or SLATE authors) it is possible that large amounts of dif- ferent types of data will overload the revision author's decision making capability in spite of decision rules provided. In short, certain types of data are likely to be ignored by inexperienced programmers, probably because they do not know what to do. This leads to the second implication; that revision feedback should be restricted to a few critical types of data and/or consultative help provided during the data analysis phase of the revision process. Discussion of the Tutorial Approach The tutorial approach to formative evaluation means simply that a tutor--either the programmer or a qualified assistant--sits down with a student as he interacts with the prototype materials and carefully observes the student's response to each frame or step in the program. If 29 the student encounters difficulty, he describes the problem to the tutor who verbally provides the needed information and makes an on-the-spot revision to the specific frame(s) causing the problem. Several empirical studies have shown that given the proper conditions of cooperative students and skilled tutors, the method can produce improved instructional sequences. However, this procedure appears deceptively simple, for its success de- pends on interpersonal subtleties which are difficult to formalize into statable principles. Markle (1967) describes some of these subtleties involved in generating the needed information: Procedures for eliciting these data vary. Some testers prefer to talk to the student throughout the process, a procedure which, of course, renders the student's final performance su5pect, if not invalid. Others prefer to query the student who hesitates or errs, leaving him to his own devices when no danger signals are apparent. The data which may be missed under this condition are exemplified by statements which some of us have heard often: l'I know what you want here, but . . ." and "I see your point, but it seems to me . . . ." There are at present no firm rules. Each programmer has his own (p. 122-123). The theoretic rationale for tutorial procedures appeared to be based on the assumption that observation of a single student is the best way to identify and remediate deficient instruction. Proponents claimed that observation of more than one student will overload the observer so important subtleties are missed (S. Markle, 1967). Furthermore, it was asserted that large group testing often suppresses individual candid reactions or the "stupid" question which underlies a major program prob- lem; whereas, the more intimate tutorial situation will be able to elicit a greater amount of relevant feedback. 0n the other hand, S. Markle (1967) and Paulson (1969) pointed out several limitations to the tutorial method: (1) it is costly and time consuming, (2) the subtlety and variety of techniques involved in 30 the tutor's or tryout editor's task make it difficult to describe and perform, (3) extreme vulnerability to atypical students, and (4) spurious inflation of learning and/or motivational. This last limitation means that the mere presence of the tutor might well be a reinforcing stimulus which spuriously inflates the student's motivational state and that any "tutoring" done by the tutor confounds the measurement of en route and criterion student performance. With respect to development of a model for the present study, the literature on tutorial tryout and revision methodology has provided some valuable guidance. Nevertheless, two other strategies have been used which warrant analysis before the MK I model can be presented. Therefore, the next section of this chapter reviews the literature rele- -vant to a second methodology of tryout and revision: the large group approach. The Large Group Approach Paulson (1969) defined the large group approach as tryout of a prototype instructional system with groups of twenty or more students with provision for recording and analyzing Specific types of data. Paulson's summary of the advantages of this approach are paraphrased as follows: 1. It is often just as easy to obtain intact classes for tryout of prototype instructional units as it is to get individuals. 2. The instruction per se is more similar to the conditions of actual use than the tutorial approach. If instruction is relevant to a given class, the tryout may be embedded into the larger ongoing in- structional system. 3T 3. Students are not harassed by the necessity of commenting on their progress or learning difficulties, nor is their attention focused on the "trouble shooting" nature of their participation or the tentative, developmental nature of the instructional system being evaluated. 4. The data obtained via large group procedures are far less vulnerable to unique or idiosyncratic personal characteristics since the data are normally summed across students. 5. The larger data base increases the probability that correct decisions will be made on system deficiencies, e.g., which deficiencies warrant revision. Essentially the logic underlying the large group approach is that the greater amount of data produced by this method provides more believable evidence on which to base costly and time consuming revisions. Since the instructional system of interest in this study is a SLATE, it must be noted that the inherent complexity of the audio, visual, and other stimuli considerably increase the cost (in time and dollars) of developing revisions over that of a simple programmed text. With SLATEs, therefore, it may be difficult to justify a costly revision on the basis of feedback from a single student as is often done in programmed text develOpment. Since the purpose of formative evaluation is systematic remedia- tion of deficiencies and since the methodology should generate information on what in the prototype is deficient enough to warrant revision, it would appear that a large data base is desirable for SLATE formative evaluation. Interestingly, the large group with its correspondingly large data base has long been recognized as essential for summative evaluation 32 . activities (Scriven, 1967). However, for formative evaluation the exclu- ‘sive use of large groups has thus far not gained any appreciable acceptance. Research by Vandermeer.--Vandermeer (1964, 1965) conducted two studies using large groups (intact classes) of school children to provide information leading to the improvement of a film (1965) and a filmstrip (1964). The methodology in both studies involved develOpment of multiple choice instruments covering every informational aspect of the film or filmstrip. Following showing of the prototype versions to intact classes, test items showing the greatest difficulty (lowest student recall) were correlated with the Specific part of the presentation where the informa- tion should have been learned. The researchers then revised the deficient portion to: (l) afford more cues or higher visibility (arrows, etc.) or (2) include less complex language in the narration. The revised versions were then shown to equivalent intact groups in other schools. The re- sults were equivocal in both film and filmstrip studies. Some of the revisions "worked" and others did not, so the net effect was NSD. After revising a second time, Vandermeer showed a significant (P.<,05) improve- ment on one-third of the items which reflected revisions in the film and one-half the items which reflected revisions in the filmstrip. The author did not discuss or advance reasons for the equivocation and nature of his results. Of interest to the present study, however, was the fact that at no time did Vandermeer or his associates interview or personally observe any of the 203 SS viewing the film or 216 Ss viewing the filmstrip. The sole basis for revision was test item data and experimenter post hoc analysis of the instructional stimuli. In light of Vandermeer's failure 33 to successfully revise, it would appear that to increase the probability of success, students must provide first hand feedback possibly including input on redesign of the deficient sections. Stated another way, test item errors may locate the troublesome part of a prototype lesson, but as Shown in Vandermeer's research, "eXpert" post hoc analysis does not necessarily remediate Student learning problems. Discussion of the Large Group Approach The large group tryout approach offers the advantages of generat- ing large amounts of data to identify problems, but lacks the direct, specific input from students needed to develOp appropriate revisions. With respect to develOpment of a model of formative evaluation, the tuto- rial and large group techniques each have advantages and both are reflected in the model developed for this study. An Approach Combining Individual afid'Gropp_Data By far, the most widely accepted approach to formative evaluation literature as reported in the literature is one which combines, in itera- tive fashion, data from both individual students and large groups. The paradigm for this approach is clearly illustrated by the flowchart in Figure 2 taken from Programed Learning: A Practicum, by Brethower, et a1. (1966). L> - major . no problems Initial Individual Revision minor Group ,7 Print Write Tryout Tryout problems Figure 2.--Schematic Representation of the Recommended Testing-Revision Procedure (Brethower, et a., 1966, p. 169) _,.-~‘ 34 This approach is advocated by numerous other authors: Taber, Glaser and Schaefer (1965, p. 144-145); Pipe (1966, p. 56-59); Paulson (1969, p. IV-46-47); Schutz (1967, p. 21); S. Markle (1967, p. 111); and Briggs (1970, p. 172-173). In addition, empirical studies have been done by D. Markle (1967), Anderson (1967), and Short (1968) each using this combined approach for tryout and revision procedures. Each of these studies resulted in statis- tically significant differences favoring revised over prototype versions of programs. For eXample, David Markle (1967) conducted a developmental study in which individual and group feedback was used not only to revise print and film materials, but to establish the course objectives, to determine the learning sequence, and to develop the evaluation instru- ments as well. In short, student feedback was used to support as many course development decisions as possible. In Markle's study, the major objective was to develop a basic first aid course which, in seven and one-half hours, would exceed the performance of an existing ten-hour course. A set of test questions, derived from analysis of thousands of accidents, was defined as the course objectives and pre-tested on trained and untrained members of the student population. After removing the items known to nearly all the typical trainees, the remaining items became the first draft of the course. First, individual tutoring enabled Markle's development group to add instructional material gradually, as required, until students were achieving at the criterion level. The major basis for revision was error rate, response time, and prompting by the tutor. A similar procedure was used to develop a series of films starting with black and 35 white "still" pictures as a first draft film. Additional pictures, camera angles, color, and motion pictures were added as required, based on student feedback. After three to five students achieved criterion performance with little or no tutoring, the instructional sequence was then tried out with large groups (N=22 to 30) and revised until the group was achieving 90% of the criterion test. The results of this study are truly exceptional and are summarized by Markle: These instructional engineering methods have resulted in the attainment of the proper objectives. In addition to the desired increase in efficiency as a function of decreased time, the new 7% hour course is far more effective than the 10-hour standard courses with which it has been compared. On one wide-range test used for comparisons, untrained subjects achieved a mean score of 85, subjects trained in standard first aid courses achieved a mean score of 145, while subjects trained in the new course achieved a mean Score of 270, out of a possible maximum of 326 points. Similar results were obtained with other tests and other subjects (p. 1). Three inferences relevant to the present study may be drawn from Markle's work. First, instructional systems of greater complexity than a programmed text may be markedly improved by tryout and revision based first on tutorial tryouts and then large group tryouts. Second, this combined iterative approach seems very profitable when working with volunteer, adult students. Third, in agreement with the work of Mager (1961), students can provide a very significant input into the fundamental design of the instructional system when they are given the chance. They should be consulted as early as possible in the development process. Discussion of the Combined Approach Thus far, several research studies have been reviewed which seem to indicate that significant differences between original and revised versions occur most often when achievement data are combined with first 36 hand direct feedback from students. A widely used approach which combines achievement data and direct feedback involves tutoring a single student \as he encounters problems and incorporating the tutorial instruction into the lesson. Although, the tutorial approach is the most sensitive to individual learning problems, this same sensitivity makes it highly vul- nerable to atypical students and succeptable to embarking on costly and possibly non-functional revisions. Moreover, the task of the "tutor- editor" is difficult to perform and variability in tutorial techniques will affect the quality and quantity of data collected. The large group approach provides a broader more credible data base and hence reduces the possibility of idiosyncratic revisions which might result from the tutorial method. Furthermore, the large group approach provides far more accurate measures of student learning as a result of the "program" or instructional materials. On the other hand, direct interaction with students is inhibited, and serious deficiencies may not be identified. In addition, a large amount of data can be gen- erated which requires careful organization and diSplay before it is usable. The third approach, combining tutorial and group data in itera- tive cycles, appears to provide the best of both techniques and was adapted as the point of departure for development of the flowchart model in this study. Related Methodological Issues Although the overall approach to formative evaluation design has been selected, a number of related methodological issues remain. For example, the types of data collected in most of the studies reviewed 37 earlier were: (1) student achievement data and (2) observational (pro- cess) data. Are these two types of data sufficient for the present study, or should others be included in the model? If other types of data are needed, what specific indicators should be used? Of direct relevance to these questions was the comprehensive treatment of measurement by Schalock (1969) in which he reviewed the strengths and weaknesses of various measures and the paper on evaluation of instructional systems by Paulson (1969) in which he analyzed the problems, needs, and alternatives available for formative evaluation. Of particular interest was Paulson's summary in which he suggested that certain specific measures were appropriate for providing given types of data. The relationships suggested by Paulson are paraphrased in Table 2. Table 2.--Classes of Data and Specific Indicators for Formative Evaluation Class of Data Specific Indicators 1. ANTECEDENT DATA Pre-tests (assessment of student entry ‘ General abilities (standard- capabilities) ized tests 2. TECHNICAL DATA (assessment of instructional Student comments stimuli quality) echnical consultant comments 3. PROCESS DATA . (assessment of students' behavior ryout monitor observations during the learning experience) and comments 4. LEARNING DATA (assessment of student progress En route responses and feed- towards learning objective) back during the lesson 5. CRITERION ACHIEVEMENT DATA (Post-test, criterion referenced Rating scale 6. ATTITUDINAL DATA Questionnaire Student comments 38 The six classes of data shown in Table 2 seem to represent virtually every important aSpect of a prototype lesson. Therefore, these Six data types were included in the preliminary model of formative evaluation. One final question which must be resolved relates to measurement of student achievement. The question at issue is whether student achieve- ment should be measured against a specific standard (criterion referenced) or against performance of other students (norm referenced). The funda- mental issue is: what type of information regarding student learning would be most useful for systematic remediation of learning deficiencies in prototype SLATEs? Glaser's (1963) paper in The American Psychologist stimulated considerable interest in the kind of measurement that was suitable for assessing the quality of instructional enterprises rather than discrim- inating among individuals: He states: The scores obtained from an achievement test provide primarily two kinds of information. One is the degree to which the student has attained criterion performance, for example, whether he can satisfactorily prepare an experimental report, or solve certain kinds of word problems in arithmetic. The second type of informa- tion that an achievement test score provides is the relative or- dering of individuals with respect to their test performance, for example, whether Student A can solve his problems more quickly than Student B (p. 374). Measures which assess student achievement in terms of a criterion standard thus provide information as to the degree of competence attained by a particular student which is independent of reference to the performance of others (p. 375). After Showing that criterion levels can be established at any point from zero proficiency to perfection and assessed at any time dur- ing instruction, Glaser cogently argues that such criterion referenced achievement tests are far more useful in developing effective instruc- tional treatments than tests which differentiate among individual students. 39 The logic underlying this argument is that prerequisite to improvement of instructional treatments is identification of what is wrong in terms of substandard student performance. Such substandard performance is most easily recognized by comparison against a criterion. Student achievement below the standard is, by definition, a deficiency in the prototype. Thus, to be maximally useful for formative evaluation, mea- sures of learning must be defined in terms of observable pupil perfor- mance at or above a specific standard. For these reasons, the principle of criterion referenced measurement is reflected in the preliminary model of formative evaluation in this study. Matrix Summary of the Literature Reviewed A summary of the information obtained from the foregoing review of the literature is presented in Table 3. The majority of the methodol- ogical questions have been answered in a preliminary manner, so a primi- tive flowchart model may now be stated. Formulation of the MK I Model Formative evaluation requires several types of data which must be collected under three different conditions. It was recommended by Paulson (1969) that data on technical quality of the instructional stimuli and students' entering abilities be collected prior to instruction. The process, en route learning, criterion learning, and attitudinal data should be collected in both tutorial and large group situations. This gives rise to a three stage model shown in Figure 3. 4O conuuunlou gauge! accumu~> nomu> -9: a Manamaco CUUJuUm UuCQuwuu: accumua> vwmfi>~m a «scam iwuo amalgam mucuuyw macamuw> voau>um a aacdmduo cuuauun aucau occauuo> won“; one ~ a 1335 noonuon oucaouuug CCOwDu>0fl H0 aaaun=~n>m a luuonxu uvcoalséox crease: iwum fimuduanQUm iuua ucmowuucwfim oz wounaaw>m “oz cibcxc: nauucwdm ~uu«unuuaum Imam Anuwunuuaum acuuu3~u>u acauuquuam x nadcw u .a > naauwon on > so uquuouau a cmaaoc accuuu>uu accumu>ux accumue> vyaa>mx mum ~wc< uo: , o m v,.H m v a « nodaw>u¢ cyan acqulLOucu cued coduaaLcu :« coduqauogc~ umam co vwmmm mucus Tux cu ceauquc*c~ 10¢ ca ceauaEhOuzu a amauoa naquuah ova—uc~ law nauuouah ov3~o=~ ~a«houah vau3~uc~ Imam Edam vwmg>wx c)ocxc: «caucuzh vvvagucu aquu0uah vouaaucH cicada?” Baa-«>01 nucyvaum nucsvzum Esp“ Juan uausfifisezx and chvsum gnu) cougua ucUUSHm scum Juan uncovnum abum lo»; susnvoo; a “cue uvwom a .csexvaw inch; A.g:: L:H:H a nus.:~ manuo> a “:96 Ivuum ¢ “coemvafi nonuzuonmm xuqnvuum co veasn uwvan youah co comma yo~3~ so warm: cuocxcz “can 4;“ :3 u¢_>zx Twash cough no woman neuzh co woman covau>ux fl codmfi>~m Beacoum uncanoau>oo ai~nchm on. wcq>qx auanoum uzu ucw>u= . >uLcoah; «macu ca uuuaamax aoanoum auuuwuouum numwcuoahz usuao uoz ciscxcu zucsvsum v0 panes: mucwnzum «c Laoasz w; :Eo~n3»n -< wean “0 mommu 9_afiu~:x cibcxc: uo ucuflcmwma< :o«u«>ux xuuufl: u:uv ucoaxvaa hogan unmana as: “nausea Avv “cam any :0 “saw 3:“ co vauwuvcwu :0«uflu0unuuuau a nuahaafi< nauuqt unevsum \auuu uaum\auu— mun>~a:< a can: unuum when: a mEbUH umah moazx cofinauza m or; auuaczzu: mumscgca mvmvzu:axz 4 >~o>duunu >->aua3u¢~ a gang .flanoum aua>~fi6< Ifluu vacuu .m«a%~fi:< Iw._ m«m>~mc< Beau fitmmut vwduuucouu Ou Ubncaasm Mano -< i>: a Uu~>~1c< auto Ic— $9N>~WC< Mano 00N>~mc< Guam IUflVfifl lawn Afiv >~H3ufivw«a “mogumom sand . _o mosey; cum eucau onioH I x can: ON n z mac: ufl9EE~ com: use»: xcuiaa> axoh Laud: cause .>uu"xc< Imummd yvu>ohm o» ejouxaa can: IvOUOum nacho i 650: vau0nm nacho u ans: co_uauomda :10 .e umgw< Samba" H cu usua150 :e1; .3 ausvam 4 usuuzum ceduvuumao :30 we mwuavuu IVNUOML o>wu op unualum aauucush o» nuaaaum _s«uouah nuuacnhsuc~ “cash ivynfi~ grab a>ac vcu “a cz>_c .rsuurom .5 s~2x u~dmszcen voumauuoucfl heuah noun HMuhOHDH Iauumucuav< acqu - 11 tau :0 vuacuwvaa ccum‘>zm .0 uhwu OW“ nuuouca~o> vouuu~um vouuoflam >~aov c:.uus_;m .0. :31:4 rcsy.:a~o> zuwouc3~o> udwcum mucoucaao> vanadw nouzuououm ~n vouuo~om Iochx Aqaovcau unwouc:_o> new: .mcaauc:~c> mommadu gonna“ 1:. nurmcqu uucgc“ muczEEcoomuCJOCxc; nozzlclocxcs vomDTSDO:x:: unafinadm vUUCOuwuax cow voucmhahox mE;u_ aufiozuuuaaz xsuu~ oede:Unu_:r msmun aofiozouu~at nfiuuu uu«o:unu~:t noaxh Ih0u_uu youozuvuaat c:«u~.~uu naou_ :u_o:Uuu~:r “new u:~69>9izu< .vmp uc~EL>u¢£C< “web ucwfie>~ucu< gawk uc08N>Ua£u< ucvflflhuac~ .u>L~mno uuuu«o .a>cmmco “Cohan amok -1o~m uxmw a meou~ .a>uanco “ooh“: .a>comno uuwuao .q>homno uuoufia mon>h swan fl canah com a ucoau>bucu< a “coao>aucu< “new .uuvm noucm a “coab>m«;o< a .coEu>w«£u< a ucofiw>uucu< macaeEOQ saw a “nah zowa= xu~o zms: zan4uou a zommox mZO~Hmm30 mamma zmx mxczhu< oouauqusuqq ozu uo Juu>o¢ any we >uail5m manual .n odnah 41 5?:fir Technical Tutorial Large Group Assessment & Tryout & Tryout & Prototype Revision Revision :2 Revision SLATE 1.0 2.0 3.0 STAGE I STAGE II STAGE III Figure 3.--Major Stages in MK I Model of Formative Evaluation Stage I.--In recognition of the need to predetermine the techni- cal accuraqy of subject matter content, the technical quality of presen- tation media, and adequacy of evaluation instruments--before students interact with the instructional stimuli technical assessment is conducted. Stage II.--Tutorial tryouts are intended to provide data on specific learning and communication problems which are best gathered dur- ing more intimate tutorial sessions than with larger groups. Stage III.--In recognition of the limitations of tutorial tryouts, e.g., vulnerability to atypical Ss and author reluctance to base expensive revisions on data from a single student, large group (N=20) tryouts are conducted to broaden the data base. A flowchart showing the MK I model at the first level of detail is shown in Figure 4. A flowchart of the MK I model showing the fourth level of detail is depicted in Figure 5. This flowchart represents the final configura- tion of the MK I model developed from this review of the literature. In the next chapter, assessment and revision of the MK I model are described. 42 mepmo mo Pm>m4 pmg_m mgu mcwzosm :owpm:~m>m m>wmeLom do anoz H 22 mg“ mo cowumgzmwwcouuu.¢ mgzmwm HHH mwm 3mw>wm quu gocvz azogo Loews meL0psk . a quwcgumh -oummn mepm mempnoca some: HHapaa Ho Ha>as gassed as» m=_zo;m :oHumsHm>m m>Huosgou Ho Hmuoz H x: ms» Ho onHugzmHHcooui.m mgsmHm 43 m.¢ H.¢ mconH>og , upon Hcho 3H uxmz aon>mo m~HHm=< nsogw mace; u7 mmH>mm v v o.m o.~ o. H H.m H.N H. H N. H. H H.H.m H.H.m m. m. H. H HHHHmz HmcHuaqup< HmcHuzuHup< .pm :. .Hm>m m.H.m mchmeH m.H.N mchgme N.~.H.H muses cm muse; cm HHHHmac mHumz ~.H.m mchgmmH ~.H.~ mcHzgmmH H.N.H.H cngmgHso cngmquu ucmucou . . HstHum H H m H.H.N HmcoHpuagamcH mmquLm mmmUOLQ H.H.H mHHme ngcm mm mama pooHHoo sumo pumHHou mama pumHHou Hzo>mH gnome mwm<4 HonmH 4mm ssz cmxgoz mgmm> m HHmpm .nmb Ha Ha mHHn,.m:mH .cmEmHm mum muxmul a ngm>mm mHOH31. .m.;H -con m mom: .Hm>mu mmcoeosaom oom mH m pcmHmHmm< mm mucmHum .on .Lo m mcwmummp .cmsm mm .LH H .mcsa 2m: cmE;mmLH omm ..Homw a lmcqummH mmuHHoo .Hum .Hga CH .aa mmH<4m meaa mcHaau maaa> NN .esoo mcheas mm 1m,.mwmg:ou .Lu m ozH aHmcl.HoLa meHHH--mcoz .o.;a cmELngu .Hamo .zpm omuomlw.an .awgm (mchommH cH mmH<4m oH .umz .Hm>mu mH<4m mchsu qu; mcmmH m commomoca Ha .Hm> cw mmczou .Lu.m m>Hmcmpxm um; p:n--mcoz .o.o am: no mmgos m nozaom om .mmH<4m HH + .Hm>mu mH<4m mcngmmH LommmHoca .m an + mgzpomb .mucmHom mcngn aHm; .Hoga mgmm> mH 3m: mm w. Hwom cw mmgsou .culml mumgmwoe mm; Han--m:oz .o.;a w . Hm>mu Him a mmLoEogaom com .mMH<4m mcHL3u aHm; .Hoga mcngmmH LopuagpmcH Hm + aaH + acupuas macs an: .pmcH aHnaz mamaH H am: we mmgzoo .EmstHm .cu my ch2 .chgp .g» H .<.z swagmmgH mHH .mmH<4m «H .Hm>mc mH<4m mcngv mcngomH Lommmwoga + an + mgzpumb .maz aHm; .Hoca meow um: mgmm> HH :mz Hm .:< cH mmgsou .Lu m .axm mzow>mca 021 .o.;a mmH<4m asp cmHmmo mH<4m :H mucmHgmaxm coHuHmoa mchz msmumxw mucmHgmaxm so mcngomH ucm HmcoHuuzgpmcH mchHmcH msoH>mLm a omcmma coHHaHHHHH< mucmccoqmmm Eogm mama cczogmxummno.m m—an -_.__._.._H __ ____, , _ 4 .fi _.__~ ___._._. 49 Interview Data ’The data collected during the seven interviews are presented by summarizing responses to each question. This method should provide readers a better insight into the problem of formative evaluation and the rationale underlying changes in the MK I model. Discussion of Individual Questions Question 1: DESCRIBE THE SUBJECT MATTER, TARGET POPULATION, AND INSTRUCTIONAL SYSTEM IN WHICH YOUR SLATES WERE USED. The responses to this question, as well as other background data, are summarized in Table 5. Question 2: DID YOU REVISE YOUR MATERIALS AFTER A "FIRST DRAFT" 0R PROTOTYPE HAD BEEN PRODUCED? BEFORE OR AFTER FULL SCALE USE WITH THE TARGET POPULATION? Six respondents indicated they had revised their SLATES beyond the ”first draft" stage. The degree of revision varied from major to minor; the majority reported only minor revisions. The definitions of "minor revision" varied considerably in terms of specific activities but usually involved less than five man-hours of author effort. One respondent (R4) indicated he had not revised at all--primarily because the original versions seemed adequate to achieve the intended SLATE objectives (i.e., to reduce laboratory time and decrease frequency of fatalities during surgical procedures with animals). Of the six respondents who revised their units, only one respon- «dent (R7) had conducted formative evaluation in the sense of having revised SLATES before extensive student utilization. The others had revised only aiiter extensive use by students (one term or longer) had revealed serious prwablems in the prototype versions. The one author who did revise was an SO instructional research and deve10pment specialist who did so because of "company policy" and the fact that the specific project involved a large potential loss of prestige and federal funds if it failed. Question 3: WHAT WAS YOUR REVISION STRATEGY? DID YOU HAVE A PREDETERMINED PLAN? IF SO WHAT WAS IT AND HAD YOU USED IT PREVIOUSLY? IF YOU HAD N0 PLAN, HOW DID THE REVISIONS EVOLVE? Of the seven respondents, only the professional had a predeter- mined strategy for revision development. Although the professional was well aware of specific techniques and desirability of formative evaluation, he cited a frustrating inability to apply these techniques in many cases due to the enormous commitment of resources required. The economic con- straints were cited as the main factor precluding his commitment to the formative evaluation process. On the other hand, there seemed to be a sliding scale of project importance, wherein a given project if it in- volved sufficient prestige and dollar cost, automatically warranted formative evaluation. The professional felt that the entering capability and experience as a program/SLATE author was a critical variable in determining the need for formative evaluation. That is, the greater the experience of the author, the less the need for formative evaluation--hence this procedure was probably most relevant to novice or non-professional SLATE authors. Paradoxically, the non-professional program authors in this sample were not aware of a need for any type of tryout-and-revise deve10pmental strategy. Of the six non-professional SLATE authors, not one had a sys- tematic plan for revision, nor were they aware of the desirability of 51 revising SLATE materials before large scale student use. The most frequent rationale for this non-interest in SLATE tryout and revision was as follows: (paraphrased) "I don't revise my lectures, my labs, or my textbooks before use with my class, so why should I Spend valuable time revising my SLATES?" All six admitted the likelihood that proto- ,type SLATES might be deficient in some important respects, but since SLATES were a subsystem of the "class" they personally were teaching, they felt that intact class usage was a justifiable method of prototype tryout (e.g., they could correct SLATE deficiencies during lectures). Question 4: FROM WHOM DID YOU OBTAIN FEEDBACK: INDIVIDUAL STUDENTS, GROUPS, EXPERTS, OR OTHERS? The respondents who did revise SLATES obtained feedback both directly and indirectly from students in the target p0pulation. Direct feedback was obtained most frequently from individual students complain- ing personally to the respondent either after lectures or while the re- spondent circulated through the carrel room while the student was using the SLATE. Indirect feedback was obtained from lab assistants, carrel rooninmnitors, or discussion group leaders who reported serious dis- crepancies sometime after they occurred. For example: (paraphrased) “SLATE on X is really bad--the students are not finishing in one hour; they'cannot do the workbook problems; the experiment consistently fails-- etc." No systematic sampling or student selection procedures were used except in the case of the "professional" working on the large federal project mentioned earlier. In two cases, scripts were read by colleagues for content accuracy, but: the majority of feedback was obtained randomly from students via inter- mediaries such as graduate teaching assistants (GTAs) in the course. 52 Question 5: WHAT KINDS OF FEEDBACK DID YOU TRY TO GET? Six respondents actively tried to gather student attitudinal data; while all seven respondents made efforts to assess student achievement (learning) from the SLATES. In two cases (R2 and R7) the student back- ground/demographic data were collected. In most cases, observational data were randomly collected via respondents' personal visits or through intermediaries such as GTAs. In sum, none of the respondents used more than two types of data. Question 6: WHAT METHODOLOGY WAS USED TO GATHER THE VARIOUS TYPES OF FEEDBACK? In all cases, respondent-designed measures were used to assess formally student end-of-SLATE learning and attitudes. In six cases, learning measures were typical paper-and-pencil tests using true-false and multiple choice items. In two cases, however, performance in post- SLATE "action” laboratories was the prime source of data on student learning. Attitudinal measures were used by six respondents but collected at different times. For example, three respondents collected attitudinal data fOllowing each individual lesson. The others collected attitudinal data at the end of the course. Process data were collected the least systematically. In most cases the methodology involved random observation/interaction between the respondent (SLATE author) and his students as he visited the carrel room during operation. The major source of process data was verbal report froni intermediaries such as carrel room attendants or lab assistants who described student difficulties with specific SLATES to the respondent-- after a number of students had been observed having a similar difficulty. 53 The number of similar discrepancies which constituted a reportable incident varied considerably, but respondents agreed a reasonable estimate would be more than 20% of the students using the SLATE. Question 7: HOW WOULD YOU CLASSIFY THE TYPES OF PROBLEMS YOU FOUND? ADMINISTRATIVE/TECHNICAL? COMMUNICATION? LEARNING 0R TASK RELATED? (EXPLAIN THESE CATEGORIES.) This was not a particularly useful question as most respondents did not understand the categorization system and often a lot of time was wasted in explanation. What it did Show was the complexity of the MK I model's classification scheme. Nevertheless, a number of respondents stated that many problems fell into the administrative/technical class such as waiting in line due to limited space; equipment malfunctions; not reading or fol- lowing directions; necessary equipment becomes lost, misplaced, and mis- labeled. Communication and message design problems appeared next in frequency, with boredom and general inattention cited most often as the primary problem. Learning and task related variables were hardly men- tioned unless prompted by the interviewer. Non-professional respondents seemed to assume that variables such as the learning objectives, evalua- tion instruments, the sequence, organization, response type and frequency (if any) were "given" and not subject to modification. In about one-half of the cases, respondents had given serious consideration to the student's response and feedback, both the type and frequency, while in the other half little consideration was given, and SLATES were simply illustrated lectures with a post-test. Question 8: HOW DID YOU SUMMARIZE, DISPLAY, AND ANALYZE YOUR DATA? Two types of data were formally analyzed: student achievement and attitudes. Achievement data analysis was done primarily by frequency ‘ 54 counts of missed items and conventional item analysis. Attitudinal data were normally summarized as a percentage response on individual items. Consultants were used only in two cases to analyze the data. Most fre- quently, the designers seemed to make a subjective decision as to whether a problem warranted revision. Question 9: HOW DID YOU DETERMINE IF A REVISION WAS REALLY NECESSARY? HOW MUCH OF WHAT TYPE OF DATA WERE NECESSARY FOR COMMITMENT OF RESOURCES (TIME AND DOLLARS) FOR REVISION? No one particular data source or criterion seemed to emerge. Each respondent seemed to have an intrinsic weighting system which included personal observations, attitude and examination error data, and personal time and dollars available at the time. It did not seem as if any one data soruce was sufficient, but rather that multiple sources would have to correlate before revision action would be warranted. However, any action depended on how costly the revision seemed to be. For example, a bad examination item (missed by a lot of people) could be revised quite easily by cutting a new stencil--unless the item iwas embedded in a large workbook. If the latter were the case, the item remained unchanged and an erratum sheet was posted in the carrel room; i.e., "disregard item X; it is no good." It was obvious that audio visual materials were revised very re- lturtantly. The time and costs were considerable and designers simply did not have time or dollars to revise any but the most deleterious materials-- anci then only after one or two terms had gone by and the data was over- whelmi ng. Interestingly, when a faculty member was teaching the same course in which the SLATE was used, he could compensate for failures of __-__—._ __m :m’ Wm— 55 one SLATE by reteaching in lectures or quiz sections the content which students did not learn from the deficient SLATE. Also, laboratory as- sistants served a tutorial function to reduce the seriousness of learn- ing difficulties that were the result of ineffective prototype SLATES. Usually, the criterion for revision was multiple inputs which indicated that a specific problem existed in a SLATE; that is, if the lab assistants consistently reported a problem, and/or if the designer personally observed the problem; and if student achievement data re- flected a deficiency--then the decision was made to revise the SLATE. Question 10: WHICH COMPONENT DID YOU REVISE: OBJECTIVES, EVALUATION INSTRUMENTS, AV MATERIALS, OR SOME OTHER? In the six cases in which revisions were made, a combination of all three components (objectives, evaluation instruments and AV materials) were revised. This was due to the highly interdependent nature of the instructional stimuli in SLATES; e.g., revision of one component necessi- tated revision of other components. All seven respondents indicated they would have liked to revise the SLATE objectives, but only three did so because of the magnitude of tflris undertaking. 0f the three who revised objectives, only one added objectives whereas the other two deleted objectives. Other aspects of the SLATES which were revised were sequence and complexity of lab experiments. In one case, the overall sequence of SLATES was revised to allow for better integration with lectures and laboratories. In another case, a complex lab experiment which failed 50% of the time was replaced with a more simple and reliable one. 56 Question 11: HOW DID YOU DETERMINE THE DESIGN OF REVISIONS? None of the respondents, including the professional, had a systematic approach or theoretic position on AV materials design. In- variably, they used an intuitive approach to revising AV materials similar to the approach used in the original design process. Revision of evaluation instruments consisted largely of deleting items of low difficulty or discrimination or rewriting item stems or foils to reduce ambiguity. Revision of objectives was largely based on an intuitive decision regarding course content and/or difficulty level of the course concepts. It appeared that the respondents typically overestimated the students' entering capability, and consequently, revised objectives were simplified versions of the original ones. Question 12: HOW MANY REVISIONS (CYCLES) DID YOU MAKE? The responses to this question varied considerably as most re- spondents made some revisions to most, but not all, of their SLATES. For example, one respondent did not revise at all. Of the six who did revise, most made one set of minor revisions (less than five man-hours or author work) on each SLATE. However, several respondents indicated they had, after several cycles of "patch up” minor revisions, made major revisions amounting to a complete redesign and reproduction of a given SLATE. Each SLATE/author combination was a unique case, and the number of revisions conducted appeared to be a function of how good (or bad) the prototype was, the resources available, and the institutional press on the individual author. It did not appear, however, that the WW _,,, _..... .____,_, _. .. L/ 57 respondents were willing to commit large amounts of time and resources to a major revision until a large amount of data had been accumulated, over approximately one year's time. The professional, on the other hand, indicated that four cycles of revisions were accomplished before the final SLATES were widely dis- tributed. The last two revisions were based on field tests where the authors were not present during student use and all data were from end- of-SLATE test scores and teacher interviews. Nevertheless, to most respondents the technique of multiple iterative revisions did not seem feasible due to its high cost. Question 13: WHAT PERCENTAGE OF PROTOTYPE DEVELOPMENT DID YOU SPEND ON TRYOUT AND REVISION? The time range was from O to 200%, with the average about 20%- 30%. When queried as to what was the original time investment on a pre-SLATE basis, most were unable to recall as various SLATES were being deve10ped simultaneously. Furthermore, the various SLATES took different lengths of time to develop depending on the author's teaching load, complexity of the unit, previous preparation, whether materials were already used in class, the production capability of the author, and other situational variables. After some probing, the experimenter (E) extrapolated an average development time of between 50 and 100 author hours per prototype SLATE-- exclusive of support man-hours (typing, collating, photography, sound recording, etc.). Question 14: WAS IT WORTH IT? HOW DID YOU DETERMINE IF THE SLATE(S) WERE IMPROVED? ‘._ fl“. - 58 All respondents who did revise were absolutely certain the re- visions were effective but had little objective evidence on which to base these judgments. Not one of the non-professional respondents made statistical comparisons between achievement test scores or attitudinal scores on original and revised versions. This technique was felt to be too time consuming, and the relative effectiveness of a given SLATE could be determined through informal means Such as lab assistants, GTAs', and students' questions in lectures. On the other hand, the professional did use statistical tests to compare original and revised versi0ns on measures of student achieve- ment, student attitudes, and teachers' attitudes. In general, the respondents did not seem concerned with an objec- tive evaluation of their revisions. With them it was simply a foregone conclusion that the revised versions would be an improvement over the prototype. Question 15: IF YOU HAD TO DO IT OVER AGAIN, WHAT WOULD YOU DO DIFFERENTLY? Many replied that in retrospect they would select different ob- jectives and/or content; e.g., their original objectives were overly optimistic in terms of students being able to achieve them in SLATES. This may be a reflection of poor program design as well as curricular refinement. Several respondents commented that the idea of "revision as you go" sounded like a good one, but there seemed too little time to perform the tutorial procedures. Another major problem in SLATE revision was selecting students with necessary entry skills. Most of the respondents' SLATES were 59 embedded in a larger instructional system--a "course"--and were in large degree dependent upon students obtaining necessary prerequisites from earlier SLATES as well as from other course related learning experiences. Naturally, the later a SLATE was to be used in the course sequence, the more serious was this problem of obtaining students who possessed the prerequisites yet were naive with respect to specific SLATE objectives. This difficulty, along with the time and expense inherent in tryout and revision procedures, tended to reduce interest in formative evaluation as represented in the MK I model. In sum, the majority of respondents indicated they would change the subject matter content of their SLATES and attempt a closer integra- tion between SLATES, but the overall process of SLATE development would remain basically unchanged. Question 16: DO YOU THINK THE MK I MODEL IS PRACTICABLE? IF NOT, WHY NOT? WHAT CHANGES WOULD YOU SUGGEST TO MAKE IT MORE PRACTICABLE? Without exception, all respondents stated that the MK I model was highly impractical in the "real world" and they would be unwilling or unable to use it. Several reasons dominated. First, the model seemed overly complex and time consuming. (It appeared to E that the flowchart itself simply overwhelmed respondents.) Second, the concept of iterative revisions based on tutoring single students appeared totally out of the question from the standpoint of data credibility and cost effectiveness. In other words, given the extremely high development costs of SLATES (both labor and materials) and the difficulty of inte- grating slides, tapes, workbooks, models, laboratory exercises, directions, etc., authors simply will not revise this whole logistical system on the basis of feedback from one student. ' -‘-—-“.___.__'-—_.—. ._.- 60 On the other hand, the prOSpect of revising on the basis of group feedback seemed more acceptable, but logistical and sequencing difficul- ties posed serious problems. That is, SLATEs in highly technical areas such as biochemistry, soil science, geography, and medicine, are highly interdependent and must be hierarchically sequenced. This means that student tryouts must follow the same hierarchical sequence which poses major logistical difficulties in terms of coordinating design, produc- tion, tryout sequencing, and class sequence. SLATE production must coordinate with learning activities within the "class" embedding the SLATES so that students who have the necessary prerequisite knowledge can be obtained at the proper time in the course sequence. If a mora- torium is declared so that the class is not offered while SLATES are being deve10ped, available students may not have prerequisites. Although SLATES are supposed to be self-instructional, they nevertheless depend to some extent on lectures, text, and lab sessions of the embedding course. If there is no ongoing course from which students may be solic— ited, a suitable sample for formative evaluation may be impossible to obtain. The major changes to the MK I model suggested by respondents were: (1) deletion of the tutorial tryout and revision phase, (2) de- letion of the technical assessment and revision phase, and (3) develop- ment of a logistical/sequencing procedure which would allow group tryouts to be optimally sequenced within an ongoing course. The procedures contained in the technical assessment and tutorial phases were recognized as potentially valuable but not worth the effort. For example, most respondents seemed very reluctant to allow peer review of their "rough draft" prototype work either for technical or stylistic 61 comments. Most regarded themselves as "content" experts; hence addi- tional technical review was redundant. In addition, most felt they were capable of assessing media and evaluation instrument quality due to pre- vious experience teaching. With regard to deletion of the tutorial tryout and revision phase, most reSpondents felt that basing SLATE revisions on feedback from a series of individual students did not seem feasible or cost ef- fective. SLATES were too complex and costly to revise on the basis of one or two students. Furthermore, the tutorial procedure appeared ex- cessively time consuming from the author's standpoint and excessively costly if revisions were to be made after each student's tryout. On the other hand, most respondents seemed agreeable to use of a large group tryout procedure which would quickly generate a large amount of data. Discussion of Interview Data Several trends clearly emerged from these data. First, none of the respondents, except the professional, felt that revision prior to full scale use was warranted due to press of time and lack of resources. Second, tutorial tryouts with individual students did not seem to be a feasible technique or basis for revision. The complex and highly coordinated nature of SLATE instructional stimuli (slides, slide change signals, audio tape content and directions, workbook, student responses, knowledge of results, etc.) made it very difficult to change anything once the prototype was set up so the first student could use it. 62 In recognition of this situation a heuristic clearly emerged; namely, SLATE authors need a rather overwhelming amount of data to con- vince them that any revision effort is "worth it." Operationally this means that several students must have encountered a given problem, and that more than one data source must have corroborated the same problem (such as personal observation and post-test errors) before revision action is taken. Furthermore, several revisions must be required on the same SLATE before any action is taken. In other words, the vehicle must have several serious discrepancies before it warrants an overhaul. As represented by this sample of SLATE authors, a very clear pattern of revision activity emerged. Typically, the SLATE was de- signed as well as possible. Then it was used in prototype form by the intact class under control of the SLATE author. During this initial usage, random feedback was obtained via authors' personal observations, verbal reports from lab assistants, carrel room attendants, discussion group leaders, and/or students. Systematic feedback was obtained from end-of-course evaluation of student learning and attitudinal data, and in some cases, assessment of student achievement and attitudes after each SLATE. Typically, however, the instruments used to collect data were too general to provide Specific guidance for the design of revi- sions. Nevertheless, data on problems in various SLATES gradually ac- crued from several sources. When sufficient corroborative data was obtained, and if time and resources permitted, revisions were attempted. These revisions were deve10ped on an intuitive basis, often in consul- tation with GTAs (What should we do about "X?") but seldom, if ever, using the students as a source of design information. The most common 63 revisions reported by respondents was a reduction and simplification of subject matter content--a reduction in "coverage"--which reduced the average student time in the SLATE by 10%-25%.‘ This differed from find- ings in programmed instruction studies where revised programs are usually longer than original versions. It appeared that the major impact on the SLATE author of typical after-the-fact feedback data was a rapprochement between estimated and actual entering student capabilities and a reassessment of objectives and content coverage in given SLATES. Typically, prototypes were too ambitious; so when revisions were made, the net effect was to reduce their complexity. Thus, feedback most often caused reformulation of course/SLATE content and objectives as well as revision in programming and/or presentation techniques. The regrettable aspect was the large number of students who were subject to the prototype versions until the author recognized what should have been in the SLATE and took appropriate action. Conclusions From the Interview Data These data clearly showed that with faculty similar to those in- terviewed, the MK I model was not practicable. MK I did not correspond even remotely to current practice, and of the seven respondents inter- viewed, none was willing or able to use the model in its present form. A major reason given was that it was logistically impossible to coordi- nate design and production with selection and conduct of tutorial revi- sions followed by large group tryouts and revision. The major problems were: (1) obtaining naive students at the proper time, (2) author "release“ time, and (3) revision costs. While most respondents conceded 64 that the use of the MK I model would likely result in better SLATES than they currently had, none felt that SLATES needed to be that good; e.g., there were other important aspects of the course, and SLATES per se Simply did not warrant all that effort. While the MK I model was designed to reduce uncertainty regard- ing formative evaluation, it seemed to raise more questions than it answered. Moreover, MK I did not recognize the severe time and finan- cial constraints which operate in the practitioner's world, nor did it recognize certain characteristics of SLATE authors which inhibit for- mative evaluation. For example, university faculty typically regard themselves as subject matter experts and highly proficient teachers; consequently, they do not recognize a need to tryout and revise SLATES before using them on their intact classes. The respondents felt that teaching and designing SLATES were complicated enough without introduc- ing more complexity and uncertainty by evaluating their SLATES and possibly getting a bad report. Furthermore, several respondents were very reluctant to allow students to criticize their SLATES, particularly in a face-to-face tutorial Situation. These data led to the conclusion that the concept of formative evaluation itself (basing revisions on feedback from students) must be "accepted" before any model is practicable. Assuming acceptance of the concept, then three major revisions of the MK I model can be inferred from the data: (1) logistical and conceptual simplification, to include either deletion or modification of the technical assessment and tutorial phases; (2) some procedure for reducing, during formative evaluation, the interdepency of SLATE instructional stimuli which dissuade authors w.-.'——- -.__ v ' 65 from changing anything (e.g., components are so highly interrelated that the smallest change becomes a major task); and (3) attention be given to obtaining corroborative data on major instructional problems so authors will be more likely to take necessary remedial action. In conclusion, the MK I model must be simplified; its fundamental concepts justified to SLATE authors; the hierarchical interdependence of instructional stimuli reduced; feedback techniques must generate corrobo- rative data. Revisions to the MK I Model Simplification The model would be greatly simplified if the first two phases were simply eliminated leaving only the group tryout and revision phase. Data from the interviews supported such a move. However, it is the ex- perimenter's opinion that the technical assessment phase is not suffi- ciently complex or time consuming to warrant complete deletion. Further- more, it had been the personal experience of the experimenter that. prototype SLATE evaluation instruments often were either lacking alto- gether or of such low quality that the necessary types of data for for- mative evaluation could not be generated. Therefore, despite the interview data, modifications to the MK I model did not include a deletion of the technical assessment phase. _thaining Corroborative Data Some provision must be made in the revised model to obtain a sufficient amount of corroborative data so authors know what must be r“£2vised. .——-.'_ ‘ ‘—‘__. _ w h 66 Based on the unequivocal response of practitioners, tutorial procedures should be eliminated. But the group tryout as it was formu- lated was not likely to generate the detailed information needed for identification and remediation of critical learning problems. In other words, tutorial procedures identified and solved learning problems while large group procedures were normally limited to problem identification. Therefore, some technique must be found which generates both the tutor- ial and large group data types. The critical aspect of this new tech- nique is that it must generate a large amount of relevant and corroborative data with minimal expenditure of designer or student's time. This procedure must also be logistically compatable with an on- going course context. Group Debriefing as a Feedback and Problem-Solving Technique E devoted considerable effort to the development of a technique which combined the tutorial and large group data collection potential, yet did so in a minimal length of time. While searching for a solution to this problem, E was struck by the functional similarity between for- mative evaluation and military debriefings. For example, it is common practice in the military to evaluate mission and training procedure effectiveness by means of formal debriefings. Usually, operations and support personnel participating in an exercise or training program are interviewed immediately following each mission to determine specific successes and problem areas. Information is collected from all partici- pants, summarized, and formally reviewed by mission/training directors to determine how to improve mission effectiveness. Thus, first hand 67 information from operational level participants is fed back to the plan- ning and design personnel. The function of formative evaluation is very Similar; information on specific success and problems of participants (students) is fed back to the lesson author for purposes of improving the lesson. In light of this similarity, it was reasoned that if SLATE formative evaluation were conceptualized as a "one Shot" small group debriefing following a lesson, that data collection would be simplified. Furthermore, some data have shown that during training program deve10pment when trainees participate in a mission debriefing, they not only provide planners with information on problems, but can often pro- vide solutions to such problems. For example, when the U.S.A.F. Air Defense Command radar intercept system was developed, personnel operat- ing the system in a training status were debriefed after each major exercise. In this way, critical problems were identified and remediated (Alexander, et al., 1962). Reconceptualizing the Problem It was reasoned that if formative evaluation were reconceptual- ized as tryout and revision by means of a small group debriefing/problem solving process, not only might identification of major discrepancies be facilitated, but quite possibly the debriefing might suggest more effective solutions than would otherwise be possible. Development of Group Debriefing/ Problem Solving ProcedUres In the present study, formative evaluation was reconceptualized .as a group debriefing/problem solving process. The major source of 68 feedback on instructional problems was to be a group of students who were given the dual task of problem identification and development of solutions to the problems identified. It became necessary, therefore, to develop procedures appr0priate to achieve these objectives. Review of Literature on Group Processes Much of the current research on group processes appears to have grown out of two separate but related historical movements. One move- ment emerged from the works of John Dewey who emphasized the social aspects of learning and the role of the school in training students for problem solving and for democratic, rational living (Schmuck & Schmuck, 1971). The other movement emerged from the empirical research of Lewin and the subsequent development of researchers and practitioners in the field of group dynamics (Bany & Johnson, 1964). The latter movement emphasized the collection of empirical data which supported the philo- SOphical work of Dewey and introduced Specific procedures for improving group processes (Bradford, Benne 8 Gibb, 1964). During the past twenty years there has been an extensive accumu- lation of scientific research on small groups as the study of group dynamics developed as a subdiscipline of social psychology (Schmuck & Schmuck, 1971). In 1955, for example, Hare and others annotated a bibliography of 584 items on small groups. By 1959 Raven published a handbook which included 1385 references, and in 1966 McGrath and Altman published a bibliography of 2699 references. In addition, shorter analyses of group dynamics were published showing both the interest and magnitude of research in this area (Golembiewski, 1962; Luft, 1963; Olmstead, 1959; Shepard, 1964). 69 One trend in education resulting from research on group dynamics was the direct application of group research for the improvement of per- sonal learning and/or for learning organizational processes (Schmuck & Schmuck, 1971). For example, one notable application was the technique for educating adults referred to as the training group (T-group). This technique was deve10ped by the National Training Laboratories: Institute for Behavioral Science. Important publications relevant to the T-group were Bradford, Gibb, and Bene (1964) and Schein and Bennis (1965). Re- finements to T-group technology grew out of research on organizational group processes (Katz & Kahn, 1965; Likert, 1961; March & Simon, 1958). Until recently, much of the research in group dynamics has been done in industry and government rather than in school contexts. Lately, however, there has been an increased emphasis on the application of group processes to educational settings. The 59th volume of the National Society for the Study of Education (Henry, 1960) provided a social psy- chological theory on classroom groups and proposed ways of using research findings to improve instruction. Several recent works review empirical data on group processes in the classroom and other school settings (Bany & Johnson, 1964; Glidewell, et all, 1966; Lippett, Fox & Schmuck, 1964) while other works utilize data on classroom group processes to make recom- mendations for improving teaching (Schmuck, Chesler, & Lippitt, 1966; Fox, Luszki, & Schmuck, 1966; Chesler & Fox, 1966; Amidon & Hunter, 1966). Emerging from this large accumulation of research was the recog- nition that a number of complex variables dynamically interact in any small group. Since the present study was concerned primarily with for- mative evaluation, no attempt was made to formally investigate the 7O numerous variables known to Operate in group dynamics. Instead, the MK II group debriefing/problem solving procedures were based on generali- zations drawn from previous research on group processes. Three works were the primary references for deve10pment of the MK II group debrief- ing procedures: Maier (1963), McGrath and Altman (1966), and Schmuck and Schmuck (1971). McGrath and Altman (1966) suggested ten variables all known to influence the output of any problem solving group. These variables in- clude: (1) member abilities and experience, (2) member attitudes, (3) member roles and/or tasks, (4) group size, (5) group task, (6) group leadership, (7) group developmental stage, (8) group cohesiveness, (9) environmental variables, and (10) group organization and/or structure. An attempt is made to deal with most of these variables in development of the MK II debriefing procedures. Group organization and structure.--Maier (1963) described several techniques or strategies for organizing group problem solving activity which may be dichotomized as structured or unstructured. Since unstruc- tured strategies normally take more time, they were not considered appr0priate for the present study. Among the structured techniques for organizing a small problem solving group, the most applicable appeared to be "problem posting" (Maier, 1963, p. 161). Using this technique, the student participants are given a common experience, e.g., individual use of the prototype SLATE materials. Following this, they convene for a debriefing. The first part of the debriefing, however, is devoted to listing all the problems encountered by various members of the group. During this 71 time the group leader summarizes the problems and writes them on a blackboard--thus collecting data and assisting the group and himself to conceptualize the problems encountered by various individuals in the group. The list of problems is then made the subject of an organized discussion in which the group assumes responsibility for development of solutions to each problem. When time does not permit an exploration of all problems, the group is allowed to select those of greatest in- terest. Maier cited evidence that this technique is effective in stimu- lating interest, helps problem conceptualization, and leads to greater productivity in deve10ping solutions (Maier, 1963, p. 191). In light of the foregoing discussion, it was determined by E that a small group problem posting debriefing would be the best format for generating the types of data required to revise prototype SLATES. Group_1eadershjp,--Most of the research information about leader- ship performance came from studies of leaderless group Situations, al- though some data came from studies using superiors' ratings of leadership in Operational settings (McGrath & Altman, 1966). Effective leadership has been shown to be a function of a number of characteristics and con- ditions including education, intelligence and/or task ability, high group status, training in leadership techniques, communication skills, and individual personality characteristics such as extroversion, assert- iveness and maturity. In the present study, it was determined that the prototype lesson author would be designated the group discussion leader by virtue of his expertise in the subject matter and his responsibility as the instructor 72 in the course. Assuming that the personality characteristics, education, communication skills, and intelligence of lesson authors (group leaders) cannot be changed, some benefit might accrue through training in group leadership techniques. However, due to lack of faculty time it was felt that any systematic group leadership training program for SLATE designers was out of the question. Therefore, as an alternative, a "debriefing checklist" was developed by E which outlined the ground rules, tasks, and responsibilities of all participants (Appendix 0). Group size.--The size of the group was determined largely by research on group processes and logistic considerations. For example, Maier (1963) cited evidence that greatest productivity in problem solv- ing groups is often obtained when the group contains between Six and ten participants. Logistically, Six to ten students from the target popula- tion should be readily available when the Opportune time for tryout is reached. The Optimal size decided upon was nine students plus the group leader (SLATE author) for a total of ten participants. Group composition.--The composition of the group was determined by the desire to obtain a sample which represented as nearly as possible the spectrum of abilities in the target population. It was assumed that students of different entering abilities but similar prerequisite know- ledge would encounter different learning problems with prototype SLATES, and it would be valuable for the SLATE author to be confronted with these problems. Furthermore, it was hOped that by varying the group composition between high and low ability students, the high ability stu- dents could assist the SLATE author in determining solutions to problems encountered by themselves and the low ability students. It was possible 73 that the opposite might also occur; e.g., low ability students could assist in solving high ability students' problems. Assuming the desirability of using students of varying ability in the group, the Scholastic Aptitude Test (SAT) was selected as a nor- malized measure of entering students' abilities. This measure was selected mainly because SAT Scores on most students in the target popu- lation (Michigan State University) were already available. It was felt that SAT was equally as valid as other measures for purposes of select- ing students possessing a range of abilities. For other target popula- tions, other normalized ability measures might be selected. It is the experimenter's opinion that the choice of a Specific measure of ability is not as important aS the procedure of using a normalized measure to select students possessing a range of abilities. Group and individual tasks.--The task of the group as designated in the ground rules was to provide the group leader information regard- ing identification and remediation of instructional problems. The general orientation given the students was to participate in lesson deve10pment as co-authors (Yelon & Scott, 1969). That is, the students were asked to share the responsibility for providing data on their learning problems as well as suggest solutions to these problems. The task of student participants was twofold. First was individual student interaction with the prototype SLATE materials. For logistic simplicity, students were requested to use the materials within some reasonable time period. After allowing some time for scoring lesson evaluation instruments, the debriefing began to take advantage of im- mediate reminiscence of learning problems. 74 The task orientation and preparation given the group leader (SLATE author) was: (1) to study and use the "debriefing procedures" checklist; (2) to adopt an attitude that "the materials are on trial, not the stu- dents:" and (3) commitment to the principle of “no reprisals" for frank and/or derogatory comments. The leader's task during students' use of the materials was to be that of a tutor offering assistance as required to individual stu- dents. AS a student indicated a problem, the SLATE author was to visit the student, note the problem and its location in the SLATE, answer the student's question, and discuss these problems during the debriefing. Presumably, if a number of students (30% or more) had similar problems, a revision was to be made so that the actual tutorial instruction would be incorporated into the SLATE. During the debriefing, the SLATE author should function as a data collector posting the problems and organizing the data so that later dis- cussion could focus on solutions to the problems posted. Obviously, different authors would vary in their group interaction skills, and these differences would affect the quantity and quality of data collected. Nevertheless, direct face-to-face confrontation with learning problems provides an experiential dimension which is likely to convince authors that certain problems must be remediated. The time limits of the total group process were established arbitrarily after consultation with several potential participants in the field trial part of the study. These authors indicated they would not participate in obtaining feedback from students any longer than two or three hours maximum--per SLATE. Therefore, a two hour limit was established for the group debriefing process. 75 Student experience and attitudes.--To obtain valid information on instructional problems, students would necessarily be selected from the target population for whom the prototype SLATE was intended. Stu- dents Should possess necessary SLATE prerequisites but not score higher than the chance level on the lesson pre-test. To ensure some degree of success in obtaining the desired feed- back, students should possess a positive attitude towards the task of the group. Selection of students from a pool of volunteers is assumed to meet the requirements of obtaining students with a positive task orientation. Summary of the Group Debriefing Technique Incorporated in the MK II Model A group process methodology was substituted for both the tutorial and large group tryout procedures Specified in the MK I model, thus over- coming many of the objections cited by reSpondents. The overt objectives of the group process were twofold: (l) to generate data on SLATE defi- ciencies/instructional problems, and (2) to develop feasible solutions to these problems. A covert or "hidden agenda" objective was to provide the SLATE author an opportunity to observe personally the deficiencies in the prototype and thus help overcome the natural reluctance to revise. The group process methodology is shown in Figure 6 and essen- tially involves the following components: (1) selection of nine volun- teer students who vary in their entering abilities (SAT scores), (2) individual use of the prototype SLATE materials by these volunteers, (3) administration and assessment of learning and attitudinal measures to provide a basis for conducting an organized debriefing, and (4) par- ticipation in a group debriefing following use of the materials which involves problem posting and problem solving techniques. 76 mcH>Hom EmHaosd w mchcgomH mcH>Hom EmHnosd\mcHHmHsnma qzoeo HH xziu.o mesmHm mcHHmHecmo Huzccoo mEmHnoea m>Hom eee HeHeeeeH mucmm< mmszmmmz mpcmvaum aim maxp meeedesnao HeeHeseHee< He eaHeeeoca -eeoca adeHda e meHeceds ee em: eeHz edemHeHEe< HeseHsHeeH cape“ mEOHcoca HHHeeeeH 77 Description of the MK II "Mini" and "Maxi" Models Previous discussion has presented the rationale for major modi- fications to the MK I model. In revising the MK I model it was deemed necessary to create two revised versions which are designated the "mini" and "maxi" MK II models. The “mini" version is highly simplified in order to facilitate conceptual understanding of the process. The "maxi" version is highly detailed and intended for use by consultants or with faculty who are intimately familiar with the "mini" version. MK II "Mini" Model Basically, the MK II_"mini" model is a flowchart specifying the chronological sequence of tasks which are to be performed by an author during formative evaluation of his SLATE (see Figure 7). Each task contributes to a function essential to the total process. In all, there are five basic functions: (1) logistics, (2) data collection, (3) data analysis, (4) revision design, and (5) recycle. At least two iterations are required to complete the process because the model stipulates that data be collected from two fundamentally different sources of information and revisions be developed sequentially based on these two sources of feedback. These sources of information are: (1) technical consultants and (2) volunteer students. Each source provides feedback on basically different types of problems. Technical experts, for example, provide feedback on discrepancies in subject matter content, instructional media, and in evaluation instrument design. Volun- teer students, on the other hand, function to provide feedback on their specific learning problems. Both sources complement each other so that the widest range of discrepancies can be identified in a minimum length coeeeeHesm dsHeeeLoa eo Heed: =HeHz= HH x2--.H acemHe 78 o.m m H283. V v .m N.H mycmvzum Ease AT. muaoHyH o :oHHumHHou unmvzpm 2 open com moHpmHmoA o.m . o.¢ . ume HcmanHm>mo mHmHHmc< mm> conH>mm apes o.~ H.H mpsmaxm sosm ucmsmmmmm< :oHHumHHou 1w. HmuHcsumH mums LoH moHHmHmoA coHumHuwsmm EmHaoLa mHmHHmc< EmHnosa :oHHmoHHHucmvH EmHnosm 79 of time. The process begins when a prototype instructional unit is com- pleted to the point where the author believes it is ready to be used with students. _ Assuming "readiness" of a prototype SLATE, the MK II formative evaluation process consists of two cycles of "problem identification,“ "problem analysis," and "problem remediation." In the first cycle, technical problems are identified by feedback from technical experts who review the new instructional unit. Following collection of data on technical discrepancies, the SLATE author analyzes these problems in conjunction with an instructional development or learning Specialist, and revisions are deve10ped. The process then recycles so that in the second cycle, learning problems are identified through feedback from a group of volunteer students from the target population. Again, follow- ing collection of these data, the SLATE author analyzes the problem with an appropriate consultant and revisions are developed. It is important that technical discrepancies be remediated be- fore student tryout of the prototype SLATE. The reason for this sequence is that SLATE authors vary considerably in their media design and pro- duction skills, their knowledge of and ability to organize subject matter, and in their skill in designing evaluation instruments appropriate to formative evaluation. To preclude students' learning erroneous content, being confronted with illegible or inaudible stimuli, and/or avoidance of critical omissions in evaluation instruments, the SLATE author must obtain feedback from the technical experts and revise the prototype prior to student tryouts. The number of eycles required to bring the prototype up to opera- tional readiness would vary depending on how "bad" the prototype was and 80 how stringent the Operationally ready criteria are. In the present study "operational readiness" was defined as: (1) 80% or more of the student tryout group achieving 80% or higher on the post-test, and (2) not more than 20% "unsatisfactory" reSponses on the post-instruction attitude survey (Appendix G). In sum, the purpose of the MK II "mini" model is to provide a framework to familiarize authors with the process of identification and remediation of problems which interfere with student achievement of in- tended learning objectives. The MK II model basically consists of two cycles of five Similar functions; the major difference between cycles being the source of information which provided the feedback during data collection. Cycle one essentially serves a technical quality control function by obtaining feedback from technical experts and developing re- visions based on this feedback. Cycle two serves to remediate specific student learning problems by obtaining feedback from volunteer students in the target population and devising revisions to alleviate these dis- crepancies. The two cycles are complementary in that through use of two different sources of feedback, the widest range of discrepancies can be identified and remediated. MK II "Maxi" Model The MK II "mini" version just described serves a useful purpose in orienting users to the process of tryout and revision. After orien- tation however, there is a need for detailed instruction and specifica- tion of techniques for carrying out the process. This detail is provided in the MK II "maxi" model, shown in Figure 8. Because the reader need ENTER UITH PROTOTYPE SLATE 81 LmIS‘I' ICS PARE CWSULTANT RYOUI'S SELECT 6 BRIEF CNSUL'IANI‘S ‘ DETERNINE DATA I W‘“. .1011 OF FEED NEEDED PREPARE roe STUDENT TRYOUTS DETERMINE DATA FORMAT anaemic: sun "a nus LA! I. l! I.l LL] 1 L CT 5'. FROM OBTAIN 5 SET UP uraooucz z. 015- 331.5“?an mm. FACILITY TRIBUTE MATERIALS ”CL DEBRIEFING MK? / $8.2 . . ,0/ I to I (‘01.. i STI‘DEN‘T TRYOUT DATA COLLECT TECHNICAL REVIEW DATA . t .161, .1 ['7' DJRIEF F MTA CONDUCT cnour 0&1:erth . ' . ‘ BRIEFING SI‘BJECT MATTER INSTRUCTIONAL MEDIA F‘v‘AL. INST. 02mm ‘ A -E-‘~ A 3.1 1 ‘CURACY AL'DIl‘ QL'AL ITY FPRE TEST 'i'iill".'EHE?\T HFASWES 2.1.1 2.2.1 L 3.3.1 .9.“ COLLECT I 'DIVIDUAL TRYOUT DATA . . 1w ' 2.1.: 2.2.2 .-.3.: """‘ Armin-U “E“;fgf“ Ari I'Il‘DIRAL “717L015“: ’ ‘ Tamra“ Pam QL'ALITY F‘JP ~v'rr TESTS ‘QASLiES n.9,: , DATA 2.1.3 2.3.1 1 i . 3-2-5 OBJECTIVES OTHER MEDIA il‘Nl‘lCT CAN”? fxi‘HRI F T 2.1.44 2.2.4 IN'T {ilfli’i'l'lil‘s' PROCESS *- 2-‘ *1 “WU ~ F91 ”'- 3.12.: r: ORIAL OAIA mom: TESTS Hrfii QLALITYj r 1" -‘.‘LE" Pm‘ilizi $4.1 ‘1‘ > 1 1 : I ~ 1 9 2 "o r VBLEH s-it -‘ r. I 3 1- ’ 9.! 9.0 A‘SAL.'1",E DATA 1 1 CAI'SAL YALTORS L281 N r 11,111“! H‘s [.‘-. may More , ,1 mm. 4.2.1 ‘4 (1 YES BOMBER U} :_('N- PUNEfi-TS AFFECTED -—-—-——-—-- 4.2.: Tum-.91: anatvsls ‘3.2 THE s «FSOL'RCFS AVAILABLE l EXIT WITH 5.2.) MAKE GO-EIO-C‘Ci OH 15105 ”A51“, ””5””; FOR EACH osscagrmcr "WINES“, , mm; It RE‘JISE a. .4 .., NO 5 l'NREVISED COH- ! “E“,‘SE mums INTO se.; ESTIMATED COST - . I 0? REVISION 4'2 5 YES 4.3 do 4.0 i ll DEVELOP REVISIONS A A H— — j]; i win evelueuon ‘ uvxsz omccrzvzs Revue Heuege Design nu”, etuden: nntrunente e! onse end(or CONTENT Conplextty Teegbeck Achievemnt teete 5.1.1 Frequency Pro 6- Poet tee: Senee modelity Wordr‘picture .DJJ reletlonehl e ‘52-] 5' :~’.4-’ 511:: . . form: Enroute teeu Redundenc Reta of ree- y cut-tionp {42.2 fut/«1 3.1.1.3 5,4,4 , levtee other SLATE (4.1 TY” cm J ‘52,, Affective . neeeuree 3.2 “I. fr! k .— f ._ to 82 not have intimate knowledge of MK II "maxi" procedures to understand the thrust of the present study, the detailed explanation of MK II "maxi" is placed in Appendix 8. Chapter Summary This chapter has described the first phase of validation of the model of formative evaluation being deve10ped in the present study. This first phase of validation consisted of interviewing seven selected SLATE authors to determine their opinions on the practicability of the MK I model and assess the degree to which MK I is congruent with their personal tryout and revision procedures. Interviews were conducted with six non-professional and one pro- fessional SLATE authors. The net result of these interviews was recog- nition on the part of the experimenter that the MK I model differed considerably from current practice of the respondents. In general, the respondents felt the MK I model was impractical. The major problems with MK I were that it appeared too time consuming, costly, and logistic- ally difficult to integrate into ongoing teaching activities. As a result of these data, major modifications were made to the MK I model, resulting in MK II "mini" and "maxi" models. The major difference between MK I and MK II versions is the inclusion in the MK II of a student group debriefing. This debriefing follows student use of the prototype instructional stimuli and is organized to follow a problem posting and problem solving format. It is reasoned that by meansof the debriefing procedure students can aid the SLATE author in developing solutions to the problems identified in a minimal amount of time. 83 Following deve10pment of the MK II version, the study progresses to the second stage of validation: field test of the MK II version with three Michigan State University SLATE authors. The methodology for the field tests is outlined in Chapter IV, and the field tests themselves are described in Chapter V. CHAPTER IV METHODS AND PROCEDURES The research methods and procedures used in five field trials to investigate the efficacy of the MK II model are described in this chapter. Two distinct types of research objectives were being sought in this study. The first was related to experimentally comparing student achievement and attitudes resulting from a prototype (unrevised) SLATE with the revised counterpart. The second type of objective centered on understanding and describ- ing the process through which the experimental treatments came into being. In this study, the experimental treatments (revised SLATES) were developed on the basis of procedures in the MK II model. Since the MK II model was itself a prototype, a description of the problems and successes resulting from its procedures was essential for further modification and refinement of the model. Research Strategy The overall research strategy called for five field experiments in three disciplines to include gathering and analysis of both descrip- tive and experimental data. Essentially each field experiment represented a replication of the developmental process, that is, application of the MK II model in a field setting. It was felt that five replications involv- ing three different authors and academic disciplines would provide a sufficient number of trials to: (1) identify critical variables in the 84 85 process, (2) suggest modifications to the model, and (3) establish the validity, feasibility, and effectiveness of the MK II model. Descriptive Methodology Data Collection Descriptive data were collected using the basic technique known as high inference observation (Kerlinger, 1964, p. 510). Using this method, an observer abstracts relevant information from his ongoing ob- servations and later makes inferences about variables. The experimenter (E) had the dual responsibility of interacting with each author (Author A, B, and C) on a consultant basis, as well as observing and recording the nature of these interactions and subsequent decisions. Narrative data were collected at each meeting between experi- menter (E) and individual SLATE authors (A, B, and C). During these meetings, E kept a "log" which was then summarized and combined with im- pressionistic data in a memorandum written immediately following each meeting. Inferences, problems, and suggestions were included in the last section of each memorandum. Tape recordings supplemented E note taking during the very critical author-student feedback interactions at control group and experimental group tryouts. But all other descriptive data were gathered by E observation and note taking. At the conclusion of each field experiment, the memos from each ' meeting were summarized to form a narrative description of the whole fOrmative evaluation/developmental process. These narrative descriptions were systematically related to procedures in the model and reported in Chapter V, “Description and Results of Five Field Trials." 86 Experimental Procedures and Methodology Similar procedures and methodology were used to conduct experi- mental comparisons between original and revised SLATES in three field experiments, A], A2, and B]. In two field experiments, A3 and C1, ex- perimental treatments were not developed. Therefore, the following description of experimental procedures apply only to A], A2, and 8]. Experimental Design The basic experimental design used in this study was the before- after control group design (Campbell & Stanley, 1963) illustrated in Figure 9. This design has been criticized by Kerlinger (1964, p. 310) for its use of pre-tests which may be reactive. That is, experimental SS may become sensitized to the criterion test items and may then be responding to a combination of reminiscence of test items as well as the experimental treatment. In the present study, this sensitization effect was not considered a problem, but, quite the contrary, as an advantage. Pre-test items were regarded as "advanced organizers" and operational definitons of SLATE objectives. Sensitization to objectives by means of test-like events may enhance learning (Rothkopf, 1966, 1968) so pre-tests were considered essential and integral parts of both experi- mental and control group treatments. - control pre-test treatment post-test Randomized SS _ experimental _ pre test treatment post test Figure 9.--Before and After Control Group Design 87 Selection of SLATE Authors The three participating authors (A, B, and C) were selected on the following bases: 1. They were currently teaching a course using SLATES which they had personally deve10ped. 2. They had deve10ped a prototype SLATE for use in their course which had not previously been used by students or undergone any formative evaluation. 3. They were willing to participate in this study with the under- standing that volunteer students from their current course would provide feedback on their prototype SLATE; the total process of data gathering and revision would likely take 20- 30 hours of their time; it was likely, but not certain, that the revised SLATE would be better than the original. 4. They had Similar backgrounds and experience in programmed in- struction and SLATE design, but were from different academic disciplines. Author A participated in formative evaluation of three SLATES, designated A], A2, and A3. Authors B and C each conducted formative evaluation of one SLATE, designated BI and C]. Additional background information on Authors A, B, and C is con- tained in Appendix F. Selection of Students Population.-.The populations from which students (SS) were selected were defined as the target populations for which the prototype SLATES were intended. Three populations were involved; specifically, the students en- rolled in three courses at Michigan State University, Fall term, 1970. These three courses were: (1) Animal Husbandry 111 (an introductory course for majors); (2) Education 327M (an introductory course for teachers of secondary school industrial arts, metalworking); and (3) Biology 141 (an introductory course in biology for majors). These courses were taught by the three participating SLATE authors. 88 Stratified random sampling.--Sampling procedures treated $5 from each course as essentially different populations due to differences in subject matter content and prerequisite Skills involved. Selection of $5 for experimental and control groups was predicated on four criteria: (1) voluntary status, (2) stratification by SAT score, (3) randomization, and (4) 55 would possess prerequisite Skills required by the prototype SLATE, but would be naive with respect to the terminal objectives. After consultation with SLATE authors, agreement was reached as to the most appropriate time in the course sequence to run the experiment. Authors agreed to withhold information in their courses which might bias SS until after the control and experimental groups had been conducted. About one week prior to prototype (control group) tryout, authors personally solicited volunteers from their classes. The experiment was described as a learning experience in which all class members would have to participate eventually, but that some volunteers were needed immediately to provide constructive feedback on a prototype version. This feedback would be used by the author to revise the SLATE and hence improve the learning experience for those to follow. Solicitation was successful in that a sufficient number of volunteers were obtained to permit stratifi- cation and randomized assignment to treatments. After obtaining a pool of volunteer $5 from each population, E obtained Scholastic Aptitude Test (SAT) scores from University records. Volunteers not having SAT scores were dropped from the pool. Within the volunteer pool from each class, E stratified SS into High, Medium, and Low sub-groups. This was done by making a rank order list by SAT, for each pool of volunteers, then partitioning each ranking 89 into thirds, for three sub-groups. $5 from each sub-group were selected randomly and alternately assigned to control or experimental groups until each treatment had an N=12 consisting of four high, four medium, and four low SAT 55. A schematic of the sampling procedure used for the three ex- perimental comparisons is shown in Figure 10. Stratify ”1 SAT: firandOmlze CONTROL GROUP N=12 Pool of Volunteer MED SAT randomize SS LOW SAT randomize EXPERIMENTAL GROUP N=12 Figure lO.--Procedure for Assignment of $5 to Treatments In one case (Bl)’ however, so much time elapsed between control group tryout and development of the revised version (seven months), that SS originally designated for the experimental group were no longer naive with respect to the content of the prototype SLATE. Consequently, a second call fOr volunteers was made from an equivalent population (same course, two terms later) and stratification and randomization techniques were used to select the eXperimental group. In all three experimental comparisons, SS were volunteers from the ongoing course, SAT scores were used as the partitioning variable, equal numbers of $5 from high, medium, and low sub-groups were represented in experimental and control treatments, and pre-experimental equivalence was substantiated by comparison of pre-test scores between experimental and control groups. 9O Treatments The five prototype SLATES used in the study were all to be used in ongoing courses at Michigan State University, Fall term, 1970. Author A developed three prototype SLATES, designated A], A2, and A3, to be used in his undergraduate service course in Animal Husbandry (AH 111). These SLATES were entitled "Pork Carcass Evaluation," "Cattle Breeds," and "Cattle Carcass Evaluation." This course enrolls 175 students per term, primarily freshman and SOphomores, who are heterogeneous in terms of major fields, motivation, and background. The instructional method used in the course consists of two lec- tures, two SLATES and one laboratory per week. Students would therefore be very familiar with the SLATE self-instructional environment. The fourth SLATE, developed by Author B (designated B1), was a lesson on "How to Read and Care for a Micrometer." This SLATE was to be used in an undergraduate course in industrial arts enrolling fifty indus- trial arts majors, primarily juniors and seniors. No other SLATES are used in this course, so students were not familiar with the format. The fifth SLATE developed by Author C (designated C1) was to be used in a freshman biology course serving 150 majors in a residential college. The prototype SLATE used in this study was an overview of several types of ecological systems, entitled "The Schema Biologica." Since other SLATES were used in this course, students would be familiar with this technique. Control treatment.--All control group treatments involved 55' use of unrevised prototype SLATE materials which had been reviewed by E for evaluation instrument quality and reviewed by author peers for 91 content accuracy. Control treatment SLATES consisted of pictorial in- formation on 35 MM slides and in student workbooks, audio information on a tape recording, printed information in the student workbook, pre- and post-tests and a post instruction attitude survey. In A1 and A2, SS responded to these materials individually in learning carrels. Stu- dents thus proceeded at their own rate, controlling number of repetitions of slides and tapes, and response rate in their workbooks. (Any time they repeated slides or tape, they were asked to note this activity.) Audio information was presented via headphones, and 55 were asked not to interact with one another but to direct any questions to the SLATE author who was available in the carrel room. In B], however, insufficient carrels were available for simulta- neous individual Student participation prerequisite to the group debriefing. Therefore, out of necessity, a group presentation mode was adopted instead of individual presentations. In the group mode, the SLATE author controlled a single Slide projector and tape recorder, stopping or repeating the pre- sentation at the request of any S. 55' responses were, nevertheless, still recorded individually in their workbooks. When a S stopped the presenta- tion by asking a question, obviously the whole group was affected. Consequently, B1 was not really a close simulation of a self-instructional environment. Nevertheless, Since the purpose of the group was to provide feedback to the SLATE author, the technique was considered valid. How- ever, caution must be used in interpreting scores on post-tests as these scores are likely to be inflated as a result of group discussions during the original presentation. 92 Experimental treatments.--Each experimental treatment consisted of SS using the revised set of slides, audio tape, workbook, pre- and post-test, with the attitude survey unchanged. A1 and A2 again used the self-paced carrel mode and 81 used the group presentation mode. In two cases (A1 and A2), the elapsed running time (no playbacks) of the revised versions was reduced 20%; on the other hand, B1 elapsed time was increased 50% (17 minutes to 26 minutes). Development of experimental treatments are reviewed in the next chapter. Independent Variable The independent variable in each experimental comparison was con- ceptualized as the total set of procedures, operations, and decision rules contained in the MK II model of the formative evaluation process (Figure 8), plus unique contributions by the users of the model (E and SLATE author). In short, the independent variable was the model and its appli- cation. Dependent Variables Four dependent variables were used as criteria for assessing the effect of the independent variable. 1. Group Mean Achievement.--Intended as an immediate post measure of student achievement of terminal objectives. 2. Gain Score.--Mean difference between pre-test and post-test scores. 3. Percentage of Students Achieving "Mastery."--Intended as a criterion referenced measure to determine which treatment enabled a greater number of $5 to achieve a minimum acceptable level of performance, e.g., 80% or more correct on post-test. 93 4. Student Attitudes.--Intended as an immediate post measure of student perceptions of lesson deficiencies and strengths. Development of Instruments Generally, two types of instruments were deve10ped. First, mea- sures of student achievement specific to a given SLATE were developed by each SLATE author in consultation with E. Second, a Likert-type instru- ment was developed by E to assess student perceptions of lesson strengths and weaknesses. Achievement measureS.--Student learning on each SLATE was mea— sured by SLATE author designed pre- and post-tests. Pre-tests contained a caveat to reduce anxiety or frustrations resulting from a low score, but cautioned 85 that 80% criterion was required on the post-test. The post-test and pre-tests used identical items and a self-scoring format. This format was selected because additional learning would likely occur as $5 scored their tests. The overriding majority of these particular SLATE objectives were cognitive; e.g., recall, visual or verbal discrimination, or problem solving. B], however, did require an integration of cognitive and percep- tual motor skills (measurement with a micrometer). In light of the pre- pOnderance of cognitive objectives, achievement measures were largely paper and pencil variety. At E's suggestion, item forms were deliberately varied to include true-false, multiple choice, completion, and matChing. During initial review of these achievement tests, E noted a number of discrepancies in that test items did not reflect stated SLATE objec- tives. This problem was compounded by the fact that in no case were SLATE objectives stated in behavioral terms. Consequently, E consulted with 94 each author approximately four hours per SLATE helping operationalize their objectives and translate these operations into test items. As finally developed, pre- and post-tests included many items in common with the en route self-tests. Particular attention was given to articulating post-test items with en route self-tests so errors on the post-test could be linked back to that place in the SLATE where instruc- tion was accomplished. Feedback from students (control group) showed that numerous items on the prototype achievement measures were faulty. These items were then either deleted completely, thus reducing the total number of items, or were replaced by new and presumably better items. Thus, experimental and control group achievement measures were not totally identical. To assess the statistical significance of differences between experimental and control achievement measures, only those items common to both original and revised measures were used. The total number of items on original and revised measures and number of items common to both is Shown in Table 6. Table 6.--Number of Items on Pre- and Post-Tests Total Items on TotaT'ItemS Common Between Pre- and Post-Test Experimental and Control Group A] Control 60 40 items worth. Experimental 52 47 pOTntS pOSSTble A2 Control 56 40 items worth_ Experimental 47 40 pOTnts pOSSTble B] Control 15 15 items worth. Experimentai 15 15 pOTnts pOSSTble 95 Scoring and data displpy.--All pre- and post-tests were self- Scored by SS. To reduce cheating, SS were given an answer key when nearly finished with each test. Furthermore, before data analysis was begun, all scores (totals and individual items) were rechecked by E for accuracy. (E noted about 10% scoring error rate, usually with the error raising the S's score.) During control and experimental group tryouts, test scores were displayed on an item by student matrix (Appendix E). This method enabled E and SLATE author to identify items missed by over 30% of the group and any such item became a topic of discussion at the group debriefing. Attitudinal measure.--A post instruction attitude survey was de- veloped by E specifically to measure 55' perceptions regarding several aspects of the SLATE they had just finished (Appendix G). Specifically, this instrument was a twenty-seven item Likert-type rating scale seeking to measure four general factors. 1. SLATE strengths and weaknesses resulting from communication/ message design factors: “hm 0.0 U93 Factor Item Number Rate of presentation 8 Redundancy 9 Interest and attention 5 Clarity of instruction and examples ll, 13, 15 Vocabulary level 16 Audio and video quality 7 SLATE strengths and weaknesses resulting from learning or task factors: LD‘hQQOU'O’ Prerequisites 1 Objectives 2 Motivation 3 Organization and sequence 6, 14 Evaluation and feedback 17, 18 Type of response and frequency 12, 19 Relevancy of information 10 96 3. SLATE strengths and weaknesses resulting from management/ technical factors: Factor Item Number a. Equipment manipulation 4 b. SLATE methodology 28 c. Tryout procedures 27 d Degree of revision needed 22 4. Perceived learning and attitudes resulting from the lesson: a. Attitude towards subject matter 30 b. Terminal understanding of concepts 26 c. En route understanding of concepts 29 d. Certainty of learning 20 e. Amount of learning 21 In addition, four open-ended questions were included to encourage students to express opinions and perceptions not previously accounted for in the Likert items. The attitude survey instrument was used in all experimental and control groups. Few criticisms of this instrument were obtained during debriefings; hence items were not modified and the rating Scale as orig- inally drafted was used throughout. Scoring and data display.--During each experimental and control group, the attitude survey was scored by E immediately after completion by each 5. A numerical value from one to five points was assigned to each response, five representing the ”ideal“ response and one representing a very low or dissatisfaction response. Total scores for individual 55 were tallied, but more important, a running tally was kept for each item on the attitude survey. If a 5'5 response deviated by more than two points from the ideal, it was tallied. Each item, which 30% or more of SS had rated too far from ideal became topics of discussion at the debriefing. In addition, if 30% of the "open-ended" responses were on a similar topic, that topic was discussed during the debriefing. 97 Experimental Procedures A and A used identical procedures; however, B varied in 1 2 1 several reSpects and will be described separately. Al and A0 procedures.--After experimental and control groups were selected, E coordinated scheduling of the SLATE author, the carrel facilities, and the 55 by selecting a date and time for the experiments and asking SS to RSVP, regrets only. 85 who had a scheduling conflict were traded among treatments, provided they were in the same SAT sub- group. If no trade-off was possible, the originally selected S was dropped and another selected from the pool of volunteers, within the given SAT sub-group. Data collection in both A1 and A2 experimental and control groups were conducted from 7:00 until 10:00 p.m. in the carrel facility in the Department of Animal Husbandry, 108 Anthony Hall, Michigan State Univer- sity, during Fall term, 1970. This facility can accommodate twelve individual students maximum. To reduce possible bias from 55' social interaction, the experi- mental treatment was developed and administered as rapidly as possible following the control group data gathering. In A1 and A2 the time inter- val between administration of control and experimental treatments was one week. E developed an "agenda” for the conduct of the experimental and control treatments which was discussed extensively with the participating author several days prior to the first tryout (Appendix C). After the discussion, E provided the SLATE author with a checklist to guide the treatment activities. It was determined that the SLATE author rather than the experimenter should conduct the experiment, in the sense of 98 providing instructions to the SS, answering their questions, and con- ducting the debriefing. E would be present to Observe the process, collate and score instruments, and remediate minor technical difficul- ties; but operationally, each treatment was conducted by the SLATE author. (This decision was made to see if the procedures in the agenda could be carried out competently by the SLATE author; if not, what changes would have to be made so the procedure would be independent of E.) Since the complete agenda and checklists are included in the appendix, they are not reiterated here. Instead, a narrative summary of the procedures are presented. On the evening of a treatment, SLATE author and E arrived one hour early to inSpect all carrels to prevent obvious technical mal- functions such as inoperative or missing equipment or Slides improperly positioned. As SS arrived, name tags were provided and SLATE author began non-course related "small talk" to place 55 at ease. After all 55 had arrived, E tape-recorded the remainder of the session. The formal treatment began with a 10-15 minute orientation briefing by the SLATE author designed to do the following: 1. Express appreciation for Ss' participation and orient 55 as to the purpose of the session. 2. Relieve Ss' anxiety and facilitate their open and frank interaction. 3. Describe the planned sequence of events which were: Pre-test Individual use of treatment AV materials Post-test Attitudinal survey lS-minute "break" including refreshments Reconvene for debriefing and feedback session ‘hDQOU’D 99 4. Establish the "ground rules" for the session which were: a. No talking to each other during lesson b. Take notes on type and locating of problems; e.g., don't understand, bored, lesson too fast, etc. c. Raise hand for tutorial assistance d. Score own pre- and post-tests e. 00 not cheat f. Do not discuss SLATE during the break 9. Please remain for the debriefing It was repeatedly emphasized that in no way would 55' remarks be used in a punitive sense. Following the orientation briefing, SS selected a carrel and worked on the pre-test. As they neared completion, E distributed pre-test answer sheets. 55 were allowed to begin the lesson immediately after completing and scoring the pre-test. There was usually a 5-10 minute dif- ferential among Ss regarding pre-test completion time. When all 55 were working on the lesson, E collected all pre-tests and answer sheets. 55' scores were rechecked and placed on an item-student matrix for display. In most cases 55 achieved below chance level, although one or two scored 70% correct. (Later discussion with these SS showed they were guessing.) While SS interacted with lesson materials, Author A circulated freely answering questions on a tutorial basis and made notes of the ques- tions and his responses. All such interactions were tape-recorded by E. Post-tests were distributed as SS neared completion of the SLATE. Answer keys and attitude surveys were distributed as SS neared completion of the post-test. SS returned scored post-tests and unscored attitude survey to E and then took a lS-minute recess. Soft drinks and donuts were available at this time. Refreshments were served to reduce fatigue effects, to occupy the unprogrammed time during the recess, to reduce anxiety and lOO promote an atmOSphere of free interaction among 55 prior to the debriefing. During the recess, E and SLATE author tallied attitude survey and post-test scores and noted those items which indicated a discrepancy for 30% or more of the 85. These discrepant items became the agenda for the debriefing. Debriefings were conducted in the carrel room. The SLATE author began each debriefing by reiterating his need for frank, candid, construc- tive criticism since the author and program were being evaluated, not the SS. Using the agenda developed during the recess, 55' interaction was guided towards the problem areas. As specific problems were broached by SS, E wrote the problems on a poster board large enough to be seen by the group. Debriefings concluded naturally after approximately one hour. 8, experimental procedures.--Data collection in B1 experimental and control treatments were conducted from 7:00 until 10:00 p.m. in the Industrial Arts carrel facility, 115G Erickson Hall, Michigan State University, during Fall term 1970 and Spring term 1971. B1 differed procedurally from A1 and A2 in several significant ways. First, B1 used a group presentation mode instead of SS interacts ing with SLATE materials on a self-paced, self-instructional basis. The SLATE author Operated the AV equipment and SS were instructed to interrupt' the presentation any time they had a question. The ensuing interaction involved the entire group and allowed the SLATE author to establish an immediate consensus on any given problem by asking, "How many of you (SS) feel that way about X . . .?" 101 Due to interruptions and SLATE author explanations, total in- structional time during prototype (control group tryout) was 98 minutes. This represented a 500% increase over the l7-minute elapsed running time of the prototype self-instructional AV presentation. Ambient light in the room was a factor in that 35 could not clearly see their workbooks in the dark, nor the screen with the lights on. Since many responSes were related to visual discriminations on the Slides, the inability to see workbook and screen Simultaneously may have adversely affected learning. The orientation briefing, pre- and post-test scoring, and use of post-test and attitudinal data to develop a debriefing agenda were Similar to A and A . Moreover, the debriefing itself was procedurally the same. 1 2 But since much of the information had been discussed earlier in the con- text of the lesson, B1 debriefings were typically one-half the length of A1 and AZ. Research and Statistical Hypotheses The following research and statistical hypotheses were tested in all three experimental comparisons; A], A2, and B1. H]: SS using revised instructional stimuli will Show greater mean achievement on post-tests than SS using prototype (unrevised) instructional stimuli. H]: Xe>xc H0: Xe = Xc H2: 55 using revised instructional stimuli will Show greater gain score between pre- and post-tests than $5 using pro- totype (unrevised) instructional stimuli. H2: Xe>XC HO: Xe = Xc 102 Percentage of SS achieving "mastery" (80% correct on post- test) will be greater among SS using revised instructional stimuli than among SS using prototype (unrevised) instruc- tional stimuli. H3: %e>%c H0: A =% SS using revised instructional stimuli will Show a greater mean score on measures of attitude regarding effectiveness of instruction than SS using prototype (unrevised) instruc- tional stimuli. Xe>XC H ° 0' Xe = XC H4: Data Analysis and Statistical Treatment H e 1. Involves a comparison between the mean achievement scores on post-test instruments using two independent samples (N=12). Assuming interval data, equal population variances, and normal distribution of the achievement scores, a t test is an appropriate test of significance. Involves a comparison between the mean gain score (difference between pre- and post-test scores) using two independent samples (N=12). Assuming interval data, equal population variances and normal distribution of achievement scores, a t test is an appropriate test of significance. Involves a comparison of the difference between two pro- portions; the proportion of SS achieving "criterion" in the experimental treatment compared to the proportion of SS achieving "criterion" in the control treatment. The signifi- cance of this difference may be computed by determining the standard error of the difference between two uncorrelated prOportions, converting this to a z score, and determining the probability of such a z score from the table of the normal curve (Edwards, 1950, p. 77). 103 H4: Involves comparison of mean scores between tWo inde- pendent samples (N=12) on measures of Ss' attitude towards the instructional stimuli and total learning environment. Assuming interval data, equal pOpulation variance, and normal distribution of the attitudinal scores, a t test is an appropriate test of significance. Chapter Summary The descriptive and experimental methodology used to assess the validity, practicability, and efficiency of the MK II model have been I described in this chapter. The methodology involved experimenter devel- oped narrative reports Of all revision activities with each of three authors. The experimental design involved three separate field experi- ments each using a group design. Three prototype SLATE authors were selected to develop revised versions. 55 were volunteers from the SLATE author's course who were stratified into three groups according to Scholastic Aptitude Test (SAT) scores. Four 35 from within each group were randomly assigned to treatments (N=12). Effects of experimental and control treatments were determined by measures of four dependent variables: mean achievement, gain score, percentage achieving criterion, and mean attitude score. Four hypotheses regarding comparison between experimental and control groups were tested at the .05 level of Signifi- cance. T tests and/or a table of the normal curve were used as appropriate. A schematic of the experimental comparisons is shown in Figure 11. 104 HmoHouoguoz comHLmanu HmpcmEchqu Ho queemcumnu.HH mesmHm osoum :Hmw H /// Hm>szm - chEpmmLH - _ HmHano TL muzqup< pmmp pmoa Hosucou pmwp med Ar NHuz magma HomHzou 7 C F ewe» e AH “wee e Had» A V mewHMwwwmQH \ m>s=m ucmsummsH HwHano weauHHp< ummpipmom .EHsqum ummpimca NHuz .azomw 4 Ho Hoom mNHEoucem Hms ecu maxuopoca neon op coEEoo msmpH “mop EosH mmeoum co wanna upmu mmmgH ”mHoz Aw <2 HeucmEHcqum Ho .riitr. <2 eH Haceeoo MH<4 .A. <2 Heeeaeecuaxm m< HH.e NN.me we NH Haceeao mHmc use maxuopoga span oH cosaou msmHH EoeH mmeoom :0 women even mmmsH "mHoz A. «z HeeeeeHcaaxm HU AA «2 HH Hoceeau mH<4m A <2 HchmeHLmaxm m< mm.eH N~.me mm.e~ om NH Haceeeo mHmng< mucmnaum Ho :oHHHoaoem wnH Ho comHHeaeooui.m mHneH 138 performance, whereas only 42.85% did so during the control group tryouts. The resulting difference is 57.15%, which calculates to be a z Score of 2.496. The table of the normal curve indicates the probability of a z of 2.496 or larger to be .0064. This 2 Score is therefore significant beyond the .01 level allowing rejection of the null and acceptance of hypothesis 3. Discussion of Findings Relative to Percentage of Students Achieving Criterion In two cases, A1 and B], a large percentage of students achieved the 80% criterion during the experimental treatment. This reflects re- mediation of both organizational and content emphasis problems as well as elimination of poor evaluation items. The improved student performance in B1 was remarkable in that 100% achieved criterion in 47 minutes in- structional time, as Opposed to 42.85% at criterion after one and one-half hours instruction during the prototype. (This SLATE had been completely reorganized to closely follow suggestions given by students at the proto- type debriefing.) The exceptional case again was SLATE A2 which only showed 8.27% improvement in percentage of students achieving the 80% criterion. Part of this relatively poor showing could be attributed to confusion on the post-test items related to discrimination between types of cattle car- casses. Again, the use of "properly" exposed slides misled students into selecting the wrong answers based on color alone. Another problem with this SLATE was transfer of training combined with satiation. Students were expected to learn a number of complex anatomical discriminations based primarily on line drawings in their workbook. Yet they were tested 139 on these concepts using actual photographs of carcasses. Since they had been given insufficient practice in making these discriminations on photo- graphs, many were unable to perform this task satisfactorily on the post-test. Furthermore, there was a satiation or fatigue factor operat- ing. Many students complained that they had seen so many beef carcasses in the SLATE that they all began to look alike; hence on the post-test they just "gave up." Again, the interesting phenomenon regarding SLATE A2 was that the MK II procedures successfully provided insight into why the data showed no significant difference. Included in Table 9 is the percentage of students achieving criterion for SLATE A3. It can be seen that 77.7% did achieve criterion when using the prototype; hence the author felt justified in not making any further revision. HYPOTHESIS 4: STUDENTS USING REVISED INSTRUCTIONAL STIMULI WILL SHOW GREATER MEAN SCORE 0N MEASURES OF ATTITUDE REGARDING EFFECTIVENESS 0F INSTRUCTION THAN STUDENTS USING PROTOTYPE (UNREVISED) INSTRUCTIONAL STIMULI. Data relative to this hypothesis is presented in Table 10. In the case of SLATE A], the calculated t ratio of 2.539 was greater than the tabled value of 2.508 occurring at the .01 level of Significance, 22 df, when using a one tailed test. Since the calculated t ratio exceeded the tabled value, the null hypothesis was rejected and hypothesis 4 accepted. In the case of SLATE A the calculated t ratio of .496 did not 2’ even approach the tabled value of 2.539 found at the .05 level of sig- nificance, l9 df, when using a one tailed test. Since the calculated value of t was smaller than the tabled value, the null hypothesis was 140 .eame aHeum HHHHec H-H see. HN umHoz Aw, <2 HchmsHHmaxu Hu He.m He.mm mm. H. Haceeao mHesm waspwpp< meHm Hzmozhm APPENDIX F BACKGROUND INFORMATION ON THE THREE PARTICIPATING AUTHORS 196 .cmsgmmgw xpwgme .uwmzsumpumoa an vmaopm>mc mace: smopowm H>< as» Eugen cmquoe «gm: mmp<4m -Psa Egon Ema mucouspm mmmch .npm; Pacowmmmm mcwgummh COP .xmmx Lag mp<4m -oea usagpwz cowpusu mamas NN Loammcoea o F uco nap P .mmgzpump N noga a cmwmoc rpm uwn a na 2m: Loguz< .xmopowm cw magaou Neon .czo mp; co mmh<4m m -uzuogpcw Lao; pwumgu v umaopm>mu xpmaow>mgm mgovcmm a msovczn xpwgmswga EEO» swa mpcmvzgm ooioe .amp mcpgumwh + mgzuumg .mcwc?mgp mcoz mgmm> m LouuagumcH m gmgomma mpg< pmwggmsucfi .<.z :mz Loguz< cw mmgaou Lao; pwumgu m mmgosogaom a cosgmmge NFwLmswga Ego» Lug mgcmnaum mNF gmucmu awuwz mcwgummh mmh<4m o, + an + «tapas; 3m: saga spa; Facocmmmc m¢mw> N_ commacoea < Ngucmnmzz Pmewc< cw -oga meow saw: mmh<4m op o ca 2m: Losuz< mmgzou Lao; pvumgu m umaoPm>mu xpmaow>mga no: mmp<4m cmwmao mh<4m cw wu=m_tmaxm :o_uwmoa a maxgouoga mg“ mcwmz wucmwsmaxm so mcwgummh cowpmwpww$< ewwmxm chomuoagpmcfi mcpcpmsh maow>mga a mmgmmo mgosu=< mcwumawuwugmg mugs» mg“ no cowuoELomcH uczogmxumm--.Np m_nmh APPENDIX G STUDENT REACTIONNAIRE STUDENT REACTIONNAIRE NAME DATE LESSON TITLE Please be frank and honest in answering the following questions. Remember, you are our prime source of information regarding what needs to be revised. KEY: 1_ means you strongly agree; 2_means you agree; 3_means you are un- certain; 4_means you disagree; and §_means you strongly disagree. 1. I had sufficient prerequisites to . prepare me for this lesson. ll 2 3 4 5 2. I was often ggsure of what, exactly, I was supposed to be learning. 1 2 3 4 5 3. After completing the lesson, I felt that what I learned was either directly appli- cable tO my major interest, or provided l 2* 3 4 5 important background concepts to me. 4. Manipulating the equipment, or equipment breakdowns Often distracted my attention. lw 2_’ 3 4 5 5. Listening to the tapes and watching the slides became tedious, or boring. 1" 2 *3 i4 5 6. This lesson was very well organized. The concepts were highly related to each other. 1 2 3'7 4 5 7. A professional Speaker (announcer) should be used to make the tapes. 1 2 3 4 5 8. The audio tape moved too fast for me, there was too much information. l ’2 43 4 5 9. There was too much redundancy. I was bored by the repetition Of ideas. 'T—' 2 3 447 54 10. There was a lot Of irrelevant infor- mation in this lesson. '1" '1?’ '7?‘ ‘7F' '17’ 197 ll. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 198 The workbook was excellently designed. I could easily follow the instructions and perform the exercises. l Frequent reference to and use of the workbook was distracting. l Often the tape and Slides seemed unrelated to each other. 1' This lesson had very serious gaps and lacked internal continuity. '1'— The examples used to illustrate main points were excellent. 1 The vocabulary used contained many un- familiar words. I Often did not under- stand what was going On. 1” The pre-test and final exam questions did a good job of testing my knowledge of the main points in the lesson. 1’ The questions during the lesson gave me valuable feedback on how I was doing. lw Many of the things I was asked to do, or questions I was asked to answer during the lesson seemed like needless busy work. 1 At the end of the lesson I was still un- certain about a lot Of things and had to guess on many Of the final exam questions. 1 I believe I learned a lot, considering the time spent on this lesson. 1' I would recommend extensive modifications to the lesson before using it with other students. 1 For you, what was the most difficult part of the lesson? 2? 5 I? ‘5 2 ”5" 2 '17- 2 5 2 5 2 '4?- 2’ 5 2 Bil 2 '17- 2 '1?- 2i 5 What was the easiest part of the lesson? 25. 26. 27. 28. 29. 30. 31. 199 What were the three worst things about this lesson? ‘ with the general subject matter than I I understood most of the concepts and vocabulary immediately after completing the lesson. ll 2* 3 4 5 I think this whole procedure of trying out new materials with students is a waste of time. 1 2* 3 4 '17" I would prefer a textbook or lecture version Of this lesson rather than the slide/tape/workbook version. 1 2 3 4 5 I Often needed to go back over a portion of the lesson to fully understand it. l 2 3 4 5 After completing the lesson, I was more interested in and/or favorably impressed was before the lesson. 1 2i7 3 4 5 Please write below any comments, suggestions, or changes which you believe will improve this lesson. Thank you. APPENDIX H TRYOUT "CHECKLIST" AND INTERVENTION PRINCIPLES TRYOUT "CHECKLIST" AND INTERVENTION PRINCIPLES The Tutorial Approach 10. 11. The programmer should first explain to the tryout student that the materials he is to be given are intended to help him learn subject matter designated in the title. The programmer should emphasize that the role Of the student is to help the programmer evaluate some new education materials. Comments an suggestions that the student makes will help the programmer make revisions. ~ The programmer should then explain that he has to know how much the student already knows about the subject matter and whether or not the student has all of the prerequisites to learn from the materials. He should then give the student the pre-test (always) and the rere ui- sites test (if required) timing the student on both. Both of these may 5e done when the test subjects are being selected. When the tests have been completed, the programmer should Show the student the program and explain again that it is the material, not the student, that is to be tested from now on. This is an especially important point about which the student should have no question. The student should be given a ball ppint pgp_with which to write his answers. (This will prevent him from era31ng potentially valuable information for revising the program.) He should be provided with answer sheets, if any. Tell the student to put an "X" next to the items he thinks he got wrong after he has checked his answer. If the program contains Open-ended questions, tell the student about this. Explain to the student that if he doesn't know an answer, he should take a guess and write "guess" on the answer sheet. If he simply can't think of an answer, he should leave the answer blank and place an "X" next to the item on the answer Sheet. Tell the student the time limits placed on the tryout session and that he can take a break whenever pp_feels like stopping. Re-emphasize that any comments he wants to write or express to the programmer will be useful and welcomed. Then ask the student to commence with the materials. (If the student asks what he should do or asks if he's doing it right, the programmer should gently insist that all the directions necessary are iven 3p the materials. It is important to try out the directions, 500.) The programmer should note carefully the time at the beginning and end of each tryout session and keep track of "break time.“ Checklist for the First Tryout Sessions (Horn, 1966, p. 6) 200 201 The Tutorial Approach Principles I. II. III. If the student can continue through the program even though he has difficulty with an item, it is best to let him continue. Ask him about the difficulty at the end of the tryout session. Watch him very carefully for three or four frames. If he's consistently in trouble, it may be well to interrupt. If the student has so much difficulty with an item that he cannot proceed with the rest of the program, the programmer should inter- vene. His first step should be to try to revise the program on the Spot, presenting a revised or new item to the student. This may be done orally or the programmer may make written changes in the program. He should do this revision with a minimum of explanation to the student. If these on-the-Spot revisions do not work or if the programmer can't figure out the difficulty, he may then query the student directly with such Open-ended questions as: "Will you tell me about the difficulty?” or "What seemed to be the trouble with this item?“ How to Intervene in the Tryout Process (Horn, 1966, p. 12) APPENDIX I RULES TO BE FOLLOWED FOR THE REVISION OF A CALCULUS PROGRAM RULES TO BE FOLLOWED FOR THE REVISION OF A CALCULUS PROGRAM The method prOposed to the writers for using data for the revision of the programmed materials: 1. Study the item analysis of the end-Ofnlesson test to determine those concepts which were most Often missed by the students. Study the incorrect reSponses to these particular test items to determine if there was a straightforward misunderstanding of notation, a complete lack Of comprehension of the concept, or a variety of errors. Use the guide to determine those frames in the program which dealt most directly with the concept(s) missed on the test. Study the student error rates for these frames. If the program frames are quite similar to the test item, and the error is quite low, more practice frames should be provided. If the error rate is quite high, these frames need revision. Study the sample of incorrect student reSponses to this segment of the program. These responses should suggest the nature of the learning difficulty and the type of revision needed. Study the comments Of both the students and the program reviewers for further suggestions concerning the problems encountered with these particular frames. If no frames in the program correSpond tO a test item missed by a large percentage of the students, consider the addition of frames that will "bridge the gap" between the present learning materials and what would be considered a transfer type item. Rules to be Followed in Revising a Calculus Program (Dick, 1968, p. 100 202 APPENDIX J SLATE A1 RAW DATA 203 Table l3.--SLATE A1 Raw Data CONTROL GROUP (N=12) Gain Attitude (38 correct) Student Pre-Test Post-Test Score Survey 80% Criterion A 29 41 12 79 Yes B 23 38 15 107 Yes C 25 37 12 100 No D 31 41 10 108 Yes E 18 32 14 103 No F 20 37 17 75 No G 20 - 39 11 104 Yes H 19 27 8 71 No I 31 40 9 98 Yes J 22 39 17 97 Yes K 20 39 19 98 Yes L 22 36 14 102 No X 23.33 37.17 13.83 95.17 7 out of 12 for 58.33% NOTE: Pre- and post-test raw scores based on 46 common items worth 47 points maximum. Attitude survey raw scores based on 27 item rating scale instru- ment worth 135 points maximum. EXPERIMENTAL GROUP (N=12) Gain Attitude (38 correct) Student Pre-Test Post-Test Score Survey 80% Criterion A 23 45 22 105 Yes B 14 28 14 91 No C 27 41 14 95 Yes D 20 43 23 110 Yes E 22 41 19 116 Yes F 18 45 27 99 Yes G 21 42 21 113 Yes H 23 44 21 105 Yes I 26 46 20 114 Yes J 22 44 22 110 Yes K 17 46 29 119 Yes L 22 43 21 102 Yes X 21.25 42.33 21.08 106.58 11 out of 12 for 91.6% APPENDIX K SLATE A2 RAW DATA 204 Table l4.--SLATE A2 Raw Data CONTROL GROUP (N=12) Gain Attitude (32 correct) Student Pre-Test Post-Test Score Survey 80% Criterion A 15 24 9 93 No B 17 27 10 118 No C 23 29 6 116 No D 18 30 12 107 No E 12 27 15 105 No F 6 25 19 97 No G 14 32 18 120 Yes H 9 34 25 107 Yes I 26 32 6 110 Yes J 16 34 18 86 Yes K 10 23 13 96 No L 11 35 24 105 Yes X 14.75 29.33 14.58 105.0 5 out of 12 for 58.33% NOTE: Pre- and post-test raw scores based on 40 common items worth 40 points maximum. Attitude survey raw scores based on 27 item rating scale instru- ment worth 135 points maximum. EXPERIMENTAL GROUP (N=9) Gain Attitude (32 correct) Students Pre-Test Post-Test Score Survey 80% Criterion A 13 3O 17 102 NO B 14 3O 16 89 NO C 15 34 19 110 Yes D 17 38 21 109 Yes E 8 23 15 108 No F 18 38 20 105 Yes G 19 35 16 126 Yes H 17 37 20 102 Yes I 27 36 9 114 Yes X 16.44 33.44 17 106.40 6 out of 9 for 66.6% APPENDIX L SLATE A3 RAW DATA 205 Table 15.--SLATE A3 Raw Data CONTROL GROUP (N=9) Gain Attitude (40 correct) Student Pre-Test Post-Test Score Survey 80% Criterion A 44 50 6 108 Yes B 43 47 4 109 Yes C 22 43 21 93 Yes D 25 36 11 112 NO E 23 48 25 107 Yes F 21 4O 19 93 Yes G 19 42 23 111 Yes H 22 38 16 97 NO I 18 45 27 100 Yes X 26.33 42.22 16.89 103.44 7 out of 9 for 77.7% NOTE: Pre- and post-test raw scores based on 50 items worth 50 points maximum. Attitudinal survey raw scores based on 27 item rating scale instrument worth 135 points maximum. No experimental treatment conducted in A3. APPENDIX M SLATE B1 RAW DATA 206 Table l6.--SLATE B1 Raw Data CONTROL GROUP (N=7) Gain Attitude (12 correct) Student Pre-Test Post-Test Score Survey 80% Criterion A O 5 5 78 No B 2 ll 9 102 No C 10 12 2 97 Yes D 3 9 6 88 No E 7 12 5 86 NO F 2 7 5 77 No G 2 13 ll 94 Yes X 3.71 9.86 6.14 88.86 42.85% NOTE: Pre- and post-test raw scores based on 15 common items worth 15 points maximum. Attitude survey raw scores based on 27 item rating scale instru- ment worth 135 points maximum. EXPERIMENTAL GROUP (N=8) Gain Attitude Student Pre-Test Post—Test Score Survey 80% Criterion A 5 13 8 104 Yes B 6 l3 7 123 Yes C 5 15 10 92 Yes D 4 15 11 123 Yes E 5 15 10 126 Yes F 6 15 9 117 Yes G 4 14 10 106 Yes H 5 l4 9 105 Yes 112.00 100% X 01 O O _a b N 0'1 \0 N UT APPENDIX N SLATE C1 RAW DATA 207 Table l7.--SLATE C1 Raw Data CONTROL GROUP (N=14) Attitude Student Survey ZZF‘XC‘HICD‘FII'T‘ICJOUU) —J O \l 113 )1 95.64 NOTE: NO pre- and post-tests were developed for C]. NO experimental treatment was developed for C]. Attitude survey raw data based on 27 item rating scale worth 135 points maximum. 1“ lillllllllllllllI