INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality o f the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1. The sign or “target” for pages apparently lacking from the document photographed is “Missing Page(s)” . If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed. For blurred pages, a good image of the page can be found in the adjacent frame. If copyrighted materials were deleted, a target note will appear listing the pages in the adjacent frame. 3. When a map, drawing or chart, etc., is part of the material being photographed, a definite method of “sectioning” the material has been followed. It is customary to begin filming at the upper left hand comer of a large sheet and to continue from left to right in equal sections with small overlaps. If necessary, sectioning is continued again—beginning below the first row and continuing on until complete. 4. For illustrations that cannot be satisfactorily reproduced by xerographic means, photographic prints can be purchased at additional cost and inserted into your xerographic copy. These prints are available upon request from the Dissertations Customer Services Department. 5. Some pages in any document may have indistinct print. In all cases the best available copy has been filmed. University' M icrrinlms International 300 N. Zeeb Road Ann Arbor, Ml 48106 8424472 R u d isill, M ich ael D a v is DEVELOPMENT AND EVALUATION O F AN OBSERVATIONAL MEASURE TO EVALUATE IN-CAR PERFORMANCE OF MICHIGAN DRIVER EDUCATION STUDENTS M ichigan State U niversity University Microfilms International Ph.D . 300 N. Zeeb Road, Ann Arbor, Ml 48106 Copyright 1984 by Rudisill, Michael Davis All Rights Reserved 1984 DEVELOPMENT AND EVALUATION OF AN OBSERVATIONAL MEASURE TO EVALUATE IN-CAR PERFORMANCE OF MICHIGAN DRIVER EDUCATION STUDENTS By Michael Davis Rudisill A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Administration and Curriculum 1984 © COPYRIGHT BY MICHAEL DAVIS RUDISILL 1984 ABSTRACT DEVELOPMENT AND EVALUATION OF AN OBSERVATIONAL MEASURE TO EVALUATE IN-CAR PERFORMANCE OF MICHIGAN DRIVER EDUCATION STUDENTS By Michael Davis Rudisill This study dealt with the development of an obser­ vational measure for evaluation of in-car performance of Michigan driver education students and the determination of the measure's reliability characteristics. This instrument was designed for use by the Michigan Department of Education to determine the effectiveness of driver education programs in the state. An integral part of the study process was the design of a test route that would yield the situations to observe and record the driving performances stipulated by the Michi­ gan Department of Education's in-car performance objectives. Also involved were the design of an instrument that was con­ cise and definitive enough for the raters to use efficiently, design and implementation of a training program for raters, development of a counterbalanced design for rater and sub­ ject assignment, and the statistical treatment of the data. Analysis of variance and Pearsons' Product Moment corre­ lations were used to determine statistically the reliability Michael Davis Rudisill characteristics of the test. The study addressed the following research hy­ potheses : 1. That there would be differences in difficulty among the items in the test. The F ratio was significant beyond alpha .01f indicating that there was a difference in item difficulty. 2. That run administrations would not affect dri­ ver performance scores. The findings were not significant, suggesting that performance was stable over time. 3. That subjects' driving performance scores would not vary according to items interacting with the time of test administration. The finding was not significant, suggesting there was no significant interaction between test items and run administration. 4. That a positive relationship would exist be­ tween true driver performance scores and ob­ served driver performance scores. of analysis were used. Three methods The correlation co­ efficients were .957, .937 and .730. 5. That a positive relationship would exist be­ tween raters on measures of sum, search, speed control, direction control, familiarization, and signs. The interrater reliability for pairs Michael Davis Rudisill of raters ranged from .49 to .83 on test com­ ponents. The overall test had a reliability coefficient of .86 for pair one and .83 for pair two. DEDICATION I would like to dedicate this study to my parents, Carl and Grace Rudisill, for early guidance and support; to my wife Jane, for her sacrifices and support; and to my daughter Michelle, who frequently said, "Daddy, do your homework." ACKNOWLEDGEMENTS The completion of this study would not have been possible without the support and encouragement of a number of people. I would like to thank the members of my doctoral guidance committee: Dr. Robert E. Gustafson, Chairman, Dr. Ben A. Bohnhorst, Dr. William D. Frey, and Dr. Robert 0. Nolan. A special thanks to Gus and Bill for their extra efforts. I would also like to acknowledge Mr. Frederick E. Vanosdall, Dr. MaryEllen McSweeney, and Dr. Kara Schmitt for their valuable support during the planning stages of the study. iii TABLE OF CONTENTS Page LIST OF TABLES..................................... vi LIST OF F I G U R E S ................................... vii Chapter I. I N T R O D U C T I O N ................. • ................ 1 The Problem.................................. .. 4 Statement of the Problem ................... 4 Purpose of the S t u d y .................... 5 Hypotheses to be Tested.................. 6 Significance . .............................. 7 Limitations of the Study ................... 8 Methods of Procedure .......................... 8 Tasks...................................... 8 Basic Assumptions......................... 10 Definition of Terms........................ 10 Organization of the Remaining Chapters . . . . 17 II. REVIEW OF L I T E R A T U R E ............... 18 Plans for Investigating Driver Education . . . 20 Methods of Observing and Measuring Driver Performance.......... 23 Driver Tasks and Driver Education Objectives . 31 Observation Methods, Techniques and Design . . 33 Summary...................................... 37 III. METHOD OF PROCEDURE................. Route and Instrument Design..................... Route D e s i g n ............................. Practical considerations of previous efforts............................... Determining route content.............. Selecting a starting point ......... . . . Identifying intensive and general observation areas.................... Developing standardized directions . . . . Identifying and controlling abort s i t u a t i o n s ........................... iv 39 39 40 40 41 43 44 45 46 Page IV. V. Instrument Design and Format ............... Designing the rating form................. Procedure for scoring driver performance.............................. Dividing the rating form into two formats................................... Development and Administration of Training Program for R a t e r s .......................... Selection of Raters. ................. Training Program ............................ Selection of Practice Subjects ............. Evaluation and Feedback...................... S u b j e c t s ....................................... Delimitations................................... Null Hypotheses................................ Study Analysis ............... . . . . . . . Statistical Analysis .......................... Counterbalanced Design ...................... Test of Null Hypotheses 1-3.................. Test of Null Hypothesis 4 .................... Test of Null Hypothesis 5 .................... Summary......................................... 53 53 54 56 58 59 60 61 61 62 62 63 66 67 67 FINDINGS OF THE STUDY.......................... 69 Item Difficulty and Rating Stability ......... Internal Consistency .......................... Interrater Reliabilities ...................... Summary......................................... 70 73 75 76 SUMMARY AND CONCLUSIONS........................ 77 Summary..................................... Statement of the P r o b l e m ................... Methods of Procedure ..................... Major F i n d i n g s ................. Conclusions..................................... Recommendations................................ Recommendations for Further Research ......... D i s c u s s i o n ..................................... 77 78 79 80 81 82 84 85 APPENDICES 47 47 48 49 88 A. Directions for MDE Road T e s t ................. 88 B. Michigan Driver Education Evaluation Project Driver Performance Rating Form ............. 90 Outline of Rater Training Program............. 104 BIBLIOGRAPHY........................................... 109 C. v LIST OF TABLES Table 1. 2. Page ANOVA Table — Relationships Between Student Performance, Item Difficulty and Run Administrations .............................. 71 Rater A g r e e m e n t ............................... 76 vi LIST OF FIGURES Figure 1. 2. 3. Page Michigan Driver Education Evaluation Project Driver Performance Rating Form............... 51 Counterbalanced Design - Repeated Pattern of Rater Assignment .......................... 64 ANOVA T a b l e ..................................... 65 vii Chapter I INTRODUCTION Michigan has long been recognized as a leader in driver education. In 1955, Michigan was the first state to pass legislation requiring local school districts to pro­ vide driver education. Although recognized as a leader, Michigan had not conducted any systematic research concern­ ing the effectiveness of driver education programs. In 1975, the Michigan Department of Education submitted a proposal to the Michigan Office of Highway Safety Planning to evaluate the effectiveness of driver education programs in the state. The Department of Education was successful in receiving a three-year grant from the Michigan Office of Highway Safety Planning. This request for funds was in­ itiated by four primary factors. First, in 19 72, the Michigan State Legislature re­ quested the Superintendent of Public Instruction to report on the effectiveness of driver education programs in the state of Michigan. Four questions were presented in an effort to make this determination: 1. Do current driver education programs insure that the student acquires the knowledge and skills necessary to pass successfully the state driver licensing examination? 1 2. Is one type of driver education program more ef­ fective than another in providing students with the knowledge and skills necessary to success­ fully pass the state driver licensing examin­ ation? 3. Does successful completion of a driver education program have a positive impact on road safety? 4. Is there any evidence to suggest that one type of driver education program is more effective than another in terms of positive impact on road safety? The 19 72 study was incomplete. The study, using questionnaires and interviews, concentrated on question four. Question three was difficult to answer due to the lack of an adequate control group. The 1975 study was, in part, an attempt at answering the general concepts of questions one and two. Second, in 1974, the state Legislature was requested to increase the amount of driver education reimbursement. The Legislature refused to increase the $30.00 per student reimbursement until proof could be presented that driver education programs were effective. A third factor prompting the study was the recog­ nition of a trend for some states to favor commercial driver education over public school driver education programs. No studies had been conducted in the state of Michigan to in­ dicate that one type of program was better than the other. The fourth factor leading to the initiation of the 1975 study was the recognition, by Michigan Department of Education staff, that there were no objective means avail­ able to schools and teachers to evaluate the effectiveness of their programs. In response to this recognition of the need for objective criteria to evaluate the effectiveness of driver education programs, the Department decided to develop minimal performance objectives to be used as the basis of instruction. The Department then decided that the most practical approach to evaluating the effectiveness of Michigan driver education programs would be to measure the students' attainment of these objectives. The first year of the project was spent on the de­ velopment of driver education performance objectives. The performance objectives were reviewed by 225 driver educa­ tion teachers, specialists and experts. made on the basis of these reviews. Revisions were The performance objec­ tives were then distributed to driver education teachers throughout the state of Michigan as guidelines to enhance uniformity of course content. The second year was spent developing a written test, based on the performance objectives, to measure the driver education students' classroom performance. The test was administered to approximately 200,000 students throughout the state. The students tested were from private, paro­ chial, commercial and public schools. The third year was spent on the development and evaluation of an instrument, based on the in-car performance objectives, to measure the in-car performance of a random sample of Michigan driver education students. The Problem Statement of the Problem No instrument existed that was designed to measure the Department of Education's in-car performance objectives. Consequently, a new instrument would have to be developed for the in-car evaluation phase of the project. The Michi­ gan Department of Education, recognizing the extensive work done by the Highway Traffic Safety Center at Michigan State University in the area of driver performance measurement, asked the Center for assistance in developing a measure of in-car performance of driver education students. The writer, having interest and experience in driver performance measurement, agreed to assist the Department of Education by developing an observational measure to meet the needs of the Department and to determine the reliability of that measure. In order to make it practical, this phase of the . project had to be divided into more manageable parts. One integral part of this process was the design of a test route that would yield the necessary situations to observe and record student driving performance as stipulated by the incar performance objectives. A second major component of the process was the design of an instrument that was concise, thorough and definitive enough to be easily manageable by the raters observing and recording driver performance. The third integral part of the study was the design and imple­ mentation of a training program for the raters who would be observing and recording the driving performances. This study dealt with the development of the obser­ vational measure for evaluation of in-car performance of Michigan driver education students and the determination of the measure's reliability. This instrument was to be used by the Michigan Department of Education to determine the effectiveness of driver education programs in the state. The results of the effectiveness study will be presented to the state Legislature in order to comply with the Legis­ lature's request for proof of program effectiveness before granting an increase in the reimbursement to school dis­ tricts for driver education expenditures. Purpose of the Study The purpose of this study was to develop an obser­ vational measure to evaluate in-car performance and to de­ termine the reliability of the observational measure. It was important to ascertain which of the observational measures provided by the instrument were reliable, which were not, and under what conditions. Reliability and validity had to be determined if the evaluation of driver education program effectiveness was to be deemed significant and beneficial. To be responsive to the concerns for reliability, it was necessary to determine internal consistency or the reliability of items measuring the same objective. Stabil­ ity in time, with respect to the times of administering the instrument to the same subjects, was another concern. A major concern in regards to reliability was stability with respect to raters. Therefore, it was important to de­ termine that there was positive agreement between raters. The initial requirement for testing new instrument procedures is the determination of the various reliability characteristics. In other words, will the scores the test yields, across raters, across administrations and across the items, be consistent. In keeping with this concern, this study attempted to test the following hypotheses. Hypotheses to be Tested 1. It was hypothesized that there would be differ­ ences in difficulty among the items in the driver test. 2. It was hypothesized that run administrations would not affect driver performance scores. 3. It was hypothesized that the subjects' driving performance scores would not vary according to items interacting with the time of test admin­ istration 4. (i.e. between run 1 and run 2). It was hypothesized that a positive relation­ ship would exist between true driver perfor­ mance scores and observed driver performance scores (internal consistency). 5. It was hypothesized that a positive relation­ ship would exist between raters on measures of sum, search, speed control, direction control, familiarization and signs (interrater). Significance Accountability is a major concern in all facets of driver education. Administrators, driver educators, parents and public officials are concerned with the accountability of driver education. In addition to being concerned with program offerings and outcomes, they are also concerned with cost effectiveness. This study may very well provide the framework and information necessary to answer the concerns for accountability. More specifically, it may provide the necessary information to answer the concerns of course of­ ferings, program effectiveness and the relative efficiency or cost effectiveness of various types of driver education programs. It may also have implications as to desirable instructional materials, teaching techniques and teacher preparation. It can also serve as a means to reevaluate the existing performance objectives. The study provides a means of evaluating and possibly improving driver education nationwide, as well as in the state of Michigan. This study more immediately provides the Michigan Department of Education with the means necessary to gather data to present to the Michigan Legislature in an effort to procure an increase in the reimbursement allotment to local school districts for the funding of driver education pro­ grams . Limitations of the Study This study was limited to an examination of the reliability of an observational measure and its use by trained raters. The observational measure was designed to measure only driving performance specified by the perfor­ mance objectives developed by the Michigan Department of Education. The observational measure did not include those objectives calling for atypical or hazardous situations. The selection of raters was limited to volunteers who could arrange their schedules to be available during the time frame of the study. The training program for the raters was designed to meet the time frame in which the raters were available. This study made no attempt to compare different programs or instructors. No attempt was made to compare teaching experience to rater performance. Methods of Procedure Tasks In conducting the methods and procedures of this study, various tasks were identified and completed. A route was designed to be representative of typical driving en­ vironments that yielded traffic situations requiring the driver to display performances stipulated by the perfor­ mance objectives. The route was divided into areas of observation and areas of recording. Dividing the route into observational and recording areas conttibuted to the design of an instrument that was comprehensive yet manageable by trained raters. A training program was prepared for the raters con­ sisting of lecture and field exercises that involved actual observation and recording of practice subjects' driving performance. Vehicles, equipment and classroom facilities were procured for use in the training and data collection phases. Subjects, having recently successfully completed their driver education course, had to be identified and randomly selected for the training and data collection phases. The subjects used during the training program for raters were not used during the data collection phase. Data were collected on the observations of the subjects used during the training session to determine rater agree­ ment. The data collection phase immediately followed the 26-hour training program for the raters. ted on 30 subjects. Data were collec­ The actual data collection involved seven and one-half days. to the data for analysis. Statistical treatment was applied 10 Basic Assumptions In conducting this study, some basic assumptions were made. The assumption that the training was sufficient to insure standardization of procedures to be used by the raters was made on the basis of observing rater performance during the training session. The assumption that raters worked independently was based upon the monitoring of rater performance during the training session. The assumption of rater agreement was based upon training and comparisons of rater recordings of driver performance during the training program. Subjects were randomly selected from school dis­ tricts in the greater Lansing area. In the study design, where subjects are not a variable of analysis, the pre­ sumption of randomness is not important. The random selec­ tion of subjects from several selected programs is included only to reduce the possibility of obtaining a subject popu­ lation which is not essentially normal. Definitions of Terms Used Anchor Points Anchor points are the extremes in driving behaviors characteristic of satisfactory or unsatisfactory behavior patterns within each Specific Performance Objective Test Sites (SPOTS). Among Raters Among raters is the agreement of all raters observ­ ing and recording the driver performance of the same sub­ ject at different times. Between Raters Between raters is the agreement between a pair of raters observing and recording the driver performance of the same subject at the same time. Direction Control Direction control is "the driver's coordination of steering and turning maneuvers with speed and timing of steering adjustments."^ Driver Behavior Elements Driving behavior elements are defined as driving behaviors occurring sequentially or simultaneously in response to traffic situations and driving task require­ ments; i.e., searching, adjusting velocities, accelerating, 2 decelerating and turning in proper time relationships. Nolan, R. 0., Vanosdall, F. E., and Smith, D. L . , et. al., Driver Performance Research, Final Report, V ol. II, Guide for Training Observer/Raters in the Driver Performance Measurement Procedure. Prepared for National Highway Traf­ fic Safety Administration, Contract FH-11-7627, Michigan State University, Department of Psychology, and Highway Traffic Safety Center, Feb. 1973, p. vi. 2 ... Ibid, p. v m . 12 Driver Performance Driver performance is that driving performance stipulated by the in-car performance objectives and dis­ played by the subject driver over a specific route. Dual Control Vehicle A dual control vehicle is one which contains an additional brake control mounted for convenient use by the front seat rater. General Observation Area The general observation area is "that portion of the route, lying between intensive observation areas, in which the rater is observing the vehicle and driver in relationship to general vehicular placement and maneuvering with respect to other traffic and manmade laws. also incorporates the recording area." This area 3 In-Car Performance In-car performance is the performance required of and/or displayed by a driver while preparing to operate and while operating a vehicle in a real-world setting. Instrument An instrument is "a set of procedures by means of which an observer can record and categorize the behavior 3Ibid, p. 49, 273. or a subject or hazards." 13 4 Intensive Observation Area The intensive observation area is "an area or por­ tion of the route where driver behavior is observed inten­ sively in relation to traffic situations and required driving tasks." 5 Interrater See Between Raters. Narrative A narrative is a written summary of the driver's performance that can serve as additional documentation re­ garding the adequacy of the test. It may also provide in­ sight into what the rater is seeing in terms of driver performance. It may also serve to clarify differences between raters in recording driver performances. Objectivity Objectivity is the recording, by a rater, of only those behaviors actually observed. Observational Measure An observational measure "is a procedure for using an observational record to assign scores to each of the ^Rowley, Glenn L., American Educational Research Journal, Winter 19 76, Vol. 13, No. 1, pp. 51-5&. ^Nolan, op. cit., pp. 43, 273. 14 subjects of observation; each score so assigned being as­ sumed to reflect some characteristic of behavior of that subject. Observer An observer is synonymous with a rater. Overt Driving Behaviors Overt driving behaviors are "behaviors such as head, eye and hand movements that are readily observable physical movements displayed by the subject driver." 7 Program Effectiveness Program effectiveness is the extent to which a dri­ ver education program of instruction produces a desired effect as determined by a subject displaying performances stipulated by performance objectives. Rater Mirror The rater mirror is an extra rearview mirror, mounted by the rater with the aid of suction cups, to assist the rater in identifying driver behavior. Raters Raters are those persons who observed and recorded the driving performances of subjects. ^Rowley, op. cit. 7 Nolan, op. cit., p. 69. 15 Recording Area "The recording area is an area of the driving test 0 route where the recording of observations is carried out." Reliability Reliability is the stability, consistency and ac­ curacy with which an instrument measures whatever it does measure. 9 Run Run is "one complete circuit of the driving test route. Search Search is "an observable behavior in which the dri­ ver looks systematically for possible sources of traffic information. Speed Control Speed control is "the use of the accelerator or brake to accelerate or slow the vehicle to fit the traffic 0 Nolan, op. ci t ., p. viii. 9 Borg, Walter R . , Gall, Meredith D . , Educational Research; An Introduction (New York: David McKay C o . , Inc., 1574) , p. 142. "^Nolan, op. cit. ^ N o l a n , op. cit., p. viii. 16 and driving task requirements." 12 Subjects Subjects are the drivers having recently success­ fully completed their program of instruction who were ran­ domly selected from a list of driver education students in the Lansing and East Lansing area. Time 1 and Time 2 Time 1 and Time 2 refer to a run or negotiation of the same route by the same subject, with different pairs of raters, at different times. Training Program The training program is a training program designed to train raters in the use of the instrument. The training program also familiarized them with the route and what it yielded. The training program also consisted of methods used to trigger and observe driver behavior at proper time and space intervals. Trigger Directions Trigger directions are directions given at particu­ lar locations along the route which initiate a response by the driver to particular driving situations and tasks. These directions also alert the raters to begin intensive observation. 12 Nolan, op. cit. Validity Validity of an instrument or test is whether the instrument or test actually measures what it is designed to measure. 13 Organization of the Remaining Chapters Chapter II contains summaries of the literature chosen for review. Some studies dealing with plans for investigating driver education, methods of observing and recording driver behavior, driving task analysis and driver education objectives, observational techniques, methods and instrument design were selected for review and report­ ing. Presented in Chapter III are (1) route and instru­ ment design, (2) development and administration of a train­ ing program for raters, (3) method of obtaining subjects and (4) the collection and analysis of data. Chapter IV contains the findings; and Chapter V presents the summary and conclusions. 13 Borg, op. cit., p. 135 Chapter II REVIEW OF LITERATURE During 1975, the Michigan Department of Education began developing minimal performance objectives to measure the cognitive and psychomotor skills considered basic to any Michigan driver education program. were finalized in 1976. These objectives Appropriate test items were de­ veloped for the classroom objectives and pilot tested. total of sixty objectives were chosen for testing. A Each objective was measured by five items with the pass level for each objective set at 80% correct responses. The test results for approximately 100,000 students were analyzed. The findings indicated only thirteen objectives were at­ tained at the 80% correct response level. This investigation was an integral part of the Michigan Driver Education Evaluation Project. The study dealt with the development of an objective-referenced incar performance measurement, the development of a route which would yield the necessary opportunities to observe student attainment of the objectives, and the determination of instrument and rater reliability. The project was the first attempt to conduct any systematic research regarding the effectiveness of driver education programs in Michigan. 18 19 To allow a determination of the various driver edu­ cation programs' strengths and weaknesses to be made, objective-referenced tests, rather than norm-referenced tests, were selected. It was determined by project staff that a more immediate and plausible approach to investigat­ ing the effectiveness of driver education programs in Michi­ gan, than accident and violation records, was to measure driver education students' attainment of the in-car perfor­ mance objectives. It was also determined by the project staff that this dynamic approach, rather than an attempt to use the criteria of accidents and violations, would deter­ mine the effectiveness of content internalization. An examination of driver behaviors of this nature was more desirable in terms of design, control, expediency and observability. There have been numerous studies concerned with measuring driver performance and the effectiveness of driver education. While many studies were reviewed and considered, only those having relevance to this study were chosen for reporting. Several studies were selected because they dealt with plans for investigating driver education. Some studies were selected because they contained methods of ob­ serving and measuring driver performance. Other studies contained driving tasks and driver education objectives. Some reviews involved studies which described observational methods, techniques and instrument design. 20 Plans for Investigating Driver Education The American University Study, by Lybrand, et. al. (1968), supported the approach taken by the Department of Education by expressing the need for accident measures to be carefully qualified when used as driving performance criteria. The study reiterated that accidents are rare phenomena and not a stable characteristic of driver be­ havior. The study also pointed out that accidents must include a valid measure of exposure if they are to be used as a measure of driver proficiency. The report suggested that driver education programs may be evaluated in terms of their general objectives, enabling objectives or terminal objectives. In summary, Lybrand said, "In order to evalu­ ate proposed driver education programs, there must be a set of instructional objectives derived logically from an ade­ quate description of driving performance and defined in terms of intended behavioral outcomes." 14 Goldstein (19 75) pointed out some observations about accidents and violations of interest. He stated, "Highway accidents have multiple causes and are rare events. In­ creasing violations may not necessarily increase accidents." Goldstein also pointed out that there appeared to be a long 14 Lybrand, William A., Carlson, Glenn H., Cleary, Patricia A., and Bower, Boyd H. A Study on Evaluation of Driver Education, pp. 210. Report National Highway Safety Bureau, 1968. 21 delay in driver education students obtaining a license after completion of their course. William Cole (1976) expressed his concern for the need of performance-based driver education. He stated that, "driver and traffic safety education is performance-oriented, traditionally taught through a sequence of standardized cognitive, affective, and psychomotor learning experiences based on minimum fixed time standards." Cole also made a case against comparing students' performance against their peers. This approach would essentially be a norm-referenced approach to determining effectiveness of driver education. He pointed out that enabling objectives specify what the driver education students should be able to do at the com­ pletion of their driver education course. He referred to these enabling objectives as immediate criteria. In his opinion, criterion-referenced tests uniformly applied are preferable to norm-referenced tests. He also stated that there was need for an intermediate criterion that is oper­ ationally feasible and statistically reliable if it is used to measure "real-rworld" driving performance. 16 15 Goldstein, Leon G . , "Rejoinder to Peck and Jones' Reply." Journal of Traffic Safety Education, Vol. XXIII, No. 1, October, l5T5, pp. 15 and 17. *®Cole, William M. "The Case for Performance-Based Driver and Traffic Safety Education." Journal of Traffic Safety Education, April 1976, Vol. XXIII, No. 3, pp. 9-10. 22 Warren Quensel (19 76) stated that, "evaluation is essentially a process of determining to what extent the program objectives are actually being realized." His approach, however, was to evaluate the ultimate criterion of safe driving by using a questionnaire and self-reporting system to determine accident involvement. He did acknow­ ledge the difficulty in obtaining accurate information from existing state records. 17 The New York University Report by the Center for Safety (1968) expressed concern for the limited value of studies that determine the effectiveness of driver edu­ cation programs using the accident criterion. The report stated that, "if we seek to evaluate driver education in terms of accident reduction, we are confronted with so many variables that we become enmeshed in an endless chain of proof." The study encouraged the consideration of short­ term, intermediate and long-term criteria. It recommended that driving performance be measured via simulator, road test and self-rated driving knowledge and driving attitudes. The report also stated there was good reason to consider favorably evaluative techniques concerned with more or less immediate learning as a result of driver edu­ cation programs. This approach would include driving task analysis in terms of expected behavioral outcomes of 17 Quensel, Warren P. "How to Measure Program Ef­ fectiveness." Journal of Traffic Safety Education, April 1976, Vol. XXIII, No. 3, pp. 6. 23 instruction, development of tests and instrumentation to provide relatively objective ratings of student performance, and a practical design for estimating relative cost effectiveness of varying instructional programs. Teal, et. al. 18 (196 8) pointed out that the general purpose of their research was to develop a concrete plan or plans for evaluating the effectiveness of current or pro­ posed driver education programs. They reported that present methods offered very little insight into the quality of the programs. For a short-term approach, they recommended an evaluative criterion instrument used by a visiting team of teachers at each school. They also recommended a compara­ tive evaluative study, among the various states, for a long-term approach. 19 Methods of Observing and Measuring Driver Performance It is noted that, "The Institute for Educational De­ velopment Report by Kennedy and Chapman (1968) was critical of driver performance studies where driver performance 18 New York University. Driver Education and Train­ ing — Plans for Evaluating the Effectiveness of Programs, pp. 95. Report No. PH180-473, under Contract FH-11-6560. Washington, D.C.: National Highway Safety Bureau, 1968. 19 Teal, Gilbert E., Truesdale, Sheridan L., and Fabrizio, Ralph A. Driver Education and Training, pp. 211. Report No. B2D 68-575, Dunlap and Associates, Inc., under Contract FH-11-6559. Washington, D.C.: National Highway Safety Bureau, May 1968. 24 variables were not derived from, nor validated against, real-world driving situations." The general concensus was performance criteria must be derived from behavior expected of drivers in the real-world. The group stated that since other means were absent, "the best that can be done at this time is to pool the judgment of experts, using what evidence is available in constructing a systematic set of hypotheses about relevant variables and how to measure them." The group also expressed concern for the measurement techniques used. It was felt that checklists could be useful, but considerable care should be taken in setting up the test situation, in defining the scoring basis, and in training the raters. 20 Quensel, reporting on an in-car evaluation instru­ ment developed by the staff at Illinois State University, stated that, "the best is comprehensive in nature but does not include assessment of basic control skills, procedures for maneuvers, visual habits, identification habits, or the evaluation of hazard." On the same page, he listed the basic response categories. The categories contained the elements of search, speed control, direction control and timing. The assessment of the very things he claimed not to assess appeared contradictory. 20 His criteria for selecting Kennedy, J. L. Driver Education and Training Project, pp. 92. Report No. PH l8o 4 under Contract FH-11-6561. Washington, D.C.: National Highway Safety Bureau, 1968. 25 a route included dynamic situations. Likewise, the concept of drawing inferences from observable behavior was included. He stated the importance of selecting a programmed route, yet apparently 10 of his 35 situations were either not pro­ grammed or they were highly atypical. He also reported the need for a reliable scoring system, yet he reported that he could not guarantee that comparable situations would occur for all subjects. 21 Ulaner et. al. (1952) reported a more reliable and meaningful measure of safe and effective vehicle operation. He explored the assessment of driver behavior through ob­ servation by and collective judgments of supervisors and associates of Army drivers. scales were devised. were: Originally eleven experimental The four that were finally selected (1) near accidents; (2) reaction to sudden change; (3) effect of temper on driving; and (4) knowledge of own limitations. The supervisors and peer raters selected 21 items, from a list of 105 driving habits, that they felt they could reliably rate. After review by a panel of ex- perts, these items were reduced to 15. 22 21 Quensel, Warren P. "An In-Car Evaluation Instru­ ment." Journal of Traffic Safety Education, January 1976, Vol. XXIII, No. 2, pp. 15-1T. 22 Uhlaner, J. E. Development of Criteria for Safe Motor-Vehicle Operation. Highway Research Board Bulletin 60, pp. 36-43. Washington, D.C.: Highway Research Board, 1952. 26 The "Driver Performance Measurement (DPM) Study" at Michigan State University by Forbes et„ al. (1973) was a unique approach to measuring driver performance. Many of the approaches, problems, techniques and concepts used in DPM were helpful in designing this study. Driver Performance Measurement is a reliable method for research in vehicles on the highway. This procedure for measuring driver performance was intended to be used instead of accident data. It dealt with a wide range of driving be­ haviors determined to be suitable or unsuitable depending upon the interaction of patterns of behavior with dynamic traffic situations. The evaluation of the behavior pattern was that pattern's reflection upon the increase or decrease of the potential hazard in the situation. Alghouth the "DPM" staff considered the most important test component to be the behavioral pattern, the basic elements of search, speed control and direction control were considered adequate for the present study. The project staff on DPM reviewed the Human Resource Research Organization (HumRRO) task analysis and found 92.5% of applicable "critical" and "very critical" items were covered. The project staff also performed on-site obser­ vations of driver behavior to obtain a "real-world" task analysis of driver behavior. The route was standardized and arranged in pro­ gression with regards to degree of difficulty. An initial 27 warm-up period was provided at the beginning of the route. The six Behavioral Environmental Traffic Situational Sequences (BETSS) selected included urban and rural, two- lane highway, two-lane street, four-lane street, and free­ way driving. Controlled and uncontrolled intersections, as well as various types of lane changes, were also inclu­ ded. This was done in an effort to sample a wide range of driving tasks determined by project staff to be important for safe and efficient driving. In order to assist the observer/rater in determin­ ing whether or not the driver's behavior was suitable or unsuitable, the DPM rating form incorporated anchor points. Expected suitable behavior was provided for each behavior pattern to be rated. The exact opposite of expected suit­ able behaviors was listed for the unsuitable end of the continuum. The unsuitable and suitable behaviors thus pro­ vided the anchor points for rating the behavior patterns of the driver. After rating the behavior pattern of the dri­ ver as suitable or unsuitable, in relationship to the dynamic traffic situation, the rater then rated the suit­ ability or unsuitability of the elements of driving be­ havior. The first attempt at rating driver behavior elements involved seven behavior elements. After pilot observations and ratings, the project staff determined that rating this number of elements was too difficult, especially for the front seat rater. The driving elements to be rated were 28 reduced to: (1) search, tion control. (2) speed control, and (3) direc­ Timing was actually a fourth element of driver behavior rated but was considered to be an element that influenced the suitability or unsuitability of the other three elements rather than standing alone. The route was divided into observation zones and recording zones. The observation zone where the driver's behavior to be recorded was observed was labeled the inten­ sive observation zone. The zone in which the recording of the driver's behavior took place was labeled general obser­ vation zone. Specific standardized directions were identified as the directions that marked the exact time and location that the rater began intensive observation of the driver's be­ havior. Areas that marked a logical conclusion of the driver's behaviors were identified. These areas marked the end of the intensive observation area and beginning of the general observation area. At this time and location, the observer/rater recalled the behavior just observed and re­ corded it. It is noted that the raters were encouraged to make marginal notes regarding the driver's performance to assist them in recalling the driver's behaviors. It should be pointed out that in addition to a training program for raters, the design of the route, in­ corporation of observation zones and recording zones, and the reduction of the number of driver behavior elements to be recorded contributed to achievement of high agreement 29 between raters using DPM methodology. The reliability estimates for between-raters in the same run were .876 and .946 for behavior pattern and element scores, respectively. For between-raters on different runs, the reliability estimates were .833 and .941 for behavior pattern and element scores, respectively. The training program for DPM raters consisted of 120 hours of training in the concepts and field application of the DPM procedure. 23 "The Michigan Road Test Evaluation" study by Vanosdall et. al. (1977) incorporated the basic concepts of DPM in measuring driver performance of driver license appli­ cants. Traffic "sequences" and "segments" replaced the "BETSS" and "SubBETSS" used in DPM. Perhaps the change of most interest was the re­ duction of the number of environmental situations. The original DPM project incorporated six "BETSS", whereas the Michigan Road Test Project reduced this number to four. The time of the route was reduced from approximately 45 minutes to approximately 20 minutes. The training program for the raters was reduced from 3 weeks to 2 weeks. The final report of this study pointed out that the raters felt they could have benefited from training much sooner if they had been exposed to the test route earlier in the training 23 Forbes, T. W., Nolan, R. O . , Schmxdt, F. al. Driver Performance Measurement Research Final Vol. 1, pp. 1^3. Technical Report, under Contract 7627. Washington, D.C.: National Highway Traffic Administration, 1973. L . , et. Report, FH-11Safety program. The reliability coefficients for raters scoring 288 subjects was reported at .60. The project staff felt this perhaps could have been improved by increasing the number of sequences and early route utilization during the training program for the raters. 24 The USC on-road performance test by Jones (1977) provided some insight into intensive observation and re­ cording of driver behavior. However, three raters were used, with each rating different tasks or behaviors. This approach was determined to be impractical for the present study due to the desire to develop an instrument that would have the potential to be used by driver education instruc­ tors in their c o u r s e s . ^ When summarizing earlier studies, Forbes (1950) made his readers aware of some very noticeable factors re­ garding the observation of drivers. The more experienced drivers picks up minor cues that enable them to anticipate hazards for which novice drivers are apparently not aware. He also pointed out the difference in search behavior of 24 Vanosdall, F. E., et. al. Michigan Road Test Evaluation Study, Final Report, Vol. III. Prepared for National Highway Traffic Safety Administration, under con­ tract MDL-75-002B, Michigan State University, Department of Psychology, Highway Traffic Safety Center, Nov. 1977. 25 Jones, Margaret Hubbard. Measuring the Outcomes of Driver Training: The USC On-Road Performance Test. Presented at the Transportation Research Board, January 25, 1977. 31 the experienced versus the novice driver. He suggested that observing the total picture of driver performance, rather than isolated items listed on a checklist, will yield more consistent ratings among observers. 26 Driver Tasks and Driver Education Objectives Fine, et. al. (1965), while developing a criterion for driver behavior, called attention to the usefulness of unobtrusive measurement in accident research. They strongly encouraged the use of actual field experiments since such experiments appeared to be more definitive. 27 The HumRRO staff (1970) developed a comprehensive inventory of the behaviors involved in operating an auto­ mobile. One of the primary reasons for developing the task descriptions was to identify a set of driving performances to be used as terminal objectives for driver education courses. Another purpose of the task analysis was to serve as a basis for designing a driver performance test to evaluate the effectiveness of driver education programs. 28 26 Forbes, T. W. Street and Highway Traffic; Hand­ book of Applied Psychology. Editors; Fryer and Henry; Rinehard, V o l . 1~ 1950, pp. 325-335. 27 Fine, Jerome L., Malfetti, James L . , and Schoben, Edward J., Jr. The Development of a Criterion for Driver Behavior, pp. 43. New York; Columbia University, l9(>5. 28 McKnight, J . , and Hunt, A. G. Task Analysis, Vol. I, Nov., 1970. Driver Education 32 The HumRRO staff developed instructional objectives based on the task descriptions. This was to assist in de­ veloping instructional programs and the evaluation of those programs. A driving situations' test was developed. It was intended to evaluate the student's ability to deal with a range of situations that occur in "real-world" driving. The test was designed to be conducted on the road in or­ dinary traffic. The test was comprised of (1) a list of planned and unplanned driving situations; (2) a checklist of observations for each situation and a format for re­ cording responses; and (3) a set of performance standards. The report stated that the test was not standar­ dized, but if the test were approximately 30 minutes in duration, the number of responses recorded would be suf­ ficient enough to obtain reliable results. The HumRRO staff pointed out that it was important to plan in advance the route and the observations to be made. Situations should be listed in the sequence in which they occur. The use of normative data to evaluate driver edu­ cation students' performance was believed to be inappro­ priate and of no value. That driver education courses should provide specified minimum standards of qualifica­ tions was a major premise. Since the test was used to determine the feasibility of administration, reliability and validity statistics were not computed. A point of interest brought up in the report was that observers were 33 often not able to observe and record all the situations due to the rate of occurrence and spacing of the planned situations. This would indicate a need for separate obser- vational and recording periods. 29 Observation Methods, Techniques and Design McGlade (1960) developed an experimental road test. He based the checklist on the information he gathered from forty-six licensing agencies. It reported a relatively high test-retest reliability (r=.77) when used with stu­ dents who had completed driver education. He used two pairs of raters and achieved interrater reliabilities of (r=.93 and r=.88). The test primarily dealt with the selected skills of braking, parking, right and left turns, lane changes, traffic controls and intersections. The test method evaluated the skills individually rather than looking at a sequence of driver b ehaviors.^ Quenault (1968) , in describing a method of system­ atic observation of driver behavior, pointed out some methods that could be very helpful in observing driver be­ havior. For example, the use of dual raters was a desirable 29 McKnight, J . , and Hunt, A. G. Driver Education Task Analysis, Volumes III and IV, March 1571. 30 McGlade, Francis Stanley. An Evaluation of the Road Test Phase of the Driver Licensing Examination of the Various States; An Investigation oi Current Road Tests and Testing Procedures, and the DeveTbpment of a Valid and Re­ liable^ Road Test Based on Driver Implications, dissertation, pp. 250. New York; New York University, i960. 34 technique, but he did not report on interrater reliability. The idea of intensive observation and objectivity was brought up in the study. The use of a rater mirror to assist in observing driver search behavior was mentioned. Other important concepts mentioned were memorization of the route and how to give instructions. Medley and Mitzel 31 (1963) noted that the observer should not be required to rate behaviors on a quantitative scale, but rather the ratings should be qualitative judg­ ments when possible. The ideal classification task would only involve whether or not the proper behavior was dis­ played. The simpler the task, the more likely it would be done correctly. It was their conclusion the simplest judg- ment of whether the behavior occurred or not was best. 32 The authors pointed out that, "selecting behaviors to be observed is done by identifying a limited range of behaviors relevant to the study and constructing items to be used by the observers." 33 31 Quenault, S. W. Development of the Method of Systematic Observation of Driver Behavior, pp. f>0. RRL Report LR 213, Crowthorne, Berks. (Gt. Brit.): Road Re­ search Laboratory, 1968. 32 Medley, D. M . , and Mitzel, H. "Measuring Class­ room Behavior by Systematic Observation." In N. L. Cage (Ed.) Handbook of Research on Teaching. Chicago: RandMcNally, 1963, pp. 251. 33Ibid, pp. 251-253. 35 They summarize an observational technique as, "an observational technique in which an observer records rele­ vant aspects of classroom behaviors as (or within a neglig­ ible time limit after) they occur, with a minimum of quan­ tification intervening between the observation of a behavior and the recording of it. Typically, behaviors are recorded in the form of tallies, checks or other marks which code them into predefined categories and yield information about which behaviors occurred or how often they occurred, during the period of observation." 34 A point of interest was a statement by Medley and Mitzel regarding teacher effectiveness: "Since it may be assumed that whatever effect a teacher has on pupils must re­ sult from his behaviors, it is only necessary to identify the crucial be­ haviors, record them, and score them properly to measure effectiveness in process."35 The report by Boyd and DeVault (1966) dealt with observational techniques, as well as collecting and record­ ing observational data. The observational techniques re­ ported dealt with participant and nonparticipant observers. The participant observer would be the presence of an ob­ server for the purpose of scientific investigation. By participating in a common natural setting, the observer 34Ibid. 35Ibid. 36 gathers better data. Most of the research using obser­ vational techniques involved the nonparticipant observer. This type of observation frequently utilized mechanical devices for observing and recording behavior. The report pointed out two distinct disadvantages to nonparticipant observation: 1. The increase in cost due to needed hardware. 2. The effect of environmental change on behavior. The authors pointed out that structured or unstruc­ tured observations may be used. The inability of any one observer to see and record all behavior that was displayed and the inability to identify distortions and inadequacies seemed to favor the use of structured observations. They also pointed out that accuracy of recall of information and feedback may effect rater agreement. 36 Herbert and Attridge (1975) developed a guide for users of observation systems and manuals: "A set of thirty-three criteria were identified and sorted into three main types: identifying, validity, and prac­ ticality criteria. Identifying criteria enable users to select the correct in­ strument for their purposes. Validity criteria, which include criteria per­ taining to the degree of inference, con­ text, reliability and validity, relate to the accuracy with which the instru­ ment represents the observed events. Boyd, Robert E., and DeVault, M. Vere. "The Ob­ servation and Recording of Behavior," Review of Educational Research, 36(5) 1966, pp. 529-551. 37 Practicality criteria provide information about the ease of administration and dissemination of results.”37 This unique guide was very useful in designing the instrument to be used for recording driver behavior. It was also helpful in clarifying procedures that were in­ volved in an observational study of this nature. The criteria developed by Herbert and Attridge, together with the included examples, were very helpful in developing a training program for the raters. Summary In this chapter a portion of the literature reviewed and deemed relevant to this study was reported. Summaries of the literature selected for reporting dealt with studies relating to plans for investigating driver education; methods of observing and recording driver behavior; driving task analysis and driver education objectives; and obser­ vational techniques- methods and instrument design. The literature indicated that a standard route should be developed. The route should contain a warm-up section, intensive observation zones and recording zones. The route should provide a range of driving tasks and con­ tain a variety of real-world traffic situations. The Herbert, S. D . , and Attridge, C. "A Guide for De­ velopers and Users of Observational Systems and Manuals," American Educational Research Journal, 1975, 12, pp. 1-20. 38 literature suggested that a route specific rating form, incorporating the use of anchor points to rate behavior patterns of the driver, be developed. The literature also indicated that procedures for using the route and the rating form be developed and used to train raters. There were several important issues in the liter­ ature regarding the observation and recording of behaviors. It was suggested that an examination of driver behaviors stipulated by performance objectives, rather than a normreferenced approach, was desirable in terms of design, con­ trol and observability. The literature indicated that inferences, regarding judgments and decisions, may be drawn from observed behaviors. It was suggested that qualitative judgments of behaviors be used instead of a quantitative scale. Chapter III THE METHOD OF PROCEDURE The primary objective of this study was the develop­ ment of an instrument to measure the in-car performance of Michigan driver education students and the estimation of the instrument's reliability. An additional concern of the study was the amount of agreement between pairs of trained raters. This chapter contains the methods of procedure by which the study was conducted. instrument design, (2) development and administration of training program for raters, tions, Included are (1) route and (3) subjects, (4) delimita­ (5) null hypotheses and (6) statistical analysis. Route and Instrument Design There were several driving performance tests avail­ able for use. However, this project was concerned with the ultimate goal of measuring the effectiveness of Michigan driver education programs. If this practical concern of the project staff was to receive attention, efforts were needed to devise a measure that was responsive and generalizable to the content of Michigan driver education programs that would be evaluated in the future. A practical approach to the development of such a measure seemed to be the 39 40 development of objective-referenced criteria, using the incar performance objectives developed for Michigan driver education programs by the Michigan Department of Education. 38 Route Design Practical considerations of previous efforts. At this stage of the project, efforts had been made by the Department of Education to develop an instrument and a route. The instrument consisted of 28 typed pages. The format of the instrument was basically that of a checklist requiring the rater to check "Yes" or "No" in response to observations of the driver's performance. The route incor­ porated approximately 23 miles of roadway and required ap­ proximately 50 minutes of driving time. After reviewing the preliminary instrument and route design, the project staff decided further evaluation and re­ vision was necessary. A panel of experts, consisting of project staff and Michigan State University Highway Traffic Safety Center staff, was formed to assess the project's needs and to examine the preliminary route and instrument design. The panel was comprised of experts in the areas of human factors' research, driver education, driver licensing, psychology and evaluation. The panel and project staff re­ viewed the instrument and the route individually and 38 Michigan Department of Education, Driver Education Performance Objectives, June 1976. 41 collectively by both reading the materials and driving the route. The combination of expertise, experience and group discussions complemented a practical approach to analyzing the preliminary design and content of the instrument and route. The combined efforts of the panel of experts pro- vided the basis for the development of an instrument and route that was manageable and comprehensive. This approach met both the practical and research concerns of the project staff. Determining route content. Following the recommen­ dations of the panel of experts, the project staff's next task was the evaluation of the in-car performance objectives. The project staff directed their review of the objectives to the determination of which objectives were critical to safe operation of a vehicle in a real-world traffic environ­ ment, as well as to those objectives which were atypical, extremely hazardous, or were logical prerequisites to the attainment of other objectives. It was the decision of project staff that certain objectives, such as those covering parking, turnarounds, backing, and entering and leaving the car, were procedurally and manipulatively oriented. These objectives were deleted from the set to be evaluated as they were determined not to be critical for safe operation of the vehicle. These ob­ jectives could be evaluated by a separate instrument, using an off-street area, if desired in future projects. 42 The project staff also decided that objectives re­ quiring responses to emergency vehicles, school buses, and passing maneuvers were conditions that offered a very low probability of occurrence for each subject being evaluated. On the basis of this observation, the decision was made to eliminate these objectives from the evaluation. The objectives requiring subjects to demonstrate the procedures for off-road recovery and operating the vehicle without power assistance were considered too hazardous to include in this evaluation. The safety and liability issues involved warranted their omission from the study. It was agreed by project staff that various objec­ tives and procedures, such as placing the gear selector in park or neutral and turning the key to start before starting the engine, were obvious prerequisites to terminal objec­ tives. However, provisions for measuring these objectives were provided for in the vehicle familiarization section of the evaluation by a Yes/No checklist. The route was then reviewed and analyzed to deter­ mine what tasks were required for proper negotiation. These task requirements were then matched with the task require­ ments of the performance objectives. A determination was then made on whether the route yielded the necessary situ­ ations to evaluate the performance specified by the objec­ tives chosen for evaluation. Portions of the route that yielded no situations, or at least no new situations, were identified. The route was then modified to eliminate or 43 reduce the length of nonproductive segments of the route. The route was reviewed again by project staff. Consideration was given to areas that might cause unwanted delays in negotiating the route. After locations such as railroad crossings/ parade routes, and construction zones were identified, the route was further revised. Selecting a starting point. Having determined the approximate beginning and end of the test route, it was necessary to locate an area that could serve as the origin and final destination of the road test during data col­ lection. Ideally, such an area would provide off-street parking, a facility to shelter staff and subjects from the weather and restroom facilities. The facility should be close to the test route and provide relative ease of entry to the beginning of the route. With the aid of a Lansing-East Lansing map, such an area was located. potential location. The Red Cedar School was identified as a After an inspection of the facilitiy, it was determined that the facility was nearly ideal in meeting the specifications. Project staff from the Michigan Department of Education made the necessary inquiries and requests, and permission was obtained to use the Red Cedar School. With the route's origin now determined, an adequate warm-up section could be added to the route. This would allow the subject an opportunity to become more familiar with the operating features of the vehicle, the presence of 44 the raters and the manner in which directions would be given. It also provided an opportunity for the raters to create a more relaxed atmosphere for the subject and to identify extreme deficiencies in the subject's ability to operate the vehicle. A practical return and closure of the test route could also now be determined. The portion of the route incorporated for the return to the facility was analyzed to determine if that portion of the route contained any situ­ ations that would yield opportunities and requirements for measuring performance specified in the performance objec­ tives. Identifying intensive and general observation areas. The route was further reviewed to identify the exact areas where an objective or combination of objectives would be evaluated. The task was to identify a logical beginning and ending of the driver behaviors required to complete the driving task and attain the objective specified. The route between designated points where intermittant testing began and ended was referred to as intensive observation areas. Areas between the intensive observation areas were identi­ fied or created. These were areas where driver performance was not being observed for the purpose of recording the driver's behavior. These areas were labeled general obser­ vation areas and provided the time and distance along the test route for the rater to complete a record of the driver performance that had just been observed in the intensive 45 observation area of the route. This provided for a com­ prehensive route yet one which was manageable by raters. The route covered approximately 11 miles and required an average driving time of 28 minutes. Developing standardized directions. The next com­ ponent in the route design was the development of a standardized set of directions. These would be used by the raters to direct drivers over the test route. tions had to be clear and direct. The direc­ They would have to be specific and use common terminology as much as possible. The wording was designed to initiate a response on the part of the driver, but not to change the driver's behavior from what would normally be displayed during the execution of various driving tasks. Landmarks and traffic signals were utilized to clarify the directions. At times hand gestures were used to complement the directions. The wording, timing and location along the route were to be standardized components of the directions. give the directions. The front seat rater was to If the driver asked for the direc­ tions to be repeated or indicated the directions were not clear, the rater was to repeat the directions. The directions were piloted by administering the directions to volunteers as they drove over the test route. On the basis of the information received from the pilot test, the directions were revised. The directions were then presented to the panel of experts used in the review of the route and instrument content. However, the directions were 46 not finalized until the raters had an opportunity to use them during the training program. The standardized direc­ tions can be found in Appendix A. Identifying and controlling abort situations. While reviewing the route and pilot testing the directions, the writer noticed that there were several locations along the route that were conducive to either a driver or a traffic abort situation. If the driver deviated from the pre­ scribed route, due to driver error or traffic interference, the planned observations and the recording of driver be­ havior would be interrupted. Either alternate test segments to the route would have to be designed, or an alternate route returning the driver to the original test route or coaching to avoid the abort would have to be used. Due to the nature of the potential abort locations, the project staff decided to use the latter two options. For those locations that provided the capability of easy return to the route, that technique would be used. For those abort situations that would cause considerable increase in time of returning to the route, coaching was most practical means of avoiding the determined to be the abort. An example would be to coach a driver to position his/her vehicle into a certain lane ahead of time to insure that the abort situ­ ation was avoided. The coaching was to be done while the driver was in a general observation area. Although the coaching would alter the driver's behavior, it would occur 47 in an area where the driver*s behavior was not being ob­ served for the purpose of recording. However, if the rater had to intervene while the driver was in.an intensive observation area, that maneuver or behavior would be recor­ ded as unsatisfactory. Instrument Design and Format Designing the rating form. To assist in reliably scoring driver performance, a form was needed that would permit the rater to recall the driver's performance and record it quickly and accurately. form was labeled a LOPE. Each page of the rating (LOPE is an acronym for Location Of Performance Evaluation). Each LOPE is an intensive ob­ servation area and is comprised of test segments or traffic situations referred to as SPOTS. (SPOTS is an acronym for Specific Performance Objective Test Site). The rating form was designed to record four scores for each test segment (SPOTS). The four scores were pattern, search, speed con­ trol and direction control. The exception to recording four scores would be when a performance objective stipulated timing as an additional performance to be scored. The Driver Performance Measure (DPM) and Michigan Road Test (MRT) rating forms were used as models for the present rating form and provided the following: 1. A summary of specific behaviors, stipulated by the performance objectives, listed for each SPOTS. These anchor points assisted the raters in recalling the satisfactory/unsatisfactory pattern performance for the SPOTS. 48 2. Satisfactory and unsatisfactory rating spaces for each behavior pattern and element for each SPOTS. 3. Space for qualitative notes or abbreviations that would assist the rater in recalling what had happened. 4. A logical progression for scoring as LOPE's and SPOTS followed the exact route. 5. A separate page for scoring each LOPE. Immediately after scoring each LOPE, the page was to be turned and the rater prepared to enter the next LOPE. The rating form can be found in Appendix B. Procedure for driving performance. The method of scoring driver performance required the trained rater to observe intensively the driver's performance throughout the LOPE. Upon the driver's completion of the LOPE, the rater was to record immediately the driver's performance as "satisfactory" or "unsatisfactory" for each SPOTS as stipu­ lated by the performance objective. The record of perfor­ mance was completed by first scoring the overall pattern of performance for the SPOTS and then, secondly, scoring the element behaviors. Each score required the rater to know the driving task and the range of satisfactory behavior required to complete the task as stipulated by the perfor­ mance objective. Observation of the driver's performance for the purpose of scoring occurred only during the LOPE. Recording of the observed driver performance occurred during the time and distance between the LOPE's. After recording was completed and while still between LOPE's, 49 directions were given to the next portion of the route. This method of scoring driver performance in the driver education road test was the same as used in the DPM and the MRT. Dividing the rating form into two formats. first format was labeled vehicle familiarization. The The sub­ ject was asked, by the front seat rater, to identify the gauges and devices by pointing them out or touching them, when possible. The front seat rater then proceeded to state verbally the information gauges, starting and control de­ vices, and safety devices that appeared in order on the rating form. Beside each gauge or device on the rating form was a Yes/No column. The raters then placed a check by the appropriate column to indicate the subject's response. This rating was done shortly after the subject entered the ve­ hicle and before the subject prepared to move the car from the parking area. Other items found on the vehicle familiarization form were listed under the headings of pre-ignition control tasks, starting the engine, putting the car into motion, stopping the vehicle and securing the vehicle. The rater made no verbal request of the subject to perform these tasks other than to give directions to initiate the exit from and return to the parking lot. After giving the direc­ tions necessary to initiate these behaviors, the raters in­ dependently observed and recorded the subjects' performance. 50 The second format of the rating form consisted of recordings of the subject's driving performance as the sub­ ject responded to various driving tasks while negotiating the route. LOPE. Each page of this format was identified as a Each LOPE was identified by a Roman numeral. The numerical order of the LOPE's represented the sequential progression of the portions of the route where driver be­ havior was intensively observed and recorded. A sample of the MDE rating form can be found in Figure 1. Each LOPE was subdivided into segments referred to as SPOTS. Directly under the heading SPOTS, numbers, representing the performance objectives being tested, were listed. Beside the SPOTS was listed the specific location on the route where the observation of the performance specified by the performance objective would occur. Im­ mediately following the specific location where the driver behaviors were to be observed was listed the range of ex­ pected driver behaviors. The listed driver behaviors served as anchor points and represented the extreme unsatisfactory and satisfactory behaviors on a continuum. The satisfactory behaviors were representative of the behaviors specified in the performance objectives. The extreme unsatisfactory behaviors were determined by using the opposite behaviors specified by the performance objectives. Previous research, project staff and the panel of experts agreed that all observable driver behaviors could be recorded under three basic elements of the driving task. Figure 1 Michigan Driver Education Evaluation Project Driver Performance Rating Form Subject Rater Date Run No. Program Performance On Speed Control Direction Control LOPE II Specific Performance Objective Test Site Search Spots 3.1 Michigan Avenue westbound _________ D_________ Does not check mirrors, fails to signal right, does not check blind spot, does not change to right lane, changes lanes abruptly causing traffic to slow or swerve, does not adjust lane position or speed, fails to cancel direc­ tional signal. U U U U U 0 Spots 3.3A Checks mirror, signals right, checks blind spot, changes to right lane, blends smoothly with traffic, adjusts lane position and speed, cancels direc­ tional signal. Michigan Avenue and right turn onto Homer U ______ _ Fails to check left and Checks traffic es­ rear traffic, fails to pecially left and rear, signal right, fails to signals right, reduces reduce speed, starts speed, turns into lane #3, recovers using hand turn early, turns into over hand, adjusts lane #1 or 2, recovers by palming or shuffling speed to match flow. wheel, does not adjust speed to flow. 52 These elements were search, speed control and direction control. It was also agreed that a fourth element existed but that it was a determining factor in the satisfactory or unsatisfactory rating of the three major elements. This element was timing. After giving a direction to the driver, the raters began intensive observation of the driver's behaviors during the LOPE. Upon completion of the designated LOPE, the driver entered a general observation area. While in the general observation area, the driver's behavior was not intensively observed; rather, the raters were recalling and recording the driver's performances of the previous LOPE. The elements of search, speed control and direction control were rated as unsatisfactory or satisfactory based upon the driver's compliance with the satisfactory anchor points of each of the SPOTS within the completed LOPE. Some of the objectives specified timing as a cri­ terion for satisfactory performance. For those objectives, a T was printed immediately under the unsatisfactory symbol of the rating form. If the element of performance was un­ satisfactory due to timing, both the U and T were marked. If the driver performed a particular element satisfactorily, the performance was rated satisfactory by marking the S. 53 Development and Administration of Training Program for Raters Selection of Raters The study involved six raters. Five of the indivi­ duals selected as raters had completed the educational re­ quirements for certification to teach driver education in Michigan. Of these, four had recently completed a course at Michigan State University consisting of supervised practical teaching experience in driver education, and the other had been teaching driver education in the public schools for approximately 12 years. The sixth rater was a member of the Michigan Department of Education who had par­ ticipated in the rater training program. The design called for two pairs of raters to be used in the data collection phase of the study. These two pairs of raters were counter-balanced, and they rotated front and rear seat positions on two successive drives with a subject. The four most compatible raters, excluding the rater from the Department of Education, as determined from practice runs, were to be used in the data collection phase. The remaining rater, along with the individual from the Michigan Department of Education, formed a pair of alternate raters and were assigned an alternate subject. Used in this man­ ner, the additional raters, subject and vehicle were always available to serve as a backup to cover vehicle malfunction or any absenteeism of subjects and raters but were not used in the data analysis. The data from the practice runs 54 indicated that there was no obvious rater incompatibility; therefore, the four raters used in the data collection and analysis were randomly assigned to pairs. Training Program The six individuals selected as raters were ad­ ministered a four-day training program. The major elements of the training effort were route-specific. Stating the directions verbatim at precise locations along the route, knowing when to observe and when to record, and recognizing where expected behaviors would occur were critical to ob­ jective and accurate recording of the observed driving be­ haviors. This necessitated a major block of time for prac­ tical work in a vehicle on the route. Other methods for presenting route-specific training included the use of 35 mm slides of the route and overhead transparencies of the route and rating form. The color slides represented the route components and were arranged in progression. The slides were taken from inside an auto­ mobile from the front seat passenger's side. This was done to better represent the front seat rater's view. In ad­ dition to representing the progression of the route, the slides depicted the nature of dynamic traffic patterns and represented a range of satisfactory driver behavior for each SPOTS. The overhead transparencies were used either alone or as complements to the slides of the route. When used in combination, the slides and transparencies were 55 incorporated in a split-screen technique. In addition to the route-specific training, the raters were exposed to new terminology related to the study. Techniques for giving directions and for responding to the subjects' questions were presented and demonstrated. Tech­ niques for observing and recording driver behavior, avoiding abort situations and maintaining the safety of the driver and vehicle were also presented and demonstrated. Practical application of these training components was attained by actual practice on the route using practice subjects. raters were paired. The Then they directed the subject over the route and observed and recorded the driver's performance. The writer monitored these practice runs and discussed them with the raters. During the training program, Dr. Robert 0. Nolan and Mr. Fred Vanosdall discussed and demonstrated writing nar­ ratives of driver performance and objectivity in recording driver performance, respectively. The narratives provided a qualitative summary or explanation of the driver's perfor­ mance. The session on the need for objectivity in recording driver performance provided the rationale for drawing in­ ferences and making judgments based only upon observed be­ havior. There was one training session during the program that involved the use of films. The two films used were from the Aetna Driver Simulator series and were entitled IPDE and Separate and Compromise. Although these films were 56 not related to the test route, they did offer the opportu­ nity for the raters to view the same driving tasks and dri­ ver behaviors in a controlled environment. Only the intro­ ductory portion of the film, which incorporated the use of a model driver, was used. The raters were asked to observe driver behaviors and to recall and list them when the projector was stopped. This was done in short segments with the only narrative being verbal directions for the model driver. cussed. The list of behaviors was then shared and dis­ The film was re-run during the discussion to clarify the observable behaviors. The exercise provided an example of objectivity, reliability and intensive obser­ vation . The training material was delivered primarily through a lecture method. These lectures were supplemented with audio-visual presentations, when appropriate. Practical applications of the training program were conducted during field exercises that involved practice ratings of subjects on the test route. The outline of the training program can be found in Appendix C. Selection of Practice Subjects While the training program for raters was being con­ ducted, driver education students from Lansing Catholic Central High School were completing their driver education course. These students were asked to volunteer to partici­ pate in the training program by driving a prescribed route 57 while a pair of raters observed and recorded their driving performances. The subjects were told that they would be asked to drive the route twice. Volunteers were asked to complete the name, address and phone number sections on a parent permission form, have their parents sign the form and return the form to the writer. The students were told that they would be called ahead of their scheduled session to finalize arrangements. The students were called two days prior to their scheduled participation in the training program to confirm their attendance. On the day they were scheduled to drive, the students were picked up at their home and returned when their driving session was completed. The students who participated as practice subjects during the training program had just completed a four-phase driver education program. The students received the class­ room phase of their course at Lansing Catholic Central High School; the simulation, driving range and on-street phases were conducted at Michigan State University as part of a driver education teacher preparation course. In this course university students performed the practical teaching respon­ sibilities under the supervision of MSU Highway Traffic Safety Center staff. All of the practice subjects had passed their driver education course, but none had yet re­ ceived his or her driver's license. 58 Evaluation and Feedback To assist in determining the effectiveness of the training program and the progress of the raters, several techniques of evaluating each rater's progress were used. Evaluation of rater performance and feedback were concen­ trated on comprehension of the route and route components and on rater agreement on driver performance ratings. To evaluate the raters' comprehension of the route and route components, the raters were requested to trace the test route on a map. The raters were then asked to identify the beginning and ending of each LOPE on the map. The raters were asked to state verbatim the directions for the route while viewing slides of the route. The raters were also required to state the driving task and satisfac­ tory driving behaviors for each SPOTS, while viewing the slides of the route. To evaluate rater agreement on recorded behaviors, the raters were required to view a portion of film repre­ senting a demonstration driver and record the driver's be­ haviors. The recorded behaviors by the raters were then compared for agreement. The film was re-run to reinforce standard observations and recordings. During early practice runs on the route, the ratings were reviewed and discussed after each LOPE. While monitor­ ing practice runs with practice subjects, the writer also rated the subjects and compared these ratings to those of the raters. Rating forms were monitored after each practice 59 run. In addition to comparing the ratings of paired raters, the writer checked the rating forms for omitted data and margin notes and asked the raters to recall their obser­ vations when there was disagreement between raters or when data was missing from the rating form. Subjects During the month of July, the Michigan Department of Education identified seven school districts in the LansingEast Lansing area having driver education programs that would be completed prior to the scheduled data collection phase of this study. Eighty students from the seven school districts were randomly selected as potential subjects for the driver per­ formance test. Although only 45 subjects would be required for the study, the extra students were necessary in the event some students had time conflicts, did not receive per­ mission to participate or failed to return the permission forms. Letters were sent to the 80 students' parents or guardians requesting permission for the students to parti­ cipate. Prom the 45 students required for the study, 30 would be evaluated by two pairs of raters. The additional 15 students would be assigned to an alternate pair of raters and used as alternate subjects if some of the original 30 students could not participate. The 80 subjects, randomly selected for participation in the data collection phase of the study, were from the 60 Dansville, Haslett, Lansing, Mason, Okemos, Waverly and Williamston school districts. The students had completed a driver education program just prior to the data collection phase of the project. All subjects selected had passed a driver education course but were not yet licensed to drive. The subjects chosen for participation came from a combination of classroom and on-street content to a combin­ ation of classroom, simulation, range and on-street educa­ tional programs. The type of program was not a consider­ ation of this study. The raters, however, were not told what type of program the subjects had completed. This was done to avoid the possibility of rater bias in regards to their opinions as to which programs may produce better dri­ vers. Delimitations Based upon the characteristics of the subject popu­ lation discussed in the preceding section, generalization of the findings is limited to: 1. Students who have successfully completed a driver education program which used Michigan Driver Education Performance Objectives and who were not yet licensed to drive. 2. Students in the age range of 14 to 18. 3. Students whose socio-economic background is consistent with that found in the greater Lansing area, which includes inner city, rural and suburban populations. 61 Hypotheses Study Analysis There were five major hypotheses investigated in this study. The focus of the first three hypotheses con­ cerned the consistency of test scores across items and test runs. The following were the specific null hypotheses tested: 1. Research Hypothesis: There would be differences in difficulty among the items in the driver per­ formance test. H : 2. Research Hypothesis: Run administrations would not effect driver performance scores. H : 3. Item difficulty will not have a systematic effect on driver performance scores. Test runs will not have an effect on driver performance scores. Research Hypothesis: The subjects' driving per­ formance scores would not vary according to items interacting with the time of test adminis­ tration (i.e. between run one and run two). H : Driver performance scores will not vary ° according to items interacting with run administrations. The focus of the fourth hypothesis was on the inter­ nal consistency of the test. This hypothesis considered the relationship between true scores and observed scores. 39 The hypothesis also took into account the effects of time and individual items. 39 Glass, Gene V. and Stanley, Julian C. Statistical Methods in Education and Psychology. Englewood Cli£fs, New Jersey: Prentice-HalTJ 19 70. 62 4. Research Hypothesis: A positive relationship would exist between true driver performance scores and observed driver performance scores. H : No relationship exists between true driver performance scores and observed driver performance scores. The focus of the fifth hypothesis was interrater reliability, or the agreement between pairs of raters. 5. Research Hypothesis: A positive relationship would exist between raters on measures of sum, search, speed control, direction control, familiarization and signs. H : No relationship exists between raters on ° measures of sum, search, speed control, direction control, familiarization and signs. Statistical Analysis The purpose of the study was to determine the re­ liability of an observational measure designed to evaluate in-car performance of Michigan driver education students. It was important, therefore, to determine the various re­ liability characteristics of the in-car performance test procedures. Topics to be discussed in this section are: a. counterbalanced design. b. test for null hypotheses 1-3 using ANOVA. c. test for hypothesis 4 - reliability coefficients and significance. d. test for hypothesis 5 - reliability coefficients and significance. Counterbalanced Design There were thirty study subjects making two test runs. There were four raters evaluating the two runs. A 63 counterbalanced design was used for efficiency in the analysis by compensating for external influences such as rater bias in terms of rating run 1 for a subject versus run 2, as well as assuring that no rater would be consis­ tently paired with another rater. Since four raters were used, a repeated pattern of rater assignment, counter­ balanced at run 1 and run 2 for each group of six subjects, would cover all possible rater pairings. For example, raters A and B tested subject 1 during run 1 and subject 6 during run 2. Figure 2). (The counterbalanced design is shown in This design also provided for control of con­ temporary history, maturation processes, measuring instru­ ments, statistical regression, experimental mortality and interaction of selection and maturation. Test of Null Hypotheses 1-3 The first three null hypotheses for the study were tested using analysis of variance. This test was chosen for its ability to test for separate effects of two or more independent variables and the interaction effects of those variables. In this study, the variables were items, runs and subjects. Sources of variation were determined using the Millman-Glass Rules of Thumb (Ref). The relevant ANOVA Table is provided in Figure 3. For hypotheses 1-3 the following F-ratios were used. 64 Figure 2 Counterbalanced Design - Repeated Pattern of Rater Assignment Run 1 Raters ABCD Subjects 1 Run 2 Raters ABCD Items for each rater for each run AB CD 2 AC B D 3 A D BC 4 BC A D 5 BD AC 6 CD AB Repeated pattern of rater assignment for each group of 6 subjects. 65 Figure 3 ANOVA Table Sources Between Subjects SS SSs df 29 Within Subjects Items SSi I-l Runs sst 1 Items x Runs SSti Items x S g SSIs Runs x Sg SSts Items x Runs x S SSITs M S I=SSI/(I-1) MSt= SSt/l (1-1)1 m s i t =s s i t / (1-1)1 29(1-1) MS=SSIS/29(I-1) 29 MS=SSIS/29 29 (I-l) MS=SSITS/29(I-l) 66 1. Differences in item difficulties „ _ MS I t e m s ____ MS items x SubjeoEs 2. Differences in average performance over runs _ MS R u n s ____ MS Runs x Subjects 3. Items x Runs interaction; i.e., is pattern of performance on the items consistent at Runs 1 and 2 p _ MS Items x Runs_____ MS Items x Runs x Subjects Test of Null Hypothesis 4 Reliability coefficients are affected by the assump­ tions one makes regarding the sources of variation built into the study. The assumptions used here treated subjects as a random variable and items and runs as fixed variables. These assumptions lead to a liberal interpretation of the data, hence, a higher expected reliability coefficient. One could, however, treat subjects and items as random, while leaving runs fixed, or subjects, items and runs as random. Both sets of assumptions lead to progressively more conservative estimates of reliability since they will account for a smaller true score. This study reported the estimates of reliability for all three methods to provide for the possible range of reliability estimates. Each of the formulas for the relia­ bility coefficients had the form of True score variance Observed score variance. 67 Method I Subjects Random-Iterns fixed - Runs fixed MS Subject-MIXSX Runs MS Subjects Method II Subjects Random Items Random Runs Fixed MS Subjects-MS Items x Subjects MS Subjects Method III Items Random Runs Random Subjects Random MS subjects + MSIX SXR_MSIXS-SXR MS Subjects The findings were considered significant if the F ratio was beyond that expected at alpha = .0 1 . Test of Null Hypothesis 5 Since the assumption of normality and equality of variance was made, a parametric statistic was needed. analysis used continuous data. The The Pearson Product Moment correlation was used to determine the correlation coeffi­ cient. The findings were considered significant if the coefficient was greater than that expected at alpha = .0 1 . Summary This chapter contained the methods of procedure by which the study was conducted. instrument design, (2 ) development and administration of training program for raters, tions, Included were (1) route and (3) subjects, (4) delimita­ (5) null hypotheses and (6 ) statistical analysis. Remaining are Chapter IV, in which the findings of the study will be presented, and Chapter V, which will report 68 a summary of the results and conclusions. Chapter IV FINDINGS OF THE STUDY The primary purpose of this study was to determine the reliability characteristics of an in-car performance test. The test was developed to measure driver education students' attainment of the in-car performance objectives set by the State of Michigan. More specifically, the study was concerned with whether or not there was varia­ bility in test performance (measured by variance in item difficulty), whether raters could consistently rate a driver's performance within the same run and whether one run affected student performance on a second run. The remainder of this chapter will present the re­ sults of the reliability study within each of the null hy­ potheses set out in Chapter III. The first section covers item difficulty and score stability. This section presents data concerning the first three null hypotheses. The next section, reliability of the in-car performance measure, presents the comprehensive reliability coefficients, taking into account multiple raters, runs, items and subjects. The last section, interrater reliability, presents corre­ lations between pairs of raters. 69 70 Item Difficulty and Rating Stability The test was expected to show a range in difficulty for the various component parts in order to determine variance in driver performance. This was necessary in order to ensure that a subject's score on one component of the test was not necessarily a predictor of the total test and that subjects truly vary in ability to perform the in-car objectives. The study also intended to demonstrate that there was stability between performances from one run of the test to the next. Significant differences between scores on run one and run two could reflect a learning ef­ fect. Rater bias is not an issue since no rater made back- to-back runs with the same subject. Following are the three null hypotheses tested to determine the consistency and stability of ratings for items and runs of the two test administrations. 1. Item difficulty will not have an effect on driver performance scores. 2. Run administrations will not have an effect on driver performance scores. 3. Driver performance scores will not vary accord­ ing to items interacting with run administra­ tions . All three null hypotheses were tested with ANOVA. The ANOVA Table can be found in Table 1. Following are the F ratios for tests of the first 71 Table 1 ANOVA Table Relationships Between Student Performance, Item Difficulty and Run Administrations SS DF Items SSi I-l Runs SSt 1 Source of Variance Subjects SSs 29 MS M S s =SS s /29 MSI=SSI/(I-1) F S 41.75 3.36 M S s =SS s /29 ItemsxSubj ects ItemsxRuns ItemsxSubj ectsxRuns 1.03 NS 72 three null hypotheses. 1. Item difficulty will not have an effect on driver performance scores. F = MS Items__________ _ MS ItemsxSubjects 75 The F ratio was significant beyond alpha .01. Therefore, the null hypothesis was rejected. This finding suggested that there was a difference in item difficulty and that the items were discriminating. 2. Run administrations will not have an effect on driver performance scores. F = MS Runs = o MS RunsxSubjects The finding was not significant. null hypothesis was not rejected. Therefore, the It did not appear that back-to-back runs made a difference in driver perfor­ mance. This result was encouraging in that performance appeared to be stable over runs, thereby suggesting that the student may not necessarily "learn" by taking the test when there is no feedback given after the run. 3. Driver performance scores will not vary accord­ ing to items interacting with run administra­ tions. F _ MS ItemsxRuns_____ (___ = 1 03 MS ItemsxRunsxSubjects There was some concern that while no overall item or run differences might occur, there might be an inter­ action between the two. If this were supported, then one 73 could assume that some items in the test were sensitive to learning effects. significant. jected. However, the finding (F = 1.03) was not Therefore, the null hypothesis was not re­ This suggested that there was no significant interaction between test items and run administrations. Internal Consistency Hypothesis four stated that no positive relationship exists between true driver performance scores and observed driver performance scores. There are several ways to ap­ proach this relationship, and depending upon the assumtions one makes, the resulting correlation coefficient is affected. If subjects are treated as a random variable, then it can be suggested that subjects are a random sample from the population of subjects that could be tested with the instrument. Items and runs were treated as fixed vari­ ables, considering the items as the only items of interest to measure the objectives and the runs as the only two runs of interest. Therefore, the resulting reliability co­ efficient is generalizable to other subjects from the same population, but only to those items included in the test and the two runs administered. This particular method of analysis is perhaps the most conservative because of its limitations on generalizability. For the purposes of this study, two additional methods of analysis were performed. Each method is 74 progressively more generalizable. In addition to consider­ ing subjects as a random sample of other subjects from the same population, items may be considered a random sample from those items measuring the same objectives. The third method of analysis is the most liberal treatment of the variables. With this method one can gen­ eralize to all subjects, all items and all testing times from the respective populations of subjects, items and testing times. Following are the results of the three methods of analysis. The formula applied for the analysis is taken from the Millman-Glass Rules of Thumb for the analysis of variance. The reliability coefficient has the following form: True Score Variance____ Observed Score Variance Method I. Subjects Random MSs “ MSI x Items Fixed Runs Fixed Sx R MSs 6,803728-.261101 _ 6.083728 „c-, There was a positive relationship between true and observed driver performance scores. hypothesis was rejected. Therefore, the null The reliability coefficient for the overall test was very high and clearly acceptable. 75 Method II. Subjects Random Items Random Runs Fixed MS -MS t „ s IxS MS s 6.083728-.379508 _ 6.083728 •yJ/ The reliability coefficient remains very high and acceptable. Method III. Subjects, Items and Runs All Random m s s +m s i x s x r ~m s i x s - s x r MSs 6.083728+.261101-.379508-1.518399 _ 6.083728 * U Interrater Reliabilities Hypothesis five concerned the interrater corre­ lations between ratings of Sum, Drive, Search, Speed Con­ trol, Direction Control, Familiarization and Signs. In addition to the reliability coefficients of the test, rater agreement was determined for each of the test components. Because the same two raters were not always paired, the rater agreement was determined for pair one (irrespective of individuals) on run one and pair two (ir­ respective of individuals) on run two. The correlations are presented in Table 2. The correlations showed a high degree of agreement on the overall (SUM) test ratings. Rater agreements on the other components were also high ranging between .49 and .83. This suggested that regardless of pairing or front and back 76 Table 2 Rater Agreement Components Pair Two Search .67 .79 Speed Control .72 .51 Direction Control .75 in • in ro • Familiarization . 00 u> • Drive o o CO . 00 o Pair One Signs .49 .54 Sum .8 6 .83 seat positions, raters of similar background, who were ad­ ministered the same training objectives, could be expected to use this instrument with a high degree of consistency. There was a positive relationship between ratings; there­ fore, the null hypothesis was rejected. Summary In this chapter was found the analysis of the re­ liability characteristics of the Michigan Driver Education Test. The analysis addressed the areas of item difficulty and stability, internal consistency and interrater relia­ bilities. The following chapter contains a summary of the results and conclusions. Chapter V SUMMARY AND CONCLUSIONS Summary This study dealt with the development of an obser­ vational measure for evaluating in-car performance of Michigan driver education students and the determination of the measure's reliability characteristics. The develop ment of the observational measure involved the development of a standard route, a route-specific instrument and pro­ cedures for scoring. Reliability and validity had to be determined if the instrument was to be used to assist in the evaluation of driver education program effectiveness. To be respon­ sive to the concerns for the reliability characteristics of the instrument, it was necessary to determine item difficulty and rating stability, internal consistency and interrater reliabilities. This study determined that the instrument was reliable and could be used consistently by trained raters. A validation study, using the Michigan State University Driver Performance Measurement criterion, was conducted at a later date. 77 This study determined that the 78 instrument was valid. 40 Statement of the Problem No instrument existed that was designed to measure the Department of Education's in-car performance objectives. An instrument of this nature was needed by the Department of Education to determine the effectiveness of driver edu­ cation programs in the state. The purpose of this study was to develop an obser­ vational measure to evaluate in-car performance and to determine the reliability characteristics of the obser­ vational measure. It was important to ascertain which of the observational measures provided by the instrument were reliable, which were not, and under what conditions. In keeping with this concern, the study addressed the following research hypotheses: 1. That there would be differences in difficulty among the items in the driver performance test. 2. That run administrations would not affect driver performance scores. 3. That the subjects' driving performance scores would not vary according to items interacting with the time of test administration (i.e. between run one and run two). 40 Michigan Department of Education. Michigan's Driver Education Evaluation Project. Lansing! The Department, 1978. 79 4. That a positive relationship would exist between true driver performance scores and observed dri­ ver performance scores. 5. That a positive relationship would exist between raters on measures of sum, search, speed control, direction control, familiarization and signs. Methods of Procedure A concern of the project staff was the development and evaluation of a driver performance measure that was responsive and generalizable to the content of the programs that would be evaluated in the future. An approach to the development of such a measure was the development of ob­ jective-referenced criteria, using the in-car performance objectives developed for Michigan driver education programs by the Michigan Department of Education. A panel of experts, consisting of project staff and Michigan State University Highway Traffic Safety Center staff, was formed to assess the project's needs and to examine the preliminary route and instrument design. The combined efforts of the panel of experts provided the basis for the development of an instrument and route that was manageable and comprehensive. The initial task of the project staff was the evaluation of the in-car performance objectives. The pro­ ject staff directed their review to the determination of which objectives were critical to safe operation of a 80 vehicle in a real-world traffic environment, as well as to those objectives which were atypical, extremely hazardous, or were logical prerequisites to the attainment of other objectives. An integral part of the study process was the design of a test route that would yield the necessary situations to observe and record student driving performance as stipu­ lated by the Michigan Department of Education's in-car performance objectives. The second component of the process was the design of an instrument that was concise, thorough and definitive enough to be easily manageable by the raters observing and recording driver performance. The third component was the design and implementation of a training program for the raters who observed and recorded the driving performances. The final component was the development and implementation of a counterbalanced design for rater and subject assignment during the data collection phase of the study, and the statistical treatment of the data to determine the reliability characteristics of the driver performance test. Major Findings The first hypothesis, "Item difficulty will not have an effect on driver performance scores," was not re­ jected. The F ratio was significant beyond alpha .01, suggesting that there was a difference in item difficulty. The second hypothesis, "Run administrations will not have an effect on driver performance scores," was not 81 rejected. The finding was not significant, suggesting that the performance appeared to be stable over time. The third hypothesis, "Driver performance scores will not vary according to items interacting with run administrations," was not rejected. The finding was not significant and suggested that there was no significant interaction between test items and run administration. The fourth hypothesis, "No positive relationship exists between true driver performance scores and observed driver performance scores," was rejected. Three methods of analysis, each progressing in the assumption of random­ ness of the variables, were employed. coefficients were .957, The correlation .9 37 and .730, respectively. The fifth, and final, hypothesis, "No relationship exists between raters on measures of sum, search, speed control, direction control, familiarization and signs," was rejected. The interrater reliability for pairs of raters ranged from .49 to .83 on test components. or overall test, had a reliability coefficient of The sum, . 8 6 for pair one and .83 for pair two. Conclusions The results of this study indicated that the driver performance measurement test, developed to measure the incar performance of Michigan driver education students as stipulated by the Michigan Department of Education's in-car performance objectives, was a reliable test. It had a 82 range of difficulty in the items, was internally consis­ tent, and had consistency and stability of ratings across two runs of administration. The interrater reliabilities appeared to be more than adequate, meaning that the test was dependable under the conditions of the rater training program. Whereas a reliable and valid test did not exist to measure driver performance, as stipulated by Michigan's driver education performance objectives, prior to this study, one now exists. Recommendations Based upon the results of the test developed, the issue is to adopt a more widespread use of the test. If the Department of Education should choose to modify or change the performance objectives for the in-car phase of driver education, then the test would need additional de­ velopment. If the test is put to use, then consideration must be given to the efficient training of people to use the test. Based upon the results of the study, the follow­ ing recommendations are m a d e . 1. It is recommended that the Department of Edu­ cation use this test procedure to measure the attainment of in-car performance objectives for successful completion of driver education courses. 2. It is recommended that the Department of Education use this test procedure to measure program effectiveness. 3. It is also recommended that this test procedure be used to measure the effectiveness of methods and materials and delivery formats of various driver education programs. The above recommendations should seriously be con­ sidered as a means of addressing the issues of accounta­ bility, cost effectiveness and teacher merit. 4. It is recommended that the Department of Edu­ cation use this test procedure as the criteria for evaluating competency-based driver edu­ cation programs and the students' attainment of the in-car performance objectives for com­ petency-based programs. In addition to the test being a valid criterion for pre and post evaluation of student performance, the test, with the use of well-designed feedback, could be used as a teaching aid. 5. It is recommended that the Department of Edu­ cation not permit this test procedure to be used to evaluate student performance or program effectiveness, without ensuring that the raters have been adequately trained to develop a route or use the instrument. If this test is used by untrained persons or by persons trained under conditions other than those set forth 84 in this study, the test cannot be considered dependable. Recommendations for Further Research Due to the results of the study and its potential for widespread application, the following recommendations for further research are made: 1. It is recommended that the research be repli­ cated with a population from different programs, different geographical locations and from dif­ ferent socioeconomic backgrounds. 2. It is recommended that a study be undertaken to formalize the training program administered in this study for raters. It is also recommended that the formalized training program be pilot tested before final adoption. 3. It is recommended that an off-street testing procedure be developed to accommodate the per­ formance objectives considered too hazardous or occurring too infrequently to be measured on the street. These types of objectives were not measured in this study. 4. It is also recommended that a time series study be conducted to determine the instrument's po­ tential for predictive validity in regards to predicting what type of accident or violation a subject might experience at some future point in time. 85 5. It is recommended that an effort be undertaken to formalize route-development procedures, as this is an important part of the test. Discussion During the preliminary stages of the study, it be­ came apparent that there was a need for an objectivereferenced test to evaluate the effectiveness of driver education. As development efforts proceeded, the need for realistic and clearly defined performance objectives became apparent. It was obvious that the initial effort put into the development of these objectives would affect the quality of an objective-referenced test. It is not only crucial that the objectives be stated in clearly observable be­ havioral terms, but also that they provide for the conditions under which the objective will be taught and the behavior observed. It is also important to specify what degree of attainment is satisfactory. During the route development phase of the study, it became apparent that driver education teachers and evalu­ ators need to know how to effectively develop a standardized route that will yield the opportunity to evaluate and record student driving performance. In order for the route to yield the opportunity for reliable evaluation, the develop­ ment and coordination of driving tasks, performance objec­ tives and traffic situations must be done by on-sight observation and verification by the developer, rather than 86 an armchair concensus by so-called experts. During the training program for raters, a combin­ ation of lectures, audio visuals and field exercises were used. The slides of the route and expected driver behaviors were photographed from the front seat passenger side of the vehicle. This photographic perspective allowed the situ­ ation to be displayed from the front seat rater's perspec­ tive. This technique appeared to be effective. If more than one training route or testing route is developed in the future, it is recommended that programmed training materials, incorporating the use of detailed sketches of the route, be used rather than a slide program. This approach would probably be as effective and would definitely be less ex­ pensive. It is the writer's opinion that the sooner the trainees are introduced to the route and the more practice rating they are exposed to, the more effective the training will be. By using a comparison of percentage of agreement, the raters' learning curve or rate of agreement seemed to peak on the second day of the data collection phase of the study. The rater agreement might have peaked sooner if an additional day of training had been conducted or if prac­ tice rating had started sooner. Based on this experience, there is a danger in exposing the rating form too soon or to persons who have received no training. The untrained person is likely to perceive the rating form as a recipe or checklist, capable of being used by anyone who has taught 87 driver education. During the data collection phase of the study, the raters were asked to write a brief narrative of the driver's performance. Although these data were not used in the analysis, they did provide a qualitative explanation of the recorded driver performance. This technique appeared to convince the raters to make definitive marginal notes about the driver's performance on the rating form. This tech­ nique would be very useful during feedback sessions with a student. Although feedback was not provided to subjects during the study, it would be a necessity if the instrument were used as a teaching aid. Now that a valid objective-referenced driver perfor­ mance test exists, there are implications for further de­ velopment, research and change for driver education in Michigan. This study has implications for curriculum change, diagnostic and learning effects, teaching methods versus student performance, route parameters and efficiency and psychological functions relating to judgments and de­ cisions related to operating a vehicle. Now that a valid instrument for measuring program effectiveness in Michigan exists, it is time to affect program changes in Michigan driver education. APPENDICES APPENDIX A DIRECTIONS FOR MDE ROAD TEST 88 APPENDIX A Directions for MDE Road Test 1. Turn left and proceed to the end of the street; then turn right. 2. Proceed to the traffic light and turn left. 3. Proceed to the third traffic light and turn left. 4. (Frandor sign) Proceed to next light and turn right. 5. (Mister D sign) Turn left at the second traffic light (Student should remain in lane 2). 6 . (Past Howard St. light) Turn right at the next traffic light. 7. Proceed to the third street on the left and turn left; then proceed to the end of the street and turn left. 8 . Proceed to the end of the street and turn right turn, tell student to be in lane 3). 9. (after (After the Grand River fork - student in lane 3) Proceed to the third traffic light and turn left. 10. Turn right at the second traffic light; (after completing turn) Proceed to the second light and turn left (Student should be in lane 2 ). 11. Continue to the second traffic light and turn right. 12. Proceed to the second traffic light and turn left; (after turn) proceed to the next light and turn left (from second lane). 13. Proceed to the first street after the traffic light and turn right; (after the turn) continue to the third street on the left and turn left. 14. Proceed to the first street and turn right; continue to the second traffic light and turn left (lane 2 ). 15. Proceed ahead and enter the expressway East 496. 89 16. Exit at the East Lansing/Flint exit; continue to the first traffic light and turn right (take the East Lansing turn of f ) . 17. Proceed straight ahead. 18. (After crossing the bridge) Turn right at the first street on the right; turn right at the next street. 19. Turn left at the second street and continue 20. (After crossing Larkspur) Turn right at the next street; proceed ahead and return to the parking lot on the right. ahead. APPENDIX B MICHIGAN DRIVER EDUCATION EVALUATION PROJECT DRIVER PERFORMANCE RATING FORM 90 APPENDIX B MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r i v e r P e r f o r m a n c e R a t i n g F orm VEHICLE FAMILIARIZATION Program S u b je c t R a te r Date Run No. Spots P ark in g Lot 1 .1 A. I d e n tif y In fo rm atio n Gauges a. A lte r n a to r L ig h t (Gauge) YES NO b. Brake System Warning L ig h t YES NO c. Fuel Gauge YES NO d. L e ft and R ig h t Turn L ig h t YES NO e. Odometer YES NO f. O il-P r e s s u r e Warning L ig h t (Gauge) YES NO g. S e a t R e s tr a in t L ig h t YES NO h. Speedom eter YES NO i. Tem perature I n d ic a to r L ig h t (Gauge) YES NO B. S t a r t i n g and C ontrol Devices a. A c c e le ra to r YES NO b. F ootbrake YES NO c. Gear S h i f t S e le c to r YES NO d. I g n itio n and S t a r t e r Switch YES NO e. Park Brake YES NO f. S te e rin g Wheel YES NO 1.1 91 1.1 C. S a fe ty D evices a. Door Locks YES NO b. Emergency F la s h e r C ontrol YES NO c. Head T e s t r a i n ts YES NO d. H ea d lig h t Beam Switch and I n d ic a to r YES NO e. H eater and D e fro s te r YES NO f. Horn YES NO g. L ig h t Sw itch YES NO h. Rearview and Sideview M irro rs YES NO i. S e a tb e lt R e s tr a in t System YES NO j. S u n v iso r YES NO k. W indshield Wiper and Washer YES NO E n te r V eh icle (Checks f o r t r a f f i c as s i t u a t i o n r e q u ir e s ) YES NO b. P laces Key in I g n itio n YES NO c. Locks a l l Doors YES NO d. A d ju sts S e a t to S u ita b le P o s itio n YES NO e. A d ju sts Head R e s tr a in t YES NO f. A d ju sts M irro rs YES NO g. F asten s S a fe ty R e s tra in in g Devices YES NO h. Makes Sure Park Brake i s ON YES NO 1.3 P r e - I g n itio n C ontrol Tasks a. 1.4 S ta r tin g th e Engine a. P re sse s A c c e le ra to r and R eleases YES NO b. D epresses Foot Brake YES NO c. Puts Gear S e le c to r in PARK o r NEUTRAL YES NO d. Turns Key to START and R eleases when Engine S t a r t s YES NO 92 1.5 P u ttin g th e Car in Motion a. D epresses Foot Brake YES NO b. S e le c ts P ro p er Gear YES NO c. R eleases Park Brake YES NO d. Checks M irro rs YES NO e. Uses P ro p er S ignal YES NO f. Checks B lin d Spot YES NO g. R eleases Foot Brake YES NO h. G rad u ally A c c e le ra te s in to P roper Lane YES NO 1.7 S topping th e V ehicle a. Checks M irro rs YES NO b. P o s itio n s Car A p p ro p ria te ly YES NO c. R eleases A c c e le ra to r YES NO d. Brakes to Smooth Stop YES NO 1.8 S ecuring V eh icle a. S h if ts to PARK g e a r YES NO b. S ets Park Brake On YES NO c. Turns O ff I g n itio n YES NO d. Removes Key YES NO MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r iv e r P erfo rm an ce R a tin g Form Subject_ Rater " Date ” Run No." Program P e rfo rm a n c e on LOPE I Specific Performance Objective Test S ite Spots Harrison Road Northbound approaching Kalamazoo 1,6 U..................... Places hand on lower h alf of Hands on upper h a lf of steering wheel, stee rs with one hand, wheel, maintains proper lane does not maintain proper lane position, adjust speed to con­ p o sitio n , does not adjust speed d itio n s, and searches system­ to conditions, does not system­ a tic a lly . a tic a lly search. SPOTS 5.1 5.2 4.2 Harrison Road Northbound to Michigan Avenue U Does not search a ll d ire ctio n s. Searches a ll d ire c tio n s, adjusts Fails to adjust speed to con­ speed to conditions, maintains d itio n s, f a ils to maintain proper lane p o sitio n , observes lane p o sitio n , f a ils to ob­ tr a f f ic sig n als. serve tr a f f ic sig n als. Speed Control Search D ir e c tio n C o n tro l U S U S U S U S U S U S U S SPOTS Harrison Road turning le f t onto Michigan Avenue U _ 4.4A II Does not reduce speed, signal Reduces speed and signals l e f t , 4.4B 12 l e f t , and check tr a f f ic in checks tr a f f ic in a ll direc­ a ll d ire ctio n s; positions car tio n s , positions car close to too fa r to the rig h t, f a ils to center lin e , yields to tr a f f ic y ield to oncoming tr a f f ic and and pedestrians, uses hand pedestrians; palms o r shuffles over hand when turning and wheel when turning or recover­ recovering, turns into lane #1 ing, turns into rig h t lane, or #2, adjusts speed to match f a ils to adjust speed to flow. flow. 2.1 DYP-DYV Figure 1 Michigan Driver Education Evaluation Project Driver Performance Rating Form Subject Rater Date Run No. Program Performance On Speed Control Direction Control LOPE II Specific Performance Objective Test Site Search Spots 3.1 Michigan Avenue westbound _________ U_________ Does not check mirrors, fails to signal right, does not check blind spot, does not change to right lane, changes lanes abruptly causing traffic to slow or swerve, does not adjust lane position or speed, fails to cancel direc­ tional signal. u u u U U U Spots 3.3A Checks mirror, signals right, checks blind spot, changes to right lane, blends smoothly with traffic, adjusts lane position and speed, cancels direc­ tional signal. Michigan Avenue and right turn onto Homer U ______ Fails to check left and Checks traffic es­ rear traffic, fails to pecially left and rear, signal right, fails to signals right, reduces reduce speed, starts speed, turns into lane turn early, turns into #3, recovers using hand lane #1 or 2, recovers over hand, adjusts by palming or shuffling speed to match flow. wheel, does not adjust speed to flow. MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r iv e r P erfo rm an ce R a tin g Form Subject, Rater ~ Date ^ Run No." Program P e rfo rm a n c e On LOPE III Specific Performance Objective Test S ite Spots 3.3C Homer S tree t and l e f t on Grand River U Does not search in a ll dlrecSearches a ll d ire ctio n s, positio n s, f a ils to position car tions car 1n lane #2, signals l e f t , reduces speed, keeps wheels in lane #2, f a ils to signal l e f t , f a lls to reduce speed or stra ig h t when stopped, turns into lane #2, #3, or #4, recovers keep wheels stra ig h t when stopped, turns into lane #1, using hand over hand, adjusts speed to flow. recovers by palming o r shuf­ flin g wheel, f a ils to adjust speed to flow. Spots 3.3B East Grand River and rig h t onto Foster U _________ Falls to search In a ll direcSearches in a ll d ire c tio n s, potio n s, f a lls to position car in sitlo n s car 1n lane #4, gives lane #4, f a lls to signal or rig h t sig n al, reduces speed, signals l e f t , does not reduce checks m irrors, s ta r ts turn when speed, f a ils to check m irror, front wheels are opposite point s ta r ts turn too soon causing where curb begins to curve, turns rig h t re ar t i r e to s trik e curb Into lane #1, recovers using re­ or too la te causing vehicle to versed hand over hand, adjusts enter the lane o f oncoming speed to flow. t r a f f i c , does not recover by using hand over hand steerin g . Search Speed Control Direction Control U S U S 1---- MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r iv e r P erfo rm an ce R a tin g Form Subject_ Rater Date “ Run No.~ Program P e rfo rm a n c e On LOPE IV Specific Performance Objective Test S ite Spots 5.1 5.2 Spots 5.1 5.2 5.3 Spots 5.1 5.2 5.3 Search Foster S treet northbound crossing Woodruff U Does not search in a ll direcSearches system atically in all tio n s, f a ils to reduce speed d ire c tio n s, reduces speed as as approaching the in tersec­ approaching in terse c tio n , main­ tio n , f a lls to maintain lane tain s lane position. position. Speed Control D ir e c tio n C o n tro l U S U T S U S Foster S tree t turning l e f t onto Hopkins U Does not search in a ll d irectio n s, f a ils to signal l e f t , f a ils to maintain proper speed and lane p o sitio n , turns too f a s t , turns too soon, or does not maintain control. Crossing Hayford U____________ Does not search a ll d lrectio n s, accelerates or main­ tain s speed, searches only a f te r entering in te rse c tio n , reduces speed only a fte r entering in terse c tio n , f a ils to stay in own lane. Searches system atically in a ll d ire c tio n s, signals l e f t , re­ duces speed and maintains lane p o sitio n , accelerates smoothly, begins turn ju s t before front bumper reaches center of in te r­ sectio n , maintains control while recovering to proper lane position. U Searches entering speed as sectio n , a ll directions before in te rse c tio n , reduces approaching in te r­ stays in own lane. r MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r iv e r P erfo rm an ce R a tin g Form Subject_ Rater ’ Date ~ Run No.~ Program P e rfo rm a n c e On Search LOPE IV (2) Specific Performance Objective Test S ite Spots 5.1 5.2 5.3 Crossing Magnolia U....................................... ................... Does not search a ll directio n s, accelerates o r main­ tain s speed, searches only a f te r entering the in te r­ sectio n , reduces speed only a f te r entering In te rsectio n , f a ils to stay 1n own lane. Searches a ll directions before entering in te rse c tio n , reduces speed as approaching In te rsectio n , stays in own lane. Spots 5.1 5.2 5.3 Crossing North Falrvlew U _________ Does not search a ll directio n s, accelerates or main­ tain s speed, searches only a f te r entering the In te r­ sectio n , reduces speed only a f te r entering in te rsectio n , f a ils to stay in own lane. Searches a ll directions before entering in te rse c tio n , reduces speed as approaching in te rse c tio n , stays in own lane. T Spots 5.1 5.2 5.3 Left turn onto Wood S treet U________ Does not signal l e f t , f a ils to search rear and continuously rig h t and l e f t , stops where v is ib ility is poor, accelerates je r k ily , turns too soon or too la te , does not use hand over hand, f a ils to accelerate to fl ow. Signals l e f t , searches re a r, searches continuously rig h t and l e f t , stops in position to see tr a f f ic rig h t and l e f t , gradually accelerates and s ta r ts turn ju s t before reach­ ing center of in tersectio n , uses hand over hand, acceler­ ates to flow. T Speed Control U T“ U T“ U U T" U U r D ir e c tio n C o n tro l MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D riv e r P erfo rm an ce R a tin g Form Subject_ Rater ~ Date ~ Run No." Program_ LOPE V Specific Performance Objective Test S ite Spots 3.3A Wood S tree t turning rig h t onto Grand River U Does not check tr a f f ic Checks tr a f f ic thoroughly, posithoroughly, does not position tions car to rig h t, gives rig h t car to rig h t side o f lane, sig n al, stops with wheels f a ils to signal rig h t, f a ils s tr a ig h t, checks m irrors, s ta r ts to stop and keep wheels to turn when front wheels are s tra ig h t, does not check mir­ opposite point where curb begins ro rs , s ta r ts turn too soon or to curve, turns Into lane #4, too la t e , does not turn into recovers using hand over hand, lane #4, does not use hand over adjusts speed to flow. hand to recover, f a lls to ad­ ju s t speed to tr a f f ic flow. Performance On Search Speed Control Direction Control MICHIGAN DRIVER EDUCATION EVALUATION PROJECT Driver Performance Rating Form Subject Rater “ Date “ Run No.- Program Performance On LOPE VI Specific Performance Objective Test S ite Spots 3.1 5.3 North Cedar Street (multiple lane change to right) U Checks mirrors, gives right s)gDoes not check mirrors, f a lls nal, checks blind spot, moves to signal righ t, does not check Into lane #2 and adjusts speed blind spot, changes to far to flow. right lane in one motion, In­ terferes with tr a f f ic , does not remain In own lane and accelerate to flow, fa ils to cancel sign al. Spots 3.1 5.3 North Cedar Street U________ Does not check mirrors, f a lls to signal righ t, does not check blind spot, interferes with tr a f f ic , does not move Into lane #3 and adjust speed to flow, does not cancel turn sign al. Spots 3.1 5.3 East Ottawa (multiple lane change U Does not check mirrors, f a lls to signal l e f t , f a lls to check blind spot, interferes with tr a f f ic , does not move Into lane #3, does not adjust speed and d irection , f a ils to cancel turn sign als. Repeat to lane #2 Search Speed Control U r Checks mirrors, gives right turn sign al, checks blind spot, moves into lane #3, adjusts speed and position, cancels turn sign al. U u r r le f t ) S Checks mirrors, signals l e f t , checks blind spot, moves Into lane #3, does not interfere with t r a f f ic , adjusts position and speed, cancels turn sign als. Repeat to lane #2 U u T” u T" Repeat to lane #1 Direction Control Repeat to lane #1 u TU r MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D riv e r P erfo rm an ce R a tin g Form Subject______________ Rater ______________ Date ______________ Run No.______________ Program______________ Performance On LOPE VIT Specific Performance Objective Test S ite Townsend crossing Washtenaw U Does not stop before entering in tersec tio n , does not search continuously, f a ils to yield to p edestrians/vehicles, en­ croaches on other lane when crossing. Spots 4.1 4.2b 4 .2 f 4.4A.1)1 4.4A.II 4.4B.I)1 4.4B.I1 4.3a Townsend crossing W. Kalamazoo S treet U S Stops before entering in te r­ Does not stop before entering sectio n , searches continuously, in te rse c tio n , does not search yields to pedestrians/vehicles, continuously, f a ils to yield maintains lane position while to p ed estrians/vehicles, en­ croaches on other lane when crossing. crossing. Spots 4.1 4.2b 4 .2 f 4.4A.I)1 4.4A.II 4.4B.I)1 4.4B.II 4.3a Townsend turning l e f t onto W. Lenawee S treet U S Stops before entering in te r­ Does not stop before entering sectio n , searches continuously, In te rsectio n , does not search yields to pedestrians/vehicles, continuously, f a ils to yield to p edestrians/vehicles, en­ turns into correct lane. croaches on or turns into on­ coming lane. S Stops before entering in te r­ sectio n , searches continuously, yields to pedestrians/vehicles, maintains lane position while crossing. U S U S Direction Control U S DYP-DYV 100 Spots 4.1 4.2b 4 .2 f 4.4A.1)1 4.4A.II 4.4B.I)1 4.4B .II 4.3a Speed Control Search U S U S U S DYP-DYV U S U S U DYP-DYV S MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D r iv e r P erfo rm an ce R a tin g Form Subject_ Rater ” Date ~ Run No.- Program P erfo rm an ce On LOPE VIII Specific Performance Objective Test S ite Spots 4.5 5.1 5.2 5.3 E. Main S treet to 496 East U F ails to use acceleration 1ane, does not signal l e f t , f a ils to check tr a f f ic thoroughly in­ cluding mirrors and blind spot, does not accelerate to match flow, merges across f i r s t lane, f a ils to center car in lane and adju st to flow quickly, f a ils to cancel sig n al. Spots 4.6 5.1 5.2 5.3 496 East approaching East Lansing/Flint Exit U Fails to position in fa r rig h t Positions car in fa r rig h t lane, f a ils to signal rig h t, lane, gives rig h t sig n al, does not check tr a f f ic checks tr a f f ic thoroughly in­ thoroughly, f a ils to check cluding mirror and blind spot, m irror, f a ils to check blind enters deceleration lane and slows to e x it speed. spot, f a ils to enter deceler­ ation lane e a rly , does not ad ju st to e x it speed. Search U Enters acceleration lane, signals l e f t , checks tr a f f ic thoroughly including mirror and blind sp o t, accelerates to flow, merges into nearest lane, centers car in lane and adjust speed to flow im­ mediately, cancels turn signal, r Speed Control U r Direction Control MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D riv e r Perform ance R ating Form Subject_ Rater ” Date ” Run No.” Program Perform ance On LOPE IX S p ecific Performance Objective Test S ite Spots 4.4A II 4.4B II 5.1 5.2 5.1 5.2 5.3 Spots 4.4A II 4.4B II 5.1 5.2 Spots 4.4A .I)2 4.4B .I)2 5.1 5.2 5.3 Daisy eastbound approaching Larkspur Drive S U____________________________________ Adjusts speed for conditions, F ails to adjust speed to consearches continuously In a ll d itio n s , f a il s to search d ir ec tio n s, maintains lane continuously in a ll direc­ p o sitio n , y ield s to pedestrians/ tio n s , encroaches on on­ v e h icle s. coming lane, f a ll s to y ield to p e d e str ia n s/tr a ffic. Crossing Larkspur U ______ _______ F ails to reduce speed, f a il s to search continuously l e f t / r ig h t, does not y ield to p ed estrian s/veh icles, ac­ celerates abruptly. r S U S DYP-DYV Daisy eastbound approaching Narcissus U Adjusts speed for conditions, F alls to adjust speed to cond itio n s , f a ll s to search con­ searches continuously In a ll d ir ec tio n s, maintains lane po­ tinuously In a ll d ir ec tio n s, encroaches on oncoming lane, s it io n , y ie ld s to pedestrians/ f a ll s to y ie ld to pedestrians/ v e h icles. v e h icles. Daisy turning righ t on Narcissus U_________ Does not signal r ig h t, maintains speed or a ccelera tes, f a il s to search continuously l e f t /r ig h t , y ield s to pedes­ tr ia n s/v e h ic le s. DYP-DYV U Reduces speed, searches continuously l e f t /r ig h t , y ield s to p ed estrian s/veh icles, ac­ celerates smoothly. D ire c tio n C ontrol S U DYP-DYV Signals r ig h t, reduces speed, searches continuously l e f t / rig h t, y ield s to pedestrians/ v e h icles. DYP-DYV S 102 Spots 4.4A.I12 4.4B .I)2 Speed Control Search MICHIGAN DRIVER EDUCATION EVALUATION PROJECT D riv e r P erfo rm a n c e R a tin g Form Subject_ Rater Date " Run No." Program P erfo rm an ce On Compliance with Michigan Vehicle Code when Encountering:_________ LOPE X Specific Performance Objective Test S ite Spots 4.1 4.2 4.3 General Observation Over Route Signs Signs a. Warning Y es No b. Regulatory Y es No c. Service and Guide Yes No T raffic Signals T raffic Signals f. Yes T raffic Control Signals No Pavement Markings Pavement Markings a. Center Lines Y es No___ b. Crosswalk Lines Y es No___ d. No Passing Zones/Lines Y es No___ e. Solid Yellow Lines Yes ___ No___ f. Turn Lanes/Lines Yes No Comments APPENDIX C OUTLINE OF RATER TRAINING PROGRAM 104 APPENDIX C Outline of Rater Training Program Day 1: 8:15 a.m. I. 8:30 a.m. II. Thursday, July 28 Reliability A. Definition of reliability B. Necessity of identifying the same behaviors Observation Observation of driver behavior rather than the environment 1. Head 2. Eyes 3. Feet 4. Mirror 5. Hands 8:45 a.m. B. Observation of driver behavior in relationship to the environment 1. Lane position 2. Spacial relationships 3. Traffic 4. Pavement markings and signals 9:15 a.m. C. Intensive observation of driver behaviors General observation of driver be­ haviors Behaviors directly observable 1. Search - eye and head movement 2. Speed control - accelerating, de­ celerating, braking, kinesthetic value 3. Direction control - hand movement, vehicle alignment, spacial re­ lationships, lane position, signals D. E. 9:45 a.m. III. Basic driving functions A. Search B. Speed control C. Direction control D. Timing (early or late) - an element that affects all functions 105 1. 2. 3. 4. 5. Increases or decreases hazards Affects smooth or abrupt steering or acceleration Affects search (ability to gather information) Affects signals (turns, lane changes) Affects crossing, joining and leaving traffic Break 10:30 a. 10:45 a. . 11:15 a. IV. V. Lane Numbering A. Lanes numbered from left to right 1. One-way street - begins at far left side of street 2. Two-way street - begins in first lane to the right of the center line 3. Divided two-way street - begins in the first lane to the right of median or barrier B. Diagrams of lane numbering VI. Aborts A. Traffic abort - traffic mix is so dense it is impossible to maneuver. "Return to Route" B. Driver abort - missed directions, wrong turn or lane change, drives past entrance or turn. "Score unsatis­ factory " C. Rater abort - lane directions, safety reasons. "Score satisfactory" D. Coaching 11:40 a. . VII. Lunch Inferences A. Drawing inferences based on observed behavior B. Judgments, predictions, reasoning, decisions - must observe driver be­ havior first Practicality A. Legality - compliance B. Safety - rater responsibility C. Rater must distinguish between legal vs. safe behavior 106 1:00 p.m. VIII. Dual Raters A. Front seat rater - directions, safety, observation, dual controls B. Rear seat rater - observation, safety if necessary 1:15 p.m. Directions A. Precise, consistent B. Timing - trigger directions (triggers behavior) C. Clues - landmarks D. Driver - recognition and compliance E. Reminder - point out or gesture with and, repeat F. Changing or alteririg driver behavior 1:45 p.m. IX. X. Comments A. Marginal notes - words, phrases, ab­ breviations, symbols Examples - Lt, Ls, RL, Dyp, Dyv, 2 fast B. Narratives of driver performance 1. Dr. Robert O. Nolan, speaker Break 2:45 p.m. XI. Ride Route (2 vehicles - 3 runs) A. Rotate monitoring; rotate rater positions B. Directions C. Mirror placement D. Review directions - rater comments and input on directions Day 2: 8:15 a.m. Friday, July 29 I. Practical Work on Route A. Front seat rater - gives directions, observes, list observed behaviors B. Use mirrors C. Rear seat raters - observe and record D. Stop for discussion after each LOPE Break 10:30 a.m. II. Discussion A. Directions B. Trigger directions C. Behavior - overt and inferred D. Reasons for differences 107 11:00 a.m. III. 11:30 a.m. IV. Definitions A. Dynamic Traffic Environment - ve­ hicles, pedestrians, road surfaces, weather, vegetation, movement of vehicle, traffic controls B. Driver Interaction - with dynamic traffic environment C. Combinations of Behaviors - example: turns often involve combinations of behaviors D. Overt Behavior - directly observable or perceived through sensory proces­ ses (kinesthetic value) Type of Observations A. During intensive observation B. During general observation Lunch 1:00 p.m. V. Audio Visuals A. Transparencies for each LOPE B. Transparencies for instrument (rating form) C. Slides of route D. Film - "IPDE" and "Separate and Com­ promise" (introduction portion only) Exercise: rate model driver as satisfactory or unsatisfactory for each - search, speed control, direc­ tion control E. Slides of route and instrument simul­ taneously (use split screen) Break 2:45 p.m. VI. Practical Work on Route (2 vehicles - 6 raters) A. Adult licensed drivers (volunteers) B. Raters - give directions, complete rating form, (drive portion only), rotate positions C. Discussion after each complete drive Day 3: 8:30 a.m. I. Saturday, July 30 Practice Runs (1 vehicles - 6 raters) A. High school subjects 1. Six subjects in the a.m. 2. Six subjects in the p.m. B. Complete entire rating form 108 C. D. E. F. Day 4: 8:15 a.m I. 9:30 a.m II. Rater pairs (rotate front and rear seat positions after each run) Subjects (rotate cars after each run) Write narrative for the last subject run in the a.m. and the p.m. Monitor rating form completion after each subject Monday, August 1 Review A. General LOPE 10 B. First recall C. Independent recording D. Rolling stops, running light, crosswalks (recording) Objectivity - speaker, Mr. Fred Vanosdall Break 10:45 a.m . III. Review A. Vehicle familiarization Lunch 1:00 p.m IV. Practical Work on Route (3 vehicles - 6 raters) A. High school subjects (6 subjects in the p.m.) B. Complete entire rating form C. Rater pairs (rotate positions after each run) D. Subjects (rotate cars after each run) E. Write narrative (for the last subject) F. Oral review of ratings by each pair of raters BIBLIOGRAPHY 109 BIBLIOGRAPHY Borg, Walter R. and Gall, Meredith D. Educational Research, An Introduction. New York: David McKay Company, Inc. 2 editions, 1971. Educational Research^ An Introduction. David McKay Company, Inc., 1974, P« 142. New York: Boyd, Robert D. and Devault, M. Vere. "The Observation and Recording of Behavior," Review of Educational Re­ search, 36 (5) (1966), 529-551. Cole, William. "The Case for Performance-Based Driver and Traffic Safety Education." Journal of Traffic Safety Education, April 1976, Vol. XXIII, No. 3. Cronbach, L. J., Gleser, G. C., Nanda, H . , and Rajartnam, N. The Dependability of Behavioral Measurements: Generalizability of Score's and Profiles. New York: Wiley, 1972. Denton, G. G. "The Effect of Speed Change on Drivers' Judgment." RRRL Report LR 9 7 , pp. 6. Crawthorne, Berks (Gt. Brit.): Road Research Laboratory, 1967. Ebel, R. L. "Estimation of the Reliability Ratings." Psychometrika, 1951, 16, 407-424. Edwards, D. S. and Hahn, C. P. Filmed Behaviors as a Cri­ teria for Safe Driving. Report No. AIR-C80-2/70-FR, prepared for National Institutes of Health, 1970. Fine, Jerome L . , Malfetti, James L . , and Shoben, Edward J., Jr. "The Development of a Criterion for Driver Behavior," pp. 43. New York: Columbia University, 1965. Forbes, T. W. "The Normal Automobile Driver as a Traffic Problem." The Journal of General Psychology, 20, 1939, pp. 471-474. Street and Highway Traffic, Handbook of Applied Psychology. Editors: Fryer and Henry; Rinehart, Vol. 1, 1950, pp. 325-335. 110 Forbes, T. W . , Nolan, R. O . , Schmidt, F. L . , et. al. Driver Performance Measurement Research Final Report, Vol. 1, pp. 173. Technical Report, under contract FH-117627. Washington, D.C.: National Highway Traffic Safety Administration, 1973. Frick, T. and Semmel, M. I. "Observational Records: Ob­ server Agreement and Reliabilities." Paper presen­ ted at the 1974 meeting of the American Educational Research Association, Chicago, April 16, 1974. Glass, Gene V. and Stanley, Julian C. Statistical Methods in Education and Psychology. Englewood Cliffs, New Jersey: Prentice-Hall, 1970. Goldstein, Leon G . , "Rejoinder to Peck and Jones' Reply." Journal of Traffic Safety Education, Vol. XXIII, No. 1, October, 1975, pp. 15 and 17. Haggard, E. A. Intraclass Correlation and Analysis of Variance"! New York: Dryden Press, 1958. Herbert, S. D. and Attridge, C. "A Guide for Developers and Users of Observational Systems and Manuals," Ameri­ can Educational Research Journal, 1975, 12, 1-20. Hoty, C. "Test Reliability Estimated by the Analysis of Variance." Psychometrika, 1941, 6, 153-160. Jones, Margaret Hubbard. Measuring the Outcomes of Driver Training: The USC On-Road Performance Test. Presented at the Transportation Research Board, Janu­ ary 25, 1977. Kennedy, J. L. Driver Education and Training, Project, pp. 92. Report No. PH180-472, under contract FH-11-6561. Washington, D.C.: National Highway Safety Bureau, 1968. Lybrand, William A., Carlson, Glenn H., Cleary, Patricia A., and Bower, Boyd H. A Study on Evaluation of Driver Education, pp. 210. Report National Highway Safety Bureau, 1968. McGlade, Frances Stanley. An Evaluation of the Road Test Phase of the Driver Licensing Examination of the Various States: An Investigation of Current Road Tests and Testing Procedures, and the Development of a Valid and Reliable Road Test Based on Derived Implications. Dissertation, pp. 250. New York: New York University, 1960. Ill McKnight, J . , and Hunt, A. G. Driver Education Task Analysis, Vol. I, Nov., 1970. Driver Education Task Analysis, Volumes I, III and IV, 1970 and 1971. Medley, D. M. and Mitzel, H. "Application of Analysis of Variance to the Estimated of the Reliability of Ob­ servations of Teachers' Classroom Behavior." Journal of Experimental Education, 1958, 27, 23-35. . "Measuring Classroom Behavior by Systematic Ob­ servation. In N. L. Gage (Ed.) Handbook of Research on Teaching. Chicago: Rand-McNally, 1963, pp. 247328. Michigan Department of Education. Michigan's Driver Education Evaluation Project. Lansing: The Department, 1978T New York University. "Driver Education and Training— Plans for Evaluating the Effectiveness of Programs," pp. 95. Report No. PH 180-473, under contract FH-116560. Washington, D.C.: National Highway Safety Bureau, 1968. Nolan, R. O . , Vanosdall, F. E., and Smith, D. L., et. al. Driver Performance Research, Final Report, Vol. II, Guide for Training Observer/Raters in the Driver Performance Measurement Procedure. Prepared for National Highway Traffic Safety Administration, Contract FH-11-7627, Michigan State University, Department of Psychology, Highway Traffic Safety Center, February, 19 73, p. vi. Quenault, S. W. "Development of the Method of Systematic Observation of Driver Behavior," pp. 50. RRL Report LR 213. Crawthorne, Berks. (Gt. Brit.): Road Research Laboratory, 196 8C. Quensel, Warren P. "An In-Car Evaluation Instrument." Journal of Traffic Safety Education, January 1976, Vol. XXIII, No. 2, pp. 15-16. ________ . "How to Measure Program Effectiveness," Journal of Traffic Safety Education, April 1976, Vol. XXIII, No. 3, pp. 6. Rowley, Glenn L . , American Educational Research Journal, Winter 1976, Vol. 13, No. 1, pp. 51-59. 112 Teal, Gilbert E . , Truesdale, Sheridan L., and Fabrijio, Ralph A. Driver Education and Training, pp. 211. Report No. B2D68-575, Dunlap and Associates Inc., under contract FH-11-6559. Washington, D.C.: National Highway Safety Bureau, May 1968. Uhlaner, J. E. Development of Criteria for Safe Motor-Vehicle Operation. Highway Research Board Bulletin 6 0 , pp. 36-43. Washington, D .C .: Highway Research Board, 1952. Vanosdall, F. E., et. al. Michigan Road Test Evaluation Study, Final Report, VolJ III. Prepared for National Highway Traffic Safety Administration, under contract MDL-75-002B, Michigan State Univer­ sity, Department of Psychology, Highway Traffic Safety Center, November, 1977.