MWB CHIGA I Llama? IIIIIIII Michigan State University 12 93 00572 8401 This is to certify that the dissertation entitled THE DEVELOPMENT AND VALIDATION OF A COMPUTERIZED DIAGNOSTIC TEST FOR THE PREDICTION OF SUCCESS IN THE FIRSTj-YEAR MUSIC THEORY SEQUENCE BY INCOMING FRESHMEN AT MICHIGAN STATE UNIVERSITY presented by James Peter Colman has been accepted towards fulfillment of the requirements for PhD Music Education degree in 4M iflk Major professor Albert LeBlanc Date February 15, 1990 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. L —_ 4— DATE DUE DATE DUE DATE DUE I m I I MSU Is An Affirmative Action/Equal Opportunity Institution -~—‘__ THE DEVELOPMENT AND VALIDATION OF A COMPUTERIZED DIAGNOSTIC TEST FOR THE PREDICTION OF SUCCESS IN THE FIRST- YEAR MUSIC THEORY SEQUENCE BY INCOMING FRESHMEN AT MICHIGAN STATE UNIVERSITY By James Peter Colman A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY School of Music 1990 It2 ABSTRACT THE DEVELOPMENT AND VALIDATION OF A COMPUTERIZED DIAGNOSTIC TEST FOR THE PREDICTION OF SUCCESS IN THE FIRST- YEAR MUSIC THEORY SEQUENCE BY INCOMING FRESHNIEN AT MICHIGAN STATE UNIVERSITY By James P. Colman Freshmen music students enrolling at many colleges and universities in the United States frequently face a one year music theory course requirement. First-year music theory courses seek to provide all freshmen students with a common theory foundation for the rest of their music training. Some assumptions must be made concerning the present knowledge of incoming students. These assumptions are seldom accurate for all students. The goal of this study was to create a computerized diagnostic test capable of measuring the current music theory achievement of incoming students so that statistical data could replace the assumptions made by college theory professors. Secondly, this study sought to determine whether the newly created test could function as a predictive variable in the evaluation of future success in music theory at Michigan State University. The test included 90 questions designed from objectives covering all content areas of the first term of the music theory sequence at Michigan State University. The test was implemented on the Macintosh computer using the HyperCard software from Apple Computer, Inc. Each test item included from two to four multiple choice answers. The subjects selected an answer by clicking with the computer's input device (mouse) on the chosen answer. The computer handled all aspects of the test including administration, data storage, test result printouts, and statistical analysis. The test was administered to 59 freshmen subjects at the beginning of the Fall term in 1988. The results of the test were correlated with three grade criteria over a period of three college terms: theory lab grades (0-100%), final percentage grades (O-100%), and grade points (0.0-4.0). The test was also correlated with a three-term average computed for all subjects who completed the entire first-year theory sequence. The strongest correlation was found between the test and final grade points. This was surprising since the grade point scale was the least sensitive of the three criteria. The study concluded that the first iteration of the music theory test was sufficiently successful to warrant further development for future use as a diagnostic/ predictive tool. Copyright by James Peter Colman 1990 Dedicated to my wife who has been my constant support and encouragement ACKNOWLEDGEMENTS I wish to express my sincere appreciation to Dr. Albert LeBlanc for his input into my professional development, his willingness to give me opportunities to use my developing research skills, and the tremendous guidance he has provided throughout this endeavor. I am proud to call him my mentor. I also appreciate the substantial input and support of Dr. Charles Ruggiero, Dr. Theodore Johnson, Dr. Corliss Arnold, and the freshman theory students who participated in this study. Apple®, ImageWriter®, Macintosh®, and HyperCard® are registered trademarks of Apple Computer, Inc. SuperPaint® is a registered trademark of Silicon Beach Software. Professional Composer® is a registered trademark of Mark of the Unicorn. MacRecorder® and SoundEdit® are registered trademarks of Farillon Computing, Inc. IBM® is a registered trademark of International Business Machines Corporation. TABLE OF CONTENTS List of Tab1 es List of Fig"ms vii ix y Chapter 1 - The Problem 1 Introduction 1 Need for the Study 6 Problem Statement R Definition of Terms 9 I imitation: 11 Chapter 2 - Review of Literature 13 The Usefulness of Prediction 13 Development or Validation of Tests as Predictor: 16 The Identification of Predictive Variable: 77 The Use of Computers in Testing 9‘; Construction of a Predictive Musical Test 79 Conclusion “-34 Chapter 3 - Development of the Test as Chapter 4 - Test Administration and Results 47 Test Administration 47 Demographic Characteristics of the Sample 49 Descriptive Statistics on Test Scores Rn Test Item Difficulty and Discrimination Indicpc 51 Test Reliability ‘3 Descriptive Statistics on Term Grades 86 Correlation of the Theory Test and Class Cradm Chapter 5 - nicmmcinn Problems with the Study Other Applications ("nnrlneinne Suggestions for Further Research and Improvement Appendix A - Frequency Distribution of Test Scores Appendix B - Item Difficulties and Discriminations Appendix C - Frequency Distributions of Gradnc Appendix D - Colman Theory Test Bibliography LIST OF TABLES Number of Test Items in Each Content Area of the Colman Test Divided Into Three Types of Mastery 40 Student Responses to Questions of Previous Musical Experiences .................... 49_ Number of Test Item Discriminations Within Selected Ranges ........................ 53 Test-Retest Scores of 20 Students 55 Descriptive Statistics on Grades for Three Terms of the First-Year Music Theory- qpqlanCD r33 Correlational Statistics for Comparison of the Theory Test With Classroom Grades 61 Frequency Distribution for 59 Theory Test Scores 72 Test Item Difficulties and Discriminations Displayed as Percentages ................ 74 Frequency Distribution for Student Lab Grades in Music Theory ...................... 77 Frequency Distribution for Student Percentage Grades in Music Theory ......... 79 Frequency Distribution for Student Final Grades in Music Theory ................... 81 LIST OF FIGURES Distribution of theory test scores converted into z Scores ‘30 Correlation scattergram between Fall term lab grade percentages and number correct on the theory test. 6% Correlation scattergram between average final grades and number correct on the theory test. 64 CHAPTER I - THE PROBLEM Introduction Freshmen music students enrolling at many colleges and universities in the United States frequently face a one year music theory course requirement. The course usually consists of two semesters or three quarters. Typically, students enrolling as college music majors have relatively little specific training in music theory. Any theory knowledge they have accumulated they gained through private music lessons or performance in high school band or choir organizations. Financial cut-backs have severely limited the number of high school music theory classes. First-year music theory courses seek to provide all freshmen students with a common theory foundation for the rest of their music training. It is impossible to begin from A the very first concepts of music theory training, however; some assumptions I I I These assumptions are seldom accurate for all students. Increased risk of must be made concerning the present knowledge of incoming students. I . . . I failure results when students attempt to accomplish the requirements of a I I l‘ I first-year theory course without the assumed knowledge since they start at a disadvantage. The goal of this study was to create a diagnostic test capable of I measuring the current music theory achievement of incoming students so I that statistical data could replace the assumptions made by college theory I professors. SecOndly, this study sought to determine whether the newly I It created test could function as a predictive variable in the evaluation of future success in music theory at Michigan State University. The end objective of the study was to provide a musically specific test that would give advisors of college music majors another aid for proper advising decisions. Some background in the types of advising problems and educational treatments addressed by this study is in order. Accurate diagnosis of student abilities and deficiencies is extremely important to successful college and university counseling. Students who receive inadequate advising risk the possibility of incomplete preparation for their Chosen profession or even misdirection into a field for which they are ill-suited. Current enrollment policies of United States institutions of higher education permit completely open enrollment, that is, there are no admission requirements, other than available space, hindering a student's acceptance. The policy of open enrollment, however, brings with it the problem of providing each student the most useful education possible while dealing with . a myriad of differences in each student's background and needs. Willingham I (1974) identified two recent trends which increase the academic advising I demands placed upon higher education. First, a greater diversity of educational alternatives and incentives, I including community colleges, federal student aid, and flexible academic I I I programs, has encouraged an influx of new students, particularly minority I students, adults, and students previously discouraged from continuing I education because of academic weaknesses, resulting in a need for advising I flexibility. I Second, economic considerations of students place pressure upon institutions to provide the specific academic need of each student. The stabilization and, in some institutions, decline in student enrollment has I i I .. greatly increased competition for students. Institutional programs must accommodate the financial needs of the individual student or the student will look elsewhere. Willingham suggests four classes of treatment for satisfaction of student academic requirements and interests in answer to the demands raised by these trends. The first treatment places or assigns students to various classes based upon similar abilities or personal characteristics such as similar test scores. The educational techniques may vary among the classes but the subject matter and end objectives should be the same. A second treatment places students into an instructional sequence on the basis of their current knowledge of the subject. As with the first treatment (assignment), the knowledge of the subject matter and the end objectives are the same for each student, but the student does not invest time in material previously mastered. For example, it is possible that an incoming student might already possess the skills usually developed in the first term of a music theory course. If an accurate assessment of the student's current knowledge were possible, the student could be placed into a subsequent term of music ; theory. The third treatment possibility suggested by Willingham, selection, groups students with different ability levels into various instructional I programs with different educational content and end objectives. This II method is most frequently observed in the offering of advanced classes I I II I I designed to exceed the usual course content demands and to motivate the student to progress past the normal end objectives for that particular class. I An opposite result is possible when students are selected for placement in ‘ remedial classes. Students required to take remedial classes might only , receive a portion of the material included in the standard class. 4 Exemption, the last of Willingham's four treatments, excuses students who demonstrate substantial proficiency in a given subject area completing course requirements that emphasize the area of proficiency. Different academic programs apply exemptions differently and several workable strategies are available. The student may or may not receive credit for the exempted classwork, or may have to take another course in place of the exempted class or classes. Each of the previously discussed treatments has many valid applications. Some applications might necessitate the implementation of more than one treatment. Willingham suggested five methods of testing student abilities to determine proper treatment; proficiency testing, diagnostic testing, evaluation of personal characteristics, aptitude testing, and evaluation of grades. All five methods of evaluation described below were reviewed for this study and diagnostic testing was selected as the most appropriate for achieving the stated goals. I Proficiency testing measures competency in a given course or group of I courses. This type of measure may assess factual knowledge, problem solving abilities, or ability to make practical applications as an indication of the extent I of the student's knowledge in the tested subject area. The test must only I include material taught in the course or courses falling within the scope of II the test. Proficiency tests are most useful with placement or exemption "I treatments. I Diagnostic testing is, in many respects, similar to proficiency testing, I I but the former provides a more detailed evaluation of what the student I - knows and what the student does not know. Diagnostic tests are most beneficial when they provide part-scores which allow the test administrator 5 to make accurate assessments of current accomplishments which in turn provide the required information for proper placement of the student. Evaluation of personal characteristics is a helpful tool when used with selection or assignment treatments although it has no usefulness in the present study. Personal characteristics can include almost any trait not connected to abilities or achievements such as background characteristics, interests, cognitive styles, and attitudes. For example, students might be placed in a participatory class because they have demonstrated greater material retention when allowed to physically interact with items related to the lesson. Evaluation of personal characteristics and interests is perhaps the least useful testing method because it is difficult to produce adequate testing tools. This method is also open to criticism in the area of objective decisions concerning student placement. Aptitude testing is also helpful as an assessment measure. Aptitudes usually include any cognitive abilities not readily improved through short- term learning. Selection treatment decisions are enhanced by assessment of aptitudes related to general scholastic performance while assignment treatment decisions are enhanced by the assessment of specialized aptitudes. Finally, a student's high school record offers another assessment tool. Generally, a student's grades provide information on academic performance across a wide range of subject areas including the chosen undergraduate field. The high school record is difficult to interpret, however, because of the lack of standardization. Variance in grading scales, teacher standards, and even the level of academic competition can greatly influence grades and make an evaluation of true accomplishment very difficult. The previous discussion has examined the need for flexibility in academic guidance because of economic considerations and the influx of new fre an (S. ad sh SI 6 students; the use of assignment, placement, selection, and exemption as suggested treatments which allow for greater flexibility; and the methods of assessment useful in gathering information necessary for correct assignment of treatment types. Attention now turns to the application of this information to the present research study. Need for the Study The lack of musically specific and/ or standardized academic information available for freshmen students makes the task of advising freshmen music students difficult. On the one hand, college advisors have access to college entrance examinations such as the Scholastic Aptitude Test (SAT) or American College Test (ACT). These standardized tests allow the advisor to make generalized inferences about the advisee's abilities but usually have limited application to specific evaluation of the enrolling student's achievements or aptitudes in his or her major area. Neither the SAT nor the ACT contain sections devoted to musical concepts. The exclusion of such material is not a defect but only a limitation since these standardized tests are designed to provide general information not to I evaluate most specific content areas. The general information gathered by these tests is an inadequate basis for advising in music. I College and university advisors may use a student‘s grades along with scores from standardized tests to make advising choices, but, as the introductory discussion noted, a student's grades are an ambiguous measurement tool at best. Standardized test scores and grades must have additional support from musically Specific indicators. Academic advisors need the information produced as the result of development and administration of diagnostic tools specifically designed to enhance more generalized academic indicators. The lack of standardized measurement tools intl imp Mic adv flat the dis; var wh me 7 in the area of music placement and advising at the college level was the main impetus for this study. The development of a measurement tool specifically for use at Michigan State University could expand the knowledge available to academic advisors at that institution. Designing a test for a specific locale is not without trade-offs. Willingham suggests that "a principal advantage of the local test is the fact that it can be designed for the purpose in mind; the main disadvantage is the fact that the technical quality of locally constructed tests varies a great deal" (p. 27). This study began the process of test development which could eventually culminate in the completion of a useable measurement tool for academic advisors of freshmen music majors at all institutions with content similar to that covered in the theory sequence at Michigan State University. More information on the other applications of the study results is included in the limitations section and the conclusion section. One might raise the objection that there are a number of diagnostic tests for music theory already available. This study stands in contrast to previous test development studies because it is entirely administered and I scored by computer. I Numerous articles have been written which describe computer I applications in the roles of teacher, drill instructor, test administrator, and evaluator. The expanding number of computer uses in educational areas . may provide the solution to one of the tedious aspects of diagnostic measurement, test administration. Prior to the widespread use of computers, test administration required an added expenditure of time and energy by the instructor or added monetary expense to hire someone to supervise the test. Inadequate testing resulted as instructors or departments became unable to provide the necessary time or money. feas adn the test gat‘ P__rg des inc wl 8 A secondary goal of this study was to demonstrate the desirability and feasibility of incorporatingIthe available technology in computers into the administration of a diagnostic test and the subsequent statistical analysis of the test results. The use of computers in the administration of a diagnostic test frees the advisor of extra time expenditures by automating information gathering. Problem Statement The problem of this study was to complete the initial stages of designing a music theory test capable of diagnosing the current knowledge of incoming college freshmen at Michigan State University and of predicting whether these students would successfully complete the required freshman theory courses. Several subproblems were addressed as the study was carried out. I First, the expectations theory professors placed upon freshmen students entering music study at Michigan State University were defined. Once identified, the chosen content areas were attached to specific behavioral objectives that reflected the expectations of the theory professors. Next, the I behavioral objectives were illustrated with test items designed to elicit a correct behavioral response. The test items were then administered to a population sample with the goal of generalizing the results to other samples taken from the same population, incoming freshmen music students. Finally, the results were validated through a test-retest design and correlations were performed to check for reliability. Definition of Terms Achievement test - refers to a test designed to evaluate the current levels of knowledge and understanding of a particular content area. Achie‘ readin ofan music when recon 501111 deve prog aIlSV 319 Uni‘ Mac vari ach the COD a 0 Fax USE 9 Achievement tests are frequently used at all age levels as an indicator of readiness for promotion. Aptitude test - refers to a test designed to evaluate the potential abilities of an individual to perform certain skills. Elementary school evaluation of musical aptitudes for playing band instruments is an excellent example. Buttons - refers to designated points on the Macintosh screen which, when clicked with the mouse, moved the student to the next test item, recorded the student’s answer to the current test item, or played a digitized sound. Card - refers to one computer screen of information in the HyperCard development system. A card is very similar to a frame in traditional programmed instruction. Each card contained one test item and from two to four multiple-choice answers. The student was required to select the correct answer by clicking the mouse button. Computer Lab - refers to the Music Computer Lab established in Room 319 of the Music Practice Building on the campus of Michigan State University. At the time of test administration the lab contained four Macintosh computers, two Apple II GS computers, one IBM computer, and. various electronic keyboards. Diagnostic test - refers to a test which evaluates the current achievement in the content area being tested. Diagnostic testing also implies the discovery of weaknesses or deficiencies in concepts necessary for completion of the test. Digitized sound - refers to the capturing of actual musical sounds onto a computer disk. The process was accomplished with the MacRecorder from Farallon Computing, Inc. The MacRecorder hardware and software allow the user to record sounds through a microphone connected to the computer. The actua the it digiti evah pres deve €XC€ P”! CO] bu C0! thI C9 ac 10 actual sound was recorded on the computer disk and was played back through the internal speaker of the Macintosh computer. In the present study, digitized sounds were used to present aural examples for the students to evaluate. Graphics - refers to any non-text items used in the computer presentation of the theory test. Graphics are an integral part of the HyperCard development system. This made possible the inclusion of small musical excerpts and other non-text enhancements. Hardware - refers to all computer equipment but does not include the programs which run on the computer. HyperCard - refers to the software used to create the computer version of the test. HyperCard allowed the presentation of each test item in a format similar to frames in a programmed text. It also provided a full complement of graphic design tools and accepted the inclusion of both sound and animation. Macintosh - refers to the Apple computer used to administer the test. I The Macintosh Plus and the Macintosh SE were used. The systems included I the central processing unit, a black and white 9” monitor, a mouse, a keyboard, one or two floppy disk drives, a 20 megabyte hard disk, and an ImageWriter II printer. Mouse - refers to a Macintosh computer input device. The user controlled the screen cursor by moving the mouse . Clicking the mouse button when the cursor was located in various parts of the monitor sent commands to the computer for processing. Predictive test - refers to a test designed to indicate what may happen in the future. When a predictive test is valid it has been demonstrated that certain scores on the predictive test have a positive correlation with achievement in some criterion variable. In other words, a high score on the predic variah an ev: taken to do. Limit admi conte conte at M Mid threI sign esse e5p< also dev dict wa: Un- Stu 11 predictive test indicates the likelihood of a high score on the criterion variable. The present study involved the development of a predictive test and an evaluation of the test’ s correlation with the criterion variables which were taken from various grades of freshman students in freshmen music theory. Software - refers to the programs and files which tell computers what to do. For example, a word processing program is software. Limitations Several limitations were placed upon the development and administration of the test. Perhaps most important was the limitation on test content. Since expectations may vary greatly from school to school, the content of this test was developed in the context of music theory instruction at Michigan State University. The freshman music theory sequence at Michigan State University was quite traditional. Freshman students met three times each week for lectures covering musical concepts including key signatures, intervals, construction of major scales and the three forms of minor scales, triads, chord inversions, and modulations. There was essentially no introduction of 20th century theory methods such as those espoused by proponents of Schenker analysis or jazz studies. The students also met two days each week in a smaller aural skills lab. Here the students developed skills including a variety of exercises in sight singing, rhythmic dictation, melodic dictation, interval dictation, and chord dictation. A second limitation involved the subjects used in the study. The study was limited to freshmen music theory students enrolling at Michigan State University during the Fall term of 1988. The assumption was that the students enrolling in the Fall term of 1988 would also enroll for the second and third term of the freshmen theory course and that these students would be comparable to future students at Michigan State University. A While th revision because year. A availabl: out of t] F done to other h; theory ‘ 12 A limitation was also placed upon the subsequent revision of the test. While the test may profit from revision based upon results of this trial, revision of the test was not done as part of this study. This limitation is because of the fact that new subjects are only available once each academic year. An inherent problem with this limitation is the small size of the available sample and the great impact upon the study of students dropping out of the course. Finally, the duration of the test was limited to 50 minutes. This was done to prevent undue boredom or loss of attention in the students. On the other hand, this time period allowed adequate time for a broad range of music theory topics. for mu place.” much i (Lehm. for ma measu aChiev plaoes evalua intere mighI CHAPTER 2 - REVIEW OF LITERATURE A review of the literature relevant to this project encompasses five areas: (a) the usefulness of prediction, (b) the development or validation of tests useful in the prediction of some criterion variable, (c) the identification of predictive variables, (d) the use of computers in testing, and (e) the construction of a predictive musical test. The Usefulness of Prediction It is often difficult to distinguish between aptitude tests and achievement tests when reviewing the literature written about predictive testing. Typically, aptitude tests are designed to measure the ”innate capacity for musical learning, even though no such learning may actually have taken place.” Achievement tests, on the-other hand, are ”designed to measure how much a student has accomplished in music or in a particular phase of music” (Lehman, p. 8, 1968). Confusion arises when tests are used as predictive tools for making academic decisions or guidance suggestions. Does a predictive test measure what the student already knows thereby placing it in the realm of achievement tests or does it measure the student’s capacity for learning which places it in the realm of aptitude tests? Very likely, various predictive tests evaluate both achievement and aptitude. Lehman continued his discussion by outlining nine reasons for interest in musical testing: I 1. Identification of talent. Tests provide early detection of talent which might go unrecognized. inform untale insel apprc stude instn not c base 5de stud QXCQ dish H1115 that mu: met occ in; 1 4 2. Adaptation for individual differences. Tests give teachers the formation necessary to set challenging but attainable goals for the musically italented. 3. Educational guidance. Tests give the instructor information useful L selection of the the proper musical instrument or in selection of ppropriately diffith academic coursework. 4. Vocational guidance. Tests provide objective information for the tudent considering a musical career. 5. Discovery of learning difficulties. Test results may allow the .nstructor to detect and diagnose weaknesses. Even if these weaknesses are not correctable, the realization of their presence provides a better knowledge base for academic guidance. 6. Ability grouping. Tests may help the teacher place individual students with others of similar ability. I 7. Assignment of instruments. Test results can be used to assign students to school-owned instruments when the number of applicants exceeds the number of instruments. 8. Studies of musical talent. Tests may reveal the extent and distribution of musical talent and the magnitude of individual differences. 9. Psychological studies. Tests can aid in research particularly when musical aptitude is used as a specific variable. (Lehman, 1968, p. 9) Some of the previously listed rationales for testing are more substantial than others, but the list provides a strong foundation for further discussion of musical testing. Whybrew (1971) cited some practical reasons for measurement and evaluation in music. Musical evaluation is a frequent occurrence for teachers and students. Musicians frequently place themselves in adjudication situations where they expect objective evaluation. Music [edit turd: hing dude cone; upon adnfi have dhec thos iden depe Fhsl goal lust fach cons pun exa1 sue: hav tho hor mu per 1 5 teachers are constantly required to evaluate their students and diagnose weaknesses. Evaluation is a particularly necessary skill for college instructors in light of the responsibility placed upon them to properly advise incoming students and, when necessary, direct them into fields other than music. Each college or university makes its own decisions regarding any musical demands upon incoming students and there are very few apparent barriers to student admission designed to sift out untalented students. In another article, Whybrew stated that ”recent emphases on accountability in education . . . have intensified the need for tools which would help music educators in directing their efforts more effectively and in demonstrating the results of those efforts more convincingly” (Whybrew, 1973, p. 9). One method of identifying untalented students is predictive testing within the music department itself. Karma (1983) pointed out several pitfalls to avoid in predictive testing. First, the factors used in prediction should reflect the aims of the school. The goals of a particular course of study might not necessitate the selection of the best students. Second, effective prediction can only be achieved by using factors which actually affect success in the music study area under consideration. Therefore, careful research is necessary to identify valid predictors. Finally, successful prediction is the result of many predictors. For example, a musical aptitude test given to music students may not be a successful predictor by itself because the tested criterion, musical aptitude, will have a much smaller variance within the preselected group of music students than would be observed had the test been administered to college students from various majors. Other variables which might be combined with musical variables include intelligence, motivation, motor ability, and personality. Karma stressed one important fact, however: any variable used as a predic limited tra In 1i is easy to ' testing. D test shaI wit] con axe ans Mr. testing. "I the disco: about the results. "I Carpenter accomplk Mu Th “P011 the PTQViousl I958 dev: are the Pi devoted 1 1 6 as a predictor must be selected on the basis of (a) its stability over time, (b) its limited trainability, and (c) its clear measurability. In light of the problems associated with various prediction variables, it is easy to understand the division among writers when discussing musical testing. David Goslin of the Russell Sage Foundation stated, Attempting to predict future performance on the basis of test scores is much like trying to guess the ultimate size and shape of an oak tree by measuring a sapling in pitch darkness with a rubber band for a ruler, without taking into account the conditions of the soil, the amount of rainfall, or the woodsman’s axe. The amazing thing is that sometimes we get the right answer. (Lehman, 1969, p. 19) Mr. Gorlin has made a humorous point which holds true with music testing. The non-exact nature of some testing methods does not necessitate the discontinuance of testing, however. Rather, it requires new thinking about the goals of testing and the methods which produce the most useful results. Throughout history the perfection of adequate tools, whether for the carpenter or the researcher, has taken time and practice but has been accomplished in many areas. Development or Validation of Tests as Predictors This portion of the review of literature examines research focused upon the development of predictive tests and the validation and use of previously developed tests as predictors. The research described here includes tests developed for the specific purpose of prediction. The tests themselves are the predictive variable. A later section of the review examines research devoted to the identification of specific predictor variables other than a test. One . Edwin Gorc used with 5‘ be used: 1. Tl performanc 2. TI abilities of : Gorc interested I however, e value of th Two in testing c test to 332 College 5m. reasonably that reliabl dissertatio: music stuc 5t“dent ev Lee: I"? usefu “Will Go: 17 One of the most prominent individuals in musical test construction is Iwin Gordon. His Musical Aptitude Profile (Gordon, 1965) has been widely ed with students of all ages. Gordon suggested five ways the test scores may a used: 1. To encourage musically talented students to participate in music erformance organizations. 2. To adapt music instruction to meet the individual needs and tbilities of students. 3. To formulate educational plans in music. 4. To evaluate the musical aptitude of groups of students. 5. To provide parents with objective information. Gordon designed the test for younger school children and he was interested more in discerning aptitude than predicting success. Later studies, however, evaluated both the age of the individuals tested and the prediction value of the test . I Two studies examined the use of the Musical Aptitude Profile (MAP) in testing college students. In a 1967 project, Robert E. Lee administered the test to 332 college freshman music students to determine whether norms for college students could be established. He found that the test scores were reasonably reliable for college and university freshman music students and that reliable norms could be established. The study, based on Lee’s doctoral dissertation, concluded that the scores of college and university freshman music students on the MAP were beneficial as one of many criteria used in student evaluation. Lee’s study and documentation of his research provide norms that are very useful to college educators particularly in light of a replicative study by Edwin Gordon in the same year (Gordon, 1967). Gordon’s study involved administrat Lincolnwoc of subjects informatiox MAP, was his test for should be ' adequate c very minor (Gordon,1 The music stuc' include on Lee’s, that his testing training, 1 have musi in general ‘ 1972, p. 35 apparentlj Concluded “onmtlSic Silldent st Fin (1974). H l947 by j; Of Music 1 8 ministration of the MAP to freshmen at Rochester, Minnesota and ncolnwood, Illinois. No information is provided concerning the number subjects tested. Much of Gordon's article is a restatement of the formation Lee presented in his dissertation. Gordon, as the creator of the lAP, was willing to make a stronger positive statement regarding the use of is test for college students. He said, ”The Musical Aptitude Profile can and hould be used as an educational diagnostic tool for the implementation of an adequate curriculum for college and university music students, and only to a very minor extent, if at all, should the battery be used as a ’talent’ test” (Gordon, 1967, p. 40). The previously mentioned studies dealt with the MAP and college music students. In a 1972 article, William T. Young expanded the research to include college and university nonmusic majors. His goal was similar to Lee’s, that is, he wanted to establish norms for this particular target group. In his testing of 205 university students with little or no previous musical training, he discovered that nonmusic majors of the southern United States have musical aptitude ”somewhat greater than that of high school students in general and lesser than that Iof midwestern college music majors” (Young, 1972, p. 390). Young’ 5 reference to rnidwestern college music majors was apparently a reflection upon the previously cited study by Lee. Young concluded that different norms must be established for music students and nonmusic students and that the MAP was a useful tool for diagnosing student strengths or weaknesses. Finally, a study involving the MAP test was conducted by Schleuter (1974). He compared the Aliferis Music Achievement Test, first introduced in 1947 by Iames Aliferis, and two tests created by Edwin Gordon; the Iowa Tests of Music Literacy, Levels 5 and 6, and the MAP. The tests were administered to universit diagnostic r developme‘ Test. Alife coauthored actually dil Schl. found that subjects. I provided 1 since each One studies, or other focu 1941). In . batteries c defined cc harmony, strength c professior appiied p success, 1 PIQdiCtivc Taylor, th likely to 5 eValuatio: Subseque 1 9 to university freshmen music majors in an attempt to determine the diagnostic strength of each test. James Aliferis was actively involved in test development and his important contributions include the Aliferis Freshman Test. Aliferis documented the construction of this test in an article coauthored with I. E. Stecklein (1953). The two Aliferis music tests are actually different tests with similar goals. Schleuter acquired data for 150 subjects over a period of two years and found that each of the three tests provided useful information about the subjects. He concluded that the MAP test combined with an achievement test provided the most information. The choice of achievement tests is variable since each school has different objectives. One early research study is especially interesting. It is actually two studies, one focused on the prediction of success in college music and the other focused on the prediction of success in the professional arena (Taylor, I 1941). In the first study, Taylor evaluated the prediction strength of four ‘ batteries of music tests and one intelligence test upon college success. She I defined college success as the ability to succeed in dictation, sight singing, II harmony, and music history. The second study examined the predictive I strength of the same four batteries of music tests upon Success in the music II profession. Using some of the same subjects tested in the first study, Taylor II applied predetermined criteria to each subject to determine professional I success. She concluded that none of the music test batteries have sufficient II predictive power to be used by themselves in student guidance. According to ITaylor, the student who is successful in dictation and sight singing is most Ilikely to succeed professionally. A final conclusion stated that the IIevaluations by a student’s instructors are very reliable indices of the student's IIsubsequent success in professional music. Although many of the more recently de study prov researcher using the r The 19605 and pertain to variables t found that psychomo study, the intelligenc aptitude t1 small nun tentative. G01 1984 and ‘ discrimina validity a In 1 1970. In ; test battel This inclu out, on th according In . doCuInen 2 0 recently developed tests were not available when Taylor did her research, this study provides insight into the predictive strength of the tests used. A researcher could produce an interesting study by replicating Taylor’s research using the modern music tests currently available. There was a marked increase in research of music testing in the late 19605 and early 19705. Several studies deserve mention here since they pertain to predictive testing. Hufstader (1974) undertook the identification of variables useful as predictors of success in beginning instrumental music. He found that musical aptitude, academic achievement, intelligence, and psychomotor skills all contributed to the prediction of success. In another study, the opposite conclusion was reached (Gordon, 1968). That is, intelligence and achievement tests do not enhance the predictive power of aptitude tests. The strength of Gordon's conclusion was weakened by the small number of subjects and the author's admission that the findings were I tentative. Gordon continues to be active in testing, especially with children. In 1984 and 1986, Gordon completed longitudinal studies of his auditory discrimination and timbre preference tests. These studies in predictive validity are of general interest. In the area of college testing, two important studies were completed in 1970. In an examination of test content, Whellams (1970) found that aptitude test batteries should include non-musical tests as well as aural-musical tests. This inclusion increases the predictive strength of the aptitude test. He points out, on the other hand, that the types of non-musical tests included vary according to the social and educational background of the subjects. In a study with specific impact on the research reported in this document, Ernest (1970) found that the best single predictor of college grade point (r=.45 addition 01 enhance th In a Ohio State remedial c indicated t in the rern training cc emphasizi present re freshmen Sch Seashore ‘ After testi the two te much mo Seashore Kwalwas: Bal test in rm to measu disCfimin Seetions - Single an A tests in n 2 1 point (r=.43) and music grade point (r=.44) was high school rank. The addition of nonmusical aptitude and achievement tests did not significantly enhance the predictive ability of high school rank used alone. In a 1983 study, Arenson sought to validate the music portions of the Ohio State University Entrance Battery as a predictor of success in the two remedial courses offered at the University of Delaware. His findings indicated that the OSU theory combined score was a good predictor of grades in the remedial course emphasizing cognitive knowledge. The OSU ear- training combined score was a good predictor of grades in the remedial course emphasizing ear-training and listening. This study is very pertinent to the present research study since it involves the prediction of academic success of freshmen students in theory. I Schmitz (1956) investigated the prognostic value of the revised Seashore tests and the Kwalwasser-Ruch Test of Musical Accomplishment. I After testing 582 students who were administered various combinations of : the two tests, Schmitz found that grades below the mean were predicted with I much more accuracy than grades above the mean. The B form of the I I Seashore tests appeared to be the strongest single predictor while the l I Kwalwasser—Ruch Test was not a strong predictor. ' Ball completed a study involving the construction of a college entrance I test in music in 1964. He constructed a battery of thirteen musical ability tests Ito measure rhythmic, melodic, and harmonic abilities as well as interval Idiscrimination, chordal analysis, and memory. His research indicated the Isections involving memory, interval discrimination, and discrimination of Isingle music elements were the best predictors. A study by Perry (1965) examined the predictive proficiency of selective [I tests in music theory administered individually and in groups. His goal was I to determiI music theO strength CC students. ' Proficient) After adm of the test: greater the M A r in predict have beer research i from plac requirem. receive 16 concepts Or school gr student’ 5 1970;Ch.- 2 2 to determine which tests were good predictors of performance in college music theory courses. Tests that provided a significant level of predictive strength could be used in guiding, counseling, placing, selecting, or grouping students. The predictor tests were administered prior to the start of classes. Proficiency tests were given after the first semester of theory was completed. After administering the test to 91 freshmen students, Perry found that seven of the tests under investigation were significant predictors with correlations greater than r=.60. The Identification of Predictive Variables A number of research studies have sought to identify variables useful in prediction of academic success in various musical areas. Many variables have been examined as predictors. One of the problems with this type of research is that the variables useful in prediction of success may vary greatly from place to place. This is typically the result of non-standardization of requirements. Concepts which receive great attention at one location may I receive less emphasis at another. Therefore, any test emphasizing certain concepts is likely to be more useful in one school than in another. One of the most important prediction variables identified thus far is I school grade point average. A weaker form of this variable is found in a I student’ 5 class rank. Several studies (Horst, 1959; Turrentine, 1965; Ernest, I 1970; Chadwick, 1976; Hedden, 1982) found that grade point average or class I Irank were significant predictor variables. Each of these studies, with the ,, exception of Hedden, focused upon college level testing. I Another strong predictor variable, intelligence, is usually measured I with a standardized intelligence test. Neely (1965) found a positive I correlation between intelligence and notational ability 1n ear-training. A I1973 study showed that musicality and intelligence could function as prediCtOI'S I placed inte musical tra combinatio viable prer Reyi personality Krueger fc predictors factor of t] other vari; perspectiv the perspr suggested directing . whether t He found school an of particij Tn examine ' that stror harmonic echo-p111} accompa In 2 3 predictors of choral achievement (Helwig & Thomas, 1973). Another study placed intelligence in a long list of variables which included'aptitude tests, musical training, personality, age, sex, race, home environment, and various combinations (Webber, 1976). Each of these studies found intelligence to be a viable prediction variable. Reynold Krueger did rather extensive research into the variable of personality as a predictor of teaching success. In two studies (1972 &1976), Krueger found that personality and motivation were very powerful predictors of teacher success. The power of these variables, however, is a factor of the measurement instrument used to gather data and the control of other variables. Motivational variables have also been studied from the perspective of success at high school band directing (Caimi, 1984) and from the perspective of college ensemble participants (Mountford, 1982). Caimi suggested that insufficient numbers of motivational variables exist in band directing tasks to warrant prediction of success. Mountford examined whether there are variables useful in predicting college band participation. I He found that variables such as extracurricular use of instrument in high I school and nonselection of rock as a favorite style were significant predictors of participation. Two studies do not fit neatly into a variable category since they examine unusual predictor variables. One study (Humphreys, 1986) found I that strong ability to echo-play a melodic segment indicated success at I harmonic audiation and performance. Humphreys suggested that training in echo-playing may enhance a student’ 5 ability to play implied harmonic accompaniments. In a 1981 study, Brand and Burnsed researched whether the number of instruments played, ensemble experience, GPA in music theory, GPA in sighisingiII could tune the examir that error instrumen' instrumen One with intell attainmen intelligenr listening 2 was best I best predi used as a A 1 students : attempt t pitch adjr 0f intona the hypo could Sig Se there is g 2 4 sightsinging and ear training, or years of private instrumental instruction could function as predictors of error detection ability. Unfortunately, none of the examined variables proved to be effective predictors which may indicate that error detection skill is not developed in the same fashion as other instrumental music abilities or it may indicate that the measurement instrument was not sufficiently reliable to demonstrate a correlation. One important study (Young, 1969) combined the Gordon aptitude test with intelligence and academic achievement tests to predict musical attainment. Young found that the MAP and either an achievement test or intelligence test were the best predictors of success in performance and listening areas of music. Conversely, success in the academic areas of music was best predicted by an intelligence test. Overall achievement in music was best predicted by the three types of tests (aptitude, intelligence, achievement) used as a group. A 1982 study by Chevallard used 77 undergraduate and graduate students in applied voice, woodwind, and brasswind instruction in an attempt to determine whether pitch memory, pitch discrimination ability, pitch adjustment ability, or pitch steadiness ability could be used as predictors of intonational performance. However, all research studies do not produce the hypothesized conclusion and Chevallard found that none of the variables could significantly strengthen the prediction of intonational performance. Several conclusions are drawn from the research cited above. First, there is a continuing interest in musical testing. Researchers are desirous of measuring the characteristics which mold musical ability. The studies cited also document the interest of researchers in predicting which students will succeed. This interest spreads across all age groups. Finally, motivated by an interest in predicting student success, researchers have tested a broad range of musical Va this same ‘ 1h__e_l_Ls_e__Q The outset. In topic of or uses of co outside th includes 5 This sped Much of 1 who use 1 choices. ' conceptic counselir In out that informat be provir simplifie inputtinI Some in. human 1 implemr Ir bEtter or College I 2 5 musical variables to identify those which function as strong predictors. It is this same motivation which propels the study reported in this document. The Use of Computers in Testing The parameters of this section of the review must be defined at the outset. In the past fifteen years a large body of articles has appeared on the topic of computer-assisted instruction, computer-assisted learning, and the uses of computers in education. For the most part, this body of research falls outside the scope of this review. The literature included in this section includes studies directed toward examination of computer uses in testing. This specific area is still in its infancy and is especially undeveloped in music. Much of the research in this field is aimed toward school guidance counselors who use the computer as a tool to direct students in academic and career choices. This has bearing upon the present research since it is hoped that the conception of a theory test will lead to the development of an academic counseling tool. In a study on computers in counseling, Eberly and Cech (1986) pointed out that ”computer technology permits presentation of more precise information without oversight or observer bias at a greater speed than could be provided by a counselor” (p. 18). They go on to state that computer usage simplifies the collection of data and increases the privacy of the individual inputting the data but these advantages are not without negative aspects. Some individuals may view the computer as an inadequate replacement for a human teacher. Thus, they are less likely to cooperate with attempts to implement the new technology or to see the computer as a benefit. In the area of testing, the prime question is whether computer testing is better or worse than standard pencil and paper testing. A study, using 72 college students sought to answer this question (Fletcher 8: Collins, 1986). The study of a test W pencil test. computer ' following They inch 1. 2. 3. All study in 1 determine alleviatio: Th professior are more computer Th It is now Examina- under (1; resPoltse 2 6 The study found that the mean scores of students taking a computer version of a test were roughly equivalent to the scores of students taking paper and pencil tests. The study also demonstrated that most students preferred the computer version of a test over paper and pencil test versions for the following reasons: 1. Computers can provide immediate scoring. 2. Computers can provide immediate feedback on incorrect answers. 3. Computers are more convenient, straight forward and easy to use. 4. Computer tests are completed more quickly than written tests. The students also identified some disadvantages to computer testing. They included: 1. Inability to review all responses. 2. Inability to make changes to responses. 3. Inability to skip questions and return to them later. All these disadvantages were a product of the test used in the research study in which these students participated and were design considerations ‘ determined by the test developer. Current technology allows for the alleviation of each of the listed disadvantages. The results of the previously cited study appear to have support in the I professional arena as well. A recent study suggested that adolescent students are more willing to input information into a computer since they view the computer as less threatening than an adult (Millstein, 1987). I The development of computer testing is moving ahead at a rapid pace. It is now possible for students to take practice forms of the Graduate Records Examination (GRE) with microcomputers (McArthur 8: Choppin, 1984). Also under development are systems which will diagnose patterns of error in responses to multiple choice questions. One of the most recent developments I is adaptive answer to informatio no other q designed 1 the compt San As has alr The numl for this re cited belo l. tradition: 2. needs. 3. response 4. time mm 5. 6. l'QSpons‘ Howeve KROWIQI 27 is adaptive testing. An adaptive test varies with each response. A correct answer to a certain question supposedly demonstrates mastery of all the information necessary for that answer. The computer ”adapts" the test so that no other questions covering that material are asked. Elaborate systems can be designed that remember which errors the student has made in the past and the computer can provide constant remedial help with problems. Sampson (1983) pointed out the potential benefits of computer testing. As has already been stated, there is a positive response to computer testing. The number of advantages inherent to computer testing may be responsible for this response. A partial list of advantages gained by computer testing is cited below. 1. Computer testing has proven to be at least as cost effective as traditional testing. 2. Adaptive testing allows for specialized attention to individual needs. 3. The computer can generate a wealth of data along with test 'esponses. 4. Since the computer handles many of the administrative tasks less rne must be spent by staff persons. 5. Administration and scoring of tests is more flexible and efficient. 6. Student error rates are decreased. That is, errors such as placing Jonses in the wrong number are eliminated. These advantages were reiterated by Meier and Geiger (1986). ever, Sampson lists some problems along with the advantages. 'ledgeable persons can tamper with records making security an I-ant issue. Some individuals have a fear of using computers and this ght be reflected in their performance. Although these are very real problems, advantage disadvanta Tur The comp Tests can the test lit student v Bet fact, he w computer decrease pencil tes within cc within it Compute the com] tradition T‘ perform criterior behavio Style cla m€asurt Student finding 28 problems, they are surmountable and do not necessarily diminish the advantages of computer testing. One must accept trade-offs of advantages and disadvantages with any form of testing. Turner (1987) added another advantage to those listed by Sampson. The computer allows the test administrator to create large banks of test items. Tests can then be generated from these banks of items. If sufficient analysis of the test items is completed, it is possible to generate a different test for each student while maintaining equivalent item difficulty. Bejar (1984) agreed with the stated advantages of computer testing. In fact, he went one step further and pointed out that in some instances a computer test is preferable to the traditional method. Some variables which decrease precision of score assessment cannot be controlled in a paper and pencil test. Typical scoring of paper and pencil tests focuses on variance within correct responses. Computer scoring allows analysis of variance within incorrect responses as well as variance within correct responses. Computer scoring also provides complete error control during scoring and the computer can generate information which is not readily available with raditional methods of scoring. Two important studies of computer testing in music have been 'rformed. In 1972 Radocy evaluated the viability of using computers for terion-referenced testing of nonperformance music behaviors. The aviors examined by Radocy included dictation, interval recognition, and a classification. Radocy developed a test based on behavioral objectives to rure competency in the stated objectives. The test was administered to 32 nts by the computer and 28 students by conventional methods. Radocy’s gs have tremendous impact upon the present study. 1. l the constr certain 11C 2. 1 success of made reg 3. nonperfor conventic cited abo M1 and Sims music tra the subje keyboarc subject. also recc subject t. . interest: adminis C‘onitLu VI IOOIS Sht IYPiCall fads to known new prt 2 9 1. Present skills (in 1972), techniques, and equipment are adequate for the construction of a workable computerized criterion-referenced test of certain nonperformance musical behaviors. 2. Rank order of items, in terms of item difficulty, is critical to the success of an incremental programing strategy wherein assumptions are to be made regarding responses to nonadministered items. 3. The computerized criterion-referenced test of certain nonperformance musical behaviors is not at present equivalent to a conventional paper-and-pencil version of the test. (The more recent studies cited above may refute this finding.) I Music preference has also been the object of computerization. Gregory and Sims (1987) developed a computer program to present nine four-voice music transcriptions to the subject in random order. The computer allowed the subject to change the music selection at any time by pressing a key on the keyboard. The computer then recorded the elapsed listening time for each subject. In a second study with the same hardware and software the computer also recorded the subject's like or dislike of each music selection when the subject touched the appropriate box on the screen. This study is of special . interest since it demonstrates the use of computers as unattended test administrators and scorers. finitmction of a Pr§_c_li_ctive Musical Test Wedman and Stefanich (1984) stated that computer based assessment tools should test concepts, principles, and procedures as well as facts. In a typical learning sequence the student begins by committing a particular set of facts to memory. Second, the student learns to restate and interpret the known facts. Finally, the student is able to apply and use the facts to solve new problems created through various situations. It follows, then, that the assessmen respond al suggest th 1. l evaluatio: 2. examples 3. principles the learn 4. perform procedur A] tremendt Marklet test desiI O predict: howeve: stated t] that ise (p. 64). t the dev . selectin test rig 30 assessment tools must incorporate items which will encourage the student to respond at higher levels than recitation of facts. Wedman and Stefanich suggest that successful computer assessment requires the following: 1. Determine the type or types of content to be included in the evaluation. 2. For conceptual content, test items should have the learner select examples from non-examples for each of the concepts included. 3. For principle content, test items should have the learner apply the principles in ways consistent with how the principle will be applied outside the learning situation. 4. For procedural content, test items should require the learner to perform the procedure under conditions similar to those in which the procedure will be performed away from the learning situation. (p. 27-28). All of these guidelines will not apply to the present study but they are a tremendous help in channeling deveIOpment ideas. Other documents by Markle (1969) and Bloom and Peters (1961) presented helpful information on test design. One of the desired outcomes of the proposed study is the ability to predict success in music theory of incoming college students. The test itself, however, will be a diagnostic, criterion-referenced test. Willingham (1974) stated that criterion-referenced tests ”should provide diagnostic information that is especially relevant to placing students and monitoring their progress” (p. 64). Colwell (1970) suggested several characteristics which are important to the development process. He makes the suggestions as guidelines for selecting an appropriate test. However, they are necessary considerations in test development. Factors to evaluate are time, difficulties in administration, cost versus asepflhf evaluation desired va xhdmge «hmdmm 3n mdwhng Content 1 mumnm1 manor present 5 mhhdv under cc Iheustl reliabilit Rt ansm. in other score hi measurt method two sul include