.311 'I'I’II IIIIII II ‘Ul 111i.1I',.*11'111".””T" "“1"." III 11 ‘ I"'I"'11IIl 1"" H ”:51" ."' 9 I H. 111%.] i1" \IIIIIIEII 11"II'I . I IIIIII 11-1111"! IIIIH'I 1L 11 '1“ II 111.11 IIIW » . ,,_ _ "I" ' "I ' ' " . " ._ , "I" '"II 1 - 11“"111 IIII III” II1 11.11 I? IIIIIIIIIIIIII: ;"""".11":1.""" ,, . , I ‘I ’ ‘ '.'1- 'I" a". '. . . III W INIR - II (”I I I I " I1 "I'M "IIII II"II"I"""l "II .If'11" 34..., " ."- "1' " . .l'. .' I'. .‘III II"I IL. ”I 1'11 11.1.1111 I“ i I" » s» 1 ...,111.1' "11111111 ‘1 1II III I I II ”11’1":le"1""1111.""II""' ," 11 .11 - . 111-111 1'. 1' 5 "I" "I ,.'I1' II. “I." I'"1 "l. 19"" -."I' 1 ' " " I "'1"! " ""11""1?"‘"I"I""” ”I” ”"1"" I 1"." "11".?" II ""1”“ " I "I H " " "h III. III ' 1.111 — —‘- '5..-.____ 1-H: h———-_—:—_._.:—:-_- m _h'| 4%.— —_-—-—— ———.—‘ *:u_E—-_ ”H 1%- _-_=——:b-‘:‘-‘— :7 ___‘ "1’11 WIN; . ""4"“ II. .‘ ‘,1III'.'I1 . I11 11. ,. F :1. . . . '. .. E'IIIIE I. ,.I .- 1'11 III’IIj‘ I'll": ‘ I. _ , .1 ' " 1‘1"".III'" .151... '1' .. IIII I [III 1" "IlI-"II'"'"*""'"""""""I"1'""I" ."1: , II ”If" .I '. ""I.'I"I'I,,,.;I",'""l I",II»"I‘I1'II",1'I'II 11-» » ' II I" '.,., ' .'.,'.'I1I» ',1".',',‘, {HI {1.11.1}: "I" I ".1;'."'1.IIII"'""I"I"IHIII11’ III "1 .II J" I" ' 'I ' 11’1"" "'1" "I""1'1331111111"..11.".."11 ""41"" " 1111".""I "".‘|1"IH""""11""'I"I:11"II “III "II 11/41 ”III » I .1. :1 .. 1 .I; 1"" .5. 2:11. W I» I I" '_1+II','I'I1I:I,...'+'+.'1'" 11111IIII'IIIII": W I IV "é'IIIIIIII' '11 IIII-I 11" "111.151"; 1.11! "1.11”" I’I’I' IIHII .III URI III. ' 1» II'I111"‘III 1" .II. . . ' ' . 1 I" . . 1”"'"’"",""."I"1"1UII- I".11. I11'1'""'"1' " """c"1,"‘l;v,II"IIIIIg '. 'JII'III)|I"".»I' ,I~II,II 11 .1» ....-1 "1' " "."".I "1.1".“ ..".111 I I' "at“ I", I' W '1 1" 11.1 "";'"11'| .'4II" "1W 1'1“".1“ 1. i I, III I I III II:.1II""1III I I I,1., “AI-II? I.III.1" "It“ ""‘IIIIIH .’I_|,I 11 II ., ,3. 111» "1"” I 1' "'- . ' “it." "" "III MIIII IIIIUL" "1..-". "1"..."."”.'" 4.1.1 511-11111 IuIIIa’I " Ill!IllllllllzllfllfllllllljlljlljjlljllWill This is to certify that the thesis entitled A Multivariate Study of the Relationships Among Types of Medical School Performance and Its Prediction presented by David 8. West has been accepted towards fulfillment of the requirements for Ph.D. degreein Educational Psychology 02sz Major professor Date _Augus_t_l_’s_.__l_ Eoum poummpmv meUOE owuxamco Hobomw mo COmwummeoo H.m musmwm 15 O H + + + O H O H + + + + + + + + + + + + + + + + + + + + + + + + + + + p4 04 «a q: ma ¢> m m m + + + + + + + + + + + + + + + + + + + HNMQ‘LDQFW O‘ + + + + + + + + + + + + + + + H N M Q' LO \0 1‘ m 03 muouomM a O m d umoe muouomm U m Am “Ouowm umoe muouomM HouomM umme uamwommm muouoom camaoomm muou06m Honocmo owmwoomm Hmumcoo wamwuasz QDOHU mwm>amcm Houomw wamfluasz .m mflm>aocm Mouomu mzouo .m mflmxamcm uouomw 039 .H 16 (1939) performed a group factor analysis on the grades of elementary school children. The general factor accounted for 28% of the total variance in the correlation matrix and the verbal, numerical, and practical group factors together accounted for an additional 21% of the variance. In contrast to the British approaches which involved extraction of a general factor first, American factorists led by Thurstone derived methods which yielded a number of what Thurstone called multiple factors. Thurstone's centroid technique involved successively extracting a number of factors and then rotating the reference axies ("factors") so that particular variables would load maximally on one factor and have negligible loadings on other factors. Using the centroid technique and subsequent rotations of the reference axies to what he called "simple structure," Thurstone (1938) factor analyzed 56 ability tests given to 240 college students. The result as with factor analyses of personality variables was a series of multiple factors (Vernon, 1961). Thurstone called these rotated factors "primary mental abilities" and strongly argued against Spearman's conception of a single general ability factor. Conclusions about the structure of academic performance depend upon which factor analytic technique has been used. When a general factor has been extracted and no further 17 analysis of the residual matrix is performed, Spearman's two-factor solution will result. When additional factors are extracted from the residual matrix, a group factor structure results. When a number of factors are extracted and rotated, a multiple factor solution emerges. The intellectual and statistical challenge offered by factor analysis is that the same set of data can be analyzed by using any of the methods discussed above, and while conclusions based upon the results will differ, all of the solutions are mathematically "legitimate." An additional complexity results when the factors are rotated. Statistically, what is being done when rotations are per- formed is that the variance due to the first or general factors of a set of achievement measures is being redis- tributed among the group or multiple factors which have been created through rotation (Vernon, 1961). When an oblique rotation is performed (i.e., the factors are permitted to correlate), the factors are often at least moderately correlated implying an underlying general factor (Wolfle, 1940). Any set of factors can be orthogonally rotated in an infinite number of ways, thus producing a theoretically infinite number of mathematically legitimate patterns of factor loadings. Statisticians (e.g., Lawley & Maxwell, 1971) refer to this as the problem of "indeterminancy." 18 One logical way of dealing with this problem of the lack of a uniquely identified solution is to specify, in advance on which factors the variables should load and then proceed to test this hypothesized performance structure (Joreskog & Lawley, 1968). An early technique used for this purpose was Burt's multiple-group factor analysis (Harman, 1968; Hunter & Gerbing, Note 1). A later, more flexible technique is Joreskog's confirmatory factor analysis. In both of these techniques the factors on which the variables load are specified in advance eliminating the rotation phase of the analysis. Most of the studies of the structure of achievement have been done on intellectually heterogeneous groups at pre-university levels of education (e.g., elementary school children, military service personnel) and are, thus, not as generalizable to the research problems in this study as would be desirable. Studies which exemplify the research done on college and professional student populations will now be discussed. Schoenfeldt and Brush (1975) calculated the GPA's in 12 subject matter areas (e.g., Humanities, Biological Science, Social Science) for over 1,900 undergraduate students. These 12 GPA's were then factor analyzed along with the student's high school GPA and Scholastic Aptitude Test (SAT) verbal and math scores. After a varimax 19 rotation was performed, the analysis yielded three factors: (a) a general academic achievement factor on which 10 of the 12 GPA's loaded; (b) a factor consisting of grades in applied areas (i.e., Agriculture and Education); and (c) an SAT factor. From these results, the researchers concluded that college achievement is essentially a unitary trait. In a similar study of law school grades, Boldt (1973) factor analyzed law school grades for 116 students and tested the goodness of fit of one, two, three, and four factor solutions. He similarly concluded that the matrix of law school grades consisted of essentially one factor. Studies of the Structure of Medical School Performance Sirotkin and Whitten (1978) collected test score and performance rating for one class of students in an organ systems curriculum at Wayne State University's School of Medicine. This curriculum was very similar to the current curriculum at MSU—COM: Year 1 consisted of basic science courses; Year 2 was comprised of courses in organ systems biology which consisted of both clinical and basic science input; and Year 3 was a year of clinical clerkship training. Using canonical correlations, the authors correlated test scores and clinical performance ratings from each year of the curriculum with those from each other year. These canonical correlations showed considerable consistency 20 performance during contiguous years (i.e., R .76; R2 3 = .71) and a surprisingly I Year 1, Year 2 = strong relationship between performance in the basic science courses in Year 1 and clinical clerkship perfor- mance ratings during Year 3 (R1 3 = .59). The matrix of I canonical correlations is displayed in Table 2.1. Table 2.1 Canonical Correlations Among Performance Measures (Adapted from Sirotkin & Whitten, 1978) Year 1 2 3 1 1.00 2 76 1.00 3 59 71 1 00 Markert (1978) investigated the relationship between classroom performance in the neuromuscular system at MSU-COM and student performance on carefully evaluated neurological history and physical examinations. He reported a significant canonical correlation of .46 between these two groups of measures. Gough, Hall, and Harris (1964) conducted a large scale study of over 1,200 graduates from the University of California Medical School at San Francisco from 1951 to 1963. One aspect of their study was an investigation 21 of the correlations among yearly medical school GPA's. The median correlations among GPA's for each of the four years and the median correlation between each yearly GPA and the four-year cumulative GPA are displayed in Table 2.2. Two interesting findings stand out. First, as would be logically expected, the median correlations between GPA's in contiguous years are the highest in the matrix. Second, as in the Sirotkin and Whitten study, the correlations between performance during the first two years (the basic science phase of the curriculum) and the second two years (the clinical clerkship phase) are surprisingly high indi- cating some degree of consistency of performance in basic science courses and clinical performance. Table 2.2 Median Correlations Among Yearly GPA's (Adapted from Gough et a1., 1964) Year 1 2 3 4 1 1.00 2 .64 1.00 3 .52 .64 1.00 4 .38 .44 .64 1.00 Four-year cumulative GPA '82 '83 -82 -74 22 Rhoads, Gallemore, Gianturco, and Osterhout (1974) compared the award of Dean's honors to students in both the basic science and clinical phases of the curriculum. Combining their data from the entering classes of 1962 to 1970 (N==728), and calculating an odds ratio (Reynolds, 1977), it can be estimated that the odds in favor of clinical honors are 3.68 times as great for students who received basic science honors (1.60 to 1.00) as for those who did not receive basic science honors (0.44 to 1.00). The 95% confidence interval for this odds ratio is 3.26 to 4.10. Since the interval does not contain 1.00 (which would indicate equal odds of receiving clinical honors for both groups), it can be concluded that a significant pos- itive association between clinical and basic science per- formance exists. The strength of association between the two types of performance can be estimated by using Yule's Q (Reynolds, 1977). The Q statistic for these data is .57. Rhoads et a1. observed, however, that there were several students (26% of their total sample) who received clinical honors but who did not receive basic science honors. They therefore concluded that proficiency in the basic sciences is not the sole determiner of success in the clinical phase. Schumacher (1964) factor analyzed medical school grades, National Board Examination scores and peer ratings of what he called "functional knowledge," diagnostic skills 23 and skill in relating to patients for a group of interns. He reported finding a "general knowledge" factor which accounted for 44% of the total variance of the correlation matrix and which contained high loadings for medical school grades, scores on Parts 1 and 2 of the National Boards, and peer ratings of functional knowledge and diagnostic skill. Ratings of functional knowledge, diagnostic skill, and skill in patient relationships loaded on a second orthogonal factor which accounted for 9% of the total variance. A similar finding of a general factor of clinical competence was reported by Maatsch, Downing, Sprafka, and Holmes (1978). Maatsch and his associates factor analyzed scores on multiple-choice tests of clinical and clinically- relevant basic science knowledge, patient management prob- lems (PMP's), and ratings of simulated clinical encounters. Participating in the study were currently practicing emer- gency physicians, physicians in other specialties who were eligible to be certified as emergency physicians (the board eligible group), and medical students. Excluding four patient management problems which did not discriminate between medical students and physicians, all of the tests loaded on a single, general factor which accounted for 43% of the variance and 83% of the communality of the correla- tion matrix. Other factors which were found were a PMP format effect and a multiple choice question format effect, 24 each of which accounted for an additional 6% of the communality. An alternative way of looking at complex learning and performance is to conceptualize it as taking place in a hierarchical sequence. Using this conception, Gagne' (1974) has hypothesized a learning hierarchy in which learning and concomitant performance at one stage depends upon possessing the knowledge or skills which were acquired at the next lower stage. Thus, as a simple example, the learning and performance of multiplication should depend upon the knowledge of addition. Bloom and his colleagues (Bloom, 1956) developed a hierarchical taxonomy of processes involved in subject matter learning. The well-known Bloom taxonomy consists of the following stages: Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation. As in the Gagne' hierarchy, performance at one stage is assumed to be dependent upon the degree of the student's accomplishments at the preceding stage. Thus, for example, application of a principle is assumed to take place only after the student adequately comprehends the principle. The simplest way of testing for a performance hierarchy is to see if performance at different levels is correlated (e.g., Gagne', 1974), or to compare the proportion of students succeeding at a stage n given 25 success at stage n—l. If the stages are indeed hierarchical, the first proportion should be greater than the second proportion. Using the correlational model, the relationship between performance at two dif- ferent levels of a hierarchy is schematically illustrated in Figure 2.2(a). A serious potential deficiency of the correlational approach is suggested by the factor analytic studies of performance discussed above. That is, performance at any or all levels of the hierarchy may be influenced by the student's general level of ability (e.g., Jensen, 1969; Spearman, 1904) or knowledge (Ebel, 1969). If this is a tenable hypothesis, performances at different levels may, in the path analytic sense, be spuriously correlated because of the underlying "influence" of a general factor. This potential influence of a third variable, 9, is sche- matically represented in Figure 2.2(b). In this figure, the student's previous level of background knowledge or ability (9) is shown as influencing his or her level of performance at stage X of the hierarchy, which, in turn, influences his or her level of performance at stage Y. 26 Stage X 4;.»Stage Y Stage X Stage Y (a) Developmental model. (b) General factor model. Figure 2.2. Two alternative causal models. Thus, the first possibility is that background factors (i.e., the student's level general academic aptitude or background knowledge) causally influences performance at Stage X of the hierarchy which in turn causally influences performance at Stage Y (Figure 2.2[a]). Or, the student's general level of aptitude or knowledge underlies performance at both stages of the hierarchy, thus producing a spurious correlation between X and Y (Figure 2.2[b]). Expressing this latter relationship in factor analytic terms, both X and Y load on 9. Hence, when g is statistically controlled, the partial correlation between X and Y, should be rXng' close to zero. On the other hand, if a hierarchical or a develOpmental relationship exists among g, X, and Y (as shown in Figure 2.2[a]), will probably be less than rXY:g rXY but will not disappear completely when g is controlled (Hyman, 1955). 27 Kropp and Stoker (1966) constructed four taxonomic- type tests designed to Operationally define the six levels of the Bloom taxonomy in both science and social studies. On the basis of an analysis of both mean performance on the tests and an analysis of patterns of correlations among the tests, they concluded that the results generally supported the hypothesized hierarchical structure of the taxonomy. Madaus, Woods, and Nuttall (1973) employed a causal model approach to test the cumulative structure of the six major levels of Bloom's taxonomy. Using multiple regres- sion procedures to estimate the strengths of associations between performance at different levels of the taxonomy, they reanalyzed the Kropp and Stroker data. According to Madaus et al., the multiple Rz's between measures of performance at adjacent levels should be significant (indicating "direct" links between performance at adjacent levels of the hierarchy). However, the increment in R2 after variance due to performance at intervening levels has been statistically controlled should not be significant indicating the absence of "indirect" links between levels in the hierarchy. In terms of the authors' causal model, this second finding would also support the hypothesis of the absence of the effects of other variables such as g which have not been included in the model. A causal model which depicts the situation described above is shown in Figure 2.3. 28 K x x \ R2 >() \(R R )-0 c K ‘\ AKLK. A.C \ \ c 4—t-A 2 R > A:C 0 Figure 2.3. Segment of Madaus et a1. causal model. This strength of the direct link between Knowledge 2 C:K' the magnitude of the direct link between Comprehension and (K) and Comprehension (C) is estimated by R Similarly, Application (A) can be estimated by RA:C° The strength of the indirect link between Knowledge and Application can be estimated by (RA:C,K"RA:C)’ This difference is the variance in Application which is accounted for by Knowledge when the variance due to the intervening level of Compre- hension is statistically controlled. In multiple regres- sion terms, this procedure tests the increment in the R2 for Application when Knowledge is entered into the regression equation after Comprehension. Testing the significance of the difference (R; _ 2 ‘ i o o :C,K RA:C) is also statistically equivalent to testing the significance of the correlation between A 29 and K partialing out C; that is, the partial correlation, r When either statistic is significantly different AK:C' from zero (indicating an indirect link in hierarchy), this is a hint that a variable which has not been included in the model may be producing spurious correlations between performance at adjacent levels. In order to test this alternative explanation, Madaus et a1. performed the regression analyses again, this time controlling for students' scores on the Kit of Reference Tests for Cognitive Factors, a well-known measure of g develOped by the Educational Testing Service. As was expected on the basis of theories of general knowledge and ability, controlling for g attenuated the size of the correlations between adjacent levels of the hierarchy and reduced to almost zero the strengths of all but one indirect link in the hierarchy (the link between Comprehension and Analysis). In terms of the Madaus et a1. causal model, these findings indicate that the performance on one level of the hierarchy is partially dependent upon performance at the next lower level and partially dependent upon the student's general level of ability or knowledge. The final results of this reanalysis are shown in causal model form in Figure 2.4. The numbers in the figure are the Rz's between performance at different levels of the hierarchy. 30 .Am ouswflm .mnma ..Hm um manna: Eoum meQMpmv weocoxmu m.Eoon mo maw>oa cmoSuon m coHDMSHm>m .ucsm / ooo. / .ucsm / .Hmc<.llll hmH. .Nm .v.~ mucosm l Adda All. 0:. II .deooAll ms. 1! meow 3051 mmm. mam. mom. 31 Summary The factor analyses of achievement and ability measures have yielded the following general findings (e.g., see Carroll, 1978; Cooley, 1976; Vernon, 1961): l. Achievement measures tend to be moderately to highly interrelated. 2. Unrotated factor analytic solutions yield first or general factors which typically account for 30 to 50% of the total variance of the correlation matrix. For a set of measures administered to a reasonably heterogeneous group, Vernon (1961) has estimated that the general factor will account for an average of 40% of the total variance. 3. When the group of examinees or students is rela- tively homogeneous in ability, the average prOportion of variance accounted for by the general factor decreases. 4. When the initial solution is rotated, smaller groups or clusters of measures appear (e.g., Thurstone, 1938). These group or multiple factors tend to represent cognitive and performance aptitudes or abilities. 5. When the factors are permitted to correlate (i.e., the analyst uses an oblique rotation), the multiple or group factors are typically correlated; with high corre- lations among the cognitive ability factors and lower correlations between cognitive and performance factors (e.g., Wolfel, 1940). 32 The results of studies which have investigated the consistency of medical school performance across different types of knowledge and skills have shown that such perfor- mance is relatively consistent. These findings are con- gruent with the results of earlier correlational studies which show achievement measures to be moderately to highly intercorrelated and the results of the factor analytic studies summarized above which support the hypothesis of a general factor of achievement or knowledge. The most critical finding reported in these studies is the sur- prisingly strong relationship between measures of basic science performance and measures of clinical performance (e.g., Gough et a1., 1964; Maatsch et a1., 1978; Sirotkin & Whitten, 1978). An alternative way of looking at complex performance over time is to view it as stages in a hierarchy. Studies by Gagne' and his colleagues have shown that subject matter learning and concomitant performance in elementary school mathematics and science is hierarchically structured but paradoxically that instruction does not have to be sequenced in a way which is consistent with this structure for learning to take place (Gagne', 1974). Investigations of the Taxonomy proposed by Bloom (1956) have yielded approximately similar results. That is, performance on achievement tests measuring learning in 33 elementary and secondary school subjects has been shown to approximate the sequence proposed in the hierarchy. Consistent with the results of factor analytic studies, however, is the finding that when general ability is statistically controlled, correlations among performance at adjacent levels of the taxonomy are attenuated (Madaus et a1., 1973). This result indicates the probable "influ- ence" of general knowledge or ability on performance as well as the "influence" or what was learned at an earlier stage. The possible applicability of these hierarchical models to the description and analysis of learning proc- esses in the MSU-COM curriculum will be discussed in the next chapter. The Prediction of Medical School Performance An Historical Perspective The contributions of the Flexner Report to the cur- ricula and admissions practices of American medical schools were discussed briefly in Chapter I. Another important influence on curriculum and, hence, on admissions policies came from the European roots of modern Americal medical education. The first U.S. medical school was established at the College of Philadelphia (later the University of Pennsylvania) in 1765 by Dr. John Morgan. Morgan, like 34 most physicians of his time, received his medical training in Europe where the curriculum (like the typical one today) began with courses in the basic sciences and culminated in clinical clerkship training. The non-university route to an M.D. or D.O. degree was through a free-standing medical school. Flexner (1910) reported that some of these schools were barely disguised commercial trade schools whose grad- uates were ill-prepared for medical practice. Most pro- prietary schools had woefully inadequate facilities and instructors. The quality programs of the time, which were cited as exemplary by Flexner, were in university- affiliated schools whose curricula followed the European model. As proprietary schools were refused licenses by state boards of education, and licensing and regulation, most of the medical schools which survived the impact of the Flexner Report were those allopathic, osteOpathic, and homeopathic schools with adequate basic science programs. In order to admit students who would succeed in these pro- grams, applicant selection based upon indicators of science aptitude and achievement were emphasized. As discussed in Chapter I, the most widely used indicator of science aptitude was the MCAT Science Test. The MCAT test was originally developed in the late 1940's to equate the academic backgrounds of applicants 35 who had attended a variety of undergraduate institutions (Erdmann, Mattson, Hutton, & Wallace, 1971). The test later became used to simply predict performance during the first two years (the basic science phase) of the traditional four-year curriculum. The original MCAT (the one used in this study) is composed of four subjects: Verbal, Quantitative, General Information, and Science. As will be discussed in the next section, the two MCAT subtests which have been most highly correlated with performance during the first two years have been the Science and Quantitative tests. Using the age old principle that the best indicator of future performance is past performance at a similar activity, admissions committees have relied heavily upon premedical GPA as a predictor of success in medical school. Logically GPA has the following advantages as an indicator (Krupka, Elstein, Molidor, King, Parsons, & Son, 1977): It is a composite of grades earned in many courses using a variety of instructional strategies (e.g., didactic and laboratory instruction) and evaluation methods (e.g., tests, term papers, performance ratings). It can function as a relatively reliable summary estimate of performance over a long period of time (at least longer than the MCAT which samples knowledge in a variety of areas but in less than one day's testing time). 36 In order to get a picture of an applicant's personal qualities and to hopefully screen out undesirable person- ality types (e.g., the most obvious sociopaths), letters of recommendation and personal interviews are used. Appli- cant's letters of recommendation are almost always highly favorable, and thus are not useful in discriminating among candidates. When the applicant passes the first admissions screen (usually based upon some combination of GPA's and MCAT scores), he or she is invited to the school for a personal interview with members of the school's faculty. The applicant is typically interviewed by two faculty members who then usually rate the candidate on a series of rating scales which purport to measure personal qual- ities judged important in a physician (e.g., problem solving ability, decisiveness, ability to interact with others). Unless the school has a good training program the interrater reliability of the interview scores tends to be low (partially due to the low variance in the inter- viewers' ratings). In general, interview scores have not been significantly correlated with subsequent performance. The Prediction of Classroom and Laboratory Pefformance In the remainder of this section, representative studies concerned with the prediction of course and laboratory performance will be reviewed and discussed. 37 In selecting these studies, the following criteria were used: (a) the study should be fairly recent so that the results can be more readily generalized to current medical school selection problems; or (b) the study has been widely quoted in the literature in support of a certain type of admissions policy. In most of the studies to be reviewed, medical school achievement has been operationally defined as the student's cumulative GPA during the first year, the first two years, or the entire four years of allopathic medical school. With the exception of a companion study done by the author and his associates (West, Markert, & Bernier, Note 4), the author was unable to find any investigations of the pre- diction of student performance in colleges of osteopathic medicine. The most prolific investigators of the prediction of medical school performance have been Gough and his colleagues at the University of California at Berkeley (e.g., Gough, 1971, 1978; Gough, Hall, & Harris, 1963, 1964). Gough et a1. (1963) studied relationships between MCAT scores, premedical GPA's, interview scores or ratings and subsequent medical school performance. Their investi- gation was carried out on data from over 1,200 graduates from the University of California Medical School at San Francisco between 1951 and 1962. 38 Gough et a1. (1963) reported the following correlations between MCAT subtest scores (V = Verbal; Q = Quantitative; GI = General Information; S = Science) and first year GPA's: V = -.23 to .24 (Median = .14); Q = -.09 to .32 (Mdn = .18); GI = -.14 to .20 (Mdn = .12); S = .06 to .37 (Mdn = .28). Similar results for single classes of students have been reported by Crowder (1959): V = .14; Q = .21; GI = .09; S = .38, and by Richards and Taylor (1961): V = .16; Q .24; GI = .09; S = .23. For ease of comparison these and other results are displayed in Table 2.3. In spite of the intra-institutional variability reported by Gough et al., the two MCAT subtests which show the highest correlations with first-year performance across institutions are the Science and Quantitative tests. It is not surprising that these two tests (especially MCAT Science) have been among the most highly weighted criteria in the applicant selection process. The predictive validity of the MCAT Science Test was further confirmed by Gough (1978) who found the following median correlations between MCAT Science scores and yearly GPA's for the sample of University of California graduates described above: Year 1 = .28; Year 2 = .22; Year 3 = .04; Year 4 = -.O3. Buehler and Trainer (1962) analyzed the differences in scores on predictor variables for students graduating in the t0p 10% (22 students) and the bottom 20% (25 stu- dents) of their medical school classes. Data from six 39 om. mm. as. he. mmemazocx awesome mo umou mumsomumiumom Anomav ucoocw> w Ham3om ma.. «0.- 40.- HH.- mocmenouumd Amomac monum mflnmcuoucw mo mmcwuum w Hodmma .moumnoflm vm. II in II mmw Hmowlusom Anomav cowsnon mo.I Ii Ii In «mo new» suusom go. I: II II ado Roux ouwna mm. II ii II «mo Hmmm pcoomm mm. in it u: «no use» umunm Amsmav canoe mm. mo. gm. 0H. cam now» umuflm Aaomav Hodmma w mpHm50Hm mm. mo. Hm. ad. ago “was umuflm Ammmav umosouo mm. NH. ma. «a. «mo mam» amuse Amomav .Hm um nmsoo mocoflom flamenco w>Humufiucmsa Hmnum> coauouwuo xosum 9&0: @UCMEHOM H0“ OUMDUMHUIUWOQ pcm Hoonom Havapmz 0cm nouoom Bvsum Hmoxiusom v new» m How» m How» H How» sownmufluo mUGMEHOMHom Hoonom Hmowpoz v.N wanna new m.¢mw Hmowowsmum :mm3uom msowumHmHHoo 44 The authors reported the following multiple correlations between these predictors and yearly compre- hensive examination performance: Year l==.51 (R2==.26); Year 2=.42 (R2= .18); Year 3= .40 (R2=.l6); Year 4=.39 (R2==.15). The best single predictor of comprehensive exam performance was GPA. Best and his associates reported an overall multiple correlation of .55 (R2==.30) between the multivariate combination of performance measures and MCAT Quantitative, MCAT Science, premedical GPA, and quality of undergraduate college. Presumably in response to the social turmoil and demands of the 1960's, medical schools began to put more emphasis on noncognitive admissions criteria (e.g., eval- uations of letters of recommendation, indicants of the applicant's social commitments). This change in emphasis plus the implementation of affirmative action programs has broadened the traditional acceptance pool. Theoretically, as students with lower GPA's and MCAT scores are admitted to medical schools and the variances of these variables in the pool of accepted applicants increase, their cor- relations with medical school performance should increase as well. The results of a longitudinal study by Frederick McGuire empirically confirm this relationship. Using percentile ranks of premedical GPA, Science GPA, MCAT Science, and MCAT Quantitative scores. developed a multiple regression-based index for predicting academic s and first at Irvine magnitude increases uccess. Correlations between values on the index year class rank at the University of California are shown in Table 2.5. 45 McGuire (1977) The increase in the of the correlations clearly covary with the in the variability of the index itself. This relationship suggests that when the range of GPA's and MCAT scores of matriculants is increased, the correlations between these variables and some criterion or criteria of academic success will increase as well. Table 2.5 Correlations Between Regression-Based Prediction Indices and First Year Class Ranks (Adapted from F. McGuire, 1977, Table l, p. 417) Index score Entering No. of class students r 5.0. 1965 85 .37 63 1966 85 .46 69 1967 63 .34 42 1968 61 .31 32 1969 62 .26 27 1970 63 .33 34 1971 66 .49 50 1972 69 .49 40 1973 58 .49 44 1974 69 .84 9O 46 As noted above, applicants who pass the first admissions screen are invited for interviews with two or more of the school's faculty members. In some schools potential interviewers are carefully trained in interview workshOps and instructed in how to structure the interview and rate the candidate on the school's interview rating form. In other institutions faculty members are asked to volunteer to be interviewers in their spare time and are given little or no training. As would be expected from other social science research, interrater reliability of the interview ratings or scores generally increases with the amount of training and the degree of structure of the interviews. In general, however, interrater reliability has not been high nor have significant relationships between interview scores and subsequent student performance been reported (e.g., Krupka et a1., 1977). Prediction of Clinical Performance The major measures of clinical performance have been ratings of clerkship, internship, and residency performance. In situations in which the practicing physician functions as an employee of an organization (e.g., the U.S. Public Health Service, the Veterans Administration), supervisors' ratings of performance have been available. However, since most physicians are self-employed, ratings of post-graduate 47 clinical performance have not usually been available. The new trend in research on physicians' performance on specialty board exams (such as the Maatsch et a1. study discussed earlier), and the study of the medical inquiry process by Elstein, Shulman, Sprafka and others (1978) will provide additional information. In a study conducted at the College of Human Medicine at Michigan State University, Krupka et a1. (1977) studied the prediction of medical students' problem solving and empathy skills. Problem solving skills were measured by ratings of the clerkship student's clinical problem solving skills by both peers (i.e., other medical students) and by clinical faculty, multiple choice tests of clinical knowledge, and diagnostic patient management problems. Variables used to predict these problem solving criteria were the MCAT, premedical GPA, the Watson-Glaser Critical Thinking Appraisal Test, the State-Trait Anxiety Inventory, the IPAT Anxiety Scale, and the Study Habits Inventory. A separate multiple regression was done for each dependent variable. MCAT scores and GPA were entered on the first step and the other predictors on the second step of the regression. Contrary to the findings about to be discussed, moderate and statistically significant multiple correlations were found between MCAT scores and premedical GPA, on the one hand, and the separate problem solving criteria, on the 48 other. After the other predictors were entered into the regression equation on the second step, multiple R's of above .4 were reported for the following criteria: faculty ratings of problem solving, patient management problems, and scores on the multiple choice exams. Similar regressions were carried out on peer and faculty ratings of students' empathy skills. Using only MCAT and GPA as predictors yielded multiple R's of .358 (R2==.128) and .593 (R2==.352) for peer and faculty ratings of empathy skills, respectively. When scales which were developed by the authors to measure empathy skills were added to the regressions, the multiple Rz's increased to .335 and..783, respectively. In contrast to these findings most studies of clinical clerkship performance have reported little or no relation- ship between MCAT scores and ratings of clinical clerkship performance. For example, Best et a1. (1971) reported a multiple correlation of .32 (R2==.10) between the set of predictors discussed earlier and ratings of clerkship performance. Gough et a1. (1963) found non-significant bivariate relationships between MCAT scores and premedical GPA's, on the one hand, and clinical clerkship performance, on the other. Richards, Taylor and Price (1962) analyzed the rela- tionship between interns' MCAT scores and supervising 49 physicians' ratings of internship performance. They found the following correlations: MCAT-V = -.11; Q = -.04; GI = -.04; S = -.l3. These researchers did, however, report the following significant correlations between ratings of internship performance and the following yearly medical school GPA's: Year 1 = .21; Year 2 = .24; Year 3 = .45. Similar relationships were reported by Kegel-Flom (1975). Cumulative GPA was significantly correlated with supervisor ratings (.32), self ratings (.46), and peer ratings (.35) of internship performance for 110 graduates of the University 1 of California Medical School at San Francisco. In contrast to these findings, Korman and Stubblefield (1971) found no relationship between medical school grades and interns' clinical performance. Howell (1966) dichotomized supervisors' comments about the performance of 312 federally employed physicians into "high" and "low" performance ratings. She found no signif- icant differences between the performances of either group on the four MCAT Subtests. Howell and Vincent (1967) in a study of U.S. Public Health Service physicians found the following significant correlations between MCAT scores and scores on written examinations of medical knowledge: V= .47; Q= .49; GI= .36; S= .60. However, no significant correlations were found between MCAT scores and scores on the clinical medicine portion of this same written exam. 50 Their most surprising finding (and the one which is most widely quoted by critics of the MCAT) was the report of a greater than chance number of low but significant negative correlations between MCAT scores (especially on the Verbal and General Information subtests) and supervisors' ratings of clinical performance. Bartlett (1967) followed 49 medical school graduates through medical school and into the beginnings of their professional careers. He failed to find any significant differences in career performance between high and low MCAT scorers. Wingard and Williamson (1973) reviewed 27 studies of the relationships between professional or graduate school grades and subsequent professional performance in medicine and other fields. No consistently strong relationships were found between grades and post—graduate performance in any of these fields. The most consistent patterns in these findings are: (a) the most successful premedical predictor of clinical performance is premedical GPA (studies have not demon- strated the predictive validity of MCAT scores) and (b) predictors of clinical performance correlate most highly with those criteria of clinical performance which have the closest temporal relationship with the predictor. That is, premedical grades and MCAT scores are moderately correlated with problem solving and empathy skills which 51 have been assessed during the clerkship years (Krupka et a1., 1977) but are not correlated with internship perfor- mance (which is measured one or more years later). Medical school grades are significantly related to intership per- formance (e.g., Richards et a1., 1962) but are not signifi- cantly related to post-graduate professional performance (Wingard & Williamson, 1973). Other studies have reported no relationships between cognitive predictors and clinical performance (e.g., Korman & Stubbelfield, 1971). Some of the possible explanations for the lack of strong correlations between predictors and criteria of clinical performance are the following: 1. Lack of well defined criteria. 2. The low reliabilities of clinical performance ratings. 3. The lack of variance in the ratings. 4. The case specificity of the ratings. For example, Elstein et a1. (1978) reported low correlations among the ratings of the performance of practicing physicians on a variety of standardized, simulated cases. In addition to these possible reasons, most of the studies which have been reviewed in this and the previous section have used bivariate correlational techniques to estimate the relationships which were investigated. Two more approPriate techniques for investigation of 52 relationships involving multiple predictors and/or criteria would have been multiple regression and canonical correlation. These methods would have yielded more statis- tically powerful estimates of the relationships among sets of predictor and/or criterion variables. Similarly, the reliabilities of the measures of clinical performance may have been improved upon by forming linear combinations of these individual measures (e.g., Nunnally, 1967). Summary Of the principally used variables for selecting students for admission to medical school, the ones which show the strongest relationships with classroom and lab- oratory performance are: premedical GPA, MCAT Science and Quantitative scores, and the quality of the applicant's undergraduate institution. Two variables which are nega- tively related to medical school performance during the basic science phase are the student's age and extent of previous employment. While statistically significant, the correlations between these predictors and first and second year grades in medical school (i.e., performance during the basic science phase of the curriculum) are generally relatively low in magnitude, and tend to be unstable from year to year at the same school. Pro- ponents of the continued use of these selection variables in admissions decision making attribute the low to moderate 53 magnitudes of the correlations to the restriction in range in the selection variables. This hypothesis has been supported to some extent by the results of the study by McGuire (1977). As McGuire's medical school relaxed its admission criteria, the range in these selection variables increased and their multiple correlation with first-year academic performance increased concomitantly. The selection variable which correlates most highly with clinical performance in medical school is premedical GPA. While MCAT scores correlate surprisingly well with performance on written tests of medical knowledge taken after graduation (e.g., Howell and Vincent, 1967), they do not generally correlate with ratings of clerkship, internship, or post-graduate clinical performance. The most consistent pattern in the findings on the prediction of clinical performance is that cognitive predictors of clinical performance correlate most highly with those criterion measures which have the closest temperal relationships with that predictor. No consistent relationships have been found with non-cognitive predictors of clinical performance. Summary and Discussion Schumacher (1964) factor analyzed written tests of medical knowledge and ratings of clinical performance. Both types of measures loaded on the first or general 54 factor which accounted for 44% of the total variance. Ratings of doctor-patient relationships formed the principal loadings on the second (orthogonal) factor which accounted for 9% of the total variance. Maatsch et a1. (1978) also factor analyzed objective measures of clinically relevant basic science knowledge, clinical knowledge and ratings of diagnosis and case management. They reported a single factor solution in which the general factor accounted for 43% of the total variance. Non-factor analytic studies have demonstrated similar consistency in performance across basic science and clinical subject matter areas, and knowledge and performance domains. Both Gough et a1. (1964) and Sirotkin and Whitten (1978) reported moderate to sizeable correlations among performance in basic science courses, clinical medicine courses, and ratings of clinical clerkship performance. These findings are similar to those which have been reported in analysis of achievement and ability variables in other populations: First, achievement or ability measures tend to be moderately to highly interrelated. Second, when these measures are factor analyzed, unrotated factor analytic solutions typically yield first or general factor on which most of the measures load and which account for about 40% of the total variance of the correlation matrix (Vernon, 1961). 55 In the majority of the studies which have included both objective measures of performance and ratings of (mainly) clinical performance, however, the observed correlations among the objective measures have been higher than the correlations between the objective measures and the ratings. There are three probable reasons for this discrepancy: (a) objective measures are generally more reliable than ratings; (b) written tests generally have larger variances; and (c) perfor- mance on the written examinations may simply be due to test wiseness, ability to memorize, better study habits, or other reasons which some would argue are not truly related to being a competent physician. These critics would argue, therefore, that objective tests are "tapping" these "irrelevant" qualities rather than important knowledge. An alternate view of the structure of complex per- formance has been offered by educational psychologists. Gagne' (1974) and Bloom (1956) have proposed hierarchical models of performance in which performance at one level is hypothesized to be dependent upon the acquisition of the knowledge or skills which comprise performance at lower levels rather than a student's general level of ability. Research by Gagne' and others on mathematics and science learning has demonstrated that the performance 56 structures in these subjects consist of a series of hierarchically ordered skills and that performance of these skills was correlated (Gagne', 1974). Similar find- ings have resulted when Bloom's taxonomy has been studied (e.g., Kropp and Stoker, 1966; Madaus et a1., 1973). It is still possible, however, that this consistency of per— formance across levels is due to the student's general level of knowledge (Ebel, 1969), or ability (e.g., Jensen, 1969; Spearman, 1904). When Madaus et a1. (1973), sta- tistically controlled for a measure of general ability in their analysis, the correlations between performance at non-adjacent levels of the hierarchy virtually disappeared, and the correlations between performance at adjacent levels of the hierarchy were attenuated. Madaus and his colleagues were led to the "compromise" conclusion that performance at one level was partially due to the mastery of the learning process at lower levels and partially due to general ability. Viewed from the perspective of theories of general knowledge or ability the consistency of performance in medical school could be attributed to the student's "general level of ability" (e.g., Maatsch et a1., 1978). Looked at from the perspective of theories of learning or performance hierarchies, clinical performance is based upon knowledge and principles of medical biology and 57 clinical medicine acquired earlier in the curriculum. Thus, students who have more thoroughly acquired these "basics" would be predicted to be more highly rated in their clinical performance. The third conclusion which represents a compromise between the first two is that performance at all "levels" of the curriculum is a joint function of knowledge, skills, and principles acquired at earlier levels, and general level of ability. Most modern American medical curricula still follow the European model: Two years of basic science education followed by two years of clinical training. The problem of selecting medical students has, in practical terms, been reduced to selecting students who will academically succeed in these curricula. Studies which have investi- gated the relationships between typically employed selec- tion variables and later performance in medical school were reviewed. The general conclusions of this review were in accord with the conclusions of previous reviews of the literature in this area. The best predictors of perfor— mance during the basic science years of the curriculum have been: MCAT scores (especially MCAT Science), premed- ical GPA (especially in the sciences), and the quality of the applicant's undergraduate institution. These objeCtive measures of the applicant's academic achievement and apti- tude have been found to have low to moderate correlations 58 with later academic performance (e.g., Best et a1., 1971; Gough, 1979; Gough et a1., 1963; Johnson, 1962). A multi-year study of matriculants recently done by McGuire (1977), however, demonstrated that as the variances of such objective predictor variables increased, their multiple correlation with first year academic performance increased concomitantly. MCAT scores and premedical GPA's were not generally found to be well correlated with ratings of clinical per- formance during the last two years of medical school (e.g., Best et a1., 1971; Gough et a1., 1963), internship per- formance (e.g., Richards et a1., 1962), or post-graduate professional performance (Howell, 1966; Howell & Vincent, 1967). On the other hand, the best predictors of clinical performance were those which had the closest temporal relationship to the clinical performance being predicted. For example, the best predictor of clinical performance during the clerkship and internship years were medical school grades during the previous years (e.g., Richards et a1., 1962; Sirotkin & Whitten, 1978). With the exception of the applicant's age and his or her reported number of hours of outside employment during undergraduate school (both of which have been found to be negatively correlated with later performance), biographical variables have not been found to be 59 consistently related to medical school performance. Similarly, subjective ratings of applicants' personal traits after interviews with the applicant have not been reported to be significantly related to later performance. Based upon the findings of studies reviewed in this chapter, three general research hypotheses can be offered to guide further investigation: 1. The student's performance across different areas of the curriculum (e.g., clinical and basic science) should‘ be consistent. 2. This performance may be structured along the lines of the learning or performance hierarchies proposed by Bloom (1956) and Gagne' (1974). The exact organization of the structure would probably be different for different medical schools. However, a general structure which would be applicable to most or all schools would probably consist of at least two stages: (a) the acquisition of basic science knowledge, and (b) the application of this knowledge in clinical performance. 3. A multivariate relationship between predictors of medical school performance and the performance itself can be hypothesized to exist. CHAPTER III METHOD In order to obtain data to test the hypothesized relationships discussed in Chapter I and at the conclusion of Chapter II, MSU-COM faculty members were requested to provide summary measures of student performance in the classes which they taught. These course performance measures and the medical student samples on which they were taken are described in the next sections. Also included in this chapter are a restatement of the research hypotheses to be investigated and a description of the data analysis procedures to be used. The Sample and the Method Academic and clinical performance data were collected for matriculants entering MSU-COM in 1974 and 1975. Rea- sonably complete data (i.e., grades or other measures of summary performance in at least 75% of the courses for which data were collected) were available for 84 of the 88 students matriculating in 1974 and 96 of the 99 students matriculating in 1975. 60 61 Selected preadmissions characteristics of these students are displayed in Table 3.1 (a complete list appears in Table 3.2). The relatively wide ranges on some of these variables reflect MSU-COM's commitment in its developing years to experimentation with the admission of non-traditional students. That is, the admission of a relatively high proportion of ethnic minority students, students with non-premedical academic backgrounds, and older students applying to medical school for training for a second career. The statistical benefit of these wide ranges in the admissions predictor variables is the increased probability that these predictor variables will correlate more highly with medical school performance than they would in traditional allopathic medical programs (such as those discussed in Chapter II). As discussed in Chapter I, MSU-COM has a three-part integrated curriculum. Courses offered during the first eight terms of the curriculum are displayed in Figure 3.1. During the first two terms of the program (Unit 1) students take mainly basic science courses (e.g., anatomy, physi— ology, biochemistry) as well as courses concerned with introductions to physical diagnosis, osteopathic principles and practice, family, and community medicine. The remaining six terms of on-campus osteopathic medical education (Unit 2) consist of systems biology courses which include basic Table 3.1 Selected Preadmissions Characteristics of Matriculants Class Characteristic 1974 1975 Sample size 84 96 Age Mean 24.3 24.5 Standard Deviation 4.3 6.6 Range 20-42 19-40 Sex Male 71% 75% Female 29% 25% Ethnic status Minority 21% 17% Majority 79% 83% Undergraduate major Biological Science 55% 48% Health related 21% 31% Non-Biological Science 7% 8% Other 17% 13% Premedical GPA Mean 3.01 3.15 Standard Deviation 0.37 0.34 Range (Xiz2 s.d.) 2.27—3.75 2.46-3.83 Premedical Science GPA Mean 2.97 3.09 Standard_Deviation 0.43 0.39 Range (Xiz2 s.d.) 2.11-3.83 2.31-3.87 Premedical Non-Science GPA Mean 3.05 3.22 Standard Deviation 0.41 0.37 Range (Xi:2 s.d.) 2.23-3.87 2.48-3.96 MCAT Verbal Mean 414.20 513.82 Standard_Deviation 85.80 93.71 Range (X122 s.d.) 323-366 326-701 MCAT Quantitative Mean 523.95 555.24 Standard_Deviation 97.49 81.88 Range (Xi:2 s.d.) 334-720 392-719 MCAT General Mean 502.70 511.31 Standard_Deviation 83.62 87.72 Range (x::2 s.d.) 336-670 336-686 MCAT Science Mean 511.71 521.07 Standard_Deviation 96.93 90.05 Range (Xi22 s.d.) 415-609 431-612 63 Table 3.2 Complete List of Preadmissions Characteristics of Matriculants Biographic 1. Sex 2. Application-reapplication (Was the student accepted on his first or succeeding attempts?) 3. Original-alternate (Was the student selected originally or as an alternate?) 4. Age 5. Majority-minority status (Majority = Caucasian; Minority = Other) 6. Marital status (Married or not married?) 7. Military service (Was the student in the military or not?) Residency (Is the student a Michigan resident or out-of-state resident?) Number of schools (How many postsecondary institutions did the student attend?) Course Work 10. ll. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. Undergraduate science GPA Undergraduate science credit hours Undergraduate nonscience GPA Undergraduate nonscience credit hours Overall undergraduate GPA Overall undergraduate credit hours Biology GPA Biology credit hours Inorganic chemistry GPA Inorganic chemistry credit hours Organic chemistry GPA Organic chemistry credit hours Physics GPA Physics credit hours English GPA English credit hours Behavioral science GPA Behavioral science credit hours Medical College Admission Test (MCAT) 28. 29. 30. 31. 32. MCAT Verbal MCAT Quantitative MCAT General MCAT Science MCAT Average 64 Table 3.2--Continued Other 33. 34. 35. 36. 37. 38. 39. 40. Score for admission interviews D.O. hospital experience (Prior to admission did the student have or not have D.O. hospital experience?) D.O. relative (Does or does not the student have a 0.0. relative?) D.O. nonrelative (Does or does not the student have a D.O. nonrelative contact?) D.O. friend (Does or does not the student have a D.O. friend?) Work (Prior to admission how many hours per week did the student work?) Health-related activity (Prior to admission was or was not the student involved in a health-related activity?) Extra-curricular activity (Prior to admission was or was not the student involved in an extra-curricular activity?) 65 .omusoo mo mm>u >5 Edusowuuso :0UIDm: H.m gunman moaommonuuo amoHonucm m mocowom . ionoamm Nam :h 5 9:0 Hm0wc«~0 ucmsmvo>oo m sums a naaouo x mmodosummi h mocowom Hmcwumwucw a Ionohmm «mo :m 0 9:0 Hmowcwnu nouuumo h EH08 1 :0 :m o mocowom >uoc«ua 7. mam New m 9:0 HmoficwHO auoucuwmmum 0 Such m oucmwom no somm>od no van :0 New :h v 9:0 Hmowcwao A .p o m sums v mocmwom mum :0 «mm :m m 5:0 Hmoficwau woodcusoz v Enos 2m so «3 an n are A 3:33 ”memmmwmmw $388.85 m .53. Hmowcwdu UMUoOQOHMEoz Houwcwdu “N was: a nwmocoowo mmoHonumm n Ham :0 «mm :m A 9:0 >¢O~Ownouow: N Enos anewm>nm T. >moHoomauosm (Nuumwaonoo«m :0wumcwadxm meaoumwz cam :0 Hmofim>cm >Eoumsc H shoe amoHofimanm “A van: ocwowomz mmwzmuoumoooum .B:ov wmmuuza uocoflom anewcwao >ooH0wm msmum>m mocofiom venom >uacaaaoo ocwowpo: o>wumdsmw:d: 336m 25233.8 mmuzoo mo um>e 66 and clinical science material related to the major organ systems of the body (e.g., the neuro-muscular system, the cardiovascular system). During Unit 2 students also continue their training in diagnosis, case management, osteopathic diagnosis and treatment, family and community medicine, and psychiatry. The third calendar year of the program (Unit 3) consists of traditional clerkships in community hospitals and ambulatory clinics. During this year the student's time is divided among clinical clerk- ships in internal medicine, surgery, pediatrics, family medicine, and other divisions of clinical medicine. The number and types of courses for which data were collected are shown in Table 3.3. Data were available for 81% of the courses taken by students in the Class of 1974 and for 47% of the courses taken by students in the Class of 1975. The course performance measures reported by the faculty members represent summaries of the student's per- formance in that course which were used to determine the student's Pass/No Pass grade. These course performance measures are described in Tables 3.4 and 3.5. Most of the course performance measures in the didactic courses are weighted averages of the student's performance on objective (i.e., multiple-choice and true-false) exams. Some of the course performance measures from the basic science courses are also composed of ratings of clinical 67 Table 3.3 Achievement Data for Classes of 1974 and 1975 No. of courses for which data No. of courses were collected in MSU-COM Type of course curriculum Class of 1974 Class of 1975 Basic science 9 8 8 Community medicine 8 7 3 Clinical science 8 4 2 Systems biology 8 7 4 Osteopathic manipulation therapy 7 6 3 Family practice- preceptorship _;1 _6_ _j; Total 47 38 22 laboratory skills, and some of the course performance measures from the systems biology courses also include ratings of clinical skills. The evaluation of diagnostic land case management skills in the systems courses is, however, mainly done by describing a case on paper and asking multiple-choice questions about it. Similarly, paper patient management problems are sometimes used in the small group discussion sections of the systems course. Course performance measures from courses in physical examination, and clinical science (now called the Compre- hensive Patient Evaluation sequence) are based upon faculty members' ratings of physical examination and diagnostic Table 3.4 68 Course Performance Measures--Class of 1974 Course Composition of course Course identifier performance measure Physiology PSL 500 Objective Histology ANT 560 Objective Anatomy ANT 565 Objectivei-practical exam Biochemistry BCH 501 Objective Pharmacology PHM 520 Objective Clinical Pharmacology PHM 521 Objective Microbiology MPH 521 Objective+-lab. skills Pathology PTH 502 Objective4-lab. skills Hematopoetic System Hemato Objective Neurology System Neuro Objectivea-video cases Cardiovascular System CV Objectivei-EKG reading Respiratory System Respir Objective Urinary System Urinary Objective Gastrointestinal GI Objective Growth and Development System GD Objective Orthopedics System Ortho Objective Physical Examination Phyex l Objective4-examination skills Physical Diagnosis Phyex 2 Objective4-examination skills . . . Clsci 6 . . . . . Clinical SCience . Objective4-examination skills {CISCI 7} OMTl Osteopathic OMT 2 Diagnostic and treatment DlagnOSIS and OMT 3 skills exams+-ob'ective Manipulative OMT 4 3 Therapy (OMT) OMT 5 exams OMT 6 Community Medicine: Medicine and Society CM 510 Biostatistics CM 512 Medical Jurisprudence CM 513 Objective Health Care Delivery I CM 514 Health Care Delivery II CM 515 Psychopathology CM 5167 Family Medicine FM 632-692 Physicians' ratings 69 Table 3.5 Course Performance Measures--Class of 1975 Course Composition of course Course identifier performance measure Physiology PSL 500 Objective Histology ANT 560 Objective Anatomy ANT 565 Objective + practical exam Biochemistry BCH 501 Objective Pharmacology PHM 520 Objective Clinical Pharmacology PHM 521 Objective Microbiology MPH 521 Objective + lab skills Pathology PTH 502 Objective + lab skills Hematopoetic System Hemato Objective Neurology System Neuro Objective + video cases Integumentary System Integ Objective Endocrine System Endoc Objective Physical Examination Phyex 1 Objective + examination skills Physical Diagnosis Phyex 2 Objective + examination skills Osteopathic Diagnosis and OMT 1 Diagnostic and treatment Manipulative OMT 2 skills exam + Therapy (OMT) OMT 3 objective exams Community Medicine: Medicine and Society CM 510 Objective Biostatistics CM 512 Objective Medical Jurisprudence CM 513 Objective Family Medicine FM 632-642 Physicians' ratings 70 skills as well as objective tests of knowledge of these skills. Similarly, measures from all of the courses in the osteopathic manipulative therapy (OMT) sequence con- sist of ratings of the student's osteOpathic examination, diagnosis, and manipulative therapy skills as well as objective tests on comprehension of the basic science knowledge and clinical principles underlying these tech- niques. Scores from the Family Medicine preceptorships (FM 632 to FM 692) are unweighted averages of preceptors' ratings of the student's clinical skills on ten Likert scale items. All of the objective tests and clinical rating scales used to measure students' achievement and performance were locally constructed by MSU-COM faculty members. The inter- nal consistency reliabilities of the single tests and other instruments which were combined to form the course perfor- mance measures are typical of instructor-made tests (i.e., in the range between .60 to .90). However, since each course performance measure was normally a composite of at least two measures, the reliabilities of the composite course performance measures are higher than the reliabil- ities of the single measures of which they are comprised (e.g., Nunnaly, 1967). Due to the lack (at that time) of a strong faculty development program, the reliabilities of ratings of clinical skills are not as adequate. This 71 is especially true of the ratings of student clinical performance during the Family Medicine preceptorships as these ratings were made by off-campus, volunteer physicians. This is also partially true of the ratings of clinical skills during the OMT sequence. However, since the course performance measures for these courses are also composed of written objective tests, these course performance measures are moderately reliable. In Figure 3.2 (adapted from Tinning, Note 2, Figure 2) the MSU-COM curriculum is schematically illustrated. The percentages in each block indicate the percentage of instructional hours devoted to each major topic during the specified time period. These percentages were arrived at through an hour-by-hour analysis of COM course protocols. As shown in Figure 3.2, as the curriculum progresses from Year 1 to Year 3, the amount of clinical instruction grad- ually increases. The osteopathic physicians teaching in later systems biology and clinical skills courses assume that students have mastered the basic medical biology, and knowledge of clinical principles and skills presented in earlier courses. Hence, these clinical instructors teach more sophisticated principles of diagnosis and patient management which are based upon the rudimentary concepts, skills, and vocabulary presented in earlier courses. Con- comitant with these curricular changes, is a change in the 72 r t L I Terms 1 & 2 : Terms 3-6 I Terms 7 & 8 I L ------- Organ Systems Biology ------- ’5 I 20% Principles I I of . . Medical ! 35% l Appltiftion B' 10109y I I Clinical I Clinical Principles Principles ' 25% l 40% I I Basic | I Clinical “‘- ———————— Clinical SkillS ———————— Skills Development I 30% i 25% j 40% Figure 3.2. Percentage of student contact hours devoted to basic science and clinical instruction (adapted from Tinning, Note 2, Figure 2). 73 mode of evaluation from an assessment of the recall of specific facts and concepts to the assessment of the application of this knowledge to diagnostic and treatment situations. Tinning, Taylor, and West (Note 3) analyzed the types and numbers of clinical experiences in MSU-COM courses offered during 1974 and 1975 (i.e., the first two years of the program for students in the Class of 1974). These researchers looked at clinical content of the courses along two dimensions: 1. What the student does during the learning session, i.e., a. Acquisition of factual material; b. Acquisition of diagnostic data on real or simulated patients; c. Formulation of diagnostic hypotheses; and d. Treatment of real or simulated patients. 2. The mode of instruction a. Didactic presentations; b. Demonstrations of diagnostic or treatment procedures; and c. Hands-on performance of diagnostic or treatment procedures. Tinning et a1. concluded that these two dimensions interacted in the following way in the curriculum: 74 It was found that courses at the beginning of the program stress the acquisition of factual material through large and small group didactic presentations which describe clinical procedures and techniques whereas, courses which occur later in the curriculum shift the instructional mode to demonstration of clinical procedures and the performance of these procedures by the students themselves; while at the same time, the student performance mode is shifting to an increasing emphasis on data utilization and treatment. (p. 2) Using the dimensions discussed in previous paragraphs, it is possible to break down the COM curriculum into roughly three stages: 1. Acquisition of basic facts, principles, vocab- ulary, and skills during the first two terms (Unit 1). 2. Acquisition of principles of clinical medicine, additional basic science knowledge, and clinical skills during the first part of Unit 2. 3. Acquisition and application of more SOphisticated principles and diagnosis and treatment during the later systems biology courses in Unit 2. Using both the hour-by-hour analysis of course content and the Tinning et a1. analysis of the instructional dimensions and content of clinical courses, the courses for which course performance measures are available are classified according to the structure proposed above. This classification is shown in Figure 3.3. 75 Basic Clinical Clinical Course or type of course knowledge principles application Physiology Anatomy Histology Biochemistry Pharmacology Microbiology Pathology Clinical Pharmacology Hematopoetic Neurology Cardiovascular Respiratory Urinary Gastrointestinal Growth & Development Orthopedics Psychopathology Physical Examination + Physical Diagnosis + Clinical Science + OMT l, 2 + OMT 3, 4 + OMT 4, 6 + Preceptorships + + + + + + + + + + + + + + + + + + Figure 3.3. Hypothesized three-factor performance structure. 76 An alternate structure which is based upon course content alone is the following: 1. Basic science courses; 2. Early systems biology courses; 3. Later systems biology courses; and 4. Clinical skills courses consisting of all physical examinations, clinical science, osteOpathic manipulative therapy courses, and family medicine preceptorships. This hypothesized structure (shown in Figure 3.4) partially ignores both the temporal dimension and the hierarchical building upon previously learned knowledge and skills which were incorporated into the first model. That is, the "clinical skills" category has been formed by combining all courses concerned with teaching clinical skills regardless of when they were offered. One advantage of this model, however, is that it allows for a strong test of the hypothesis that performance of clinical skills is unrelated to academic performance in basic and clinical science. A test of the two-factor, clinical and basic science performance model can be made by using the performances structure shown in Figure 3.5. A test of the single factor performance structure obviously implies that all course performance measures should load on a single factor, and thus is not illustrated. 77 Course or Basic type of course knowledge Clinical principles Clinical application Clinical skills Physiology Anatomy Histology Biochemistry Pharmacology Microbiology Pathology Clinical Pharmacology Hematopoetic Neurology Cardiovascular Respiratory Urinary Gastrointestinal Growth & Development Orthopedics Psychopathology Clinical Science OMT Preceptorships + + + + + + + + + + + + + + + + + + Figure 3.4. Hypothesized four-factor performance structure. 78 Course or type of course Basic science Clinical Physiology Anatomy Histology Biochemistry Pharmacology Microbiology Pathology Clinical Pharmacology Hematopoetic Neurology Cardiovascular Respiratory Urinary Gastrointestinal Growth & Development Orthopedics Psychopathology Physical Examination Physical Diagnosis Clinical Science OMT Preceptorships + + + + + + + + + + + + + + + + + + + + + + Figure 3.5. Hypothesized two-factor performance structure. 79 When several measures of performance are available, a research method which can provide a great deal of information about their latent or underlying structure is covariance structure analysis (e.g., J6reskog, 1974; Wiley, Schmidt, & Bramble, 1973). A subset of covariance structure analysis techniques is confirmatory factor analysis. As briefly discussed in Chapter I, confirmatory factor analysis allows the researcher to test specific hypotheses about the latent structure of a set of data and to estimate the relationships among the hypothesized latent factors. One of the simplest procedures which can be done with this very general and flexible technique is to test for what Thurstone called "simple structure." That is, to test the plausibility of an hypothesized model in which the variables are assumed to have large loadings on one or at the most two factors and loadings of zero on the other factors. Using confirmatory factor analysis, the analyst would restrict a variable's loadings on certain factors to be zero and have the program estimate the remaining loadings and the correlations among the hypoth- esized factors. The COFAMM program for confirmatory factor analysis developed by Sbrbom and J6reskog (1976) provides maximum likelihood estimates and standard errors of these estimates for all of the parameters which are estimated by 80 the model. These maximum likelihood estimates are the most probable estimates of the parameters given both the hypothesized model and the observed data. Conventional (or exploratory) factor analysis yields an unrestricted estimate of the factor pattern matrix (i.e., the matrix of factor loadings). However, as discussed in Chapter I, the major disadvantage of this approach is that the solutions are not uniquely identified. That is, when an orthogonal rotation is performed in order to make the results more interpretable, there are a theo- retically infinite number of orthogonal rotations which will result in different but equally mathematically legit- imate factor pattern matrices and, hence, potentially different conclusions about the underlying structure of the data (Lawley & Maxwell, 1971). The major advantage of confirmatory factor analysis on the other hand is that when some of the parameters of the model are judiciously set in advance, the COFAMM program will probably yield a uniquely identified solution. Rules of thumb which are necessary but not sufficient for achieving a uniquely identified solution will be discussed below. The general statistical model for factor analysis is (Joreskog & Lawley, 1968, Eq. 1): x = Af + e, 81 where x is a vector of p measures, f is a vector of k common factors, e is a vector of p residuals which represent the combined effect of specific factors and measurement error (i.e., the "unique variances" of the measures), and A is a p x k matrix of factor loadings. The residuals e are assumed to be normally and independently distributed, and to have a mean vector of 0. They are also assumed to be uncorrelated with each other and with the common factors. The dispersion or covariance matrices of f, e, and x can be represented as 0, Y, and 2. Since the residuals are assumed to be uncorrelated, Y is assumed to be a diagonal matrix whose diagonal elements are the estimates of the unique variances associated with each of the measures. In addition, it can also be assumed without loss of generality that the common factors have unit variances so that the diagonal elements of 0 can be specified as unities. Given these assumptions, the expected covariance or correlation matrix (if the measures have been standardized) given an hypothesized model can be represented as (J6reskog & Lawley, 1968, Eq. 2): Z = AQA' + W, 82 where to review: 2 = The expected covariance matrix given a hypothesized model; A = The factor pattern matrix; 0 = A matrix containing correlations among the latent factors; and Y = A matrix whose diagonal elements are the specific variances associated with each of the measures. A necessary but unfortunately not a sufficient condition for achieving a uniquely identified solution is that at least k2 elements of the A and 4 matrices should be fixed (where k = the number of hypothesized factors). As mentioned above, it can be assumed without loss of generality that the variances of the k factors are 1.00's. Hence, the k diagonal elements of the 0 matrix can be fixed to unities. In addition, fixing at least k-l elements (i.e., factor loadings) in each column of the factor pattern matrix to zero will usually cause the solution to be uniquely identified (Joreskog & Lawley, 1968). (After two preliminary runs using COFAMM on the data for this study, the present author found that failure to fix the diagonal elements of ¢ to unities resulted in a program diagnostic that one of the diagonal elements of this matrix was not uniquely identified.) 83 Using the COFAMM program, models derived from the four performance structures hypothesized above can be tested. That is, the COFAMM program can be constrained to yield the following solutions: 1. A one—factor model. 2. A two-factor model in which the correlation between the factors will be estimated by the program. 3. A three or more factor model. The goodness of fit of the expected covariance matrix to the observed covariance matrix S can be tested by using the likelihood ratio chi-square statistic. ,When the chi-square value is low relative to its degrees of freedom, the model can be said to "fit" the data. The simplest criterion for determining the goodness of fit, therefore, is to compare the chi-square test statistic to its expectation (i.e., its degrees of freedom). If the chi-square is less than its degrees of freedom (i.e., is non-significant), the model can be said to fit the data. However, the problem in the present study as well as in many or most applications of confirmatory factor analysis is to choose the most appropriate model from a number of hypothesized models. In this case, J6reskog (1974) has recommended the following heuristic strategy. Compute the ratio of each chi-square to its degrees of freedom, i.e., 84 2 xi/dfi where i is the test statistic for the i th model and dfi equals its degrees of freedom. Also taking into consid- eration the theoretical aspects underlying each model, select the model which yields the lowest of these ratios. Two additional methods have been suggested for heuristically determining how much additional information is yielded by a more complex model (e.g., a model with more factors) over a simpler model. Since the chi-square statistics for a pair of models are additive, the analyst can test the significance of the decrease in the chi-square associated with the more complex model by testing the difference in the two chi-squares referenced to the difference in their degrees of freedom, i.e., (x2-x2) ~ x2 _ s c (dfs dfc) where x; and x: are the chi-squares associated with the simpler and more complex models, respectively, and dfs and dfC are their respective degrees of freedom. This test is analogous to testing the significance of the increment in R2 (and, thus, the increase in "fit") of a regression model when an additional variable has been added to the regression equation. 85 A descriptive statistic which is analogous to a reliability coefficient has been prOposed by Tucker and Lewis (1973). R. Burt has provided a useful explanation of this technique. The following formulas and explanations of them are adapted from Burt (1973, pp. 148-150). The sum of the squared covariance not explained by the model can be computed as where x2 is the likelihood ratio statistic associated with the hypothesized model; N is the sample size; and df is the degrees of freedom associated with the model. Sim- ilarly, the sum of the squared covariances which are able to be explained by the model can be computed as Zci. Zci. M = __11 -_- 11 o are r (r+1)/2 (i< j) where cij is the squared covariance between measures i and j; and r is the number of rows in the covariance matrix. Thus, the numerator of MO is simply the sum of the squared elements below the diagonal in the observed covariance matrix, S. The expected value of the sum of the squared covariances not explained by the proposed model involving the k hypothesized factors is 86 E(Mk) = l/N Combining the above information, the Tucker and Lewis reliability coefficient can be expressed as (Burt, 1973, p. 150): M __ the amount of covariation explained 5 = 0 MR = by a proposed structure k M - E(M ) the amount of covariation available to' o k . be explained by a proposed structure According to Burt, and Tucker and Lewis, a small value of this statistic is an indication that the proposed model is inadequate to explain the covariation among the observed variables, and that more hypothesized latent variables or factors are required. As more factors are added to the model, the value of 0 will increase asymptotically. The maximum value of the statistic is 1.00 which indicates that the proposed model fits the data perfectly. One of the difficulties with the likelihood ratio chi-square is its sensitivity to large sample sizes. Hence using the probability level of the chi-square as the cri- terion for deciding whether or not the prOposed model fits the data may lead the analyst to accept solutions with one or more theoretically meaningless factors in order to lower the p-value of chi-square to an "acceptable" level. In contrast, the Tucker-Lewis statistic is not as sensitive to sample size. The value of this property of their 87 their statistic is illustrated in its application to Harman's classic physical measurements example (Harman, 1967). Eight physical variables were measured on 305 girls. When the intercorrelations among the measures were originally analyzed using exploratory factor analysis, two factors, "lankiness" and "stockiness," were identified. When the same data were reanalyzed using maximum likelihood factor analysis, the likelihood ratio chi-square associated with the two-factor model had a probability level of less than .001; yet the 0k for the two-factor solution was .934. When a third factor was added, the significance level was still less than .01 and 0 increased to .975 indicating only a slight increase in the amount of covariation accounted by the addition of the third factor to the model. When a fourth factor was added,the p-value asso- ciated with the chi-square statistic was .23 and 0k was calculated to be .994 (Tucker & Lewis, 1973). Tucker and Lewis cite Harmon's comment on the use of the likelihood ratio chi-square statistic as the "sole arbitor" of deciding how many factors to extract (Harman, 1967): This example illustrates the general principle that one tends to underestimate the number of factors that are statistically significant. For twenty years, two factors had been considered adequate, but statis- tically two factors do not adequately account for the observed correlations based on a random sample of 305 girls. However, the third factor (whose total 88 contribution to the variance ranges from 2 per cent to 5 per cent for the different solutions) has little "practical significance," and certainly a fourth factor would have no practical value. (p. 229) Echoing Harman's concerns, leading proponents of the use of maximum likelihood confirmatory factor analysis have emphasized that judgments about the adequacy of the solutions yielded by the technique must be based upon the theory and hypotheses underlying the investigation and the nature of the data being analyzed as well as on statistical criteria (e.g., Joreskog, 1969; Lawley & Maxwell, 1971). In order to have external validity, the pattern of relationships among the different factors (i.e., the phi matrices) should be approximately the same from year to year. Thus, the phi matrices for both the 1974 and 1975 classes should be approximately the same given that the same variables are specified to load on the same factors in both models. In this study, the models of the perfor- mance structures described above will be estimated on the data from the Class of 1974. Once an appropriate model has been selected, the model for the Class of 1975 can be estimated by restricting the same variable to load on the same factors, and the phi matrices for both models can be compared. Once an adequately fitting and theoretically appro- priate model has been identified, composite observed scores 89 on the factor or factors can be easily computed by summing students' standard scores on the course performance measures. The advantages of unit-weighted, linear composites have been discussed by Dawes and Corrigan (1974), F. Schmidt (1971), and Wang and Stanley (1970). The advantage of using standard scores over raw scores in forming the composites are apparent upon examination of the variances of the course performance measures displayed in Table 3.6. As mentioned above, the course performance measures are themselves composites of scores on instructor-made meas- urement instruments. Unlike standardized tests, the scales of these instruments are entirely arbitrary and even change from year to year. As shown in Table 3.6, the variances of these scales for the Class of 1974 range from a low of 3.80 for the course performance measures for the Hematopoetic System to a high of 6,952.06 for Microbiology 521, for a ratio of 6,952.06/3.80 or 1,829 to 1.00. It is well known that measures with large variances will be weighted more heavily in a linear composite strictly because of their larger variances (e.g., Nunnally, 1967). Thus, using raw scores to form linear composites of the variables in this study would clearly bias the linear composite toward performance in the basic science courses which have the arbitrarily larger variances. 90 Table 3.6 Means and Variances of Course Performance Measures 1974 1975 Course Mean Variance Mean Variance Physiology 74.40 70.32 76.55 46.19 Histology 74.28 190.90 71.82 112.97 Anatomy 372.69 909.37 396.68 975.50 Biochemistry 78.13 101.71 72.40 75.09 Pharmacology 79.69 26.24 81.52 23.85 Microbiology 502.222 6,952.06 82.73 27.26 Pathology 56.28 50.28 62.34 27.64 Clinical Pharmacology 77.35 25.65 78.83 22.21 Hematopoetic 28.21 3.80 20.39 3.35 Integumentary -- -- 48.18 13.54 Endocrine -- -- 41.41 11.22 Neurology 50.60 46.28 50.17 45.93 Cardiovascular 106.31 56.54 -- -- Respiratory 226.20 299.46 -- -- Urinary 50.58 74.71 -- -- Gastrointestinal 246.32 409.34 -- -- Growth & Development 79.61 40.68 -- -- Orthopedics 69.00 27.01 -- -- Psychopathology 77.83 64.25 -- -- Physical Examination 88.31 34.50 211.77 157.11 Physical Diagnosis 37.25 17.18 105.91 19.01 Clinical Science 6 34.01 11.46 -- -- Clinical Science 7 33.75 6.29 -- -- OMT l 91.19 19.24 89.33 16.89 Osteopathic OMT 2 84.02 25.83 87.47 41.67 Diagnosis & OMT 3 84.58 21.91 87.72 24.11 Manipulative OMT 4 60.85 11.45 -- -- Therapy (OMT) OMT 5 94.98 4.21 -- -- OMT 6 92.81 46.52 -- -- FM 642 43.61 33.81 42.59 37.56 Family FM 652 43.09 28.56 42.95 28.94 Medicine FM 662 40.93 38.29 -- -- Preceptorship FM 672 42.22 28.57 -- -- Ratings FM 682 42.95 33.41 -- -- FM 692 43.65 29.60 -- -- 91 The relationship between the linear composites of course performance measures and traditionally used pre- dictors of success in medical school can then be analyzed using multivariate multiple regression. In the review of literature concerning the prediction of medical school performance, the following consistent predictors of per- formance were identified: premedical GPA, MCAT Science and Quantitative scores, unweighted average of MCAT scores, quality of undergraduate institution, the student's age, and the extent of the student's previous employment. Cther variables on which data were collected which may provide relatively independent information about medical school performance are measure of verbal or general aptitude and achievement (e.g., MCAT Verbal and General scores, English GPA) and achievement in the behavioral sciences (Behavioral Science GPA). The analysis strategy will be to enter the tradi- tionally used predictors into the regression first and the other predictor variables second. The dependent measures (the linear composites of course performance measures) will be stepped into the regression in the order in which they were listed in the performance structures formulated above. Given previous findings reviewed in Chapter II and given the sizeable variances of the predictor variables, it can be hypothesized that a multivariate 92 relationship exists between the group of predictors and the group of composite course performance measures. The statistical assumptions of multivariate multiple regression are (Finn, 1974): 1. All of the predictor and criterion variables are linearly related. 2. The residuals on the dependent measures are independent and follow a multivariate normal distribution with expected values of zero and a variance-covariance matrix 2. In order to test the assumption of linearity, bivariate scatterplots of selected predictor variables and course performance measures were done. In all of the plots which were done, the pairs of variables were linearly related. These scatterplots simply confirmed the well known obser- vation that most aptitude and achievement measures tend to be related in a predictably linear fashion (e.g., Dawes & Corrigan, 1974). Summary Data on student performance in basic science, clinical medicine, and clinical skills courses were collected from students matriculating in the Classes of 1974 and 1975 at MSU-COM. Depending upon the course, course performance measures consisted of composites of scores on objective (i.e., multiple-choice and true and false) midterm and 93 final exams, evaluations of clinical skills and performance, and in some courses, ratings of laboratory skills. Data on medical school selection variables such as MCAT scores and GPA's were also collected. From analyses of course content, four performance structures were developed. In the three-factor structure (illustrated in Figure 3.4), course performance measures were hypothesized to load on the following factors: 1. Factor I: Acquisition of basic facts and skills, 2. Factor II: Acquisition of clinical principles, and 3. Factor III: Application of clinical principles. This proposed structure resembles the learning and per- formance hierarchies prOposed by Bloom and his colleagues (1956) and Gagne' (1974). An alternative structure which directly tests the relationship of clinical performance to performance in other facets of the curriculum is the following: 1. Basic science knowledge, 2. Clinical principles, 3. Clinical applications, and 4. Clinical skills. In this structure all clinical skills courses, no matter when they were given, were hypothesized to load on the single, clinical skills factor. 94 Two other structures were also prOposed. They were: (a) a two-factor structure in which basic science courses were hypothesized to load on one factor, and the systems biology and clinical skills courses on the second factor; and (b) a single, general factor structure as proposed by Maatsch et a1. (1978). When the researcher knows a priori what structures are to be investigated, confirmatory factor analysis permits the analyst to specify on which factors the measures load and then to test the fit of this hypothesized structure to a set of data (J6reskog & Lawley, 1968). Using the COFAMM program for confirmatory factor analysis (Sorbom & J6reskog, 1976),the statistical models underlying these hypothesized performance structures can be estimated. The COFAMM program provides maximum likelihood estimates of the factor loadings, the correlations among the factors, and the unique variances of the measures. These estimates of the correlations among the factors can be used to answer the principal research questions of the study concerning the relationship between clinical and basic science performance. The fit of the four models can be compared by doing sequential chi-square tests for goodness of fit (J6reskog, 1974). Once the most theoretically and statistically appropriate model has been chosen, the multivariate relationship between the performance factors and typically 95 used medical school selection variables can be estimated by using multivariate multiple regression (e.g., Finn, 1974). In this study both the initial estimates of the factor analytic and multivariate regression models will be done on the data from the Class of 1974. The data from the Class of 1975 will be used only to cross-validate the relationships among the factors in the 1974 model (i.e., the phi matrix from the confirmatory factor analysis) and the multivariate regression equations estimated on the 1974 sample. CHAPTER IV RESULTS In this chapter, the results of the data analyses will be reported and discussed. First, the bivariate relation- ships among the course performance measures which furnish the basis for the confirmatory factor analyses will be described. Second, the results of the confirmatory factor analyses and the estimated relationships among the hypoth- esized performance factors will be reported. Finally, the results of the multivariate regression analyses and the estimated relationships among the admissions predictor variables and the medical school performance factors will be reported. Interrelationships Among the Course Performance Measures It is well known that the observed correlation between two variables is dependent upon the reliabilities of the variables. That is, the maximum observed correlation between X and Y will be less than or equal to the square root of the product of their reliabilties, viz, r 5 /r r (4.1) xy xx yy 96 97 Thus, a low observed correlation between two failable measures may actually approach or even equal the highest possible correlation between them. As such, this restriction on the size of the observed correlations should be kept in mind when evaluating the magnitudes of the observed relationships among the course performance measures. The correlations among the course performance measures are displayed in Tables 4.1 and 4.2. The course performance measures have been grouped according to the factor patterns hypothesized in Chapter III. Several aspects of the cor- relation matrix for the Class of 1974 (Table 4.1) are worthy of note: The largest observed correlations in the matrix are found among the basic science courses. The next highest relationships occur among the systems biology courses, which are also relatively highly correlated with the basic science courses. Performance in the clinical skills courses (Physical Examination, Clinical Science, and Osteopathic Manipulative Therapy (OMT) is more highly correlated with performance in the basic science courses than with perfor- mance in either (a) the systems biology courses or (b) other clinical skills courses. The most likely explanations of these findings is that the course performance measures for the basic science courses were composites of scores on objective tests while the clinical course performance 98 OOH OH OH NN OH OO OH NH ON OM NM HN NN OM OM ON HM 9N ON ON OM O? mH NM ON ON O9 mM ON OM Mv Hv NM NOO In OOH ON ON MN mH 7O NO MH OO HM ON OH ON OH ON MH OH OH mH OO MN mN NN ON mH ON HN OH OH NH mH MH NOO lb OOH OO 9H OO NOI mNI NO VH HO OH ON NH NH MH HN ON OH NH OO NH OH OH NH vH MO OO GO MO MO OOi NO- NbO 8h OOH NN ON NN OH NH mH OM ON NH QH mN HM OO NH HN OO OH mM MM MN HH HN ON ON VN NH NN OM MN NOO :m OOH NH 5H OO NOI OO NOI HO OO OO OH OO OO mO HH HH NH HN OH OH OO OO OO MH MH OO HOI MO OO NmO In OOH VO HOI OH mO ON NO MOI vHI 9O MO OO OHI HH MO OH NO OO HOI MOI MO mOI OH OO GO NN MH OO NOO :h OOH HN OH Ov HN HM HN MN mN MN NN ON ON 5N ON ON OH MN ON ON mN ON MN VH VN OH mH O 980 OOH OO MN MO MO mO OH OO OM OH NH MN HN ON OM OM NM OH MN MN OH NM OH OH vH ON m 9:0 OOH OH MN OM MM OH MN 5N Nv ON MM QM ON ON ON QM Mo mN OM mM ON ON OM OM HM v 9:0 OOH O0 OM OH 9M Om OM Nv QM Nv 9N mv Ov OM Ov OM ON OM Om Om Mv mm mv mm M 9:0 OOH ON QM ON hv OH OM OM mM VH MN Mv 0' mM ON ON Hv He He He Mm Ov OM N 920 OOH vN MM he Mm he OM 9v Oe mM Ov NM Ov mv O? vm Om OM O? Mv Hm Nv H OOH OO MH mN hv OM mM MM HM Nv OM Nv Mv OM mN mN NM ON ON NN mN MHUOAU OOH mM 90 Ov MM Nm O0 O9 Nm Hm Om O0 MM Om Hm vv he Mv we Ov OHUOHU OOH QM Ov ON HM ON ON H9 OM Ov Ov ON mm he Nv Mv mm Nm Ow XO>OA OOH Ov we be OM OM vm OM Mm NM OM Mm Om mM Mv Mv em Ov xmrzm OOH Ov OO OO OO OO Hm Oh 9O he OO OO mm Om 9m Hm vm OOU>OA OOH HO Om Ov Om hm Om Om Ov Om vv MM OM OM mM mM OOHOO OOH mO Hb Oh OO Oh Om Nm HO MO OO OO Nm vm Om 393010 OOH NO Om Mm OO Ov Nm vm Ov OM vv OM OM MM HO OOH Oh vO mh Mm OO vm Om Mm Mm mm Ov HO ZHOO OOH Mb Oh OO mO Hh hO Om VO hm hm OO OHOOOO OOH Oh MO Oh VO Hm Hm Ow Ov Ov HO OHOO¢U OOH OO MO Nh mO VO Hh OO NO Mh QGONZ OOH Om MO Mm hm Hm Ov OM Om OFZIH: OOH MO Mm Mm Mm 5* ON mO ZOO.HU OOH OO NO 9O mm hm 6O OFOA OOH OO Nb mO mO mO OGUH! OOH OO Hm mv Om 2141A OOH Nb Hp MN OUOHO OOH vb Hh 92‘ OOH VO OEOH: OOH Hm>=m NOO NOO NhO NOO NmO NQO O m a. M N H h O XO .3 rmm BOO 00 H0 ZOO ONO m5 ON: IN: 21m 29m :m! 5.3 :8 s OH: HOA— 1&2...— tutu szgggsgggogorm-mrzm AU .cmuflso 350.. «.3603 onsam thHilmousmam: mucoau0uuom amusou O¢Ofi¢ mcoHumHmuuou H.v OHDME 99 OOH OO NH MO MO OO HOI NH HH HH HO HOI Mo OO OH OO ON NO NH NmO Sh OOH OO MO HH OOI HO OO OOI mHI OO MOI OOI mHI MOI OOI NO NO 00 NOO Em OOH NN NM OM OO ON MN OH ON OM HH NN HH OH Hm mN HM M 920 OOH NH OH NH OH NM HM ON MN 00 NO mN HO ON MN OH N 820 OOH NO Om HO ON HM HO NM mM OM OM OM mO MM OM H 920 OOH NO mO ON HO ON NO NO OM mO MM HO NN NO xawmm OOH HO NM MO NM MO OO OO OM OO Hm Hm Nm xmwmm OOH OO Om Mm OO HO HO Om OM NM NO Om OMDOZ OOH Mm MO Om NO Nm NO NM OM Om Om UOOZO OOH OO Nm Om mm Om OM NN OM MO OOEZH OOH Om Nm OM OO HM mN NN mM 0842mm OOH Om mm Om ON ON OM Nm ZOOHU OOH MO mO Nm NM MM OO ms .aMGOOMfio may o>onm MOM mcowumHoHHoo Hoouououon Umumsqmm ooo.H cam. hem. 6mm. mam. moc. Hoe. «cm. 6cm. «am. «no .Aom .>ocom .. ooo.H 66o. mam. ace. mcm. mcm. mom. Hon. com. «on oneaocm In I- ooo.H New. cec. woe. mam. mmm. mod. omm. Hoono> ecu: I- n- .. ooo.H cmm. mmo. «cm. mom. mmm. ago. «no Sooaoem .. u- .. mmo. ooo.H mom. mcc. omm. mom. «mo. ooooeom ecu: 1- -u u- cmc. moo. ooo.H mcm. com. mom. ooc. «no ooooeom nu -- n- n- u: u- ooo.H «co. new. one. cococoeammc .. u- .. ocm. end. and. .. ooo.H moo. mom. naeexm -- I- I- com. mmm. Ham. 1. ccc. ooo.H mom. noameoceuo u- I- .. Hmm. mom. ocm. .. mco. moo. ooo.H monocom canon go «no 03> «no . now «no . 3% 33m . oceno . com 036365 . m . m . com .202 . oem ecu: . 8m 3an moamamm vhmallmmanmwum> cofluouwno pom Houowooum @GOEd msoflumaouuoo mH.v canoe 125 From the squared correlations in Table 4.15, it can be seen that: (a) MCAT Science accounted for an approx- imately equal amount of variation in all of the criteria; and (b) in spite of the part-whole correlation between them, Biology GPA accounted for substantially more variance than did Science GPA on the Clinical Skills factor. The magni- tudes of the squared zero-order correlations are somewhat related to the magnitudes of the standardized regression coefficients shown in Tables 4.14.(although for reasons to be discussed below this is definitely not always the case in regression analyses with correlated independent variables). The standardized regression coefficient gives the estimated change in the dependent variable in standard deviation units when the predictor variable is increased by one standard deviation and the values on all of the other predictors are held constant. In spite of their greater sensitivity to the sample variances of the pre- dictors, the betas are easier to compare and interpret than raw score regression weights because they are expressed in terms of the same unit of measurement. As the degree of correlation (or collinearity) among the predictors increases, the magnitudes of the beta weights become more sample dependent, and hence“ in order to be taken seriously, should be cross-validated on another 126 sample. With these caveats in mind, the sizes of the beta weights can be interpreted as reflecting the amount of "independent" contribution a predictor makes to the dependent variable above and beyond the contributions made by the other predictors (i.e., when the other predictors are held constant). Using this interpretation, it can be seen that (a) the three independent variables contribute approximately equally to the prediction of performance on the Clinical Principles and (b) the Biology GPA is a reasonably good predictor of Clinical Skills performance. The strength of the overall multivariate association between a set of predictor and a set of criterion variables can be estimated by the series of canonical correlations between the sets (Finn, 1974). According to Finn, the canonical correlation is the simple correlation of two random variables (say, w1 and v1) each of which is a linear combination of the predictor and criterion variables, respectively. The weights for these linear combinations are chosen so as to maximize the zero-order correlation between the linear combinations. The weights for the second canonical correlation are chosen so as to maximize the correlation between a second pair of linear combinations (say, w and v2) subject to the restrictions that r = 0 2 wlw2 and rvlv2 = 0 (i.e., that the linear combinations of each set are orthogonal). If the measures are assumed to be 127 standardized to unit variance, the overall percentage of variance accounted for in the p criterion measures by the q predictors is, (4.2) where R: = the i th canonical correlation. The number of canonical correlations which will be yielded by the analysis is the minimum of the number of independent or dependent variables. In this case, with three independent and three dependent variables, three canonical correlations will be generated. The signif- icances of the correlations can be tested by sequential chi-square tests, testing the significance of the last correlation first, the last two correlations second, and so forth. As was the case with the step-down F tests discussed earlier, the logic of this test is to stop testing when a significant result is encountered. The estimates of the canonical correlations and the results of these sequential tests are shown in Table 4.16. As can be seen from these results, all three canonical R's are significant. Applying Equation 4.2 to the canonical R's, it can be calculated that 27.6% of the variance in the three criterion scores is accounted for by the predictor variables. 128 Table 4.16 Canonical Correlations and Significance Tests--l974 Sample Canonical Significance correlation Ri test of Ri x2 Significance 1 .799 1 through 3 96.77 .0001 2 .339 2 through 3 16.00 .0031 3 .275 3 through 3 6.27 .0123 Due to the varying degrees of dependence of the regression statistics on the sample variances of the measures, and on omnipresent sampling variability, it is important to cross-validate these statistics on a second sample when possible. Using the procedure described above, a multivariate regression analysis was performed on the data for the Class of 1975. As in the 1974 sample, the overall test of the null hypothesis of no association between the three predictor and three criterion measures was significant (F [9, 219] = 5.255, p<:.0001). In contrast to the results of the 1974 analysis, the stepwise tests of the contribu— tions of the predictor variables showed that Biology GPA did not contribute significantly to the prediction of the criterion variables above and beyond Science GPA and MCAT Science (F [3, 91] = 9.61, p<:.0001) and Science GPA (F [3, 92] = 6.46, p< .0001) which were both highly statistically significant. 129 The results of the step-down tests are shown in Table 4.17. The last criterion variable which was entered into the regression equation was Clinical Skills assessment. The step-down F for this factor was 2.32 (p<:.08) which indicates a marginal contribution to the regression equation above and beyond the Basic Science and Clinical Principles factors which were entered earlier. In contrast to the results of the 1974 analysis, the addition of the Clinical Principles factor was not significant. In spite of this, the logic of the step-down testing procedure would recommend retention of this measure in the model because the Clinical Skills factor (which was entered later) was retained. Table 4.17 Step-Down Test of Association for Criterion Variables-- 1975 Sample Variable F Significance Basic Science 13.839 .0001 Clinical Principles 0.877 .4564 Clinical Skills 2.325 .0802 The results of the univariate regressions are shown in Table 4.18 and the regression weights are displayed in Table 4.19. In spite of the insignificant contribution of Biology GPA to the regression, the model was not reestimated. 130 Table 4.18 Univariate Regression Statistics for Criterion Variables-- 1975 Sample Multiple Multiple Variable R R2 F Significance Basic Science .558 .311 13.84 .0001 Clinical Principles .334 .111 3.85 .0122 Clinical Skills .500 .250 10.20 .0001 The rationale behind this decision was to allow for a direct comparison between the univariate regression weights for both the 1974 and 1975 samples. (BecauSe of the lack of the significant contribution of Biology GPA, the estimates of multiple R's will not differ a great deal from those displayed in Table 4.18.) The multiple Rz's (i.e., the proportions of variance in the criterion variables accounted for by the linear combinations of the predictor variables) while highly statistically significant were substantially less than those from the 1974 sample. For example, the three pre- dictor variables accounted for 63.4% of the variation in the Basic Science performance in the 1974 sample but only 31.1% (or about half as much) of the variation in the performance in Basic Science in the 1975 group. Possible explanations of these precipitous decrements in the Rz's can be found in the correlations among the predictor and 131 Table 4.19 Regression Coefficients and Standard Errors--1975 Sample Factor Raw score Standardized score/ regression Standard coefficient predictors coefficient error (Beta) Basic Science: Science GPA 28.290 18.78 .209 MCAT Science 0.256a 0.05 .434 Biology GPA 6.119 17.29 .051 Clinical Principles: Science GPA 19.330 16.07 .189 MCAT Science 0.091 0.05 .205 Biology GPA 2.786 14.79 .031 Clinical Skills: Science GPA 29.107a 11.03 .382 MCAT Science 0.107a 0.03 .322 Biology GPA -5.341 10.15 -.079 a(p<:.05). 132 criterion variables, and in the variances and reliabilities of the factor scores. As can be seen from Table 4.20, the correlations among the predictor variables and the factor scores are consider- ably smaller for the 1975 sample than for the 1974 sample. This is likely due, in part, to the lower variances of the factor scores (shown in Table 4.21 along with their 1974 counterparts), and in part to the lower reliabilities of the factor scores for the 1975 sample. Summary This section contains a summary of the results of the analyses. In order to avoid repetition, the results are discussed within the context of earlier findings in the first part of the next chapter. The strongest relationships among the course perfor- mance measures occurred between the measures for the basic science courses, on the one hand, and (a) the systems biology courses and (b) the clinical skills courses, on the other hand. The observed correlations among all of the course performance measures were generally moderate to high. In order to estimate the degree of relationship among the different facets of performance in medical school which were discussed in Chapter III, confirmatory, maximum like- lihood factor analyses were done on the course performance 133 Table 4.20 Correlations Among Predictor and Criterion Variables-- 1975 Sample Basic Science MCAT Biology Variable Science Principles Skills GPA Science GPA Basic Skills 1.000 Principles .719 1.000 Skills .659 .560 1.000 Science GPA .353 .263 .398 1.000 MCAT Science .503 .262 .386 .243 1.000 Biology GPA .370 .252 .334 .780 .359 1.000 Table 4.21 Summary Statistics for Predictor and Criterion Variables-- 1975 Sample Standard a _ Range Variable Mean deviation (X i 2 S.D.) Basic Science 350.00 53.15 (58) 244-456 Clinical Principles 250.00 40.04 (51) 210-330 Clinical Skills 200.00 29.90 (46) 170-230 Science GPA 3.09 0.39 (0.43) 2.31-3.87 MCAT Science 521.07 90.02 (96) 341-701 Biology GPA 3.24 0.44 (0.48) 2.36-4.00 134 measures for the Class of 1974. The four-factor structure which proposed Basic Science, Clinical Principles, Clinical Application, and Clinical Skills factors fit the observed relationships among the measures significantly better than the other models which posited fewer factors. The estimated true score correlations among the fac- tors were both high (i.e., .77 to .93) and statistically significant. Contrary to the hypothesis of little or no relationship between basic science course performance and the performance of clinical skills, the estimated correla- tion between the factors which measured these performances was .927. The correlations among the other factors were also high which strongly suggests that performance across both clinical and basic science areas in the curriculum is consistent. In order to test the generalizability of these relationships, a three-factor structure was estimated on similar course performance measures from students in the entering class of 1975. As in the previous model, this hypothesized structure included Basic Science, Clinical Principles, and Clinical Skills factors. (The Clinical Applications factor was not included in the model due to the lack of data for most of these courses.) While slightly lower than those for the 1974 sample, the estimated true score correlations among the factors 135 were also high (i.e., .71 to .82) and statistically significant. The Tucker-Lewis coefficient for the model was .93, which indicated an acceptable fit of the model to the data. Three "clusters" of independent variables were chosen to be used to predict performance on the four criterion factors. These clusters were: measures of science aptitude and achievement (i.e., Science GPA, Science MCAT, and Biology GPA), verbal ability and achievement (i.e., MCAT Verbal and English GPA), and behavioral science achievement (Behavioral Science GPA). The results of a multivariate regression analysis showed that performance on the criterion measures was significantly related to the science predictors only. The results of the series of sequential, step-down F tests similarly demonstrated that the Clinical Applications factor did not contribute to the regression above and beyond the other three per- formance factors which were entered earlier. Hence, the verbal and behavioral science predictors and the Clinical Applications factor were deleted from the analysis and the regression model was reestimated. The results of the univariate regression analyses showed that the science predictors accounted for 63% of the variance on the Basic Science performance factor; these predictors also accounted for 49% and 42% of the 136 variation in the Clinical Principles and Clinical Skills factors, respectively. Together the predictors accounted for 28% of the variation in the three criterion measures. While the associations between the same predictor and criterion variables were also highly statistically significant in the 1975 sample, the magnitudes of the associations were smaller. That is, the science pre- dictors accounted for an estimated 12% of the variance in the criterion variables in the 1975 sample. Similarly, the univariate Rz's between the predictors and each cri- terion were also smaller than in the 1974 sample, i.e., .31 for Basic Science, .11 for Clinical Principles, and .25 for Clinical Skills. One logical explanation of the shrinkage of these statistics is the decreased variances and, hence, restrictions in range in the predictors. CHAPTER V CONCLUSIONS AND IMPLICATIONS In this chapter the conclusions of the study and their implications for medical school admissions policy and further research will be enumerated. First, the results of the investigation will be briefly summarized and discussed within the context of previous research. Second, the conclusions and limitations of the study will be outlined and the potential generalizability of the results will be discussed. Third, suggestions for further research will be made. Fourth, the implications of the results for medical school admissions policy will be develOped. Discussion and Conclusions The principal research question which was investigated by this study concerned the relationship among facets of performance in medical school. In Chapter III, four models of the structure of performance in osteopathic medical school were identified. These models were formulated on the basis of (a) course content; (b) the distribution of contact hours devoted to basic science, clinical science, 137 138 and clinical skills instruction: and (c) the sequencing of these courses in the curriculum. The models, how they were formulated, and medical student performance data which were collected to test the hypothesized relationships are described in detail in Chapter III. The relationships among the facets of performance specified by the models were estimated by using maximum likelihood confirmatory factor analysis. This technique permits the researcher to specify on which latent factors the observed variables should load, and provides maximum likelihood estimates of the factor loadings, the "true score" correlations among the factors, and the unique variances of the measures. The four-factor model, which included a separate clinical skills performance factor, fit the data best. The correlations among the four factors hypothesized by the model were all high (i.e., .77 to .93) and highly statistically significant. Using data from the entering class of 1975, these relationships were successfully cross- validated. The estimated correlations among the three hypothesized factors in the 1975 sample were also high (i.e., .71 to .82) and statistically significant. These strong relationships among didactic and clinical skills' performance (which were measured by objective tests, and ratings of "hands-on" clinical performance) demonstrated a substantial degree of consistency of student performance 139 across both medical school subject matter domains and measurement methods. These results are consistent with the moderate to high canonical correlations among per- formance in basic science and clinical courses reported by Sirotkin and Whitten (1978) for students in a similarly structured curriculum at another university. The results are also consistent with the strong relationships among scores on objective tests measuring clinically-relevant basic science knowledge and clinical knowledge, and per- formance on clinical simulations reported by Maastch et a1. (1978, 1979) for practicing physicians. It should be stressed that the relationships in the studies described above and in the current study are correlational rather than causal relationships. Should an investigator wish to test for unidirectional links between different types of performance, structural equation or path models would be an appropriate alternative analytic strategy. Unit weighted indices of performance on the four factors were developed by summing T-scores from the courses which loaded on each of the four factors. The results of the multivariate regression analyses showed that science GPA, MCAT Science, and Biology GPA were significantly related to scores on the four performance factors in both the 1974 and 1975 samples. The relationships in both samples were generally higher than those previously 140 reported in similar studies from allopathic medical schools (see review in Chapter II). However, the strength of relationship between the predictor variables and per- formance factors was higher in the 1974 than in the 1975 sample. The probable explanation of this difference in the findings was alluded to in Chapter II: It is well known (e.g., Gough, 1979) that schools of allopathic medicine place a great deal of emphasis on objective measures of achievement and aptitude in selecting applicants. Thus, the range of these predictor variables has been severely censored in allopathic student populations leading to attenuated relationships between these predictors and later medical school performance. The principal conclusions of the study can be briefly summarized as follows: 1. Medical school performance is consistent across both subject matter domains and methods of measurement. 2. Consistent with the results of recently reported studies, basic science and clinical performance are more strongly related than had previously been reported. 3. Given a population with a wide variation in scores on objective predictors of medical school performance, these measures are substantially related to subsequent medical school performance. 141 Limitations of the Study and of the Generalizability of the Results MSU-COM was the first college of osteopathic medicine to be established in the United States in over 50 years, and after its sister allopathic school at the University, the College of Human Medicine, the first new medical school to be established in Michigan in over 50 years. Because they were not as tied to tradition, both of these schools began their existences trying out new curricula, methods of teaching, and admissions criteria. As discussed in Chapter I, the first few classes admitted to COM contained a relatively high percentage of hithertofore "non-traditional" medical students, i.e., higher percentages of minorities, women, applicants with predominantly non-science academic backgrounds, and older students preparing for a second career. These non- traditional students were admitted partly because the admissions committee at the time was not weighting GPA, MCAT scores, and other objective predictors of performance, as heavily as indicators of social commitment, previous health related activities, and motivation toward a career in osteopathic medicine. In 1975, however, the Committee began to place more weight on objective predictors of performance, and this emphasis (while still not as heavy as at most allopathic colleges) has continued to increase. Concomitant with this increasing emphasis, the premedical 142 grade point average of new classes has risen, mean age has dropped, and fewer students with predominantly non- science academic backgrounds are being admitted. The reason which was hypothesized for the strong findings yielded by the regression analyses in comparison to similar results from allopathic colleges was the greater range in values on the predictor variables. Hence, the limits to the generalizability of the results of the regression analyses are essentially related to the range of scores within which applicants are selected for admission to a particular school. If only applicants with high GPA's (e.g., 3.40 or better) and/or high MCAT scores (e.g., 1.5 S.D. above the mean or better) are admitted, these variables will not as reliably predict performance within this range of scores. A second limitation of the study is the following: While the study included explicit and fairly reliable measures of clinical performance, all of these measures (with the exception of the ratings of students' preceptor- ship performance from the last Family Medicine preceptor- ship) were made during on-campus courses and clinical experiences. It certainly can be questioned whether this performance can be legitimately generalized in order to make inferences about clerkship performance, and more questionably, about post-graduate clinical performance. 143 This difficulty in findings measures of clinical performance which generalize across cases has been discussed by Elstein et a1. (1978) who found physicians' clinical problem solving behavior to be disease or case specific. Thus, the low reliability of clinical performance measures across cases and across time may be due to the varied nature of medical practice. Third, this study employed the old MCAT test as a predictor of student performance. Clearly, an important question is whether or not the same result would be obtained from the recently developed, new MCAT test. Until these validity studies are conducted, the generalization of the results of the current study to applicant populations who have been administered the new test should be done with caution. Before the recent implementation of new rating forms, hospital-based physicians were asked to rate third-year students on their clinical performance and attitudes using "global" rating scales. These ten-point scales had very low variances and were highly negatively skewed. Hence, data from them were not included in this study. On the other hand, revised rating forms consist of short descrip- tions of the clinical skills, attitudes, and behaviors to be rated. This revised format asks the rater to identify the descriptor which best describes the student's level of 144 skill, professional behavior, or attitude from among five or six alternatives. While the data from these revised scales have not yet been formally analyzed, it would seem a priori that this new rating format would increase the reliable variance of the ratings of clerkship performance and would probably be worth including in a future study. Another method for assessing clinical competence is through ratings of students' diagnostic and case management behavior with simulated patients. Recent research by Elstein et a1. (1978), Maatsch et al. (1978, 1979), and Tinning (Note 3) has demonstrated the feasibility of this approach. While unfortunately lacking measures of clinical clerkship performance, the study did adequately sample basic science and clinical performance (including osteo- pathic diagnostic and treatment skills) during the first two years of osteopathic medical school. To the extent that other osteopathic and allopathic medical curricula contain approximately the same "balance" of basic science and clinical instruction (this condition would be more applicable to osteopathic than allopathic schools because of the training in OMT clinical skills during the first two years), the results concerning the relationships among clinical and basic science performance should be general- izable to these schools. In addition, results should be particularly generalizable to schools which have organ systems curricula and/or substantial amounts of clinical input during on-campus training. 145 Most of the students in the entering class of 1974 are currently in family or general practice. While the specialty choices of students matriculating in later classes will not be known for another year or more, it would be both interesting and worthwhile to investigate the relationship between both premedical predictors and measures of medical school performance, on the one hand, and specialty choice when it is known, on the other. The "conventional wisdom" among the COM faculty is that stu- dents with higher premedical GPA's and MCAT scores will choose specialty practice over general practice. The variable weights given to these measures in admitting students during the past ten years has conveniently "set the stage" for a longitudinal test of this hypothesis. Suggestions for Further Research Based upon the limitations of the study discussed in the last section, the following suggestions for further research can be offered: Data on clinical clerkship performance from the revised rating instruments, and, if possible, ratings of students' clinical work with simulated patients should be included in a future study. The relationship between premedical predictors and medical school performance, on the one hand, and specialty choice, on the other, could also be profitably studied. 146 Using the conservative method of estimating the common variance between the set of predictor and the set of criterion variables described in Equation 4.2 (c.f., Stewart & Love, 1968), it was found that the predictor variables accounted for 28% of the variation in the criterion variables in the 1974 sample and only 11% of the variance in the performance variables in the 1975 sample. Consequently, it would be useful to inves- tigate the predictive validity of non-academic predictors, such as study habit inventories, problem solving instru- ments, reading tests, and social-psychological instruments designed to measure the applicant's ability to relate to others (e.g., the Krupka et a1. empathy scale, the Myers- Briggs). In this vein, MSU-COM in cooperation with OMERAD has begun the develOpment of a problem solving skills assessment for COM applicants. Of specific use to colleges of osteopathic medicine would be an instrument designed to measure the applicant's interest in and commitment toward a career in osteopathically-oriented medical care in a family practice setting. This scale would, however, require considerable development and validation work so as to avoid "socially desirable" responses on the part of applicants who simply wish to get into a medical school regardless of its philosophy of medical practice. An investigation of unidirectional links among per- formance in the areas investigated in this study would be 147 an interesting adjunct to the present relational analyses which used confirmatory factor analysis. This investigation could be carried out by using the LISREL IV computer program developed by J6reskOg and Sérborn (1978). Implications The most important findings of this study were lack of support for the hypotheses of (a) little or no relationship between clinical and basic science performance and (b) little or no relationship between objective predictors and subsequent medical school performance in populations of students with wide ranges of scores on these measures. As mentioned above, these findings hold few if any impli- cations for medical schools who select applicants from the upper ranges on distributions of applicant talent. The findings, however, hold wide-ranging implications for virtually all medical schools which are attempting to broaden the ethnic, socioeconomic, and student background compositions of their classes. The explanation for this is not often discussed outside faculty and administrative conclaves but must be made explicit for a clear understanding of the issue: Prestigious medical schools, such as Harvard, Michigan, and the University of California Medical School (which was studied by Gough et a1., 1964), virtually have their choice of unquestionably academically qualified minority 148 applicants. On the other hand, other less prestigious schools either nationally or within the applicant's own state must draw from lower areas of the distributions of scores on objective measures in order to recruit more than a token number of minority applicants. This, in combination with the admission of the other types of "non-traditional" applicants discussed above, is why there is now a wider range on both predictor and criterion measures than existed 10 years ago (c.f. McGuire, 1977). To say that degree of skin pigmentation "causes" lower academic performance is to misspecify the casual model. However, variables which may indeed contribute to lower academic performance are clearly confounded with race. That is, poorer academic preparation in high school and college, poorer academic self-concept, poorer study habits, and a number of other variables have been shown to be related to both race and later academic performance. The recent Bakke decision of the U.S. Supreme Court has allowed medical and other professional schools to explicitly consider race as an admissions criterion. In terms of the results of this study, what recommendations can be made for doing this? As discussed in Chapters I and II, medical schools typically have a two-screen. admissions procedure. The first screen is usually based upon some combination of objective predictors of medical school performance. Applicants whose combined scores 149 exceed the school's cutoff point pass the first screen and are invited for a personal interview. Based upon privileged communications with admissions officers and faculty of other medical schools, the author learned that some medical schools consider race as part of the "first screen" by assigning "admissions points" to minority applicants in such a way as to compensate for lower scores on the objective predictors. The results of this study imply, however, that in doing this, a school which admits a wide range of applicants may be admitting minority students whose GPA's and MCAT scores have been "over compensated for" and, hence, students whose scores are predictive or low performance and even failure in medical school. In admitting an applicant with low enough scores so as to predict probable low performance or failure (the applicable cutting scores would have to be empirically determined for each school individua11Y), the school is neither doing itself nor the student a favor. Perhaps a better and fairer way (for all applicants) of explicitly considering ethnicity is to lower the cutting scores on the first screen for all applicants and to assign the "ethnicity points" to minority applicants after they have passed the first screen. The results of this study also suggest an additional revision of the admissions procedure which would benefit 150 all applicants regardless of ethnic background. The results of the regression analyses in this study showed that while a representative set of typically used predictor variables accounted for substantially higher percentages of variance in criterion measures of performance than had previously been reported, the predictors by no means accounted for most of the variance in the criterion measures. This result implies that other predictors (such as those discussed above), while correlated with the typically used predictors, demonstrate a potential for offering additional independent information about criterion performance. REFERENCE NOTES REFERENCE NOTES Hunter, J. E., & Gerbing, D. W. Unidimensional measurement and confirmatory factor analysis. Paper No. 20. East Lansing, Mich.: Institute for Research on Teaching, Michigan State University, 1979. Tinning, F. C. An experimental study investigating the effects of real and simulated clinical training on psychomotor, affective and cognitive variables during real clinical performance of first year osteopathic medical students. Unpublished doctoral dissertation, Michigan State University, 1973. Tinning, F. C., Taylor, J. L., & West, D. B. Analysis of the clinical education program--College of Osteo- pathic Medicine. Unpublished manuscript, College of Osteopathic Medicine, Michigan State University, 1975. West, D. B., Markert, R. J., & Bernier, F. A. Predictors of academic achievement at Michigan State University's College of Osteopathic Medicine. Occasional Paper No. 11. East Lansing, Mich.: College of OsteOpathic Medicine, Michigan State University, 1979. 151 BIBLIOGRAPHY BIBLIOGRAPHY AAMC. 1977-78 AAMC curriculum directory. Washington, D.C.: Association of American Medical Colleges, 1977. Bartlett, J. W. Medical school and career performances of medical school students with low Medical College Admis- sion Test scores. Journal of Medical Education, 1967, 42, 231-237. Best, W. R., Diekema, A. J., Fisher, L. A., & Smith, N. E. Multivariate predictors in selecting medical students. Journal of Medical Education, 1971, 46, 42-50. Blalock, H. M. Causal models in the social sciences. Chicago: Aldine-Atherton, 1971. Bloom, B. S. (Ed.) Taxonomy of educational objectives. Handbook 1: Cognitive domain. New York: Longmans, Green, 1956. Boldt, R. F. Factor analysis of law school grades. (ETS RB-73-42). Princeton, N.J.: Educational Testing Service, 1973. Buehler, J. A., & Trainer, J. B. Prediction of medical school performance and its relationship to achievement. Journal of Medical Education, 1962, 37, 10-18. Burt, C. The relations of educational abilities. British Journal of Educational Psychology, 1939, 9, 45-71. Burt, C. Three reports on the distribution and relations of educational abilities. London: King, 1917. Burt, R. S. Confirmatory factor-analytic structures and the theory construction process. Sociological Methods and Research, 1973, 2, 131-190. Carroll, J. B. How shall we study individual differences in cognitive abilities?--Methodological and theoretical perspectives. Intelligence, 1978, 2, 87-115. Cooley, W. W. Who needs general intelligence? In L. B. Resnick (Ed.), The nature of intelligence. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1976, 57-61. 152 153 Crowder, D. G. Prediction of first-year grades in a medical college. Educational and Psychological Measurement, 1959, 19, 637-639. Dawes, R. M., & Corrigan, B. Linear models in decision making. Psychological Bulletin, 1974, 81, 95-106. Ebel, R. L. Knowledge vs. ability in achievement testing. In Proceedings of the 1969 invitational conference on testing problems: Toward a theory of achievement measurement. Princeton, N.J.: Educational Testing Service, 1969, 66-76. Ebel, R. L. Measuring educational achievement. Englewood Cliffs, N.J.: Prentice Hall, 1965. Elstein, A., Shulman, L. S., Sprafka, S., et a1. Medical problem solving: An analysis of clinical reasoning. Cambridge, Mass.: Harvard University Press, 1978. Erdmann, J. B., Mattson, D. E., Hutton, J. G., & Wallace, W. L. The Medical College Admission Test: Past, present, and future. Journal of Medical Education, 1971, 46, 937-946. Finn, J. D. A general model for multivariate analysis. New York: Holt, Rinehart and Winston, 1974. Flexner, A. Medical education in the United States and Canada. A report of the Carnegie Foundation for the Advancement of Teaching, 1910. Gagné, R. M. Learning and instructional sequence. In F. N. Kerlinger (Ed.), Review of research in education: I. Itasca, Ill.: Peacock, 1974, l-33. Gaier, E. L. The criterion problem in the prediction of medical school success. Journal of Applied Psychology, 1952, 36, 316-322. Goldberger, A. S., & Duncan, 0. D. (Eds.) Structural equation models in the social sciences. New York: Seminar Press, 1973. Goodman, L. A. A modified multiple regression approach to the analysis of dichotomous variables. American Sociological Review, 1972, 37, 28-46. Gorsuch, R. L. Factor analysis. Philadelphia: W. B. Saunders, 1974. 154 Gough, H. G. How to select medical students. Medical Teacher, 1979, 1, 17-20. Gough, H. G. The recruitment and selection of medical students. In R. H. Coombs & C. H. Vincent (Eds.), Psychological aspects of medical training. Springfield, I11.: Charles C. Thomas, 1971. Gough, H. G. Some predictive implications of premedical scientific competence and preferences. Journal of Medical Education, 1978, 53, 291-300. Gough, H. G., Hall, W. B., & Harris, R. E. Admissions procedures as forecasters of performance in medical training. Journal of Medical Education, 1963, 12, 983-998. Gough, H. G., Hall, W. B., & Harris, R. E. Evaluation of performance in medical training. Journal of Medical Harman, H. Modern factor analysis (2nd ed.). Chicago: University of Chicago Press, 1967. Hauser, R. M., & Goldberger, A. S. The treatment of unobservable variables in path analysis. In H. L. Costner (Ed.), Sociological methodology 1971. San Francisco: Jossey-Bass, 1971, 81-117. Hoffman, E. L., Wing, C. W., & Lief, H. I. Short and long term predictions about medical students. Journal of Medical Education, 1963, 38, 852-857. Howell, M. A. Personal effectiveness of physicians in a federal health organization. Journal of Applied Psychology, 1966, 50, 451-459. Howell, M. A., & Vincent, J. W. The Medical College Admission Test as related to achievement tests in medicine and to supervisory evaluations of clinical physicians. Journal of Medical Education, 1967, 42, 1037-1044. Hyman, H. Survey design and analysis. Glencoe, I11.: Free Press, 1955. Institute of Medicine. Cost of education in the health professions (Interim Report). Washington, D.C.: Institute of Medicine, National Academy of Sciences, 1973. 155 Jensen, A. R. How much can we boost IQ and scholastic achievement? Harvard Educational Review, 1969, 39, 1-123. Johnson, D. G. A multi-factor method of evaluating medical school applicants. Journal of Medical Education, 1962, 37, 656-665. J6reskog, K. G. Analyzing psychological data by structural analysis of covariance matrices. In D. H. Krantz, R. C. Atkinson, R. D. Luce & P. Suppes (Eds.), Contemporary developments in mathematical psychology (Vol. 1). San Francisco: W. H. Freeman & Company, 1974. J6reskog, K. G. A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 1969, 34, 183-202. J6reskog, K. G. Simultaneous factor analysis in several populations. Psychometrika, 1971, 36, 409-426. J6reskog, K. G., & Lawley, D. N. New methods in maximum likelihood factor analysis. British Journal of Mathematical and Statistical Psychology, 1968, 21, 85-96. Kegel-Flom, P. Predicting supervisor, peer, and self ratings of intern performance. Journal of Medical Education, 1975, 50, 812-815. Kerlinger, F. N., & Pedhazur, E. J. Multiple regression in behavioral research. New York: Holt, Rinehart & Winston, 1973. Korman, M., & Stubblefield, R. L. Medical school evaluation and internship performance. Journal of Medical Education, 1971, 46, 670-673. Kropp, R. P., & Stoker, H. W. The construction and validation of tests of the cognitive processes described in the "Taxonomy of educational objectives." (Cooperative Research Project No. 2117). U.S. Office of Education, 1966. Krupka, J. W., Elstein, A. S., Molidor, J. B., King, L., Parsons, M., & Son, L. Assessment of empathy skills and problem solving skills as a screen for admission to medical school. Final report to the National Fund for Medical Education. East Lansing, Mich.: Office of Medical Education Research and Development, Michigan State University, 1977. 156 Lawley, D. N., & Maxwell, A. E. Factor analysis as a statistical method (2nd ed.). London: Butterworths, 1972. Maatsch, J. L., Downing, S., Sprafka, S., & Holmes, T. Toward a testable theory of physician competence: An experimental analysis of a criterion-referenced specialty certification test library. In Proceedings of the seventh annual conference on research in medical education. Washington, D.C.: AAMC, 1978, 399-408. Maatsch, J. L., & Elstein, A. 8., et a1. Model for criterion-referenced medical specialty test. (Progress Report on Grant No. HS 02038.) East Lansing, Mich.: Office of Medical Education Research and Development, Michigan State University, 1979. Madaus, G. F., Woods, E. M., & Nuttall, R. L. A casual model of Bloom's taxonomy. American Educational Research Journal, 1973, 10' 253-2620 Markert, R. J. The relationship between grades and clinical competence among first year osteOpathic medical students. Medical Education, 1978, 12, 282-286. McGuire, F. L. Fifteen years of predicting medical student performance. Journal of Medical Education, 1977, 52, 416-417. Nunnally, J. Psychometric Theory. New York: McGraw-Hill, 1967. - Reynolds, H. T. Analysis of nominal data. Beverly Hills, Ca1if.: Sage Publications, 1977. Rhoads, J. M., Gallemore, J. L., Gianturco, D. T., & Osterhout, S. Motivation, medical school admissions, and student performance. Journal of Medical Education, 1974, 49, 1119-1127. Richards, J. H., & Taylor, C. W. Predicting academic achievement in a college of medicine from grades, test scores, interviews, and ratings. Educational and Psychological Measurement, 1961, 21, 987-994. Richards, J. M., Taylor, C. W., & Price, P. B. The prediction of medical intern performance. Journal of Applied Psychology, 1962, 46, 142-146. 157 Schmidt, F. L. The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 1971, 31, 699-705. Schoenfeldt, L. F., & Brush, D. H. Patterns of college grades across curricular areas: Some implications for GPA as a criterion. American Educational Research Journal, 1975, 12, 313-321. Schumacher, C. F. A factor-analytic study of various criteria of medical student accomplishment. Journal of Medical Education, 1964, 39, 192-195. Simon, H. A. Models of man. New York: Wiley, 1957. Sirotkin, R. A., & Whitten, C. F. Relationships between the preclinical and clinical years of medical school: A study of the interrelatedness of several performance measures. In Proceedings of the seventeenth annual conference on research in medical education. Washington, D.C.: AAMC, 1978, 197-201. Sarbom, D., & J6reskog, K. G. COFAMM: Confirmatory factor analysis with model modification. User's Guide. Chicago: National Educational Resources, 1976. Sfirbom, D., & J5reskog, K. G. LISREL: Analysis of linear structural relationships by the method of maximum likelihood. User's Guide. Chicago: National Educational Resources, 1978. Spearman, C. The abilities of man. London: Macmillan, 1927. Spearman, C. General intelligence objectively determined and measured. American Journal of Psychology: 1904, 15, 201-293. Stewart, D., & Love, W. A general canonical correlation index. Psychological Bulletin, 1968, 70, 160-163. Thurstone, L. L. Multiple-factor analysis. Chicago: University of Chicago Press, 1947. Thurstone, L. L. Primary mental abilities. Psychometric Monographs, 1938, 1. Tucker, L. R., & Lewis, C. A reliability coefficient for maximum-likelihood factor analysis. Psychometrika, 1973, 38, l-lO. 158 Vernon, P. E. The structure of human abilities (2nd ed.). New York: Wiley, 1961. Wang, M. C., & Stanley, J. C. Differential weighting: A review of methods and empirical studies. Review of Educational Research, 1970, 40, 663-705. Wiley, D. E., Schmidt, W. H., & Bramble, W. J. Studies of a class of covariance structure models. Journal of the American Statistical Association, 1973, 68, 317-323. Wingard, J. R., & Williamson, J. W. Grades as predictors of physicians career performance: An evaluative literature review. Journal of Medical Education, 1973, 48, 311-322. Wolfle, D. Factor analysis to 1940. Psychometric Monographs, 1940, 3. LIBRARY . Michigan State .‘ 3%}; URI VCI‘SI ty 4]...“ ‘—