‘ r}. .11.... .L2 2... .5 3; . ¢ ! ‘ i... I 3.....1...(e ix;..§. . r1; 3 TI). i. 1'1» .u l .1997... .5 . 3.41.). s ‘ . 2. 1. m V' 1‘...- v ‘l .11 t 3.7. . . :1! 3115...: . 71.30.... ‘ ‘ V . $29. . .. 1.1.79. 1. ’,¥a.y.o.1r..~..: ,4 0115. e.‘ #539 .2 3!....<;..tt:.oi::..1 .. iv . :0. I .l. 0:... 14|.l?.1la‘u 5:914:15: at 2...}. ~ (wwyv-vv\\v-vv ifia¥hfi 9 77401557 lHl"Nil”!lllHllmllllllllHlll"HllllJlllllllllllllllll 293 00777 6804 LIBRARY Michigan State University This is to certify that the dissertation entitled Cross Validity of Authentic- and Proxy-Criterion Regression Methods in the Selection of Veterinary School Applicants presented by Ivan Alexander Stuck has been accepted towards fulfillment of the requirements for Doctor of Philosophy degree in Educational Measurement, Evaluation, and Research Design Major professor Date [X’J‘Iffif MS U is an Affirmatiw Action/Equal Opportunity Institution 0- 12771 PLACE N RETURN BOX to remove thle checkout from your record. TO AVOID FINES retu'n on or before due due. DATE DUE DATE DUE DATE DUE L_ : :l__J l___Jle__ -ll: fi-C J . MSU le An Atflnndlve Action/Ewe! Opportunity lnetltulon emails-9. CROSS VALIDITY OF AUTHENTIC- AND PROXY-CRITERION REGRESSION METHODS IN THE SELECTION OF VETERINARY SCHOOL APPLICANTS BY Ivan Alexander Stuck A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology, and Special Education 1989 ABSTRACT CROSS VALIDITY OF AUTHENTIC- AND PROXY-CRITERION REGRESSION METHODS IN THE SELECTION OF VETERINARY SCHOOL APPLICANTS BY Ivan A. Stuck Several advantages might be gained for admissions departments from the use of a proxy criterion in the development of a predictive multiple regression equation for selecting among candidate characteristics: (1) moderator error might be avoided by restricting prediction to only the immediate group of applicants, (2) predecessor data might not be necessary to develop a precise predictive equation, (3) novel criterions or' predictors might be entered into the predictive equation for immediate use, (4) an interval-style multiple regression procedure might be ‘used idespite graduate level pass/fail grading, (5) the range restriction problem could be avoided, and (6) the data from all applicants could contribute to the reliability of the predictive equation. Admissions and performance data for five cohorts of veterinary applicants were used to compare four proxy criterion methods with the conventional multiple regression approach to the development of a predictive selection equation. The authentic criterion was the graduate grade-point average, while the proxy criterion was an undergraduate pre- veterinary studies GPA. Predictors were undergraduate GPA, admissions test scores, employment ratings, and biographical and other’ data. Prediction factors were developed by selecting on college origin and performance level, by varying calibration sample sizes, and by restricting the use of intercorrelated predictors. T-tests and HANOVAs were used to evaluate mean differences in prediction error among the conditions. When prediction error (in ranks) was transformed to emphasize the error for cases near the cut score, no method differences were significant among any prediction methods and neither was there a year effect. It was evident also that no methods differed from prediction by UGPA alone. When transformed prediction error (actual) for a proxy-criterion method was observed across five years, a year effect appeared among proxy conditions. The significant effect for year, nevertheless, was attributable to exceptional interactions among prediction factors for two of the five annual cohorts. Several improvements to proxy criterion calibration are suggested and the potential for proxy criterion use at other sites and for other graduate programs is discussed. TO THE STUDENTS WHO CHALLENGE CONVENTION iii ACKNOWLEDGMENTS I have deeply appreciated those persons and organizations who helped me in various ways to complete this dissertation. Former colleagues at the Ingham County Department of Social Services, St. Vincents Childrens Home, John Richer, the College of Veterinary Medicine, Bob Simpson of B-O-C Powertrain, and Dr. Juan Olivarez of Grand Rapids Public Schools, provided a means of flexible employment during my doctoral study. Dr. Richard Houang, formerly with our department, was particularry helpful in guiding the development of my initial dissertation proposal. Dr. John Tasker and Pat Lowrie of the College of Veterinary Medicine provided the research environment from which my thesis arose, and further made available the data for my research, and the continuing use of their college facilities. Aleta Zamel provided invaluable assistance in illuminating several subtle features of the data set. My friend Dr. Jeff Mayer, helped to get me started in programming SPSS, and the consultants at the Computer Center assisted me often in the later data analyses. I thank my committee, Dr. Bill Mehrens for his tireless editing and suggestions, Dr. Steve Raudenbush for his statistical expertise, and Doctors Fred Ignatovich and Jim Haf for their insightful advice. I thank Rosalind Goodman for correcting my spelling and other shortcomings. iv TABLE OF CONTENTS LIST OF TABLES O O O O O O O O O O O O O O O O O O O O 0 v1 1 LIST OF FIGURES O O O O O O O O O O O O O O O O O O O O x Chapter I. Introduction . . . . . . . . . . . . . . . . . . 1 Need . . . . . . . . . . . . . . . . . . . . . 1 Purposes . . . . . . . . . . . . . . . . . . 6 Research Hypotheses . . . . . . . . . . . . . 7 Rationale for Research Hypotheses . . . . . . 8 Hypothesis A . . . . . . . . . . . . . . . . 8 Hypothesis 8 . . . . . . . . . . . . . . . . 8 Hypothesis C . . . . . . . . . . . . . . . . 9 Hypothesis D . . . . . . . . . . . . . . . . 10 Hypothesis E . . . . . . . . . . . . . . . . 10 Overview . . . . . . . . . . . . . . . . . . . 11 II. LITERATURE REVIEW . . . . . Part 1: Substantive Review Summary . . Part 2: Theoretical Review Sampling . . . . . . . . Measurement . The Reliability of validity Coefficient Multiple Regression . . . Multivariate and Univariate Analysis of variance and the T-Test . . . . . . . . . 34 eeeee eeeeee eeeee eeeee eeeee eeeee eeeee eeeeeeee eeeeeeee N p III. PROPOSED THEORY . . . . . . . . . . . . . . . 38 General-criterion (GC) Regression Approach . . 39 Using Multiple Regression to Shrink Error . . 44 A.The case where the school variable is absent . . . . . . . . . . . . . . . . . 45 B. The case where the sample has a predominate rule . . . . . . . . 45 C. The case where rules vary within the school . . . . . . . . . . . . . . . . . 46 D. Where prediction is improved for extreme cases . . . . . . . . . . . . . . . . . 46 E. The case where a proxy criterion is used . 47 Proxy-criterion Alternatives to Conventional Prediction . . . . . . . . . . . . . . . . . 48 Local-proxy (LP) . . . . . . . . . . . . . . 49 Chapter General-DICK}? (GP) e e e e e e e e e e e General-criterion-mixed or General-mixed (GM) smary O O O O O O O O O O O O O O O O O O IV. DESIGN . . . . . Population Sample Predictors . . . . Criteria . . . . . Analyses . . . Test of Methods (MANO VATO Sample . . . . . . . . Conditions . . . . . Dependent Variable . . . . Satisfaction of Assumptions Application to Hypotheses . Test of Factor and Year Effects (MANO Sample . . . . . . . . . . . Conditions . . . . . . . . . Dependent Variable . . . . . Application to Hypotheses . ) 5 eeee'fleeeeeeeeeee V. RESULTS . . . . . . . . . . . . . . . . . . . Test of Methods across Two Years . . . . . Contrasts Between UGPA and Local-proxy Estimation . . . . . . . . . . Comparisons Against the General-criterion “athOd O O O O O O O O O O O O O O O O O Test of Years and Other Factors under LP (Local-proxy) . . . . . . . . . . . . . Independent Variable Contrasts . . . . Relative Validity of Non-MSU and MSU UGPAs . VI. DISCUSSION . . . . . . . . . . . . . . . High UGPA Validity . . . . . . . Potential Usefulness of Proxy Methods . . . Potential Improvements to Estimation with a Proxy Criterion . . . . . . . . . . . . How Intercorrelation May Remain Benign . . Extreme Outcomes in Figures 5 and 6 . . . Cautions Regarding Study Realism . . . . Why R2 Wasn't Used as a Measure of Validity Regression Equations Have Superfluous Predictors . . . . . . . . . . Practical Implications of the Study . . . . VI I O SUM! O O O O O O O O O O O O O O O O O O O vi v 90 90 101 103 105 106 107 111 112 115 120 122 122 132 Chapter Page APPENDICES . . . . . . . . . . . . . . . 140 Substitute Merit Values . . . . . . . . . . . . 141 Error Weight by GPA Plot . . . . . . . . . . 143 Correlations: LEWAR-transformed Error by Predictors and GGPA . . . . . . . . . . . . . 146 REFERENCES O O O O O O O O O O O O O O O O O O O O O O O 1 5 3 vii Table 01 02 03 04 05 06 07 08 09 10 11 12 13 14 LIST OF TABLES Page Cross-institutional disparities in predictors, predictor weights, and prediction validities . . . . . . . . . . . . Comparative validity: Ordinary Least Squares, Local-proxy, General-criterion, and college GPA O O O O O O O O O O O O O O O O O O O O O Regression bias factors based on Richards (1982) O O O O O O O O O O O O O O O O O O O Channels by which error may affect calibration methods . . . . . . . . . . . . . Potential systematic error for each calibration method . . . . . . . . . . . . . Potential random error for each of four calibration methods . . . . . . . . . . . . . Potential advantages for each of four calibration methods . . . . . . . . . . . . - Disadvantages of each of four calibration Ramada O O O O O O O O O O O O O O O O O O O Optimal conditions for use of each calibration math“ O O O O O O O O O O O O O O O O O O O The study sample: Five cohorts of veterinary applicants . . . . . . . . . . . . . . . . . Predictors used in levels of the intercorrelation factor . . . . . . . . . . . Confirmation of expected outcomes . . . . . . Contrasts of UGPA prediction against LP prediction O O O O O O O O O O O O O O O O O Method efficiency in logs of absolute rank-error e e e e e e e e e e e e e e e e e viii 14 17 28 62 63 63 64 64 65 67 69 77 81 83 Table 15 16 17 18 19 20 21 22 23 24 25 AI AIII Page Estimate of average contrast between general- criterion and general-mixed transformed prediction error means . . . . . . . . . . . 84 Test of main effect for methods (MANOVATOM) . 84 Test of main effect for years (MANOVATOM) . . 85 Cross-validities for prediction methods . . . 86 Test of main efects for years (MANOVATOF) . 92 T-test of mean error between MSU and non-MSU applicants . . . . . . . . . . . . . . . . . 101 A chart for identifying ideal method prediction factors . . . . . . . . . . . . . 109 Method betas under optimal prediction conditions O O O O O O O O O O O O O O O O O 12 5 Predictor names and their definitions . . . . 126 LP betas by year: full sample . . . . . . . . 127 LP betas by year: 75% sample . . . . . . . . 128 Values substituted for missing merit values . 142 LEWAR transformed error by predictors and GGPA correlations . . . . . . . . . . . . . . 147 ix LIST OF FIGURES Figure Page 1 Relative levels of transformed prediction error among GGPA estimation methods (1984) . . . 88 2 Relative levels of transformed prediction error among GGPA estimation methods (1985) . . . 89 3 Prediction error under local-proxy prediction where the proxy-criterion has been restricted O O O O O O O O O O O O O O O O O O O 9 3 4 When measured by transformed prediction error, local-proxy prediction appears little affected by level of intercorrelation . . . . . 94 5 When measured by transformed prediction error, restriction of the proxy- criterion range only appears to affect prediction for the 1984 cohort . . . . . . . . . 96 6 When measured by transformed prediction error, source of UGPA only appears to affect prediction for the 1984 and 1985 cohorts . . . . 97 7 When measured by transformed prediction error, local-proxy prediction appears affected by level of intercorrelation for the 1984 cohort . 98 8 When measured by transformed prediction error, sample size appears to have little effect on prediction . . . . . . . . 99 AIIa LEWAR/LEWA weights plotted by GGPA rank (n. 88) O O O O O O O O O O O O O O O O O O O O 144 AIIb LEWAR/LEWA weights plotted by GGPA (n. 88) O O O O O O O O O O O O O O O O O O O O 145 CHAPTER I INTRODUCTION Need The efficiency of any process and the quality of its output improves as the selection of its input becomes more and more purposeful. Concurrent refinement of both (1) the criteria identifying output quality and (2) selection of input is additionally beneficial as an approach to quality development. When the process is graduate level education and the outcomes are licensed practitioners, the selection process is administered by a college department of admissions, while the criteria are largely determined by both ( 1) a host of university instructors, and by (2) professional examination boards. In such a context, selection and criteria tend to become estranged. However, because the refinement of selection requires that knowledge of the identity and weights of valid predictors (which reside in the admissions department data) be linked to quality criteria (which remain under the domain of the student records authority) a cross- departmental flow of information is desirable. For many years the logistical difficulty of bridging the departmental offices was an authentic barrier to their reciprocation. 1 2 Today, however, inexpensive and. adequate technology exists for the necessary data storage, integration, and analysis to allow admissions policy to be shaped by data on student performance. An equation which weights and combines admissions data (predictors) to estimate a performance outcome (or criterion) becomes a useful link between pre-program credentials (admissions data) and program performance (student records). Such an equation is known as a linear model. Perhaps the most useful tool in developing’ a linear' model, (or' "selection formula" for present purposes) is the multiple regression procedure. Multiple regression allows the use of past experience to inform present decisions. Given a criterion of quality (e.g. graduate school grades) and a set of application scores, the procedure can select the most predictive variables and weight them to maximize prediction of the criterion. This linear' model ‘which. was optimal as a selection formula for' the original data set. may still remain useful for predicting the future grades of present students. Unfortunately, research in this area has failed to demonstrate consistent outcomes. Typically, selection formulas will differ by location or by year of data studied (see Niedzwiedz 8 Friedman, 1976). For the most part, discrepancies are unsurprising, due to the limited sampling and sample sizes involved (single graduating classes of less than 100 students are typically used). Often, reports 3 mention only a limited number of predictors which were considered significant, leaving the reader to guess what additional predictors may or may not have been tried (see Niedzwiedz & Friedman, 1976: Hart, Payne, & Lewis, 1981: Markert, 1983: and Jones & Thomae-Forgues, 1984). In the past, multiple regression research would demand substantial resources. Collection of admissions data would require increasing administrative costs and organization as the set of predictive variables was expanded. Additionally, prior to the advent of the computerized office, many hours of clerical labor were required for the transfer of both admissions and student performance data from office documents to a usable medium for data analysis. In addition to these obstacles, selection formulas obtained from one year could be notoriously unreliable for predicting performance for a subsequent year. The resource drain projected for a multiple-year regression study was considerable, and few admissions officers could be confident that the advantage would compensate for the loss entailed. Today, fortunately, many of these previous costs have diminished due to the advent of the microcomputer. The evolution of methodological innovations may also provide cost effective improvements in the use of available data. One such potential multiple regression innovation may be the use of proxy-criteria (variables which measure a set of factors similar to those measured by the ideal criterion: 4 e.g. undergraduate grade-point average [UGPA] may serve as a proxy for graduate grade-point average [GGPA]). The implementation of proxy criteria in multiple regression calibration studies may provide an alternate means of estimating a selection formula and extend the use of multiple regression selection. If the error between a proxy and the real criterion is of less importance than the error associated with the potential confounding due to years, location, or other factors, a selection formula calibrated with the use of a proxy criterion may' be preferable for decision making. This author found no evidence that a proxy approach had been studied prior to the author's own pilot study (Stuck, 1986). In that case, the proxy criterion (prerequisite veterinary GPA) allowed substantially' higher’ predictive validity than a selection formula that was calibrated on one year's data from one location. In addition to using a proxy criterion with a sample limited to a single year and location (referred to as "local-proxy method" or "LP"), the proxy may substitute for the criterion in a multiple year and/or multiple location sample calibration (the ”general- proxy method" or "GP") or the proxy criterion may substitute conditionally-- only where real criterion data are absent (the "general-criterion and proxy method" or "GM”) for multiple year and/or multiple location samples. 5 To summarize, proxy criterion estimation could use one of the following forms: (LP) single site, single year proxy criterion multiple regression estimates of betas (predictor weights) (GP) multiple site and/or multiple year proxy criterion estimates of betas (GM) identical to GP except that a proxy substitutes for the criterion only where the criterion measure is absent from the case. Besides allowing the use of additional cases to increase the sample size, use of an adequate proxy criterion in the LP, GP, and GM methods would allow the inclusion of the normal range of applicant ability, thus eliminating the need to correct subsequent weightings for restriction of range. For the LP method (using solitary year and location data), additional variables can be added to the set of predictors in any year. This could allow, for instance, the use of two optional admission tests (e.g. GRE: Graduate Record Exam, and MCAT: Medical College Admissions Test) with confidence that they would both be appropriately weighted in the selection formula. The LP method would also be useful where the necessary data from previous years are unavailable. For neither the conventional calibration approach (GC), nor for the proxy-inclusive approaches (LP, GP, GM) is there literature addressing the nature of admissions prediction error. It is likely that such knowledge could be useful in several ways: (1) if error in predicted scores 6 is random, then the quest for additional predictors might be ill- advised, (2) knowledge of relative levels of random error among methods would be of value for the analyst in choosing a calibration approach, and (3) where moderated prediction error is evident, evidence of its characteristics may assist to control or to reduce such error by the introduction of new variables or transformations. Of particular concern is the nature of variation by years and by locations. Should prediction error appear to be moderated by these factors, then the LP approach may be advantageous due to its year-specific calibration and its option of selecting a different set of predictors. If the error appears to be both random and unilevel (identically distributed) across factors, then the LP approach may provide no advantage over the conventional calibration. Purposes The purposes of this research are to ( 1) create from veterinary school admissions data these four selection formulas: LP- a single year and location formula using a proxy criterion, cc- a conventional generalized formula using an authentic criterion, GP= a generalized formula using a proxy criterion, GM- a generalized formula substituting a proxy criterion only for cases which lack an authentic criterion: (2) to compare them in terms of 7 predictiveness on new applicant cases, and (3) to examine the nature of their prediction error. Research Hypotheses This research is designed to test the following list of propositions (which precedes a subsequent commentary): Hypothg§i§_5. For students falling within a cut score zone, the local-proxy (LP) formula will be more predictive of graduate GPA (GGPA) than will undergraduate GPA (UGPA). Hypothg§i§_3. For students falling within a cut score zone, the general-mixed (GM) formula will be more predictive of graduate GPA (GGPA) than will the conventional prediction model (the general-criterion formula, GC) as corrected for range restriction. . With prediction error as the dependent variable, and with variation controlled with respect to years, methods, sample-size, academic origin, and intercorrelation of predictors, prediction differences among years and methods will be obtained. fiypgtng§i§_n. With prediction error as the dependent variable, with methods limited to the local-proxy (LP)approach, and with variation controlled with respect to years, sample-size, academic origin, intercorrelation of predictors, and level of ability, prediction will vacillate across years. . Non-MSU undergraduates will be associated with greater prediction error. 8 Rationale for Research Hypotheses mm: It is widely held that a previous grade-point-average (UGPA, for undergraduate study) is the best single predictor of future GPA (Mehrens and Lehmann, 1984) . Therefore, given predictor cases limited to a single year and location, the UGPA would be expected to be the most reliable predictor of GGPA (graduate academic performance). Any proposed alternative (such as a formula formulated from the LP regression approach), therefore, must be able to outperform UGPA. Hence, Hypothesis A is to be evaluated by the relative validity of LP prediction against UGPA prediction. In addition, because predictive precision only matters where it may alter the conventional outcome, validity difference (between methods) must be granted more importance as it falls within the cut-score zone (the lower bound of the veterinary doctor achievement distribution). In this instance the weighting is done by a non-linear transformation of the errors of prediction which results in greater importance for the prediction errors for marginal veterinary students. W: Where data extends across years or locations, the conventional GC (general-criterion) regression-formula validity is the standard (where corrected for restriction of range). Hence, for the proxy criterion alternative to prove itself useful, Hypothesis 8 requires that the GM (general-mixed) regression-formula validity must predict cut score proximity cases with less 9 error than does the conventional GC approach (corrected for range restriction). W3 Prediction by regression equation may vary according to specific factors (such as size or selection of the sample used in estimation of the regression equation). To determine the relative importance of such factors, it is necessary to control the influence of each factor. Control of prediction factors can be achieved by the deliberate selective sampling of calibration cases to bias selection formulas in a controlled manner (to deliberately exaggerate error effects), or by other deliberate means. Measurement of this bias is possible by applying the biased formulas to new data and by estimating the prediction error (between predicted and actual criterion scores). By entering these condition-specific prediction error values into a repeated- measures MANOVA procedure, the statistical and relative importance of these factors may be evaluated. Where such a MANOVA procedure is controlled for year and/or method moderators, the emergence of effects for the LP (local- proxy) and GP (general-proxy) methods should correspond with concurrent effects for years: if the dependent variable is not moderated, then the practice of cross-year and cross- location generalization will be unimpaired, and, hence, the conventional GC (general-criterion) approach may be preferable for use. Assuming that calibration factors 10 will not exhibit random influence, MANOVA effects for years and for methods are expected. Wham: The following is consistent with the case of prediction parameters which vary across time: with a MANOVA procedure (1) limited to the local-proxy calibration method, and (2) controlled for year, sample- size, academic origin, redundancy of predictors, and level of achievement, an effect for years is expected. (Under the LP [local-proxy] approach, a larger sample-size and achievement range is possible, thus allowing the inclusion of an additional variable, achievement level, into the study). The likelihood of finding year effects is enhanced (1) due to the larger number of years which may be included, and (2) due to the additional variance available with the use of prediction error reported in interval scale. A moderator effect (such as a year effect) would suggest that (1) additional predictors are required in the regression equation or that (2) blocking is required on years. Blocking (e.g. local-proxy calibration) is a less precise means of control (than the addition of missing predictors), and therefore could be expected to only partly account for variance caused by changes in predictor validities. mm: The importance of the variation in UGPA standards across institutions may be confirmed by obtaining a significant difference in mean absolute error between subgroups which differ on UGPA origin (MSU vs non-MSU) . 11 Another indicator is the contrast between calibration conditions which differ only on the N factor, but this would be a weaker test (M: MSU, Na all). Overview Chapter II will present a two-part literature review: Part 1 will review the use of multiple regression in selection for health science and graduate school admissions, and Part. 2 ‘will review’ theoretical issues underlying the methods used in this study. Chapter III presents the theory being examined by the present study. The relative effects of proxy-criterion use are hypothesized for three multiple regression approaches (LP, GP, and GM) in relation to the conventional multiple regression approach (GC). Also the potential for a systematic (vs. random) nature for prediction error is discussed. Chapter IV outlines the designs for the two main analyses (repeated-measures MANOVAs) conducted within the research study: (1) the test of methods (TOM), a validity test comparing four multiple regression methods (GC, LP, GP, and GM) with respect to prediction error as the dependent variable: the methods each being controlled on three prediction factors (sample size, source of UGPA, and intercorrelation of predictors), and (2) the test of 12 factors (TOF) , a validity test comparing prediction conditions and controlling an additional prediction factor--past academic performance in pre-veterinary courses, while holding methods constant (using the local- proxy, or LP method). Chapter V presents the results of the study. From the methods test MANOVA (MANOVATOH) , the plausibility of five hypotheses will be judged: Hypothesis A, proposing the greater predictiveness of the LP model relative to that of the UGPA for marginal students: Hypothesis 8, proposing the greater predictiveness of the GM model (that conditionally substitutes a proxy criterion value where the authentic criterion value is lacking) relative to that of the conventional multiple regression model (GC): Hypothesis C, proposing effects for years and methods: and Hypothesis E, proposing a greater association of prediction error for students claiming a non-MSU UGPA. In addition, the rank- order of method effectiveness relative to prediction-error will be observed. From the test of factors and years (MANOVATOF), Hypothesis D will be tested (again) to confirm the existence of year effects. Chapter VI offers a discussion of the findings. Chapter VII offers a summary of the research. CHAPTER II LITERATURE REVIEW Part 1: Substantive Review Health Sciences candidate selection (including selection for human, veterinary, and dental medicine) provides an ideal domain for the study of academic selection because (1) the demand for medical education remains fairly consistent, and (2) medical education tends to remain uniform over time. Surprisingly, there have been few multiple regression studies of academic selection in this area, and none that this author has seen report any efforts to validate selection' formulas longitudinally. Niedzwiedz & Friedman (1976) did study academic selection across schools, however. Table 1 shows disparity in the magnitude of correlations between predictors and the four- year veterinary school grades (ranging from r= non-significant to r= .55). More important are the differences among sets of predictors. Assuming that similar scores and ratings are available to each institution for the evaluation of applicants, and assuming that the most predictive variables were reported, prediction appears to be inconsistent across schools. Additional studies (Hart,Payne, a Lewis, 1981: Markert, 1983; and Jones & Thomae-Forgues, 1984) found comparable 13 14 Table 1 Cross-institutional disparities in predictors, predictor weights, and prediction validities Criterion- GPA Schools Predictors rzYearl r:Year2 r:Year4 [Niedzwiedz and Friedman, 1976] A Physics GPA .31 Physics hours Chemistry GPA .50 Extra-Curricular Rating VAT Total Score C Science GPA .55 Academic Rating VAT Science Score D (not reported) NS [Hart,Payne, and Lewis, 1981] E College Science .40 (w/biochem. mem.) E College Science .43 (w/biochem.intp.) E College Science .43 (w/biochm.p.lrn.) E College Science .39 (physiology) [Markert, 1983] F College GPA .39 MCAT [Jones and Thomae-Forgues, (1984)] 8-25 College GPA .41 8822 College GPA .37 8-25 MCAT .41 .37 s-number of schools in study 15 correlation magnitudes (all near r= .40), but nevertheless, failed to demonstrate a reliable set of predictors of medical school performance. Using Class of 1985 data as the regression formula calibration sample (to predict veterinary school performance), and attempting to validate the formula on 1986 and 1987 cohorts, the author obtained estimates correlating 0.49 with Vet School GGPA for 1986, but for 1987 the correlation dropped to a validity of 0.20. For 1987, however, there was a single variable by GGPA correlation as high as 0.41. Clearly, predictors selected for one class via multiple regression procedures may appear to be unreliable across subsequent classes. As is evident under these conditions, the regression formula may appear sufficiently unreliable that admission directors will feel justified in imposing subjective hunches or even prejudice into their selection processes, subsequently resulting in yet weaker and more prejudiced selection formulas. Some efforts have been made to correct for error which contributes to unreliable selection formulas. In particular, attention has been directed towards error that occurs across locations. We know that considerable variation in academic standards exists from college to college. There are also many opportunities for deliberate and accidental transcription errors in the assessment of applicant credentials. Clapp and Reid (1976) improved prediction of medical student performance by weighting UGPA 16 by an index of undergraduate admissions standards. Linn (1966) reviewed research attempting to re-scale multi- standard applicant high school GPAs to a single, standard, scale (HSGPA). Although some prediction gains were observed for zero-order HSGPA X UGPA correlations by the use of a specific-school- adjusted HSGPA (HSGPAS), adjustments had no effect on multiple correlation coefficients where admissions test data were among the predictors. In all cases, where prediction gains were obtained for the validation sample, the prediction advantage shrunk substantially upon crossvalidity testing. The use of the proxy criterion should increase sample size, at the expense of criterion precision: this may be preferable to accumulating potential year or location error from using additional years and/or locations as a means of increasing sample size. Wilson (1982) used UGPA as a proxy criterion in estimating the validity of GRE (graduate admissions test) scores. For chemistry majors (the reported major most relevant to the health sciences), he observed a correlation between UGPA and first year graduate GPA of 0.30 (pooled data for years 1974, 1975, and 1978; n= 574). Stuck (1986) observed that the use of a proxy criterion (a pre-veterinary' UGPA) might provide better estimation of veterinary school GGPA than may the use of a conventional multiple regression approach, because it can control for the potentially confounding effects of year and 17 location. This approach to developing a selection formula is referred to as the LP (local-proxy) approach. An LP selection procedure was carried out retrospectively for a set of veterinary School applicants. Table 2 compares LP (local-proxy) prediction results with outcomes from a GC (general-criterion) model and the (optimal) ordinary least squares (OLS) correlation. The OLS equation predicts at Rols' .66. Because it is an original calibration, its predictors are uniquely selected and their weights are uniquely computed to minimize the squared error for that Table 2 Comparative validity: Ordinary Least Squares,Local-proxy, General-criterion, and college GPA RUN APPLIC CALIBRATION PREDICTORS WEIGHTS r COHORT CRITERION (SCORE X GGPA) OLS 1984 GGPA 1984 1984 .66 LPPVS 1984 PVUGPA UGPA+VARS 1984 .58 LPCUM 1984 UGPA PVUGPA+VARS 1984 .51 GC31 1984 GGPA 1981 1981 .20 UGPA 1984 - UGPA 100% .20 OLS= optimal equation validity for data set LP- Local-Proxy prediction method LPPVS' LP approach using PVUGPA (veterinary prerequisite UGPA) as a proxy-criterion LPCUM‘ LP approach using UGPA as a proxy-criterion GC= General Criterion (conventional) prediction method GC81- GC approach applying 1981 regression equation to 1984 data 18 particular sample. GC31 uses a selection formula calibrated with a GC approach (having fixed predictors and fixed weights) computed from 1981 data which predicts at r- .20 when applied to the 1984 data. The correlation for UGPA, which predicts with UGPA alone, is the same as that for GC81: at 0.20. The LP formulas LPpVS and LPCUM: predicting at 0.58 and 0.51, provide a better level of prediction. The potential advantage of an LP (local-proxy) approach lies in its avoidance of moderator error from uncontrolled year, location, and other confounding effects. Where data is sampled in a non-random fashion-~as is the case with admissions data--the presence of such effects must be expected unless there is substantial evidence to the contrary. Of course, year and location effects are, more precisely, artifacts of the changes in selection as it varies across years or locations. The importance of such moderation has been suggested by previous research. Gender, ethnicity, socioeconomic status, personality, sites, years, and high school rank are variables which have been found to moderate prediction coefficients. Doolittle and Cleary (1987) found that women do worse on math items. Hogrebe, Ervin, Dwinell, and Newmann (1983) reported differential validity among performance prediction models for gender for white (but not for black) ethnic subgroups. McCornack (1983) found white-ethnic subgroup differences for blacks, and Asians. Goldman and Hewitt (1976) found 19 that minority performance predictiveness differed even after controlling for specific program category. Wright and Bean (1974) found. socioeconomic status to moderate prediction for a sample of white urban male college students. Heiner and Owens (1985) observed an association between vocational choice and personality factors, whereas, Gough and Lanning (1986) obtained male and female cross-validity coefficients of r- .38 and r- .36 with the California Psychological Inventory in predicting academic performance. Hakstian and Woolsey (1985) , in turn, found validity coefficients for males and females of r=.39 and r-. 37 with the California Aptitude Battery in predicting an introductory psychology course grade. Outside the health sciences area, Linn, Harnisch and Dunbar (1981) observed differences in LSAT validity for sites and years, additionally concluding that one cause appeared to .be variation in grading as opposed to variable aptitudes. Goldman and Hewitt (1975) likewise found evidence of grading variability. Particularly, they observed an adaptation of grading standards relative to the ability range of the lower two-thirds of the class. Humphreys and Taber (1973) concluded from their postdictive studies that variation in grading standards best explained non-linear semester grade by GRE relationships. In part, this may be explained by differential attrition from academic disciplines (Loeb 8 Bowers, 1973) due to, in turn, 20 differential cross-discipline grading (Thornell & McCoy, 1935). More recently, Elliot and Strenta (1988) were able to improve the validity of UGPA for prediction by adjusting scores according to departmental standards, thus reducing prediction bias for race and gender groups. McCornack and McLeod (1988) also reduced gender-related prediction bias by controlling for subject matter. Although Sawyer and Maxey (1979) found stable prediction over a four-year span in the prediction of UGPA from .ACT (American College Testing) scores, Sawyer (1986) later found UGPA variation a major source of prediction bias, accompanied by the lesser sources of age, gender, and race. Wood and Langerin (1972) found that high school rank moderated prediction for high ability students. It is recognized that where differential validity is inferred from discrepant correlation coefficients, the cause may often be artifactual due to (1) sampling error, (2) measurement error, or (3) the variability of the sample studied relative to the variability of the sample to which the equation is to be applied (commonly called the restriction-of- range problem, see Mehrens and Lehmann, 1984). Thus some of the preceding findings must be interpreted with caution, due to the uncertainty regarding control of artifactual effects. 21 Summary Mederation of selection formula validity by sites finds some confirming evidence in the health science education literature. Moderation across years, however, is more difficult to evaluate through literature review due to a dearth of longitudinal study of regression equation validity. The author's longitudinal study of the generalizability of a single-year, single-site, equation over years, found the validity to be poor. For one year, the use of a portion of the UGPA as a proxy variable allowed the calibration of a more valid regression equation for selection. Such an outcome may have been possible because of the presence of moderating factors associated with years. The literature reporting moderating factors is quite extensive. If use of proxy criterion regression avoids moderation by years, it nevertheless remains somewhat less valid. Part 2: Theoretical Review Sampling Sampling theory provides justification for drawing inferences from samples under certain conditions. Suppose that a population exists ‘who share some independently acquired mutual attributes and characteristics but who differ on other attributes and characteristics. If samples are drawn in large enough numbers and in a random manner, we are confident that: ( 1) randomly sampled, independently 22 acquired characteristics of the sample can be inferred to the population as a whole, and (2) randomly sampled, independently acquired characteristics of the sample can be inferred to any other large random sample of that particular population (these follow from the central limit theorem, see Huntsberger and Billingsley, 1973, pp. 131-134). Ross (1988) cites Kish fer classifying samples as (1) experimental, (2) survey, or (3) investigative, based largely on the quality of the sampling. An experiment provides deliberate treatment with control of extraneous variables by randomization or other' means. A survey selects randomly from a defined population in which each member has a specific probability of being studied. In the investigation, however, control is the least. Sampling is by convenience with neither randomization nor probability sampling. The study of admissions data falls under this latter category. Where cases are not sampled randomly, but are selected according to their value on a particular variable (let's say “selected on IQ" [scholastic aptitudej), observed correlations between ‘that. selected variable (IQ) and another (say, academic performance) may be lower than would have been the case if the sample had been sampled randomly (an additional instance of the restriction-of-range problem). If therefore, selection is on the dependent variable, or if the regression uses standardized variables, 23 regression /validity coefficients for selected data will be artificially low (Richards, 1982). This is the case in sampling to produce a multiple regression selection formula, wherein only selected applicant cases include a GGPA/ criterion. Because the validity coefficient reflects the proportion of true variance to error variance: rxy - sZT / (szos s2T+sZE), a reduction in true variance resulting from selection of a restricted variable range leaves the error variance intact thus reducing the proportion of true variance to error variance. This is an instance of artifactual error, because the proportion of error is inflated due to the improper sampling procedures used. Other error is due to the vagaries of the sampling process. Nuisance variables (Kirk, 1982), confounding variables, and moderator variables (Allen & Yen, 1979) are common labels referring to another factor which may reduce prediction validity during the sample selection phase of an investigation. Inasmuch as all members of a population will not equally share access to, nor interest in, graduate admissions: certain papulation traits and characteristics may be over-represented in a non-random sample of applicants. When such unspecified and uncontrolled-for variables affect performance on the dependent variable, an additional source of error is imposed on the investigation. 24 Two hazards accompanying the use of non-randomly selected admissions data are therefore: (1) restriction-of- range artifacts and (2) confounded variables. Measurement Every measurement can be best regarded as an estimate which includes unknown components of two types of error: (1) unsystematic and (2) systematic. Unsystematic error randomly increases and decreases the measurement ‘value which is observed (relative to the true value of the object or process being measured). Given a large number of measurements, however, the positive and negative errors tend to cancel, leaving a mean value that is virtually the true mean for that set of measurements. Systematic error affects the recorded measurement value in a consistent way (such as always mistakenly using a meterstick instead of a yardstick): regardless of the number of measurements taken, the error remains in the computed mean as well as in the individual measures. Nevertheless, if the nature of the systematic error comes to light, individual measurements and group statistics may be corrected. Measurement error generally refers to the random kind of error, whereas, systematic error in the measurement is an unaccounted for factor which has a nonrandom influence on the observed scores. If the systematic error is due to instrumentation or procedures, the factor may be called an "artifactual factor". Otherwise, the systematic factors 25 will be attributed to uncontrolled variables in the real data. Of course, errors also differ in level (or magnitude) 2 error that doesn't differ in magnitude across samples is known as identically distributed error (unilevel), whereas error that does differ across samples (is multilevel) is known as moderated error. In the context of a distribution, level of error is known as error variance. If error variance is multilevel (or heterogeneous: variance differs across factor levels), it is said to be moderated by that factor. Where error variance is multilevel and, in fact, correlated with the levels of the factor, the error is systematic-- a special case of moderated error. Where it can be determined that error has systematic or random qualities, the possibility of controlling the error becomes more feasible. The Reliability of Validity Coefficients The effect of error on correlation coefficients is more complex than is its effect on observed scores. With no error, the bivariate correlation is a consistent maximum: the coefficient of the two latent traits. To use a biological analogue, error may be likened to a parasite that invades a "host" variable. From a maximum, latent- trait correlation value, coefficients decline in value as greater levels of error affect the observed scores. This is always true for random error and is virtually always 26 true for systematic error (the exceptional cases being (1) where error adds a constant value to its host variable, or (2) where systematic error is perfectly correlated with its host variable). Where the error in the observed scores is random, or where the error is systematic relative to its host variable, an unbiased estimate of the expected population coefficient can be computed. It can be computed with precision, moreover, if from a large sample: of course, the resulting correlation coefficient will be attenuated from the latent trait coefficient value due to the random error. Where error 'varies systematically relative to external influences, however, the computed estimate of the expected population coefficient may be inaccurate in some consistent fashion (biased). The biased estimate of the correlation of the latent traits would, therefore, require a correction of the observed-score correlation. It is common for error to have attributes of both systematic and random error. It may appear to be normally distributed as in the case of random error, yet also prove to be reducible by the addition of variance controls. This would be the case of a moderating variable (such as year or location) where error may vary across units of the variable (e.g. times or sites) in either a systematic or a random fashion. By blocking on potential moderator variables (see Neter, Wasserman, & Rutner, 1985) or by using other 27 statistical controls, the moderated error can be removed or reduced. The restriction-of—range problem is analogous to the problem of unreliability. Both can be accounted for in terms of the proportion of true variance to error variance. Bouh unreliability and range restriction are reflected in coefficients which are reduced when the proportion of error variance increases. By reducing the range of the variation in the sample of scores (as a consequence of selecting candidates via a cut-score criterion), the proportion of true variance is decreased, and, as a consequence, the proportion of error variance is directly increased (even with no change in the amount of error variance). If the error variance is substantially eliminated, the coefficient approaches the latent-trait value (in the case of parallel tests, that value should be one, though in the case of latent traits, the value could range between positive one and negative one). Linn and Hastings, (1984, p.166) provide a good discussion of the range restriction issue. Variation in range only affects raw-score regression coefficients when the dependent variable range is subject to variation between the calibration and the application samples (Richards, 1982), although the precision of this unbiased estimate depends heavily on a large sample size. Richards (1982) discusses the data characteristics displayed in Table 3, which result in error artifacts under (1) raw-score and (2) standardized regression/ 28 correlation. Most notably, raw-score regression coefficient estimates are unbiased by measurement error in the dependent variable, whereas. standardized regression coefficients are unbiased by variation in scale units. Neither type of regression is immune from bias due to measurement error in the independent variable. Although artifactual error in computed statistics may often be reduced through the use of various correction formulas (e.g. for unreliability or for range restriction), these corrections, nevertheless, are limited by the analyst's ability (1) to identify the affected variables (2) to determine levels of variance or reliability under other circumstances. Furthermore, a ”corrected" coefficient cannot be assumed to be completely accurate, and may be expected to be conservative (see Linn, Harnisch, & Dunbar, 1981). Table 3 Regression bias factors based on Richards (1982) BIAS FACTOR TYPE OF REGRESSION Raw-score Standardized Units of measure differ XX Dispersion of independent variable xx Unreliability of dependent variable xx Dispersion of dependent variable XX xx Unreliability of independent variable xx xx Norm referent scale xx xx Change in test length xx xx Selection on meditating variable xx xx 29 Correction of range restriction for a standardized regression equation requires the correction of each partial coefficient. Correction of the raw-score regression model is difficult. because it requires the generation of a constant. in addition to the transformation of partial coefficients into "b" weights (raw score coefficients). Some important implications of this theory for selection need to be considered: (1) If a sample is not randomly drawn its statistics will, nevertheless, represent its population as a whole if all of its relevant characteristics are invariant from member to member (for example, all Girl Scouts are invariant with respect to gender and relatively invariant with respect to age). (2) If a sample is not randomly drawn, its statistics will also represent its population if the sample is large enough and if the subjects happen to be representative: automobile drivers are random relative to gender and political party preference: ten cars in a line that are picked from a public parking lot may not reflect population composition accurately, but a few hundred cars picked as a block from a parking lot may represent population composition quite accurately relative to gender or political preference. However, sample correlations are apt to be biased due to moderated error (non-random samples tend to systematically select certain subgroups). 30 (3) In practice, artifactual differences can be anticipated. Because many factors may be predictive of a particular kind of human performance, one must assume that (a) people may perform at a similar level even though they differ with respect to particular attributes (abilities on several factors may compensate for deficiencies on other factors): and (b) for a given year or location, non-random pressures (self- selection or other non-random selection) must be expected to favor particular factors/attributes resulting in samples which are systematically different from the population as a whole. For instance, a change in requirements for admission to human medicine programs may affect the rate and quality of applications to veterinary medicine. Sample subgroups may differ in quality and level of preselection prior to inclusion in the sample, due to either self or institutional selection. Aggressive students may be over-represented due to self-selection, and range widely on required aptitudes while students with high verbal skills may be over-represented due to institutional selection and they may range ‘very little on required aptitudes. The effects of these disparities are (a) to create the appearance of a differential validity of any fixed selection formula for the various applicant subgroupings (e.g. verbal, aggressive) and (b) to create the appearance of a differential selection formula validity across samples (e.g. applicants of different years or 31 locations may differ in their subgroup structure, see Linn, 1983: and Linn 8 Hastings, 1984). Nevertheless, the 'validity difference would be largely artifactual, a consequence of range- restriction due to selection. Differential validity may actually exist independent of the artifactual manifestation, of course; however, a more insightful conceptualization is to attribute this particular validity problem to model misspecification (i.e., a selection formulation lacking in one. or :more important variables, such as origin of UGPA). In short, under the uncontrolled conditions of an investigation-level study such as the analysis of graduate admissions data, the likelihood of moderated/ confounded prediction across years or locations is substantial. The result of moderated prediction may be biased regression and validity coefficients. Multiple Regression The usual mathematical procedure used in multiple regression yields what is called the ordinary least squares (OLS) estimate of the criterion (or simply the least squares estimate). This term means that the sum of the squared prediction errors is minimized for the data used (see Neter, Wasserman, & Kutner, 1985) . The OLS linear model that is calibrated, however, is truly OLS only with respect to the specific combination of the calibration predictors and the calibration criterion. The OLS 32 correlation (coefficient- ROLS) is the optimal correlation of predicted and actual scores because of the following: (1) scores are calculated by applying the calibrated linear model back onto the calibration predictors, (2) the formula (the linear model) was specifically developed to predict the same criterion data: the subsequent correlation of ”ideal” criterion and "ideal" estimated criterion scores is optimal, (3) the OLS correlation is apt to be inflated, to some extent, due to the chance correlation of error with the criterion: when this chance correlation melts away during the application of the linear model to a new sample, the decrease in the coefficient upon cross- validity measurement is called shrinkage. Multiple regression assumes that (1) responses on the dependent variable are independent, that (2) the variance is constant across cases, that (3) the errors are normally distributed, that (4) the system being modeled is in a steady-state equilibrium, and that (5) errors are uncorrelated (Kenny, 1979, pp.50, 51). If all of the predictive factors are perfectly represented by the set of predictors and the criterion, the calibrated model will be optimal (although only a perfect correlation if in a totally determined situation where the latent trait correlation is one). Otherwise, as Deegan (1976) and Pedhazur (1982) explain, the following will be true: (1) where superfluous factors are included among the predictors (Deegan's overspecified model), unsystematic 33 error will be added to the predicted scores when the model is applied, thus causing an underestimate of the validity coefficient of the prediction scores (attenuation due to unreliability); (2) where some factors are omitted from the set of predictors (Deegan's underspecified. model which includes the case of omitted independent factors: Deegan, p. 238) , systematic error will be added to the predicted scores (this problem can be overcome only by providing the missing predictor data): (3) when a combination of these two situations exists, the model is said to be misspecified (misspecified models have biased parameter estimates which exhibit an interactive character, Deegan, p.238; obviously, without evidence that all predictive factors are appropriately represented, all practical models must be assumed to be somewhat misspecified); (4) where important predictor variables are highly correlated, multicollinearity is said to exist. Pedhazur (1982) points out that there is differential use of the term, multicollinearity, but its problematic manifestation is biased predictor weights. This problem results in systematic error being added to predicted scores when the calibrated model is applied to non-calibration data (new data). Much work has been done to develop alternate regression techniques to cope with the problem of multicollinearity. Unfortunately, most of these techniques are helpful only in the most severe circumstances (Huberty 8 Mourad, 1980: Morris, 1986: Cattin, 1981). Kenny (1979) 34 lists three characteristics associated with low predictor error of measurements These characteristics are also associated with reduced problems of multicollinearity: (1) a high reliability, (2) a low regression coefficient, and ( 3) a low predictor intercorrelation. He also adds that multicollinearity decreases as the number of predictors decreases relative to the number of cases. If, therefore, certain data characteristics exist, the regression procedure will yield an equation which will specify an efficient means of weighting several variables in order to estimate a criterion. Regression equations may suffer from either too few or too many predictor variables. A lack of predictors results in biased regression coefficient estimates, while too many predictors makes the coefficient estimates less reliable. High predictor variable intercorrelation may also bias coefficient estimates although this problem is less severe for predictors with high reliability and/or a moderate to low correlation with the criterion. Multivariate and Univariate Analysis of Variance and the T-Test The t-test is a special case of the more general ANOVA; therefore they share similar theoretical assumptions. The ANOVA procedure yields a ratio of the variation of means to the variation of simple scores. Under the null hypothesis of no effects for the levels of a factor studied, the ratio 35 will be approximately one (1:1) . Otherwise the variation of means will result in a ratio greater than one thereby suggesting the implausibility of the null hypothesis. The analysis of variance requires an interval-level dependent variable and a nominal-level independent variable with at least two levels (a one-way ANOVA) . Where there is more than one independent variable, the ANOVA may be two-way, three-way, or etc. If two or more multi-level independent variables are in the ANOVA design it is classified as a factorial design. The ANOVA procedure makes certain statistical assumptions of the data being analyzed: where these assumptions are violated, ANOVA findings may be less valid. Under all circumstances the dependent variable scores must be independent of each other. And where the sample-sizes differ per condition (cell-sizes), the variances must be equivalent. Violations of other assumptions tend to be less important (see Kirk, 1982, pp. 74-79). Another way of looking at the issue of the independence of the dependent variable scores is in terms of accounting for variance. When sampling is less than random, dependent variable scores may not be independent. If dependent variable scores are correlated, there is apt to be a variable whose control would result in independent scores. The question becomes, therefore, whether or not the important factors have been controlled in the study's design. The answer to this question requires a rational 36 analysis of the potential causes of score variance and the adequacy of the study's design to differentiate such variation. For example, where the dependent variable is a performance score, it is critical that there be no overlap in respondents' between-condition samples. However, where the dependent variable is error-due-to-method, as in the present study, variance in the dependent variable will not be substantially affected by randomly overlapping samples of respondents whose scores are fixed prior to the experiment: the systematic error will be virtually a consequence of the, mathematical transformations attributable to the methods. If, however, the sampling is restricted from certain levels of population ‘variation (e.g. particular years, locations, ability levels), the restricted variables need to be included in the experimental design as independent. variables: otherwise the dependent variable is likely to be dependent on one or more unspecified moderators and ANOVA validity will suffer. Where cell variances are unequal, it is important that cell sizes be equal. If cell variances and sizes are approximately equivalent, the .ANOVA. validity' should be acceptable, particularly where the sample sizes are large. Extending single dependent variable analysis of 'variance to the multivariate case (MANOVA), additional assumptions must be met in order to make valid statistical inferences. Tabachnick and Fidell (1983, pp.231-235) include the following assumptions and requirements: (1) 37 homogeneity of covariances replaces its ANOVA analogue, equality of cell variances: (2) the number of cases per cell must exceed the number of dependent variables: (3) the dependent variables should exhibit a multivariate normal distribution: (4) there should be no outlier cases: (5) all dependent. variables and covariates should share linear relations: and ( 6) dependent variables should exhibit an absence of multicollinearity. In short, the t-test, ANOVA, and MANOVA, test between group variation by the variation of simple scores in order to conclude whether variation between groups exceeds limits acceptable for the null hypothesis. Independence of responses and the equality of either cell sizes or variance are the critical assumptions. As this procedure is extended to the multivariate situation, some additional requirements become important. CHAPTER III PROPOSED THEORY The term YQGGPAi in Equation 1 represents the estimated GGPA (graduate school GPA) for student i. For a conventional GC (general-criterion) multiple regression calibration of pooled cross-year and/or cross-location samples, the selection formula is identical to equation (1) where GGPA has been regressed on mutual predictors (cross- year/cross-site calibration can utilize only those predictors which are mutually available: exceptional predictors must be discarded): Y.GGPA1 -fiO+ fiUGPA1*XUGPA11 +82X21 +,...,+fiani (1) Where, YeGGPAi - Estimate of Grad Program GGPA XUGPAli - Undergraduate UGPA (X21...Xni) - Other application variables In practice, a formula is often calibrated with the GGPA of only the first year. For the GC (general- criterion) approach, the predictor data of non-admitted applicants is ignored, while only the predictors of accepted students are saved. The predictors lie idle through the freshman through junior years, until the end of the senior year when the final GGPA is available as a criterion (C). The formula is calibrated on a selected 38 39 range of applicants in year four, nevertheless, applied to the full range of applicants in year five. In the LP (local-proxy) regression approach, PVUGPA, a subset of UGPA (specifically, the college pro-veterinary UGPA) becomes the proxy-criterion for the solitary-year- and-location sample. Assuming that PVUGPA - GGPAi + error, equation (1) also applies to the LP calibration when the UGPA subset serves as a proxy criterion and when raw-score regression is used with an adequate sample size (error in the dependent variable does not bias the raw-score regression coefficient: Richards, 1982) . In contrast to the conventional approach, however, the LP calibration cases are also the cases to which the subsequent selection formula is applied. A regression model using all applicant data for year ”Y" is calibrated at the time of application, using a subset of UGPA as the criterion (PVUGPA, the UGPA for the veterinary prerequisites) and UGPA as one of several predictors. The resulting regression equation is used as the selection formula for the same set of year "Y" applicants: the selection formula is applied to the year ”Y” applicant data to compute selection scores for each applicant. General-criterion (GC) Regression Approach The conventional strategy for implementing multiple regression in the development of a selection formula will be identified as the general-criterion approach, or GC. 40 For this model, a large pool of cases are cumulated across years and/or across locations adding error where years and/or' locations are moderators. An authentic though restricted criterion is used (e.g. GGPA: it represents mostly the higher performing applicants). The optimal formula for predicting the criterion is limited by (1) the availability of mutual predictors among all the cases, and by (2) the predictiveness of those variables for that particular pool of cases. Once a selection formula is calibrated, the permissible predictor variables and their accompanying weights are set until the next calibration. Depending on the range of years and locations represented in the calibration sample, the formula may be generalizable across time and locations. If the selection-rating system changes over time or location, however, the potential validity of the formula may decline. A large number of assumptions are required to support this approach due to potential moderator variables, multicollinearity, and restricted range problems. Moderators and 'multicollinearity’ become important concerns. because the formula is being generalized to cases outside the calibration pool (usually, across years and/or locations). Because calibration case UGPAs are range- restricted relative to the applicant pool, corrections for restriction- in-range are required to adjust the calibrated selection formula. 41 With this approach, error may enter by way of the following channels: Ca. criterion variables Cb. predictor variables Cc. statistical artifact (connected with the multiple regression procedure or correction specifications) Cd. individual effects Ce. moderators (e.g. year or location effects) Potential sources of systematic error include the following: Sa. individual aptitude/motivation variation Sb. halo and other individual error Sc. qualitative/quantitative metric variation Sd. restricted content domain Se. unspecified predictors Sf. multicollinearity among predictors Sg. individual effects Sh. moderator effects (e.g. years and locations) The notion of an individual effect being systematic may seem dubious to some. Nevertheless, it is both possible for individual error (such as a halo effect) to (1) occur across graders in a consistent fashion or, (2) to affect graders in a random fashion. Potential sources of random error include the following: 42 Ra. measurement error Rb. sampling error Rc. individual effects Rd. moderator effects (e.g. years and locations) Re. superfluous predictors Some advantages for the general approach include the following: Aa. criterion accuracy--use of the authentic criterion Ab. potential generalizability across years/or locations (diminishing the need for frequent recalibrations) Ac. the accumulation of a large calibration pool will diminish the problems of sampling error relative to the estimation of a population observed-correlation, given that (1) sampling is equivalent to random across years and locations and (2) moderator effects are largely absent Disadvantages of the general approach include the following: Da. the need for previous cohort data Db. a fairly complicated analysis procedure is required Dc. the selection formula is fixed (closed to new predictors) Dd. the criterion exists only in a selected sample, thus requiring corrections of the calibrated selection formula due to restrictions-of—range 43 De. the dangers of systematic error due to individual effects, year effects, and/or location effects (compounded since years and locations are seldom drawn randomly or even with large numbers) Df. the predictor pool is diminished because some locations or years don't have conforming variables, thus increasing the underspecification of predictors problem (and systematic error) Optimum conditions for the use of the general approach are as follows: Oa. a large calibration sample Ob. a large application sample Oc. the stability of qualitative/quantitative metrics of selection variables across locations 0d. multiple independent (orthogonal) variables which closely predict the criterion Oe. a rich pool of parallel predictors which exist across locations Of. a minimum drift of the population model over time Og. a high validity of population model over locations Oh. stable demographic characteristics Oi. a stable applicant pool (despite recruitment variation) 44 Using Multiple Regression to Shrink Error If a predictor such as UGPA has been measured with a variety of attribution rules (for rating performance) across applicants, the pooled predictor values will include error moderated by locations and/or years. Linn (1966) demonstrated how raw-score multiple regression can be used to ”shrink" (reduce in magnitude) the moderated error. This can be done where the following appropriate conditions exist: (1) there are many cases sharing a given rule, (2) at least two mutual measures of performance are known to be standard for all of the applicants, and (3) these mutual measures of performance are similar in nature to the uncorrected predictor (measures share common factors). For example, to correct (partially) UGPAs, one would like to have a set of cohort data where, in addition to (1) the uncorrected UGPA, there is ( 2) a mutual GGPA (graduate program GPA), (3) a mutual admissions test score and (4) a dummy variable for each rule (or school of origin). By using the graduate program GGPA as the calibration criterion and the test score and uncorrected college UGPA as predictors, an adjustment weight can be obtained for correcting cases of a similar rule. If college UGPA, test score, and graduate program GGPA are parallel measures, then the predicted scores resulting from the application of the calibrated raw-score multiple regression model, should be estimates of graduate GGPA. This constitutes a particular case of improving an 45 underspecified model, since school variables were correlated with the criterion and accounted for a certain type of variation which otherwise would have been regarded as error. The following special cases are possible modifications of Linn's model augmentation approach: WW If, however, the raw score multiple regression is done without variables identifying the school of origin, the predicted scores will be estimates of graduate GGPA plus some moderated error due to the absence of the missing predictors (school variables). Nevertheless, to the extent that the variation in rule (of UGPA standards) is random and the number of cases is sufficiently large, the predicted scores may be partly-corrected (errors of estimate would be unbiased). If a substantial proportion of the cases already have a similar rule, the correction problem, obviously, is diminished, and the precision of prediction for that particular rule-subgroup will increase. In contrast, subjects with non-conforming rules will be predicted with less accuracy. To the extent that the non-conforming rules differ randomly from the typical rule, quantitatively, and to the extent that qualitative variation is random in nature relative to the typical rule, the calibrated selection formula will be optimal for the whole of the 46 assorted rule-subgroups despite its inferior prediction for individuals having non-conforming rules. Where a predictor variable's rules vary greatly (lacking a predominate rule- subgroup), general precision will suffer and the calibrated selection formula will tend to select more for general as opposed to specific ability. This is because the predictor variable will be less reflective of specific ability due to varying qualitative and quantitative standards/ rules: hence, only general ability will tend to remain intact as a common factor. WW It may be assumed that cross-institutional variation in academic standards is an important factor in criterion integrity. However, variatiOn at other levels may be of equal or greater importance, such as at the curriculum or major level (Elliot & Strenta, 1988: McCornack 8 McLeod, 1988). On the other hand, academic-major variation may be due to factors independent of subject matter, such as specific course content or specific class instructor. Error at this level cannot be reduced by merely controlling for location. Should a predictor differentiate unequally along some important dimension of a sample of cases, the resulting improvement in prediction would affect only a restricted range of cases (i.e. interview ratings may only be valid when augmenting prediction for the highest ability 47 students). In this instance an improvement in predictiveness does not improve prediction in the cut score region. Therefore, an increase in a coefficient value may not correspond to any real improvement in selection formula validity. WWW If the necessary conditions are obtained for the shrinking of variable error (as outlined above) except that the (graduate) GGPA criterion is replaced with a proxy (UGPA) , the consequent model will estimate a GGPA with prediction error shrunken (relative to the accuracy of the proxy variable and the randomness of the school-rules in which the proxy variable is measured). The greater the potential year or location influence on the criterion, the greater the potential for reducing prediction error. In addition to allowing control of year or location influence, a proxy criterion may also be useful as a means of extending the size and variability of the calibration sample (the sample used to calibrate a selection formula). If the criterion (GGPA) represents the same measurement factors as a proxy (UGPA) then the proxy may be regarded as c+e (criterion 4- error). If a predictor variable then correlates with the criterion, then rcp > r(c+e)pr since attenuation of correlation results from unreliability (or error) in a measure. However, if raw- score regression coefficients are used, then the criterion with error will combine with a predictor to yield an 48 unbiased estimate of the raw-score regression coefficient (Richards, 1982) . Hence, to the extent that UGPA may approximate GGPA plus a random error component, and to the extent that prediction factors vacillate annually, or locally, a proxy criterion may improve prediction and selection. Proxy-criterion Alternatives to Conventional Prediction The use of a proxy criterion allows additional alternatives to the conventional procedure for calibrating a selection formula. The availability of a suitable proxy criterion may, potentially, extend the number of cases available to analysis in addition to extending the variability of the cases available. Depending on the quality of the proxy criterion, the sample may be controlled for year or location variability by restricting the sample on such confounding/moderating variables. If the proxy criterion allows an increase in the usable sample size per year, the loss of other-year or other-location cases may not be critical. Where number of cases seems to be a more critical factor than moderator problems, the sample may be increased by adding criterion-absent cases to those having a criterion, because a suitable proxy can substitute for the criterion. Should no ideal criteria be available, use of the above procedure with all cases while using an inferior criterion may still be beneficial. Tables 4 through 9 (on pages 62-65) contrast four 49 calibration methods on the following features: (5) potential channels for error, (6) potential sources of systematic error, (7) potential sources of random error, (8) potential advantages, (9) potential disadvantages, and (10) optimal conditions for use of each of four proxy methods. A discussion of the attributes of three potential calibration approaches precedes presentation of the tables (items are ordered according to the tables): W The local-proxy calibration uses a single-year, single- site sample to estimate predictor equation parameters. If a proxy variable is appropriate and available for replacing an unavailable criterion (e.g. future GGPA), a selection formula can be calibrated and applied back to the same data to yield prediction scores having a validity approaching optimal validity. Using raw-score regression and assuming that the proxy approximates c+e (criterion + random error), the linear model generated will approximate the OLS regression model. Validity here depends greatly on the quality of the proxy criterion, although a second important asset would be a rich assortment of reliable predictors. Error, therefore, may enter by way of the following channels: Ca. criterion variables (particularly, the proxy variable) Cb. predictor variables (present or absent) 50 Cc. a statistical artifact (connected with the multiple regression procedure) Ce. moderators (e.g. gender, social class) Potential sources of systematic error would be the following: Sa. individual aptitude variation Sb. halo or other individual error Sc. qualitative/quantitative rule variation Sd. a restricted content domain Se. unspecified predictors Sf. the level of multicollinearity among predictors Sg. individual motivation variation Sh. moderators (e.g. gender, social class) There is little reason to anticipate substantial changes in individual aptitudes over the course of a graduate program, although such is possible (e.g. brain disease or injury). More likely is the possibility of halo effects which consist in systematic increases or decreases in the criterion score due to subjective bias on the part of the instructor who assigns the predictor or criterion score. Although differences in grade attribution-rules among schools offer the potential for systematic error in predicting criterion scores using school UGPA as a predictor, Linn (1966) observed that this problem was insignificant when admissions-test data was 51 included among the predictors. Where some cases contain predictor scores representing performance on a narrower content domain, those cases are likely to be overpredicted on their criterion performance. Of course, a change in level of motivation may affect criterion performance. Where important predictors are excluded from the prediction model, systematic bias is added to the estimated criterion. .Although. multicollinearity is most problematic ‘where a regression model is being generalized to additional samples, it nevertheless can play a minor role in LP estimation. In particular, multicollinearity may distort variable weights so that when the calibrated selection weight is applied to a parallel variable (i.e. a weight calibrated on mostly 1983 MCAT-- Medical College Admissions Test-- scores gets applied to a 1987 MCAT score), systematic bias may be added. This problem is exacerbated by the level of measurement error present. The absence of important predictor variables from the selection model will also distort the calibrated selection formula. Lastly, the accumulation of moderators is likely to be a consequence of non-random sampling. Moderators, in turn, may have a systematic influence on prediction error. Sources of unsystematic error in the predictors would likely bias the selection formula. Random error affecting the proxy criterion may distort regression weights for low sample-size-to-measurement-error ratios while unreliable predictors violate the regression assumption of perfectly 52 measured independent variables. Sources of random error would be the following: Ra. measurement error Rb. sampling error Rd. moderators (e.g. gender, social class) Like measurement error, sampling error is defined by statisticians as random error, although statistical differences between random samples may be partly systematic (Pedhazur, 1982, is an exception who includes systematic error as a type of measurement error). The fact that statisticians prefer to attribute the systematic component of sampling error to unspecified predictors rather than to the pool of sampling error does not alter the practical fact that differences between sample statistics will always be partly due to the problem of unspecified predictors. Mbderator error can be expected to be random (i.e. level of measurement error in MCAT scores may vacillate randomly across years), except where a theoretical basis exists for a systematic nature. Some advantages for the LP approach include the following: Ad. a simple analysis procedure Ae. freedom from the need for previous cohort data At. the proxy criterion provides the desirable feature of interval level scale where some graduate programs have only dichotomous grading (pass/fail) 53 Ag. the option of adding predictor variables for any new set of applicants (e.g. alternate admission test scores can be specified and weighted) Ah. freedom from some potential systematic or random ability change) year effects, and location effects Ai. range restriction problems are largely eliminated by the implementation of all applicant cases with proxy criterions Aj. non-admitted cases can be used Disadvantages of the LP approach include the following: Dg. the need to recalibrate a new selection formula for each set of new applicants Oh. the danger of individual aptitude or motivation change during the course of the program in question Di. potential proxy criterion inadequacies Although any change in individual aptitude or motivation would decrease the ‘validity of an aptitude measurement (lower the regression coefficient corresponding to the aptitude measure), where an authentic criterion is used, the lower validity would be accurate. With use of a proxy criterion, however, the validity would remain inflated because the proxy would not reflect the problematic trait variation. 54 Conditions under which the LP approach will be optimum: Oa/b. a large calibration/ application sample Oc. the stability of qualitative/quantitative metrics of selection variables across locations 0d. multiple independent (orthogonal) variables which closely predict the criterion Oe. a rich pool of parallel predictors which exist across locations Oj. the stability of aptitudes and motivation W The general-proxy (GP) approach uses a multi-year sample and a proxy criterion. With the pooled sample fixed relative to location, the proxy criterion approach can be used to calibrate a selection formula from a pool of cases accumulated across several years. The advantage of this approadh is the potential for compensating for cases lost while controlling for a moderator, because it allows a greater number of usable cases within each applicant-year sample. It may also generalize across years, thus reducing the frequency of the need to recalibrate a formula. Relative to the local approach, a potential liability is the possible systematic error due to year effects. Because only a small range of years of data is likely to be accessible, the external validity of the selection formula may be poor (the local approach does not attempt to generalize). Other variations of this compromise approach 55 are possible also, such as using a pooled sample fixed relative to year but not to location. Error may enter by way of the following channels: Ca. criterion variables (particularly, the proxy variable) Cb. predictor variables Cc. statistical artifact (connected with the multiple regression procedure) Ce. moderators (e.g. year or location) Potential sources of systematic error include the following: Sa. individual aptitude variation Sb. halo and other individual error Sc. qualitative/quantitative rule variation Sd. a restricted content domain Se. unspecified predictors Sf. the level of multicollinearity among predictors Sg. individual effects Sh. moderator effects (e.g. year or location effects) Potential sources of random error would include the following: Ra. measurement error Rb. sampling error Rc. moderator effects (e.g. year or location effects) Rd. Re. 56 individual effects superfluous predictors Some advantages for the general-proxy approach include the following: Ab. AC. Ad. At. Ai. Aj. generalizability to other locations and/or years sample size can be increased by pooling a simple analysis procedure the proxy criterion provides the desirable feature of interval level scale where some graduate programs have only dichotomous grading (pass/fail) freedom from potential moderated error due to location effects (or alternatively, freedom from moderated error due to year effects) range restriction problems are largely eliminated by the implementation of all applicant cases with proxy criterions allows use of non-admitted cases Disadvantages of the general-proxy approach include the following: Da. the need for previous cohort data the selection formula is fixed error due to years, sites, and individuals restrictive tendency in predictor pool 57 Dh. the danger' of individual aptitude or :motivation change during the course of the program in question Di. proxy criterion inadequacies Conditions under which the general-proxy approach will be optimum: Oa/b. a large calibration/ application sample Oc. the stability of qualitative/quantitative rules of selection variables across locations Od. multiple independent (orthogonal) variables which closely predict the criterion Oe. a rich pool of parallel predictors which exist across locations Of. the stability of population model across years Oj. the stability of aptitudes and motivation The general-mixed (GM) calibration approach includes a multi-year sample and a conditional criterion: an authentic or a proxy criterion. This is the conventional strategy for implementing multiple regression in the development of a selection .formula except with the inclusion of non-admitted graduate applicant cases. Non-admitted graduate cases utilize PVUGPA (prerequisite veterinary course UGPA) as a proxy for the authentic graduate GGPA criterion. A large pool of cases are cumulated across years and/or across locations. The optimal formula for 58 predicting the criterion is limited by the predictiveness of those variables for that particular pool of cases. Once a selection formula is calibrated, the permissible predictor variables and accompanying weights are set until the next calibration. Depending on the range of years and locations represented in the calibration sample, the formula may be generalizable across time and locations. If the system changes over time or location, the potential 'validity of the formula may decline. As in the conventional GC (general-criterion) model, a greater number of assumptions are required to support this approach, but the range restriction problems are effectively resolved so that corrections for selection may be unnecessary. Multicollinearity remains a concern since the formula is being generalized to cases outside the calibration pool and, usually, across years and/or locations. With this approach, error“ may enter by ‘way of the following channels: Ca. criterion variables Cb. predictor variables Cc. statistical artifact (connected with the multiple regression procedure or correction specifications) Cd. individual effects Ce. moderators (e.g. year and location) Potential sources of systematic error include the following: 59 Sa. individual aptitude/motivation variation ‘Sb. halo and other individual error Sc. qualitative/quantitative metric variation Sd. a restricted content domain Se. unspecified predictors Sf. multicollinearity among predictors Sg. individual motivation variation Sh. moderator effects (e.g. year or location effeCts) Important sources of potential random error include the following: Ra. measurement error Rb. sampling error Rc. individual effects Rd. moderator effects (e.g. year or location effects) Re. superfluous predictors Some advantages for the GM approach include the following: Aa. criterion accuracy- use of the authentic criterion Ab. potential generalizability across years/or locations (diminishing the need for frequent calibrations) Ac. the accumulation of a large calibration pool will largely diminish the bias effect of sampling error given that sampling is equivalent to random across years (or locations) Ad. Ai. Aj. 60 simplicity of analysis range restriction problems are largely eliminated by the implementation of all applicant cases with proxy criterions increased sample size due to added rejectee cases Disadvantages of the general-criterion-proxy approach include the following: Dd. Dc. Df. Dh. Di. the need for previous cohort data the formula is fixed (is closed to new predictors) dangers of systematic error due to individual effects, year effects, and/or location effects (compounded since years and locations are seldom drawn randomly or even in large numbers) the predictor pool is diminished as some locations don't have conforming variables thus increasing the underspecification-of-predictors problem and its systematic error the danger of individual aptitude or motivation change during the course of the program in question proxy criterion inadequacies Optimum conditions for the use of the GM approach: Oa. Ob. a large calibration sample a large application sample 61 Oc. the stability of qualitative/quantitative rules of selection variables across locations Od. multiple independent (orthogonal) variables which closely predict the criterion Oe. a rich pool of parallel predictors which exist across locations Of. a minimum drift of population model over time Og. a high validity of population model over locations Oh. stable demographic characteristics Oi. a stable applicant pool (despite recruitment variation) Oj. stability of aptitudes and motivation It is expected (in any measurement situation) that unspecified factors will add a random distribution of error to the scores of the cases being measured. Where scores are estimates of future ratings (predicted scores) it is possible to actually obtain a measure of these errors in order to study the nature of the error. This is done by subtracting subsequent outcome scores (the criterion) from the prediction scores. If these errors are partly correlated with one or more potential predictors, they are systematic and it is possible that the score prediction formula may be improved by modifying predictors or their weights. If the errors appear to be non-randomly distributed for a large sample, but they fail to correlate with conceivable predictors, there nevertheless is likely 62 to be an unspecified factor’ of (predictive importance: therefore the selection /prediction formula will be inaccurate. If the distribution of errors is random, there may, nevertheless, be a moderating variable (e.g. years, locations) for which the level of error changes in an unsystematic way. This also results in a selection/ prediction formula which is inaccurate. Only where error is randomly distributed and apparently irreducible by (1) the addition of predictors or' by (2) controlling for potential moderator variables, can it be concluded that the selection/prediction formula is precise. Table 4 Channels by which error may affect calibration methods CHANNELS FOR ERROR GC LP GP GM Ca Criterion variables . . . . . . . . . . xx xx xx xx Cb Predictor variables . . . . . . . . . . xx xx xx xx Cc Statistical artifacts . . . . . . . . . xx xx xx xx Cd Individual effects . . . . . . . . . . . xx xx Ce Moderators . . . . . . . . . . . . . . . xx xx xx XX GC= General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GM- General-Mixed prediction method Potential systematic error for each 63 Table 5 calibration method SOURCES OF SYSTEMATIC ERROR O O E G) 'U 9 Sa Aptitude/ motivation variation . . . . . xx xx xx xx Sb Halo and other individual error . . . . xx xx xx xx Sc Scale irregularity . . . . . . . . . . . xx xx xx XX Sd Restricted content domain . . . . . . . xx xx xx xx Se Unspecified predictors . . . . . . . . XX XX XX XX Sf Multicollinearity among predictors . . . XX XX XX XX Sg Individual effects . . . . . . . . . . . XX xx XX xx Sh Mederator effects . . . . . . . . . . . xx xx XX XX cc- General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GM: General-Mixed prediction method Table 6 Potential random error for each of four calibration methods SOURCES OF RANDOM ERROR GC LP GP GM Ra Measurement error . . . . . . . . . . . XX xx xx XX Rb Sampling error . . . . . . . . . . . . . xx xx xx xx Rc Individual effects . . . . . . . . . . . xx XX xx Rd Moderator effects . . . . . . . . . . . xx xx xx XX Re Superfluous predictor rs . . . . . . . . . xx xx xx cc- General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GMh General-Mixed prediction method 64 Table 7 Potential advantages for each of four calibration methods ADVANTAGES GC LP GP GM Aa Criterion accuracy . . . . . Ab Generalizability . . . . . Ac Increase sample by pooling . Ad Simplicity of analysis . . . Ae Needs only one year's admission data Af Interval scale despite pass/fail GGPA Ag Admits new predictors . . . . Ah Reduced year, site, and individual error Ai Avoids range restriction . . . . . . . Aj Uses non-admitted cases . . . . . . . ERR §§§§ XX XX §§§§§§§ §§§ R fifii cc- General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GMa General-Mixed prediction method Table 8 Disadvantages of each of four calibration methods DISADVANTAGES GC LP GP GM XX XX XX XX XX XX XX XX XX XX XX XX XX XX De Need for previous cohort data . . Db Complicated analysis . . . . . . . . Dc Selection formula is fixed . . . . Dd Selection on the criterion . . . . De Error due to years, sites, individuals Df Restrictive tendency in predictor pool Dg Need to recalibrate each year . . . . Dh Risks aptitude or motivation change . Di Potential proxy criterion inadequacies XX XX XX XX XX XX cc- General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GM- General-Mixed prediction method 65 Table 9 Optimal conditions for use of each calibration method OPTIMAL CONDITIONS FOR METHOD GC LP GP GM XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX Oa Large calibration sample . . . . . XX XX XX XX Ob Large application sample . . . . Oc Stability of cross-site scales . Od Multiple, sound predictors . . . Oe Cross-site predictors . . . . XX XX XX XX XX XX XX XX XX XX XX XX Of Temporal stability of model . . Og Model validity across samples . . Oh Stable demographic characteristics Oi Stable applicant pool . . Oj Stability of aptitudes and motivatin OD GC= General-Criterion prediction method LP- Local-Proxy prediction method GP- General-Proxy prediction method GM: General-Mixed prediction method Summary The conventional regression approach (GC) to calibration of selection formulas may incorporate several usages of a proxy criterion in the estimation process, resulting in alternative regression procedures which are labeled LP, GP, and GM (LP: formula calibration from a single site and time, GP= calibration across sites or times using a proxy criterion, and GM= calibration across sites or times mixing both authentic and proxy criterion use). Tables 4 through 9 attempt to compare and contrast the methods with respect to their corruptibility, strengths, and conditions for optimal performance. CHAPTER IV DESIGN Population Sample The non-random sample consisted of five cohorts: four graduation cohorts and an additional three-year cohort from the College of Veterinary Medicine at Michigan State University. The cohorts included both accepted and rejected applicants. Applicants who obtained a vet school GGPA were considered to be GRADS (this includes the three- year cohort): all others were considered to be non- graduates, or NGRADS. Student attrition was estimated to be at less than two students per cohort. Some overlap existed among the cohorts, as some cases were repeat- applicants. While the annual number of applicants changed notably over the years, the number of candidates in-program was relatively constant (see Table 10). All applicants in this sample received scores on the New Medical College Admissions Test (MCAT) and were ranked for admissibility by a selection formula decided upon by the Veterinary College's admission committee. The high admissibility of an applicant could often mean that such an applicant might accept another candidacy elsewhere, therefore, some NGRADS were of this caliber. 66 67 Table 10 The study sample: Five cohorts of veterinary applicants APPLICATION GRADUATION NUMBER OF NUMBER OF YEAR YEAR APPLICANTS GRADUATES 1981 1985 327 90 1982 1986 327 89 1983 1987 310 103 1984 1988 273 88 1985 1989 248 101 Three graduation cohorts (1985, 1986, 1987) were used in the calibration sample for the GC, GP, and GM conditions and the other two cohorts (1988, 1989) were used for validity testing of GC, GP, GM, and LP conditions. For the LP condition, the same cohorts (1988, 1989) were used for both calibration and validity testing. The validity test weighted higher a portion of the graduating subset of the two validity-test cohorts: specifically, the validity-test used cases found in the lower third of the GRAD program GPA range. These cases were selected for their likely proximity to the admissions cut-score region. The higher performing students would be less affected by a different admissions cutting criterion. Predictors Predictor variables were of two types, (1) within-unit predictors (which were the following ordinary predictor 68 variables: CUMGPA (cumulative 'UGPA), PVUGPA. (veterinary prerequisites UGPA), honor points, prerequisite course honor points, credits without pass/fail credits, average credits per term, summer credits, total credits, number of terms, pass/fail credits, the Medical College Admissions test subtest scores, age, sex, interview scores I and II, work experience, veterinary experience, narrative work sample, and source of UGPA), and (2) between-unit variation (year). Although most of the variables were taken directly from admissions documents, the source-of-UGPA variable is defined especially for this research. This variable is defined from a retrospective analysis of student numbers together with consultation with an admissions staff-member to determine which applicants clearly were not MSU undergraduates. Subsequently, a small proportion of the applicants actually fell into the non-MSU category: this is particularly true for applicants who graduated from the program. Where applicant cases had missing values, default scores were assigned either by entering an average score or by entering a minimum score. This was similar to the practice of admission departments in determining an applicant's merit (see default score list in Appendix I). Predictors were available from admissions data variables. Because many of these variables were not of interval scale, and because some of them were composites of several variables, such variables were deleted. Composites were sacrificed for single variables. The twenty-two 69 Table 11 Predictors used in levels of the intercorrelation factor PREDICTORS INTERCORRELATION LEVELS >.45 (.45 cumulative UGPA . . veterinary prerequisites UGPA. veterinary prerequisites points honor points . . . . . . . . credits . . . . . . . . . . average term credits . . . . sum of pre-veterinary credits total credits . . . number of terms . pass/fail credits MCAT Biology . . . MCAT Chemistry . . MCAT Physics . . . MCAT Quantitative. MCAT Reading . . . MCAT Science . . . age . . . . . . . ‘ex O O O O O O O O veterinary experience work experience . . . . activities and achievements rating narrative writing sample rating fifififiifififififififififififiiififififi 15¢ $819935 ES 1885 it variables included in the predictor levels are listed in Table 11. Criteria intercorrelation Program GGPA served as the authentic criterion. The proxy criterion (used with the LP, GP, and GM approaches) was the PVUGPA (veterinary prerequisite course UGPA). 70 Analyses All research hypotheses were addressed by analyses of either of the following two analyses: one primarily used to test calibration methods and another primarily used to test for calibration factor and year effects: Test of Methods (MANOVATOM) Sample. Three graduation cohorts were used for the calibration of GC, GP, and GM formulas with two additional cohorts reserved for the purpose of validity testing. The calibration of the LP formula used one of the reserved cohorts in each of its two calibrations. Validity testing for all calibrated formulas was on the two reserved cohorts. Two years' data (1981 and 1983) were drawn for the small sample GC, GP, and GM calibrations while three years' data were used for their large sample calibrations: the same two years were used as the source for all two-year condition samples. For the LP method (which used a single year's data) the small sample condition used a random 2/3rds of the year's data. Subsamples differed relative to (1) the inclusion of non-MSU applicants and (2) the size of the calibration pool. Conditions. A repeated-measures MANOVA design was used (MANOVATOM), using C (methods), A (sample size), N (source of UGPA), and P (predictor intercorrelation) as within- subjects factors. Y (year) was the sole between groups factor. Because all but the four-level N factor' was 71 dichotomous, there were 64 conditions overall. The dichotomous independent variables were, furthermore: A (sample size), where A! a two-year sample, and B- a three-year sample. N (source), where M2 MSU, and NI: all academic origins P (predictor intercorrelation), where P= uncorrelated predictors, and 0- correlated predictors Because, in actual practice, sample size variation would be differentially affected by the recruitment approach used, sample size control was imposed by reducing a proportion of usable cases (as opposed to reduction to an absolute numerical limit). Thus from the subsample of cases qualifying for a particular combination of attributes, one condition used the full set of cases, while for a second condition the sample was reduced to (roughly) 2/3 of the full subsample size. For the conditions using the GC method, the calibrated regression formulas (which. used only’ higher’ performing applicant cases) were range-restricted relative. to the populations to which they were to be applied (namely, the full, annual applicant roster). Because regression coefficients were biased by restriction of the dependent variable, correction for range restriction was appropriate for each standardized partial in the calibrated regression formulas of the general method conditions. Alexander, Carson, Allinger, and Carr (1987) provided the following 72 formula which was used for correcting doubly truncated correlations (range restriction in both the criterion and in the predictor): rho' - 1 (1 - rho'2)2 1/2 ' O I 2 Where, rho'-range restricted correlation Ux‘ the ratio of restricted to unrestricted standard deviations for variable x Uys the ratio of restricted to unrestricted standard deviations for variable Y +/- :corresponds to the sign of rho' To correct the GC regression coefficients, a standardized correlation equation must be computed and the partials corrected individually for range restriction. Dependent_yariable. The difference score/dependent variable was a transformed, rank-order difference between rank-of—estimated GGPA and rank-of-actual GGPA (the transformation is referred to as LEWAR: Log of prediction Error -- Weighted, Absolute, and in Bank units) . Because the magnitude of the rank difference was of concern, the absolute value of the difference was used. Where weighting was desired to reflect the proximity of the error to the cut score region, values were weighted to reflect proximity to the lowest GGPA rank. To normalize the distribution of 73 these absolute values, the modified scores were transformed into their natural logarithms. Satisfactign_gf_5§§umptign§. Two critical assumptions for MANOVA are that (1) observations be independent, and that (2) variance be constant (Keeves, 1988). The largely objective nature of admissions and program data allowed some confidence in the proposition that the admissions and program data both had high levels of independence. Differences in condition means were completely determined by the treatment (application of potentially biased, calibrated formulas) and, therefore, there was no treatment-related error to correlate among subgroups. Constant variance was expected due to the lack of differential sampling: a solitary two year sample received all treatments within a repeated-measures format. Applicatign_tg_flypgthe§e§. Error variables from these calibration conditions were used to test hypotheses A, B, C, and E. Test of Factor and Year Effects (MANOVATOF) Sample. Five graduation cohorts (including the 1985 cohort with only a three-year UGPA) were used for the calibration and validity testing of LP models. Random samples were drawn from subsamples of each calibration cohort to obtain calibration subsamples of n= 80 and n= 135. Subsamples differed relative to (1) the inclusion of 74 non-MSU undergraduates, (2) the size of the calibration pool, and (3) ability level as measured by PVUGPA. gonoigiono. A repeated-measures MANOVA was used (MANOVATOF), including A (sample-size), N (source of UGPA), 0 (selection on PVUGPA), and P (intercorrelation of predictors) as within-subjects factors. Y (year) was again the sole between-groups factor. Altogether, there were 80 conditions. The dichotomous independent variables were as follows: A (sample size), where A! 80 random cases of the selected sample, and B: 135 random cases of the selected sample. (For the high-ability condition for 1984, B- 117: and for 1985, Ba 111 due to the smaller pools of cases) N (source), where M- MSU undergraduate origin, and N- otherwise P (predictor intercorrelation), where P- uncorrelated predictors, and Q8 correlated predictors O (achievement level), where Ga PVUGPA > 3.0, and 0- otherwise. There were five years for the between-subjects variable, year. Because methods were not being compared in MANOVATOF, sample-size variation was introduced in absolute numbers of cases. Non-MSU cases were qualified as in MANOVATOM (the 75 methods MANOVA) and predictors were selected as in the methods MANOVA. ngngnggn3_!§ziahlg. LEWAPtranstormed prediction error served as the dependent variable (LEWA- Log of prediction (Error: fleighted and in Absolute values). The transformation into logarithms normalized the distribution at the deviations. Prediction-error residuals were weighted according to their proximity to the GGPA rank of one (1). Apnligatign_tg_fiypgtn§§g§. Error variables from these calibration conditions were used to test Hypotheses D. CHAPTER V RESULES Table 12 summarizes the research findings relative to the outcomes expected. In a local-proxy (LP) analysis, prediction varied significantly by year, confirming Hypothesis D: fiypgtng§1§_n. With prediction error as the dependent variable, with methods limited to the local-proxy (LP) approach, and with variation controlled with respect to years, sample-size, academic origin, intercorrelation of predictors, and level of ability, prediction will vacillate across years. Unconfirmed were hypotheses A, B, C, and E. Hypotheses A and 8 follow: nyngtng§1§_5. For students falling within a cut score zone, the local-proxy (LP) formula will be more predictive of graduate GPA (GGPA) than will undergraduate GPA (UGPA). . For students falling within a cut score zone, the general-mixed (GM) formula will be more predictive of graduate GPA (GGPA) than will the conventional prediction model (the general-criterion formula, GC) as corrected for range restriction. The failure to find method and year differences contradicted Hypothesis C. Year effect represents differing annual error means in logs of weighted, absolute, prediction-error residuals (a= .05). For the following hypothesis, C, prediction (measured by prediction-error means) did not vary across years, although the probability of the observed outcomes was only at .069: 76 77 Table 12 Confirmation of expected outcomes HYPOTHESIS CONFIRMATION YES NO A. LP more valid than UGPA ............... 8. GM more valid than GC ................. C. Method and Year effects (TOM) Method (TOM) ..................... Years (TOM) ..................... D. Year effect (TOF) ..................... Y E. Non-MSU GGPA less validly predicted 2 252 232 TOM= Test of Methods TOF= Test of Factors and Years LP- Local-proxy estimation formula UGPA:Undergraduate grade point average GM: General-mixed estimation formula Gc- General-criterion estimation formula MSU= Source of UGPA is Michigan State University GGPAaGraduate grade point average . With prediction error as the dependent variable, and with variation controlled with respect to years, methods, sample-size, academic origin, and intercorrelation of predictors, prediction differences among years and methods will be obtained. For the following hypothesis, E, the numerical outcome was in the right direction, nevertheless, the difference was not significant: . Non-MSU undergraduates will be associated with greater prediction error. Tests of Methods (TOM) across Two Years 'I t: ‘ 3...".1 .‘i 1". l. 1-...$_S ll: ’! This analysis addressed Hypothesis A: that the LP (local- proxy) approach would out-predict UGPA as an estimator of 78 GGPA. For the dependent ‘variable, a prediction-error residual transformation was used. The transformation was given the acronym LEWAR, representing Log of prediction Error: Weighted, Absolute, and in.3ank units. The original prediction error residual was the difference in rank between the estimated GGPA and the actual GGPA. It was transformed by the following steps: (1) take the absolute value of the difference in ranks between the rank of the estimated GGPA and the rank of the actual GGPA (magnitude and not direction of the error was important), (2) weight the absolute value by a non-linear function which is biased towards low GGPA rank (estimation precision is most critical for cases in the cut- score region: for cases which are marginally acceptable), (3) obtain the natural log of this weighted error (this serves to normalize the distribution of the variable). The LEWAR weighting function is the following: [RKGPA+l]3 LEWAR- Ln ((l+ABS[RKGPA-RKEST]) * ( +2)} RKGPA3 Where LN- a function providing the natural log of a term, A88- a function providing the absolute value of a term, RKGPA= the rank of the GGPA score, RKEST- the rank of the GGPA estimate (see Appendix II, Figure IIA for a graph of the weighting used for the LEWAR error transformation). The LEWAR transformation eliminated the negativity of error residuals, inasmuch. as error' magnitude (and not 79 direction of error) was the vital concern. The logarithm transformation was then needed, however, to restore the normal-shaped distribution lost with the transformation of the negative side of the scale. Also included in the transformation was a non-linear weighting (graphed in Figure IIA) which served to increase the importance of prediction-error for cases in the lower range of the GGPA ranking. It was assumed that these cases were most likely to correspond to applicants considered marginal by admission committees. This weighting, which reduced the importance of cases having higher merit rank, served to reduce the actual degrees of freedom to an unknown extent; the systematic deletion of cases is a special case of variation in case weighting. Subsequent significance tests, therefore, must be somewhat liberal. With respect to assumptions of normality and independence of variables for these conditions, dependent variable frequencies approximated normal distributions and the LEWAR-transformed prediction-error by predictor correlations were mostly non-significant. LEWAR- transformed prediction-error was explicitly and negatively biased according to GGPA rank due to the weighting: however, other significant correlations appeared for variables representing activities and achievements, work experience, narrative writing, MCAT-Chemistry, and average credits carried per term. Only activities and achievements and average credits carried per term ever obtained positive 80 correlations, and only activities and achievements was consistently positive. GGPA obtained the highest correlation with a LEWAR-transformed variable at -.32 for 1984 data. For 1985, the highest correlations were about ten points less, see the correlation matrix in Appendix III). Except for the intentional bias due to GGPA weighting, there was no reason, to expect differential subgroup prediction. Contrasts using LEWAR-transformed prediction-error as the dependent variable were conducted between UGPA and LP estimation by subtracting the LP discrepancy score (GGPA minus estimated GGPA) from the UGPA discrepancy score. Table 13 displays four contrasts, not one of which attains statistical significance. Contrasts were conducted separately between 1984 and 1985 data for two (optimum) LP conditions differing only on source of UGPA allowed into the calibration of the formula. The variables contrasted with UGPA were (1) “MSU: where, M. only MSU UGPAs were allowed into the calibration, B. the calibration used the total year's cases, P. intercorrelated predictors ( >.45 ) were not permitted into the calibration: and (2) mm. where, N. applicants were not selected on the source of UGPA B. the calibration used the total year's cases, P. intercorrelated predictors ( >.45 ) were not permitted into the calibration 81 Table 13 Contrasts of UGPA prediction against LP prediction* YEAR CONTRAST DIFFERENCE tVALUE DF PROB UGPA- LP 1984 eu-ennsu- 2.7740-2.7831- -.0091 -.15 87 .881 1985 eu'enMSU- 2.9655-2.9758- -.0103 -.18 100 .855 1984 eu-enALL- 2.7740-2.7345- .0395 .59 87 .558 1985 eu’enAII' 2.9655-2.9503- .0152 .25 100 .805 ' Dependent variable expressed as LEWAR transformed error= Log of Error: being Weighted, Absolute, and in units of Bank UGPA- Undergraduate grade point average GGPA- Graduate grade point average LP- Mean log of weighted absolute prediction-error in ranks, using local-proxy estimation eu- Mean log of weighted absolute prediction-error in ranks, using UGPA estimation enMSU’ Mean log of weighted absolute prediction-error in ranks using local-proxy estimation with an all MSU calibration sample efiALLF Mean log of weighted absolute prediction-error in ranks using local-proxy estimation with a calibration sample unselected on UGPA Not one of the contrasts attained significance, thus ruling out the possibility of significance familywise. The test of Hypothesis A, therefore, did not confirm greater validity for the LP estimate over UGPA for cases in the cut-score region. egaei :e': ,0: 1‘ 1‘ ’1‘ °- ‘ _°! 1' 9.... These analyses addressed (1) Hypothesis B: that the GM (general-mixed) approach would out-predict the GC (general- criterion) , and (2) Hypothesis C: that prediction would differ by method used and by year's data estimated. 82 Methods of selection formula calibration were evaluated with respect to mean level of LEWAR (the dependent variable). Thus significant effects (MANOVA/ ANOVA) would indicate less valid estimation. GC equations were calibrated using standardized regression in order to allow corrections of partials (standardized regression coefficients) for restrictions-of-range. The proxy method equations used raw-score regression. Precision between the two types of method equations was compared in terms of LEWAR-transformed residuals of rank (between estimated GGPAs and actual veterinary school GGPAs). Method difference in estimation across the whole range of GGPAs is of some interest, although differences in estimation would be most relevant to the cases close to the cut-score. For instance, the college's selection ratio may expand, invalidating the previous selection procedures for a particular cut-score. As a general comparison of methods, Table 14 presents average predictive rank among the four calibration methods where error is reported unweighted in its natural log (log of absolute rank error). No differences were significant. Table 15 presents between-subject. by' within-subject effects for the test-of-methods using a repeated-measures MANOVA (MANOVATOM) with the following within subject factors: 83 Table 14 Method efficiency in logs of absolute rank-error* METHOD LOG OF ABSOLUTE RANK ERROR GC LP GP GM MEAN 2.50 2.43 2.71 2.09 8.6. 1.31 .71 1.10 1.08 8.8. .34 .18 .28 .28 I Rank-error is absolute value of a discrepancy: estimate rank minus rank of GGPA GC= General-criterion, LP- Local-proxy GP= General-proxy, GM= General-mixed (authentic and proxy criterion) C= calibration methods, 0- intercorrelation of calibration predictors, Na source of calibration applicants, and A! size of the calibration sample. Years (Y) were the only between subject factors. The dependent variables were prediction rank-error residuals for a factorial matrix of 64 calibration conditions. The average absolute prediction-error in ranks across conditions ranged from 17 to 34 positions. These residuals received the LEWAR transformation (described on page 78, 79). The plots of the dependent variable frequencies approximated normal distributions and the predictor by dependent variable correlations were often marginally significant for predictors similar to GGPA (due to explicit weighting). As presented in Table 15, the a priori contrast between the GM (general-mixed) and GC (general-criterion) conditions was not significant, indicating comparable prediction. Table 16 reports no 84 Table 15 Estimate of average contrast between general-criterion and general-mixed transformed prediction-error means SOURCE OF VARIATION COEFF ST.ER T-VALUE SIG.T L-BND H-BND GM minus GO -.092 .107 -.863 .389 -.304 .119 * Measured in transformed prediction-error residuals (IiIWAR-I Log Error: Weighted, Absolute, and in Bank un ts) GM- general mixed: multi-year calibration sample with authentic and proxy criterion GC= general criterion: multi-year calibration sample with authentic criterion differences among methods: Table 17 likewise reports no differences among years (although for years, the probability of the results under the null hypothesis is only .069). Table 16 Test of main effect for methods* (MANOVATOM) M U L T I V A R I A T E SOURCE-OF-VARIATION WILKS-LAMBDA MUDT-F HYP-DF ERR-DF SIG Methods .99160 .52258 3 185 .667 'Measured in transformed prediction-error residuals (LEWAR- (Log Error: Weighted, Absolute, and in Bank units) Methods- local-proxy, general-criterion, general-proxy, & general-mixed MANOVATOH- Test-of—methods repeated-measures MANOVA 85 Table 17 Test of main effect for years' (MANOVATOM) SOURCE OF VARIATION SS DF MS F PROB Year 66.09 1 66.09 3.34 .069 ' Measured in transformed prediction-error residuals (LEWAR=Log Error: Weighted, Absolute, and in Bank units) MANOVATOM-Test-of-methods repeated-measures MANOVA Years-1984, 1985 Plots of the means in Figures 1 and 2 (for 1984 and 1985 cohorts respectively) illustrate the contrasting interactiveness between the GC (general-criterion) and the GM (general-mixed) methods. Under the following conditions the GC method is notably less predictive: (l) the sample is small, (2) the calibration sample is highly selected on a single source of applicants (relative to the graduating cohort), and (3) the calibration does not screen out correlated predictors (note condition AMQ for 1984 and 1985: where, A: sample size= small M- source of UGPA- MSU QB intercorrelation of predictors > .45). It should be noted that the GC method uses a much smaller calibration sample than do the competing proxy methods, although all small-sample conditions use two- cohort calibration samples. For the GC method, this amounted to about 200 applicants, for the proxy methods, over 400 applicants were available per the two cohorts. 86 Table 18 Cross-validities for prediction methods METHOD CONDITION COHORT 1984 COHORT 1985 n- 87 n- 101 General Criterion . NO . . . .60 . . . . . .49 General Mixed . . . HQ . . . .58 . . . . . .34 General Proxy . . . NP . . . .54 . . . . . .38 Local Proxy . . . . MP . . . .53 . . . . . .40 PVUGPA . . . . . . . . . . . .49 . . . . . .37 UGPA O O O O O O O O O O 0 O O 55 O O O O O O 41 Method condition acronyms represent the following: N- sources . . . . . . . . . all UGPAs Mb source: . . . . . . . . . MSU UGPA 0- intercorrelation> . . . . .45 P- intercorrelation< . . . . .45 Inclusion of (l) UGPA error in the calibration sample (a consequence of multiple UGPA sources in the calibration sample) and ( 2) the reduction of correlated predictors, appears to reduce the adverse impact of small calibration sample size (note condition ANP for 1984 and 1985; where, A- sample size= small N= source of UGPA- all P- intercorrelation of predictors < .45). Table 18 presents cross-validity coefficients for the presumed optimal condition for each of the four methods when applied to predict 1984 and 1985 GGPA from the appropriate admissions data. It is apparent from the table that, numerically, proxy criterion GGPA estimates (from all but the general criterion method) correlate somewhat below the correlation level of conventional (GC) multiple predictor GGPA estimates (from the general criterion 87 method). Because cross-validation ‘was performed on a restricted sample (applicants with a subsequent GGPA), however, it cannot precisely index method validity for application to ordinary admissions selection (where the range of the criterion is unrestricted). To compensate for this problem, the regression weights for the GC (general- criterion) were corrected for restriction prior to the computation of the cross validity coefficients. This adjusted the GC range-restricted weights towards weight levels appropriate for a wide-range criterion. Problems with using correlation coefficients to evaluate selection validity are discussed later. The test-of—methods addressed Hypotheses B and C as follows: B. Hypothesis B, that GM (general-mixed) prediction would exceed that of the GC (general-criterion) approach, was not confirmed. Conversely, under the most normal conditions (e.g. larger calibration sample-size, and multiple source UGPAs, the GC method generally appeared more predictive than other methods for cases in the cut-score region, C. Hypothesis C, that main prediction differences among methods and years would emerge was not confirmed, suggesting that these factors were unimportant to cases in the cut-score region. Nevertheless, the probability of the outcome for years (under the null hypothesis) was low (p= .07). 138 Aemmav moonuma cowumfiwumm «mum macaw uouum cowuowcmum omauommcmuu mo ma0>0a 0>fiu0Hmm .H mmDOHm (hm-Ugh 4&6: U”X~2l~flh0fl~00 ~AKOLQI~GH0COU ~ANOanl—fl004— GoqhvudhnUIuahvcflo D X 4 O + D mzm mz< m2m m2< sz 324 02m 02< h p _ P _ _ O©.N 1 95m X ( > OQN I OQN l Goa” mv. A sawumamuuouumucfi woquooun no me. v newusaouuoouuucw ROHOwuoum um nouuwuumoucs «mun uu>ueum no neonaom «moo Has a: I oNd uousom «mus pm: haze a! o.n A «no: uu>uoua no swam oaafinm Hazy nu «awn canfism moun\n It I and JOJJS UOIQOIpeJd pamJOJSUBJL HVMS'I unknowns soduducou uou aux u QTm $9 Acacia/02.8.3 mzfiz 202.528 89 Amwmwv moosuma cofiusfiwumm «moo mcoao Houum coflvuwooum omfihommcmuu wo mH0>0H 0>wumamm .m mmDUHm $53?“ «£0: coxEIELoCeU mxoLmIELoCmo axoamLaooA CoEBCULaLoCoo D x d o + 0 mzm mz< min 53. 02m az... 02m 02¢. _ _ a _ _ r O$.N 1 Iohm m v H I om.m L J B u S I cod J n o . .._ v m c RT M JI/WL x . m < I co m D. v I d . a [com m me. A sawusaeuuoououcfi uouufiooun no . I me. v coflunamuuoououcw uouuwvoum um m. omuowuummucs «mun 9970.3 50 m. moousom «mu: :0 I2 I omd u uuuoom duos om: haco ll o.n A «no: u0>luum nu 3 Guam «HE—Bu .35“ In and u ouwm magnum moun\~ ad ”I m abound cognac—~00 How 29— ova mm? 38362525 2.522 202.528 90 Test of Years and Other Factors under LP (local-proxy) Mans—YW- This analysis addresses Hypothesis D which predicts an LP method prediction difference among years. The dependent variable was a transformed prediction-error residual differing from LEWAR variables by its use of actual rather than rank error (see pages 78 and 79). Its transformation is referred to as LEWA, representing Log Error: Weighted and Absolute. The use of actual prediction error (rather than the predictive ranking error ‘used in the LEWAR-transformed dependent variable) allowed more powerful significance testing than in the test of methods analysis. The original prediction-error residual was the difference in the estimated GGPA and the actual GGPA. Across conditions, the average absolute prediction-error ranged from .29 to 3.87. The LEWA transformation is as follows: (1) take the absolute value of the difference between the estimated GGPA and the actual GGPA (magnitude but not direction of the difference is of interest) (2) weight the absolute value by a non-linear function which is biased towards low GGPA rank (see Appendix II for a graph of the weighting function used to give greater weight to the cut-score region), (3) obtain the natural log of this weighted error (to normalize the distribution of the error variable). 91 The LEWA weighting function is the following: [RKGPA+1]3 LEWA= Ln {(1+ABS [GGPA-ESTGGPAJ) * ( +2)) RKGPAJ Where, LN= a function providing the natural log of a term, ABS= a function providing the absolute value of a term, RKGPA- the rank of the GGPA score, ESTGGPA= the GGPA estimate. LEWA-transformed variable distributions were approximately normal. Due to deliberate weighting by GGPA rank, LEWA variables correlated negatively with predictors which were similar to GGPA, and positively with predictors which were inversely related to GGPA. Negative correlations with GGPA were as large as -.75. (Correlations were as great as .66 for age, .70 for possessing a college degree, and -.45 for average credits taken per term. Generally, however, correlations were not significant). .A testrof-factors-and-years MANOVA (MANOVATOF) was performed with five levels for the between variable, year. For MANOVATOF, it was possible to include an additional calibration factor, range restriction, defined as restricted where the PVUGPAs (undergraduate prerequisite veterinary course UGPAs) were above 3.0 and not restricted otherwise. Table 19 displays the ANOVA table confirming prediction differences among years among these conditions (and thereby confirming Hypothesis D). For cases in the 92 Table 19 Test of main effects for years* (MANOVATOF) SOURCE OF VARIATION SS DP MS F PROB Within Cells 643.78 462 1.39 Constant 1619.13 1 1619.13 1161.95 <.001 Years 33.60 4 8.40 6.03 <.001 'Measured in LEWAR-transformed prediction-error residuals LEWARsLog Error: Weighted, Absolute, and in Bank units) MANOVATopsTest-of-factors 8 years a repeated-measures MANOVA Years= 1981, 1982, 1983, 1984, 1985 cut-score region, LP (local-proxy) estimation differed across years. To further amplify the interaction among calibration factors, Figures 3 through 8 graph the dependent variable means (prediction error) for selected contrasting conditions. In Figure 3, the better predictiveness of the non-restricted condition is weakly suggested by the relative levels of error. (compare unrestricted ANOP, [where A: sample size= small N8 source of UGPA: all 0- range restriction- unrestricted P- intercorrelation < .45] against restricted ANGP, [where As sample size= small N- source of UGPAa all G- range restrictions on P- intercorrelation < .451) £93 omuofiuummu coon men coaumuwuonmxoum on» muons cofiuofiomhm mxoumnamooa nouns uouum cowuuwomum .n mmDOHm doz< + moz< D mums» 0mm“ vmofi mama mmmu «mow _ 1 _ Nd w. 3 M V L J n I 9d w J o J I 65 m a n. I ago a a m me. A cofiuoaouuooumucw Roundumum no I.oAU m me. v sawunamuuoououca uouowomua um n omuufluummucs (moo u0>noum no 0 mouusom dawn Has n2 u ouuoom anon Om: waco n» I a a 0." A ammo uw>umud no 4 exam madame aasu nn 4 swam maaamm noun\n Id I «A m mlhnouud nowadunou How aux NA moz< m> moz< mhmomou mafih mmE/O Homrmrmm mwz0H ha omuomuum oauuwa museums cowuowooum axoualamooa .uouum coauOHumun coauoumsmuu an consumma G053 mozm + mmmu omma muse» mmmu 002m B NmoH «mad me. A scausaouuoououcw soundness me. v coduaaeuuoououca uouuwoeud ouuofluumouco snob u0>nmum «mousom mac: Has mouse» «so: am: wane o.n A «moo uo>uoud onwm unease Hana swan masses ecun\~ unasouud sofiuwocoo b 4044834: 1‘ now won CI moZm m> 002m mHmOEOO mm>o ”Human?“ ZOHH .45) and BMOP, (where B: sample size- large M- source of UGPA- MSU 0- range restriction- off P- intercorrelation < .45: differing on intercorrelation of predictors: BMOQ having greater intercorrelation) fails to indicate a consistent pattern of difference. In Figure 5, BMOQ and BMGQ (where B- sample size- large Ms source of UGPA- MSU G- range restriction- on 0- intercorrelation > .45: differing by restriction of range: BMGQ being restricted), differ dramatically in 1984-- a dramatic interaction for years by range restriction. In Figure 6, BMOQ and BNOQ (where B- sample size- large N- source of UGPA- all 0- range restriction- off Q= intercorrelation > .45: differing by the inclusion of non-MSU undergraduates: BNOQ being the all-sources condition) differ substantially in both 1984 and in 1985, an interaction this time between years and non-selected UGPA source (in computing the formula calibration). unocoo «mad 0:» you cowuufiomum Hummus ou mummmmm waco mossy cofiuouHHOImxoum may no cofiuofiuummu .uouum ceauowomua omauoumcmuu an monommma c053 .m Hmoon 96 coin + 002m B muse» mom” vmma mama mama 2mm“ _ _ _ «WC 3 1 \ a! 1.14 . o M n v 1 II 0.0 L J m I md c J o J Ind m 9 n. I md d a m me. A ceausauuuoouuucw Mongoose no I ad m me. v casuaauuuoououcw mongoose um n vmuuwuumeucs «moo 0970.3 no 0 moouoou (no: Has 82 u cannon (so: am: maco a: I s a o.n A «non uo>I0um no a 0:0 mama—am :3 am 4 on: 0.33am noun} as I «A m unknowns Gonzo—.00 Mow >02 NA 002m m> 82m mHmOmOU Ezra mm>o Homrmhm HUZu0um no 0 moouoom n0uo no J anew mamsmm Hana nm . m swam mamamm moun\m «4 I «a J mahEOH0¢ newuwocoo you 202 m4 Gozm m> 002m mhmomoo mm>o HUddv/S ”momDOml<&UD phonon vmma may you cowumHouuooumucw m0 H0>0H an omuommmo museums coauoflomum wxoumIHmOOH .uouum cowUOwomuo omauoumcmuu ha omusmmmfi ch3 .b mmDon 98 0028. + .524 a mumo> mmmu ¢mmq mama mama «mma _ _ _ 0.0 . 1 I IIIIIIIH so a 4 IIIIIIIImIIIIIII M n v 1 I “.0 tum a B I 2 w J o J T. 56 w a p I on d a m me. A c0wumaeuuoouuucq uouuwuoum no I m.o m. me. v sawumHmuuoououcfi uouowooum mm m omuofluummuc: (do: um>u0um no 0 mouusom «do: Hae.nz I ~ u sousou «no: am: xaco I: a o.n A «moo um>uoum no a swan camasu Hazy um . m ouwm masses uuun\~ ad I - a unknowns s0auqucoo you aux NA Guz< m> moz< mHmOEOU max/O ”Powhhm ZOHHmn ou mummmmm muwm mamfimm .uouum cowuofiomum UmahoumCMHu an venommmfi con? .m mmaon aoz< + 002m B manor moms $3” mama mmma Emu _ _ _ ms . 1 m» .vo M n v - I ago I J 9 m 9 I m6 8 I. o 1 1A8 m a p I was a a a p me. A cowusaouuoououcw uouofiooun no I may m me. v cofiumaouuoououcw uouofiooum nm n nevusuumouco (moo uu>I0uo lo 0 moouoom 559 Sun I: I a u oousom «so: am: xaco n: 3 o.n A anon um>naua no 4 Guam cameos Hana no . u mean magnum moun\~ [d I «H J ulhsouufl scaumucou you 20% NA GOZ< m> 002m WNHm wing/Em QOM\N m> AADm 100 In Figure 7, the ANGP-ANGQ contrast (where A! sample size- small N- source of UGPA- all G- range restriction- off Qa intercorrelation of predictors < .45: differing by intercorrelation of predictors) also interacts with year in 1984. In Figure 8, a difference in sample size between BMOQ and AMOQ (where A! sample size- small Mh source of UGPA- MSU 0- restriction of range- unrestricted 0- intercorrelation of predictors < .45) results in small differences in prediction error until 1985 when year interacts with sample size. This graphical evidence is most noteworthy for its illumination of the interactive effect of years on all of the factors: (N) source, (A) sample size, (G) restriction of achievement level, and (Q) intercorrelation of predictors. It may be recalled that in the calibration for this test of years and factors, the large sample size was n- 135, while the small size was n- 80). It should be noted, therefore, that in the case of intercorrelated predictors ( r > .45), the interactive effect of years only occurs where the sample is small (compare ANGP:ANGQ in Figure 7 with BMOP:BMOQ in Figure 4). Surprisingly, an interactive pattern appears under the large sample, restriction of PVUGPA, which doesn't occur under the small sample condition (in Figure 5, contrast BMOQ:BMGQ, then 101 compare with ANGP:ANOP in Figure 3). It happened that for the ‘years 1984 and 1985, the large sample, restricted PVUGPA conditions had low case counts (1984: n- 117 ,1985: n- IJJJ. Due, perhaps, to the greater sampling error for these two years, the selection for’ these large sample conditions wasn't representative (and was, perhaps, greatly restricted for' MSU' applicants). By chance, the small sample conditions for these two years select a much more representative sample. Relative Validity of Non-MSU and MSU UGPAs Contrary to Hypothesis E, Table 20 shows that non-MSU applicants were not associated with significantly greater prediction error (in 1981 and 1982) . For these years, log-transformed absolute ranking error does appear to be Table 20 T-test of mean error* between MSU and non-MSU applicants ERROR TYPE N MEAN so SE T DF PROB(2-TAIL) LEAR cccg N-MSU 26 2.7924 1.045 .205 .75 187 .449 MSU 163 2.6307 1.003 .079 LEAR UGPA N-MSU 26 2.9755 .866 .170 1.42 187 .189 MSU 163 2.7108 .963 .075 ' Measured in transformed, unweighted prediction error residuals (log of absolute rank error) Gch= General-criterion formula where (1) sample size is large, (2) intercorrelated predictors are used, and (3) non-MSU UGPAs are included 102 slightly greater when the estimator of GGPA is UGPA as opposed to the GC (general-criterion) estimator. _: E" i-_ 3“ CHAPTERVI DISCUSSION This study enquires whether prediction (or estimation) using a proxy criterion might exhibit better precision than conventional methods (use of UGPA, and use of regression estimates from multiple predictors of previous annual samples). It is assumed that moderation of prediction error by years is a data characteristic favoring the practice of local-proxy estimation. An ancillary concern is the role of observable prediction factors which might influence the relative validity between proxy and authentic criterion prediction methods. The experimental proxy criterion methods do not significantly differ from the conventional methods when evaluated on the basis of logs of weighted absolute values of prediction error residuals (weighted according to proximity to the cut-score) . Thus, for these data and methods, method of estimation seems to have little impact on cases most vulnerable to rejection. Nevertheless, no predictive advantage is observed for any of the proxy criterion methods over that of the simple UGPA (moreover, by the same standard, Figures 1 and 2 suggest no significant advantage over UGPA for the conventional multiple predictor method either). 103 h 104 For the study and control of prediction factors (Q= level of intercorrelation of predictors, NB source of UGPA, and A- sample size) in the methods analysis (or MANOVATOM), two levels for each factor were represented in the condition matrix. In addition, a between groups factor, years, also had two categories, 1984 and 1985. For these years, factors did not significantly differ from each other in their influence on prediction. Likewise for the test-of—factors-and-years (or MANOVATOF), similar prediction factors were in the condition matrix plus another two-level factor, range restriction (O) on the veterinary pre-requisites UGPA. The methods factor was eliminated: all conditions used the LP (local-proxy) approach. The sole between groups factor, years, had five categories (five years of data). For these years, three of four factors differed from each other in their influence on prediction. Factor scores for factors predictor’ intercorrelation (Q), and for’ restriction-of- range on the veterinary prerequisites UGPA (O) , differed from each other and from the factor scores for sample size (A) and applicant source (N) factors. The latter factor scores, however, failed to differ significantly from each other. While this study does not find a consistent predictive advantage for the proxy methods over the GC (general-criterion method) or even over the UGPA, (1) some potential enhancements of the method remain untried, and (2) the theoretical viability of the concept is not 105 clearly refuted. Nevertheless, from Figures 1 and 2, it appears that the UGPA acting alone is virtually as valid a predictor of GGPA as is multiple variable prediction (particularly when (1) ranking-error is the error variable, and where (2) cut-score region cases are emphasized). High UGPA Validity Standardized graduate admissions examinations provide consistent scales of evaluation across schools, and may be indirectly responsible for the apparent reliability of UGPA that was present in the data studied: local grading policies may be shaped by national admissions test performance. The same may also be true for other admissions variables: effective variables may provide little independent prediction where their effects are mediated through the UGPA. It is probably more likely the case, however, that the sounder psychometric indicators (the UGPA and the admissions tests) are more valid as predictors due to (1) superior reliability and due to (2) a factor structure similar to the GGPA. The adequacy of UGPA prediction can be expected to diminish, however, where a larger proportion of multiple- source UGPAs have to be handled. This would be the case in prediction across sites and in non-veterinary graduate programs which may be less influenced by geography (veterinary programs serve primary geographic regions and are, on average, less than one per state). Likewise, GC . _——_———.WW 106 (general-criterion) prediction would be less precise due to measurement error in the independent variables during formula calibration, and due to the subsequent lack of precision in appropriately’ applying formula weights to larger numbers of ambiguous UGPA scores. The localizing of prediction limits the entry of uncontrolled moderators into the statistical analysis, potentially improving internal validity, but it nevertheless diminishes external validity. The present study is local with respect to the dependent variable (100% MSU GGPAs), and it happens to be mostly local with respect to the UGPA predictor. The generalizable prediction methods in this study (GC, GP, and GM) are largely limited to generalizing across years. For this data there is only weak evidence that years moderate prediction (plots of means indicated only one of five years where prediction error differed notably). Moderation-due- to-years in other sites, and moderation-by-sites remain to be investigated. Potential Usefulness of Proxy Methods Although it is true that problems due to error-laden UGPAs also impair prediction (and selection) with proxy- criterion methods, the potential exists for situations where proxy methods may be optimal. One such occasion may exist where changes are being made in the outcome variable. Use of the GGPA as a prediction criterion is questionable because it can fail to adequately represent 107 practical competence. Ultimately, however, the problem is not the limitations of GGPA, per se, but of the measurement design and procedures which succeed or fail to measure appropriate performance factors. Typically, this measurement issue is ignored while attention is fixed on the issue of selecting and weighting adequate predictors. Researchers in this field often conclude that the prospects of predicting beyond a given level of precision may be futile. Such a conclusion, however, neglects to address the dependence of prediction on the quality of measurement which determines the criterion variable. The measurement of factors which distinguish adequately between levels of professional competence must be the most important component to the improvement of academic selection. As institutions depart, nevertheless, from measurement conventions (across years or locations) the need for localized prediction will emerge. It may be in such a context that proxy-criterion prediction methods find greater practicality. Potential Improvements to Estimation with a Proxy Criterion Figures 1 and 2 indicate that for the two years represented, the GC method appears to consistently out- predict UGPA when the calibration condition is BNQ (where B- sample size= large N- source of UGPA- all 0- predictor intercorrelation >.45), ' ..-.....m” 108 (where A- sample size- small Me source of UGPA- MSU P- predictor intercorrelation <.45), B- sample size- large Me source of UGPA- MSU P- predictor intercorrelation <.45), (where (where A- sample size- small N- source of UGPA- all Pr predictor intercorrelation <.45), or BNP (where B- sample size- large N= source of UGPAas all P- predictor intercorrelation <.45): the LP method predicts similar to the UGPA condition for the two years under the calibration condition BMP: and the GM method is consistently as predictive as UGPA under calibration conditions BNO, ANQ (where A- sample size- small N- source of UGPA- all 0- predictor intercorrelation >.45), and ANP. Hypotheses are hereby provided to account for these patterns. In Table 21, factors are specified (IDEAL FACTORS) under each method-sample size combination. Each ideal factor is based on a logical or empirical expectation relating (1) intercorrelation of predictors (Q= r > .45, P-r < .45) to sample size (large > 270, small= 180), (2) source of UGPA (N- all sources, M- MSU) to type of criterion used, and (3) source of UGPA to type of UGPA predictor used. For method G0 with a small sample size, P 109 Table 21 A chart for identifying ideal method prediction factors APPLICATION CALIBRATION METHOD Variable Predictor N Criterion Predictor Estimated Used Used Used GC-small smple GGPA4 UGPA 180 GGPAx UGPAx IDEAL FACTORS . . . . . .

GC-large smple GGPA4 UGPA4 27o GGPAx UGPAx IDEAL FACTORS . . . . . . LP-small smple GGPA4 UGPA4 200 PVS4 UGPA4 IDEAL FACTORS . . . . . .

LP-large smple GGPA4 UGPA4 300 PVS4 UGPA4 IDEAL FACTORS . . . . . . GP-small smple GGPA4 UGPA4 600 PVSx UGPAX IDEAL FACTORS . . . . . .

GP-large smple GGPA4 UGPA4 900 pvsx UGPAx IDEAL FACTORS . . . . . .

GM-small smple GGPA4 UGPA4 600 GGPAx UGPAx vax UGPAx IDEAL FACTORS . . . . . .

GM-large smple GGPA4 UGPA4 900 GGPAx UGPAx pvsx UGPAx IDEAL FACTORS . . . . . . GGPA4- estimated graduate GPA of applicant UGPA4- applicant's undergraduate GPA PVS4 - applicant's veterinary prerequisites UGPA GGPAX- includes GGPAs of applicants of other years UGPAxs includes UGPAs of applicants of other years PVSx = includes PVS4s of applicants of other years

- predictor intercorrelation < .45 = predictor intercorrelation > .45 = UGPA not selected by source - only MSU UGPAs in sample 110 (predictor intercorrelation < .45) is recommended due to the small sample size, while IN (a calibration sample containing some non-MSU applicants) is recommended due to the fact that the UGPA4 (UGPA for the fourth year cohort) used in the application stage will be error-laden relative to the frequency of non-MSU applicant UGPAs (if factor M were used, only MSU UGPAs would be in the calibration, thus potentially inflating the beta weight for the UGPA predictor). For all conditions, N is generally appropriate because for all of the applicant data there are non-MSU UGPAs which impose scaling error into the data. Nevertheless, for methods LP and GP, :M (specifying a calibration sample containing only MSU applicants) provides a calibration criterion having less error than one which includes non-MSU applicants. Because it is impossible to use UGPA from all sources as a predictor and to simultaneously use a PVUGPA (proxy criterion) from only MSU cases, a choice must be made between two advantages: (1) a realistic predictor or (2) a less fallible proxy criterion. The advantage from M (a less fallible proxy criterion) is arbitrarily granted more importance than that from N and thus M is recommended for LP and GP. Where the sample size is sufficiently large and the calibration criterion adequately precise, Q (predictor intercorrelation > .45) seems to improve prediction, otherwise P (predictor intercorrelation < .45) seems more efficient. 111 Having used the present data to inform this set of expectancy rules, the hit rate for these rules on the same two years of data is 75%. The factor that is difficult to specify with confidence is Q (Predictor intercorrelation) because (1) the calibration criterion is imprecise and because (2) the sample size is only moderately large. Study of the condition means, therefore, gives rise to three hypotheses: (1) LP prediction may be improved by the use of single-source selected cases (to .improve the reliability of the proxy-criterion when calibrating the selection formula), (2) use of an overly reliSble UGPA predictor (in the calibration) may contribute to prediction error: hence, the addition of random error to the UGPA predictor in the calibration may reduce the inflation of the UGPA predictor weight, and (3) multicollinearity may be less of a problem with these variables than has been presumed: given an adequate sample size and precise measures, moderately correlated multiple prediction may be more valid than prediction with correlated variables selected out. How Intercorrelation May Remain Benign As Pedhazur (1982) points out, there is no agreement on the breadth of meaning in the term multicollinearity, although its existence is unambiguous where predictor intercorrelation biases regression coefficients. It is clear that parameter bias is more likely as (1) the number ....a nus—.444» —uw 112 of predictors approaches the number of cases in the calibration sample, (2) the predictors share mutual factors, (3) intercorrelated predictors are highly correlated with the criterion, and (4) predictors lack reliability (Kenny, 1982). Although for this study GC (general-criterion) performance is limited to only two years, for both of those years GC prediction using correlated predictors (having intercorrelations > .45, see Appendix III) was numerically (though not significantly) better than. GC prediction ‘with 'uncorrelated predictors (when the sample size was the maximum: review Figures 1 and 2). irt may be that given (1) the unique factor structure flor intercorrelated predictors used and (2) the reliability' of the intercorrelated predictors used, prediction using mmderately intercorrelated predictors may be a desirable method. Extreme Outcomes in Figure 5 and 6 As can be seen from Figures 5 and 6, respectively, two LP (local-proxy) calibration conditions produce substantially higher error means for 1984 and somewhat higher means for 1985 data: BHGQ (where Ba sample size= large Ms source of UGPA- MSU G- range restriction- on On predictor intercorrelation >.45) 113 and BNOQ (where B- sample size- large N- source of UGPA- all Oa range restriction- off 0- predictor intercorrelation >.45). This underscores the previous recommendation that local-proxy calibration be performed with uncorrelated predictors due to the combination of (1) unreliability of the proxy criterion and (2) the moderate sample size. Beyond these observations, the question remains-- ‘what affected prediction for these two years which was not evident for the preceding years?' Multicollinearity might be offered as an explanation for the unusual increase in error because, in both conditions, uncorrelated predictors are employed in the calibration. For the three years prior to 1984, however, the error level is uniformly low. Although the BMGQ calibration conditions were more restricted than were the data subsequently estimated by its selection formula, BNOQ (using ‘unrestricted calibration conditions) exhibits greater error than BMGQ. One explanation might be sampling error. Although the LP calibration would ordinarily include all cases in the calibration and in the estimation process, for this experimental study, calibration samples are, in fact, limited artificially to condition (a) of 80 cases, or to condition (b) of 135 cases, while the application sample is unrestricted. The 1984 and 1985 data entered the study midstream (it was added after 1981-1983 condition samples __3 114 had already been drawn), and because the case counts for these years were low, the restricted calibration samples for these years were somewhat under-sized (1984: B- 117, and 1985: B- 111, while for 1981-1983: B- 135). Although the calibration with the restricted sample may be expected to be less reliable than a calibration with a larger sample, the opposite outcome is also quite possible. Although sampling error might account for this erratic prediction, such an explanation seems inconsistent with Figure 8, for instance, which depicts large and small sample prediction (BMOQ vs AMOQ) equal and constant across five years. Interactions only occur dramatically for 1984 data suggesting that a year effect is present for 1984 and perhaps (to a lesser extent) for 1985. The nature of this potential year effect is not evident. Because this apparent year effect is not systematically associated with any particular prediction factor, it can be suspected that it results from general unreliability in the selection equation due, perhaps, to some particularly incongruent non-MSU veterinary prerequisite UGPAs, or non-MSU, ordinary UGPAs which appear for these two years. Only these two variables are expected to be both important and yet potentially unreliable (due to possible differences in grading standards) to the degree sufficient to cause such a year effect. Probably the most feasible explanation for the resulting interactions are that some unusually incongruent non-MSU UGPAs in the 1984 (and to a lesser 115 extent, 1985) data when combined with two other prediction invalidating conditions (prediction factors) created notably unreliable regression equations. When the unreliable equations were used to create GGPA estimates, they again drew upon a UGPA predictor which remained quite unreliable. Therefore, in using a proxy criterion, it must be remembered that unreliability makes its mark both in the calibration of the equation and in the estimation of the criterion. Cautions Regarding Study Realism The small number (5) of suitable cohorts available for analysis limits the emergence and range of prediction factors in the data. To compensate for such limitation, this study exaggerated potential sources of variation in prediction error by deliberately biasing the selection formulas. This was achieved by (1) selecting on certain variables (eg. PVUGPA and MSU UGPA) , (2) controlling the sample size, and (3) controlling the admission of correlated variables into the formula calibration. Unfortunately, some of the resulting conditions depart from realism. For that reason, omnibus tests of factor effects that test for general effects (over a large number of conditions) may have less practical validity than certain realistic, specific contrasts. Some calibration conditions in this study are either not realistic to practice, or they might otherwise mislead '.._o 4 “>.“‘“-?‘ITI"-—r ’ 0.. 116 the interpreter of the study. In the method contrasts, for example, the sample sizes of the proxy methods are always greater than those of the GC (general-criterion) method, which accounts for the more erratic variation of GC error. Small sample size here was defined as data from only two of the three years of the cases available. Roughly 270 veterinary graduate cases were available to GC ‘method calibration within three years of data, while about 900 cases were available to the GP (general-proxy) and to the GM (general-mixed) methods within the three years of data. For the tests of methods, error was defined as error- in-ranking the estimated GGPAs against actual GGPA (or, LEWAR, where L! natural log value E- estimated W- weighted A- absolute value Re error in rank). Although the use of error in ranks was necessary in order to compare raw score regression error (for the proxy- criterion methods) with standardized regression error (for the GC method which required standardized regression in order to correct partial coefficients), rank error was more valid with regards to selection error. This weighted LEWAR and LEWA error (where L: natural log value E- estimated W3 weighted A- absolute estimation error residual) should be more appropriate than using ordinary unweighted ‘ Q 117 estimation error, as ordinary estimation error has little importance for cases above the cut-score. In addition, the use of log transformed error values gave greater weight to errors of large magnitude. Larger errors, were believed most likely to be due to (1) errors near the cut score, (2) errors from non-MSU UGPAs, or ( 3) criterion errors. Among generalizable calibrations, and among LP calibrations, ( 2) and ( 3) should remain constant, leaving cut-score region errors to explain the differences between these conditions. A problem linked to the use of weighted scores is the loss of degrees of freedom and the resulting liberalizing effect on significance tests: systematically weighting some cases greater than others, is the same (with respect to degrees of freedom) as deleting some cases. In the LP (local-prediction) method contrasts, the calibration sample sizes (either 80 or 135) were less authentic due to limits in absolute number (it became impossible to provide even these numbers of cases for all conditions for the years 1984 and 1985) . Error measures also were less realistic for the test across LP (local- proxy) methods, because they were in LEWA error (logs of weighted absolute differences between the estimated and actual criterion) rather than in ranking error. This sacrifice was made in the hope of improving the power of the statistical tests. In practice, selection formulas would be applied to wide ranges of applicant scores: therefore, selection 1114 ”-1)“ I :1. a) :‘A 118 formulas might ideally be calibrated on samples having the same wide ranges of scores. According to Table 3 (adapted from Richards, 1982), except for measurement error in the independent variable, raw-score regression coefficients could remain correct despite selection on the independent variable. Thus restriction of range might not be a problem if one wished to use a raw-score regression model as a selection formula. Such reliability in scores, however, is difficult to substantiate. Richards also demonstrates that (1) variation in the dependent variable range between application and calibration samples, and (2) variation in scale interval meaning affects the validity of the raw- score regression coefficient. The alternative regression procedure, standardized regression, is not affected by the problem of scale variation, and in addition, it allows for correction (actually, only a shrinking of error) of its betas (or partial correlation coefficients) for restriction of range on both the independent and dependent variables. Alexander et al. (1987) provide a formula for correcting correlations for both types of range restriction, and the author has extended the application to the betas (partial coefficients) in a standardized regression model. Since correction, in fact, is likely to be conservative (Linn, Harnisch, and Dunbar, 1981) and may ignore significant predictors (to result in an under-specified model), ‘adjustment for dispersion' may be a more 'appropriate expression. In addition, such adjustment is also required 119 for expansion of range, where this occurs. In the practical application of selection formulas, the proxy- criterion methods would ordinarily not require any adjustment for dispersion differences between calibration and application samples, whereas, adjustment would be required for the conventional GC (general-criterion) method. Ordinarily, therefore, the error due to dispersion differences would be greater for the GC approach since the adjustment could only be approximate. In this study, however, the validity-test sample happens to be ideal for the GC method because the application sample is range restricted (to upper distribution cases) in the same way as the GC caIibration sample is restricted. The (unrestricted sample) proxy- criterion methods are faced with an uncharacteristic range restriction problem relative to the (range restricted) validity-test sample. In the test of methods, this has been partly countered by adjusting the GC (general- criterion) selection formulas for application to the total ranges of applicant scores. As such an adjustment is expected to be conservative, however, the GC method is expected to retain a slight predictive advantage attributable to the characteristics of the application sample of this particular study. 0n the other hand, the potential exists for the adjustment to accrue extraneous errors if the standardized regression is subject to the effects of sampling error and therefore imprecise 120 (standardized score regression is required in order to estimate the partial correlations to be adjusted for dispersion differences for each predictor). The author's adjustment technique, though extrapolated from respected bivariate correlation, has no literature to support its use with multiple coefficient correlation. This author did test the procedure on a small scale simulation with anticipated results, and therefore uses it with some justified confidence. Of course, argument can be offered for not adjusting the GC selection formula, inasmuch as the full-range selection formula will not be optimal for cut-score situations where the selection ratio may be small. Such a course, nevertheless, risks the likelihood of misspecifying the selection model by erroneous predictor inclusions or exclusions or by weightings associated with the use of a restricted calibration sample. Why R2 Wasn't Used as a Measure of Validity R2 is the square of a multiple correlation coefficient, a conventional measure of validity for regression equations. R2 was not used in this study as an index of validity for several reasons. Not only does R2 not give greater weight to cases near the cut-score, but it gives greater weight to cases the farthest from the cut-score. Because R2 is a variance statistic, it follows that cases farther from the mean will contribute a disproportionate n- 1." vxm-’ 121 share of the variance: variance- SUM (score - mean)2 / N. Note that the difference from the mean is squared, thus larger discrepancies from the mean can disproportionately influence the magnitude of the variance (e.g. the outlier problem). Because a graduate candidate distribution represents the upper tail of a distribution (higher ability college students), this distribution will be strongly negatively skewed with the distribution mean near the cut score. Under these circumstances, the most likely means of improving an R2 would be to improve prediction at the place in the distribution the most extreme from the mean: among the highest ability candidates. Such an ‘ improvement' in prediction may, in fact, decrease the level of discrimination in the cut-score region. It is also possible for the R2 to increase significantly without any corresponding change in the ordering of scores. In such circumstances, selection would remain unchanged despite better absolute prediction. Although the alternative index of validity, transformed prediction error, leads (as mentioned before) to liberal significance tests, it, nevertheless, can measure the change of interest to the admissions office. 122 Regression Equations Have Superfluous Predictors To control for confounding, calibration (regression) runs were executed without deleting ineffective predictors by means of conventional statistical tests. Statistical significance tests for coefficients are based partly on the number of predictors and partly on the size of the sample. Significance tests thus potentially confound with prediction factors such as (1) sample size and (2) intercorrelation of predictors. The consequence of superfluous predictors in the regression equation, however, is the addition of random error to the predicted scores, thereby attenuating actual predictor validities (Deegan, 1976). Equations without superfluous predictors will follow below. Practical Implications of the Study Presented. in. Tables 22, 24, and 25 are regression models developed through a stepwise multiple regression procedure where conventional methods of coefficient significance testing (a- .05) have been used to refine the set of active predictor variables. In Table 22 (Method betas by condition) three estimation methods, GC (general- criterion), GP (general-proxy), and GM (general-mixed) use pooled. data (1981 through 1983) to calibrate a single (estimation) regression equation for each method. For the LP method, 1984 and 1985 data are used to calibrate a regression equation for each of the two years. The GC and 123 GM methods (using GGPA and PVUGPA [veterinary prerequisites GPA] respectively) produce highly similar regression equations and they also give the greatest weight (.94, virtually all the predictive weight) to the UGPA variable. The GP method differs from the previous two methods in its weighting scheme, and weights UGPA at .77. The LP method runs (1984 and 1985) utilize fewer predictors and weight UGPA at .67 and .63 respectively. The similarity between GC (general-criterion) and GM (general-mixed) calibrations further underscores the similarity of UGPA and GGPA. at the site under study (although for GM the alternate criterion is, in fact a proxy, PVUGPA- veterinary prerequisites UGPA). As displayed in Table 22, for the GC method, the selection formula validity (without crossvalidation) with five years of data was .91 (or about 80% of criterion variance accounted for by the predictors, or R2- .81). The best average error using the GC formula to predict subsequent GGPA ranking of approximately 90 vet school graduates averages was a discrepancy of 17 positions. Also in Table 22, proxy-criterion methods obtained similar validity coefficients (non-crossvalidated). Of course these formulas predicted the proxy rather than the real criterion. Measured in terms of actual prediction error, the best average (absolute) proxy criterion (GM) prediction error was .29 from the actual GGPA. 124 Tables 24 and 25 provide additional LP runs for years 1981 through 1983: in Table 25 regression equations were calibrated on only 75 percent of available cases. Across the five years the weighting scheme varies somewhat, although UGPA maintains its dominance in prediction (ranging from .63 to .99 for the full samples). Sample-size differences (100% vs 75%) have a minor effect on the regression coefficients estimated, but little effect on the predictors selected. Because this study is site specific, generalization from these findings to other sites must be regarded as tenuous. One important feature of this data set is its homogeneity with regards to origin of UGPA (undergraduate GPA is predominately MSU). Admission of students with non- MSU UGPAs may require a relatively higher level of ability: graduate school faculty may be more willing to accept equivalent credentials from students whom they know rather than from less familiar candidates, thereby ruling out all but the highest performing non-local candidates. Should such be the case, prediction. error in ranks *would. be minimized (rank error decreases at the extremes of the distribution). This could explain (1) why UGPA is so effective as a predictor, and (2) why non-MSU UGPAs are not associated with significantly greater prediction error (in ranks). At other sites or in other selection situations, the UGPA and GGPA. may’ be less dependent, opening' the potential for (1) alternative predictors with competitive 125 Table 22 Method betas under optimal prediction conditions 94;? -.-.4-- .._.. . -; 9.9 9.! 9.2 L291 LE9: Predictor B p B a B s.e. s.e. s.e. s.e. s.e. 01 actach -.032 -.031 -.043 . . .013 .013 .019 . . 02 age . . . . .093 . . . . .034 03 avgcred ---------------------------------------------- 04 cred .447 .460 .346 . . .119 .120 .164 . . 05 ugpa .938 .940 .767 .674 .626 .037 .037 .053 .031 .038 06 intlsc . . .149 . . . . .019 . . 07 int2sc . -.027 . . . . .013 . . . 08 mcatb ------------------------------------------------- 09 mcatc .048 .051 .084 . . .018 .018 .022 . . 10 mcatp .038 .057 . . . .018 .018 . . . ll mcatq ------------------------------------------------- 12 mcatr -.039 -.046 . . . .015 .015 . . . l3 mcats ------------------------------------------------- 14 narr -------------------------------------------------- 15 numterms .122 .128 . . . .039 .039 . . . 16 pfcred . . .062 . . . . .019 . . 17 pts -.678 -.692 -.445 -.110 -.125 .114 .114 .164 .023 .034 18 pvspts .217 .213 .170 .341 .383 .018 .018 .025 .031 .039 19 sex -.049 . . . . .014 . . . . 20 sumcred ----------------------------------------------- 21 totcred ----------------------------------------------- 22 vetexp ------------------------------------------------ 23 workexp ----------------------------------------------- R .91672 .91599 .81681 .93142 .91284 Note: See Table 23 for predictor variable definitions GC= general-criterion, GP= general-proxy, GM- general-mixed, LP84= local-proxy for 1984, LP85= local- proxy for 1985 126 Table 23 Predictor names and their definitions NAME DEFINITION 01 actach: activities and achievements (non-academic) 02 age: age in years 03 avgcred: average term credits 04 cred: total (grade point) credits 05 ugpa: undergraduate grade point average 06 intlsc: interviewer rating number 1 07 intZsc: interviewer rating number 2 08 mcatb: Medical College Admissions Test: Biology 09 mcatc: Medical College Admissions Test: Chemistry 10 mcatp: Medical College Admissions Test: Physics 11 mcatq: Medical College Admissions Test: Quantitative 12 mcatr: Medical College Admissions Test: Reading 13 mcats: Medical College Admissions Test: Science Reasoning 14 narr: narrative writing sample 15 numterms: number of terms enrolled in college 16 pfcred: credits on a pass/fail basis 17 pts: total honor points 18 pvspts: total honor points in veterinary prereq. 19 sex: gender 20 sumcred: total veterinary prerequisites credits 21 totcred: total grade-point and pass/fail credits 22 vetexp: veterinary experience rating 23 workexp: work experience rating 127 Table 24 LP betas by year: full sample 1291 .1292 1291 1291 1299 Predictor p p fi fl 3 s.e. s.e. s.e. s.e. s.e. 01 actach —— - — ————= 02 age .138 . . . .903 .047 . . . .034 03 avgcred — — -- ————— - ==— — — ———= 04 cred . .640 .591 . . . .190 .186 . . 05 ugpa .823 .994 .914 .674 .626 .031 .064 .064 .031 .626 06 intlsc -.068 . . . . .023 . . . . 07 intZsc ------------------------------------------------ 08 mcatb . .076 . . - . . .030 . . . 09 mcatc . .058 .082 . . . .028 .025 . . 10 mcatp .103 . . . . .026 O O O O 11 mcatq ------------------------------------------------- 12 mcatr . -.101 . . . . .027 . . . l3 mcats -6 ----------------------------------------------- l4 narr - - —— — — —=— — ——=== 15 numterms .170 . . . . .078 . . . . 16 pfcred — — - ———-==== 17 pts -.388 -.797 -.719 -.110 -.125 .068 .189 .192 .022 .034 18 pvspts .220 .196 .261 .341 .383 .031 .029 .029 .031 .039 19 sex --------------------------------------------------- 20 sumcred . . .067 . . . . .023 . . 21 totcred ----------------------------------------------- 22 vetexp . . . . -.056 . . . . .026 23 workexp ----------------------------------------------- R .91595 .91893 .92455 .93142 .91284 Note: See Table 23 for predictor variable definitions ”f .... "“"“"‘.fi322.flf LP betas by year: 75% sample 128 Table 25 1291 1292 1299. 1299 1299 Predictor p p p p B s.e. s.e. s.e. s.e. s.e. 01 actach ------------------------------------------------ 02 age . . . . . . . . . .036 03 avgcred ----------------------------------------------- 04 cred . . 1.027 . . . . .226 . . 05 ugpa .852 .814 1.003 .735 .662 .035 .031 .072 .050 .040 06 intlsc -.053 . . . . .026 . . . . 07 int2sc ------------------------------------------------ 08 mcatb . .069 . . . . .033 . . . 09 mcatb . .085 .078 . . . .031 .030 . . 10 mcatp .105 . . . . .028 . . . . 11 mcatq ------------------------------------------------- 12 matr . -.107 . . . . .030 . . . l3 mcats ------------------------------------------------- l4 narr -------------------------------------------------- 15 numterms .332 . . . . .080 . . . . 16 pfcred ------------------------------------------------ 17 pts -.435 -.189 -1.l71 .350 -.150 .080 .028 .235 .113 .037 18 pvspts .201 .189 .281 .344 .364 .034 .031 .034 .037 .040 19 sex --------------------------------------------------- 20 sumcred . . .102 . . . . .027 . . 21 totcred . . . .237 . . . . .114 . 22 vetexp . . . . -.064 . . . . .029 23 workexp ----------------------------------------------- R .91853 .92245 .91723 .92782 .92002 Note: See Table 23 for predictor variable definitions 129 validity, and (2) substantially better relative prediction for multiple regression prediction methods. For selection situations similar to that in this study, the use of admission test scores is open to challenge. Admission test predictors obtain only marginally significant (as .05) regression weights (see the MCATc, MCATp, and MCATr weights in the GC column, Table 24) when the UGPA predictor variable is already in the selection equation. Nevertheless, if the test score information is used by undergraduate institutions (as a secondary purpose for the data), to evaluate and modify internal curricular and grading standards, abandonment of admission test requirements might well result in a decline in UGPA validity. The potential influence of the admissions test on UGPA validity is probably sufficient reason to retain admissions test scores in the selection equation even though the immediate consequence may be marginal. Over the course of several years, the continuing inclusion of the admissions test predictors in the selection formula may preserve or even improve the validity of UGPA. One strategy would be to use the admission test scores to correct all UGPAs, or to correct just the outside (e.g. non-MSU) UGPAs. The work of Linn (1966) in adjusting GPA was cited earlier. He found that GPA adjustment by admissions test scores effectively eliminated the GPA error due to source of GPA for high school GPAs. After correction of the appropriate UGPAs, admission-test 130 predictors would be withheld from the selection formula. In such a procedure, UGPAs would be adjusted up or down depending on the UGPA-admission test score discrepancy. Such a process, nevertheless, might provide little benefit to selection where few outside UGPAs reside near the cut score. The inclusion of demographic variables in the regression equation may be quite informative for purposes {of research. For application however, the use of demographic variable predictors cannot be recommended. This caution must be exercised because data samples are voluntary: hence, they are non-random data, plagued with systematic selection effects related to candidate recruitment and personal motivation. Effects attributed to categorical variables (e.g. race, gender) may be completely spurious. For example, due to a shortage of black candidates in human medicine, all minimally qualified blacks may be intensely recruited to human medicine, greatly depleting the remaining pool of blacks who would consider application to veterinary study. Blacks from this depleted pool would not provide a sample of black characteristics which could be validly generalized. Where data are not random, demographical variables must be regarded as moderator variables which may control variance to a certain extent, but which do not account for that variance in a literal sense (e.g. gender effect not being due to one's gender, per se). Ultimately, the predictive -""'-,‘.-.3':.‘.:‘y 131 advantage gained by blocking cases according to (moderator) categories (e.g. gender, race) may be retained by identifying other variables which account for the qualitative differences between category levels. For example, Table 22 shows a significant GC weight for the predictor variable "sex". In reality, the sex predictor 'variable may be mmderating "level of affection for animals", or "temerity towards human patients". By including measures of these traits in the selection equation, the advantage of including the gender variable may disappear. Use of the categorical (moderator) predictor (e.g. gender) in the actual admissions selection formula merely on the basis of its predictive value is a questionable policy which is difficult to justify. Beyond the consideration of merit, certain categorical variables neutral to race, religion, or gender (e.g. economic disadvantage) may be chosen and weighted by the admitting institution to create a non-merit criterion for admissions as an exception to the usual merit criteria (see Roos, 1978, for relevant information on non-merit admissions selection conforming to the Bakke judicial decision). The rating of such non-merit attributes, nevertheless, shouLd not interfere with the evaluation of conventional applicant academic merit. The balance between merit and non-merit considerations should be specified as a consistent policy prior to application of non-merit considerations to particular cases. CHAPTER VI I SUMMARY Refinement of the selection process is an essential component of any serious effort to enhance public benefit from educational programs. In the introductory chapter it is noted that only recently has ease of data entry and retrieval made the prospects of a scientific graduate candidate selection process feasible. A review of research on candidate selection within the health sciences reveals a lack of consistent findings, likely partially due to limited sampling and sample sizes. Additionally, inconsistencies may arise from unspecified factors associated with particular years and locations which influence the composition of the non-random pool of applicants. One potential solution to such year and location effects would be to ‘localize' estimation by attempting to estimate future performance based on a sample restricted to a single site and year. This prediction strategy was dubbed the LP (local-proxy) approach because it specified that the sample be local and that the regression criterion used be a proxy for the GGPA (graduate GPA). The proxy criterion would be the grade point average for the undergraduate veterinary prerequisite courses (PVUGPA), and this would allow the UGPA (undergraduate GPA) to serve as one of the multiple predictors. Three 132 133 additional variants of the LP approach would include (1) the GP (general-proxy) where admissions data would be multi-year, (2) the GM (general-mixed) where admissions data would be multi-year (or multi-site) but the criterion would use either authentic or proxy criterions, or (3) the lone variable UGPA as an estimator of GGPA. Advantages to be gained by these methods might include (1) an increase in the sample size of local cases and (2) an expansion to the full applicant range of local cases (no cases need be deleted from the analysis for lack of a subsequent GGPA), (3) additional predictors can be added or deleted for any year (alternate admissions test scores could be accepted to a limited extent), and (4) previous GGPA data would not be required in order to estimate GGPA. The research hypotheses called for some estimation methods to exceed the predictive validity of others: A: LP (local-proxy) to be more valid than UGPA B: GM (general-mixed) to be more valid than the GC (general-criterion) E: MSU applicant prediction to be more valid than prediction for non-MSU applicants, or for the appearance of year or method effects: C: Year and method effects to appear in an analysis of methods D: Year effects to appear in an analysis of years and other factors The review of literature in Chapter II concludes that research on predictor validities for the health sciences are not consistent from year to year nor across sites. 134 Linn's (1966) work to equate high school GPAs indicated that for these data the introduction of admissions test scores into the multiple regression was as effective as other more elaborate equating methods. Although some work has been done using UGPA as a proxy, no literature was found (other than the author's) where the context is academic selection. A large number of studies reported evidence of factors which moderate prediction among educational samples. In ‘the theoretical review' of“ Chapter II ‘topics of sampling, measurement, coefficient validity, multiple regression, and multivariate and univariate analysis of variance and the t-test are discussed. In Chapter III conventional prediction is identified as general-criterion, or GC (general, because the regression equation is generalized across years and sites: criterion, because the criterion used is an authentic criterion). This prediction is compared and contrasted with the following three experimental multiple predictor prediction procedures: (1) local-proxy, or LP (local, because the equation is not generalized beyond year or site: proxy, because the criterion is a ‘stand-in' [veterinary prerequisites UGPA] for an ‘authentic' criterion [graduate GPA]), (2) general-proxy, or GP (general, because the equation can be generalized: proxy, because a proxy criterion is used), 135 (3) general-mixed, or GM (general, because the equation may be generalized: mixed, because the criterion will be an authentic criterion when available, otherwise, the criterion will be a proxy.) Channels for the entry of error into the prediction jprocess, potential for systematic and random error, advantages and disadvantages, and optimal conditions for the conventional and experimental approaches are discussed and compared. Chapter IV describes the design of the proposed study. Five admissions cohorts were used from the Michigan State University College of Veterinary Medicine. Of these applicants, some were subsequently admitted to the veterinary program and subsequently received a graduate GPA. For all applicants, parallel admissions data were available. These took the form of grades, admissions test scores, ratings, and some demographic variables. Admissions data constituted the source of regression predictors, the graduate GPA was the authentic predictor, and the veterinary prerequisites GPA served as the proxy criterion. Two major analyses were performed: both were repeated measures MANOVAs using dependent variables that were transformed prediction-error residuals. A test-of—methods MANOVA looked for methods and year effects, while controlling for the following factors: (Q) predictor intercorrelation, (A) sample size, and (N) source of UGPA. . m- m“ _._ T“ .-' 136 The test-of-factors (and years) MANOVA used one estimation method, local-proxy, across five years of data. It looked for a year effect and it controlled for these factors: (Q) predictor intercorrelation, (A) sample size, (N) source of UGPA, and (O) restriction of range. For the test-of- methods MANOVA, general-criterion estimates were obtained with a standardized regression procedure, the betas being corrected for restrictions of range. For this MANOVA, prediction rank minus GGPA rank was the prediction-error used. For the test-of—years MANOVA, actual prediction- error from raw-score regression estimates was used. Results are reported in Chapter V. As summarized in Table 12, Hypothesis D was confirmed. For Hypothesis D, cut-score prediction error differed across the five years. A view of Figures 5, 6, and 7, again reveals substantial interactions for 1984 data and modest interactions for 1985 data. Because dramatic interactions occur specific to only 1984, with lesser interactions for 1985, it is likely that data for these two years was of lower reliability (perhaps due to greater diversity in UGPA standards). Unconfirmed are Hypotheses A, B, C, and E. For Hypothesis A, local-proxy estimation does not differ from estimation using UGPA as a sole predictor for cases in the cut score region. For Hypothesis B, both the general- criterion and the general-mixed method prediction validities are about the same. For Hypothesis C, neither method nor year effects occurs under the test-of—methods 137 for cases in the cut score region. Also failing to differ is prediction error for MSU and non-MSU applicants, contrary to Hypothesis E. Chapter VI provides a discussion of the findings. While it is acknowledged that this study does not find a consistent predictive advantage for the proxy methods with the present data, some potential enhancements remain to be tried: (1) calibration of local-proxy equations using a single source of UGPAs (e.g. MSU), and (2) addition of random error to the UGPA predictor prior to the calibration. Also, potential for the methods may remain, albeit, under different circumstances. It is noted that the UGPA validity level was high for these data. In view of the geography of veterinary education which finds less than one school per state, this is not surprising. Where UGPA sources are more diverse, or where local grading standards are changing, the validity of the UGPA is bound to decline and competing estimation procedures (such as proxy criterion methods) may find practical use. Intercorrelation of predictors did not appear to constitute a multicollinearity problem for general- criterion prediction. When the intercorrelated predictors did pose a problem for local-proxy estimation, it may have been due to the greater number of predictors tolerated in the high intercorrelation condition (and not from intercorrelation, per se). Kenny (1982) did note that multicollinearity was associated with predictor 138 unreliability, overlapping factors, and with the high correlation of the intercorrelated predictors with the criterion. Perhaps multicorrelation is not a serious problem for moderately intercorrelated admissions data which are sufficiently reliable and factor independent. The reader is again cautioned regarding several aspects of the study which might be ‘misleading: (1) general- criterion sample size levels were numerically smaller than those for the proxy-criterion methods: this was true to life but, nevertheless, sample size levels were not identical, (2) the dependent ‘variable in the test-of- methods differs from that in the test-of—factors (the first is reported in rank-error, while the second is reported in actual error): the rank-error was more relevant to effect on cut-score cases, although the actual error allowed more powerful testing, (3) the general-criterion regression equations were computed using standardized regression and adjusted for restriction of range while the proxy-criterion estimation used (unadjusted) raw-score regression estimates, (4) the high predictive validity of UGPA for these data may not generalize to other sites or programs: UGPA validity is likely tied to the dominating proportion of MSU UGPAs in the applicant pool. It is acknowledged that the regression equations used to estimate graduate grade point average in this study were not optimal, because they retained non-significant predictors. Non-optimal estimation allowed the testing of 139 sample size and predictor intercorrelation which would have been confounded with statistical testing, had it also occurred. To provide accurate regression equations for the methods in this study, therefore, regressions were run again with statistical testing of coefficients. The general-criterion and general-mixed equations were virtually identical, while all equations gave the dominate predictive role to the UGPA predictor. APPENDICES 140 n- .b-“.——- ‘ . APPENDIX I Substitute Merit Values 141 142 Table AI Values substituted for missing merit values VARIABLES VALUE activities and achievements (non-academic) 2.5 age in years 21.935 average term credits 14.612 total (grade point) credits 58. interviewer rating number 1 20.483 interviewer rating number 2 20.711 Medical College Admissions Test: Biology 2. Medical College Admissions Test: Chemistry 2. Medical College Admissions Test: Physics 2. Medical College Admissions Test: Quantitative 2. Medical College Admissions Test: Reading 1. Medical College Admissions Test: Science 2. narrative writing sample 3.072 number of terms enrolled in college 10.38 credits on a pass/fail basis 7.2 total honor points 162. gender 1.53 veterinary experience rating 4.833 work experience rating 2.26 APPENDIX I I I 1 I 1.. Error Weight by GPA Plot 143 #91! SUP'SIUC' 210 Pufibflfi 144 CUT-SCORE WT BY RKGPA ++/ /-+----+----+----+----+----+----+----+----++ 1.2+ + I I I I I I I I 1+ + I I I I I I I I .8+ + I I I I I I I I 06+ + I I I I I I R I .4+ + I I I I I I I 21 I .2+ 12 + I 22 I I 22221 I I 12222222221 I I 12222222222222222222 I 0+ + I I I I I R I I -.2+ + I I I I I I I I -.4+ + ++/ /-+----+----+----+----+----+----+----+----++ 20 40 60 80 RKGPA FIGURE IIA. LEWA (or LEWAR) weight plotted against RKGPA (n= 88). 4 I ‘2, b.‘"-.‘.‘ DDT 145 CUT-SCORE WT BY GPA '1 FA ‘41 . .eq4‘ . H+IIII+IIII+IIII+IIII+IIII+IIII+IIII+IIRI+ . . . . + 1 . 1. . 1. u 1 2 + 2 . 2 . u 2 3 + 3 . 6 . 2 u A. 2 + . 5 . 1.2 n 3 6 + 3 . 5 . 4.1 . 2 . 1 + 14.2 . 1 u 1.31 . 1.1 d. 1. 1 11. . u 1 1 + . / 1 / + + IIII+IIII+IIIR+IIII+IIII+IIII+IIII+IIII+ 1 8 6 4 2 0 2 A. 1.2+ 'LRLW.A an. runuw.nnn “Wm; 2.8 4 112 / + + + + . . . . + . . . . + . . . . + _ . . . + 2 . . . . + . . . _ + . . . . + . . . . + _ GPA LEWA (or LEWAR) weight plotted against GPA (n= 88). FIGURE IIB. APPENDIX III Correlations: LEWAR-transformed Error by Predictors and GGPA 146 Table AIII Correlations: LEWAR-transformed error by predictors and GGPA \ ERROR \ \ VARIABLE 2 N \ PROB \ ACTACH SUICRED SEX PVSGPA ICATC ICATR NARR PFCRED PTS CUMGPA AVGCRED 198‘ LEIARHA LEUARUO LIIAIIC ( Pm LBIARID LEUARII LEIARUJ LIUARIK LIHARIL LEUARNA LEUAINB LEUARNC ”A. ‘0“ 00° N e LENARND LiwARNx LEWARNJ Table AIII-continued \ ERROR \ \ VARIABLE 2 N \ PROB \ ACTACH CUIGPA ICATC ICATI "ARR PFCRED PTS PVSGPA SEX SUMCREO AVGCREO -v~o LEUARHP LEIAROE LEUAROF cas- LEUAROG LIUAROH 148 “WAN LEUAROM LIUARON NAI- LIIAROO “Is-sw- FINN MMV o . LENAROP LEUARPE LIUARPF s-A's LEUARPG LEUARPH LEUARDM LEUARDN "“T Table AIII-continued \ ERROR \ \ VARIABLE 2 N \ PROB CUMGPA NCATC ICATR NARR PFCRED PTS PVSGPA SEX SUICRED AVGCREO ACTACH LEUARPO LEUARDP —A° ’00 I00” 0 u u «A LEUARC LEVARP GGPA WORKEXP GGPA HORKEXP GGPA HORKEXP GGPA NORKEXD 149 LEUARPO LEUAROM LEUARNG LEUARHE LEWARPP LEUARON 0758 101) .226 LEUARNH LIUARMF r-m LENARC LEUARNI LEIARHG LEwARP NAG Cam'- 0 P LENAROP LEHARNN LEUARMH GAO 0mm 0400 ll Va LEUARPE LEUARNO LEUARIfl LEUARPF -.0294 ( 101% P: .38 LEUARNP LEUARIN “an M“. u v0 LEUARDG 090 101) .454 LEUAROE LEUARMO GAO NOV o'- . LEWARPH LEUAROF LEUARHD NBC) 0D" '- ~41 LEUARPM LEIARNE LEKAROH LEUARHF h‘k' _ 4. -q._ Table AIII-continued \ ERROR \ \ VARIABLE 2 N \ PROB \ ACTACH AVGCRED MCATR “ARR PFCRED PTS PVSGPA SEX SUICRED ICATC CUMGPA 1985 LEHARME ( Pm LEUARHF ”A. LEUARIG LEUARHH 36) 301 LEIAHHM 150 LEUARHN LEVARMO LEUARHP 96) .151 ( Pm 0A.- LEUARNE NAI- LEUARHF LBUARNG LEUARNH LEUARNM New} use m—e oo— e-e-o LEUARNN ,LENARNO Table AIII-continued \ ERROR \ \ VARIABLE 2 N \ PROB AVGCRED CUMGPA ICATC ICATR HARE PFCRED PTS PVSGPA SEX SUICREO ACTACH .0391 .191 88 O -.0697 SI .ggg 0297 .20; L LEIARNK LEHARNL ID ‘0 man. “:90 N ~46 ”A'- 50“ o - LEUAROA 0" LEUAROO LEUAROC LEWAROD NFC FUND COMO (‘4 - LEUARO! LENAROJ u—Aa OM10 Ono ' - LEUAROI LEUAROL 0A0 v'OO ‘0'“ ' I LE'ARPA LE'ARPI LENARDC oru~ LEUARPD LENARPI Table AIII-—continued \ ERROR \ \ VARIABLE 2 N \ PROB \ ACTACH CUMGPA ICATC MCAT! NARR PFCRED PTS PVSGPA SEX SUICRED AVGCRED LEIARPJ In“ ”0" "an o . LIUAIPK LEUAEPL LEUARC LEIARP 152 GGPA WORKEXP GGPA WORKEXP GGPA VORKEXP LEHARPA LEHAROA LEUARNA LEUARIA LEUARPB -.1388 88 .09 ( Pm LEWIROO -.1807 ( 88% PI .04 LEHARNB LEUAIIB PAN LEUARPC LEUARIC LENARPD LEWAROD '.1779 88) .049 ( Pm LEUARID LEUARPI LEUARNI LEIARHI LEUARPJ -.1494 88) .082 ( 9m LEWAROJ LEUARNJ LEIARIJ LEHAROK LEUARNK LEIARMK LEWARPL LEUARNL LEUARHL .._-- j? REFERENCES 153 REFERENCES Allen,M.J.,Yen,W.M. (1979) u 199292. Monterey, California: Brooks/Cole. .Alexander,R.A.,Carson,K.P.,Alliger,G.M.,Carr,L. (1987) Correcting doubly truncated correlations: An improved approximation for correcting the bivariate normal correlation when truncation has occurred on both variables. E99saIi2na1.an9.2sxshglegisal_nsasurement. 47, p.309-315. Cattin,P. (1981) The predictive power Of ridge regression: Some quasi-simulation results. lgnzn§l_gf_Apnli_g 2919991991: 6533: PP-232'290- Clapp,T.T.,Reid,J.C. (1976) Institutional selectivity as a I predictor of applicant selection and success in medical school- I29Inal.of_negisal.zgusatign. 51. pp-851-852- Deegan,J.Jr. (1976) The consequences Of model misspecification in regression analysis. M919129I19§9 Bebaxioral.gesearsb. April. 1976. pP-237-248- Doolittle,A.E.,Cleary,T.A. (1987) Gender-based differential item performance in mathematics achievement items. I2urnal.2f.Egssatignal.usasurement. 24:2. pp-157-166- Elliot,R.,Strenta,A.C. (1988) Effects Of improving the reliability Of the GPA on prediction generally and on comparative predictions for gender and race particularly. 25:4, pp.333-347. _ u_.--' Goldman,R.D.,Hewitt,B.N. (1976) Predicting the success Of Black, Chicano, Oriental, and White college students. 12EIEa1.2f.Eggsafional.neasgrement. 13:2. pp-107-117~ Goldman,R.D.,Hewitt,B.N. (1975) Adaptation-level as an explanation for differential standards in college grading. I29rna1.of.Egusational.nea§gremsnt. 12:3. pp.149-l6l. Gough,H.G.,Lanning,K. (1986) Predicting grades in college from the California Psychological Inventory. Educational nng Psycholggical Measurement, 46, pp.205-2l3. Hakstian,A.R.,WOOlsey,L.K. (1985) Validity studies using the comprehensive ability battery (CAB): Predicting achievement at the university level. Educagiona; and Psychologiggl neasunemeng, 45, pp.329-341. 154 155 Hart,M.E.,Payne,D.A.,Lewis,L.A. (1981) Prediction Of basic science learning outcomes with cognitive style and traditional admissions criteria. Eeueatien. 56:2. pP-137-139- Hogrebe,M.C.,Ervin,L.,Dwinell,P.L.,Newman,I. (1983) The moderating effects Of gender and race in predicting the academic performance Of college developmental students. 0 43: pp.523-530. Huberty,C.J.,Mourad,S.A. (1980) Estimation in multiple correlation / prediction- E99eafien21.ane.ze¥ehelesieal Measurement. 40:1. 99-101-112. Spring- Humphreys,L.G.,Taber,T. (1973) Postdiction study Of the Graduate Record Examination and eight semesters Of college grades- ie9Ina1.ef.Eeueetienel.ueeeurement. 10:3, pp.179-184. Huntsberger,D.V.,Billingsley,P. (1973) Elgngn;§_gfi 9L5Iie§ieal.Inferenee- Boston: Allyn and Bacon. pp.131-134. Jones,R.F.,Thomas-Forgues,M. (1984) Validity Of the MCAT in predicting performance on the first two years Of medical school. Ieurnal.ef.neeieal.neueatien. 59:6. pp. 455-464- Keeves, J. P. (1988) Multivariate Analysis. In Reeves (ed) Iago n: 1° ,-. - fine t' - MW Pergamon. New York pp. 527- 537 KennY.D-A- (1979) 9errelatien.ane.eauealitx- Wilely 8 Sons. New York. Kirk.R-E. (1982) EEnerimental_neeign.1289_e91- Monterey, California: Brooks/Cole. Linn,R.L. (1966) Grade adjustments for prediction of academic performance: a review. lgnnnn1_gfi_Egngn§19nnl Measurement. 3. pp- 313-329- Linn,R.L. (1983) Pearson selection formulas: Implications for studies Of predictive bias and estimates Of educational effects in selected samples. igngnal gf Eeueaeienal.ueaeerement. 20:1. pp-l-lso Linn,R.L.,Harnisch,D.L.,Dunbar,s.B. (1981) Corrections for range restriction: An empirical investigation Of conditions resulting in conservative corrections. 1eurnal_ef.AenlieQ.Pexehelegx. 66:6. 99-655-663o .711 156 Linn,R.L.,Hastings,C.N. (1984) Group differentiated prediction- WI. 8:2. Loeb,J.,Bowers,J. (1973) Programs Of study as a basis for selection, placement, and guidance Of college students. , 10:2, pp.131-139. Markert (1983) Relationship Of Old and new MCAT scores to performance on the Part III examination Of the NBME. 1eErnal.ef.neeieel.zdueetien. 60:1. pp- 53-55- McCornack,R.L. (1983) Bias in the validity Of predicted college grades in four ethnic minority groups. I 43: pp.517-522. McCornack,R.L.,McLeOd,M.M. (1988) Gender bias in the prediction Of college course performance. lgn;nn;_g£ Edueaiienel.neaeurement. 25:4. pp-321-33l- Mehren8.W..Lehmann.I-(1984) ueaeuremen:.and_ExalueEien.in Chicago: Holt, Rinehart, & Winston. Morris,J.D. (1986) Selecting a predictor weighting method by PRfiss. E9ueaIienal_an9.2exehelegieel.ueeeeremen§. 46 pa 853-869 0 Niedzwiedz, E. R.,Friedman, B. A. (1976) A comparative analysis Of the validity Of pre-admissions information at four colleges Of veterinary medicine. l9n;nn1_g§_ye§ezinnzy Meeieal.Eeneeeien 3: 2. pp 32- 38- Neiner,A.G.,Owens,W.A. (1985) Using biodata to predict job choice among college graduates. lgnznnl_gfi_Annlieg Perenelegx. 70:1. PP-127-l36- Neter ,J.,Wasserman, W.,Kutner, M. H. (1985) 92211e9.Lineer 9IeIie$1991.929ele..1229..99.1 Homewood Illinois: R. D. Irwin, Inc., p. 10. Pedhazur,E. J. (1982) Eu ulgiple negngss sign in behgvigna a; re.eaIen._9ERlaneIien.e89.2Ieeietien.1289..e9.l Chicago: Holt, Rinehart, and Winston. Richards,J.M. (1982) Standardized versus unstandardized regression weights. s c a Measu e t, 5:2, pp.201-212. 157 Roos,P.D. (1978) The implications Of the Bakke decision on affirmative action admissions and related programs. In: t . Connolly, W.B., Dilworth, E.J., and Leach,D.E. (chairmen). New York: Harcourt Brace Jovanovich, p.209. Ross, K. N. (1988) Sampling. In: Reeves (ed) Egngngignnl _ - e. - I r- 9.1... AA fl’. 1 ' . 1"-l I _l 1!: 'I'. finndhggk. Pergamon. New York, pp. 527-537. Sawyer,Richard (1986) Using demographic subgroup and dummy variable equations to predict college freshman grade everage- 9eurna1_ef.Edueariena1.Measurement. 23:2. pp.l31-145. Sawyer,R.,Maxey,J. (1979) The validity Of college grade prediction equations over time. Measurement. 16:4. pp-279-284- Stuck,I.A. (1986) Selection by concurrent prediction: an alternative to the validity generalization Of selection models. Michigan State University: Author. Tabachnick,B.G.,Fidell,L.S. (1983) Wsing_nul§ignrin§e ggngisnigs. New York: Harper 8 Row. Thornell, J. G.,McCoy (1985) The predictive validity Of the Graduate Record Examinations for subgroups Of students in different academic disciplines. Egngnnignn1_nnd 2sxenelegieal.Measurement. 45. pp-415-419~ Wilson,K.M. (1982) A study of the validity Of the restructured GRE aptitude test for predicting first-year performance in graduate study. Educational Testing Service Research Report 82-34, p.60. WOOd,D.A.,Langerin,M.J. (1972) Moderating the prediction Of grades in Freshman engineering. lgn:nn1_gfi_Edngn§19nn1 Measurement. 9:4. pp-3ll-320- Wright,R.J.,Bean,A.G. (1974) The influence of socioeconomic status on the predictability Of college performance. 1eurna1.ef.Euueauiena1.Measurement. 11:4. 99-277-284- TE UNIV LIBRARIE IILIIIIWIIIHIIHISN 0|!” 7||H7| LIHLIIIIJHIIWHI