‘

r}.

 

.11.... .L2
2... .5 3;

 

. ¢ !
‘ i... I
3.....1...(e
ix;..§.

. r1;
3 TI). i. 1'1» .u
l .1997...

.5 .

3.41.).
s ‘ . 2. 1.

m
V' 1‘...-

v ‘l .11 t
3.7.

. . :1! 3115...: .

71.30.... ‘ ‘ V . $29. . .. 1.1.79.
1. ’,¥a.y.o.1r..~..: ,4
0115. e.‘

#539
.2 3!....<;..tt:.oi::..1 .. iv .
:0. I .l. 0:... 14|.l?.1la‘u

5:914:15:
at 2...}.
~

 

 

(wwyv-vv\\v-vv

iﬁa¥hﬁ

9 77401557

lHl"Nil”!lllHllmllllllllHlll"HllllJlllllllllllllllll

293 00777 6804

 

LIBRARY
Michigan State
University

 

 

 

This is to certify that the

dissertation entitled

Cross Validity of Authentic- and
Proxy-Criterion Regression Methods in the
Selection of Veterinary School Applicants

presented by

Ivan Alexander Stuck

has been accepted towards fulﬁllment
of the requirements for

Doctor of Philosophy

degree in

 

 

Educational Measurement,
Evaluation, and Research Design

Major professor

Date [X’J‘Ifﬁf

MS U is an Afﬁrmatiw Action/Equal Opportunity Institution 0- 12771

PLACE N RETURN BOX to remove thle checkout from your record.
TO AVOID FINES retu'n on or before due due.

      

DATE DUE DATE DUE DATE DUE

L_
:

:l__J
l___Jle__

-ll:
ﬁ-C
J .

MSU le An Atﬂnndlve Action/Ewe! Opportunity lnetltulon
emails-9.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CROSS VALIDITY OF AUTHENTIC- AND PROXY-CRITERION REGRESSION
METHODS IN THE SELECTION OF VETERINARY SCHOOL APPLICANTS

BY

Ivan Alexander Stuck

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational Psychology, and
Special Education

1989

ABSTRACT

CROSS VALIDITY OF AUTHENTIC- AND PROXY-CRITERION
REGRESSION METHODS IN THE SELECTION OF VETERINARY SCHOOL
APPLICANTS

BY

Ivan A. Stuck

Several advantages might be gained for admissions
departments from the use of a proxy criterion in the
development of a predictive multiple regression equation
for selecting among candidate characteristics: (1)
moderator error might be avoided by restricting prediction
to only the immediate group of applicants, (2) predecessor
data might not be necessary to develop a precise predictive
equation, (3) novel criterions or' predictors might be
entered into the predictive equation for immediate use, (4)
an interval-style multiple regression procedure might be
‘used idespite graduate level pass/fail grading, (5) the
range restriction problem could be avoided, and (6) the
data from all applicants could contribute to the
reliability of the predictive equation. Admissions and
performance data for five cohorts of veterinary applicants
were used to compare four proxy criterion methods with the

conventional multiple regression approach to the

development of a predictive selection equation. The
authentic criterion was the graduate grade-point average,
while the proxy criterion was an undergraduate pre-
veterinary studies GPA. Predictors were undergraduate GPA,
admissions test scores, employment ratings, and
biographical and other’ data. Prediction factors were
developed by selecting on college origin and performance
level, by varying calibration sample sizes, and by
restricting the use of intercorrelated predictors. T-tests
and HANOVAs were used to evaluate mean differences in
prediction error among the conditions. When prediction
error (in ranks) was transformed to emphasize the error for
cases near the cut score, no method differences were
significant among any prediction methods and neither was
there a year effect. It was evident also that no methods
differed from prediction by UGPA alone. When transformed
prediction error (actual) for a proxy-criterion method was
observed across five years, a year effect appeared among
proxy conditions. The significant effect for year,
nevertheless, was attributable to exceptional interactions
among prediction factors for two of the five annual
cohorts.

Several improvements to proxy criterion calibration
are suggested and the potential for proxy criterion use at

other sites and for other graduate programs is discussed.

TO THE STUDENTS WHO CHALLENGE CONVENTION

iii

ACKNOWLEDGMENTS

I have deeply appreciated those persons and
organizations who helped me in various ways to complete
this dissertation. Former colleagues at the Ingham County
Department of Social Services, St. Vincents Childrens Home,
John Richer, the College of Veterinary Medicine, Bob
Simpson of B-O-C Powertrain, and Dr. Juan Olivarez of Grand
Rapids Public Schools, provided a means of flexible
employment during my doctoral study. Dr. Richard Houang,
formerly with our department, was particularry helpful in
guiding the development of my initial dissertation
proposal. Dr. John Tasker and Pat Lowrie of the College of
Veterinary Medicine provided the research environment from
which my thesis arose, and further made available the data
for my research, and the continuing use of their college
facilities. Aleta Zamel provided invaluable assistance in
illuminating several subtle features of the data set. My
friend Dr. Jeff Mayer, helped to get me started in
programming SPSS, and the consultants at the Computer
Center assisted me often in the later data analyses. I
thank my committee, Dr. Bill Mehrens for his tireless
editing and suggestions, Dr. Steve Raudenbush for his
statistical expertise, and Doctors Fred Ignatovich and Jim
Haf for their insightful advice. I thank Rosalind Goodman
for correcting my spelling and other shortcomings.

iv

TABLE OF CONTENTS

LIST OF TABLES O O O O O O O O O O O O O O O O O O O O 0 v1 1
LIST OF FIGURES O O O O O O O O O O O O O O O O O O O O x

Chapter

I. Introduction . . . . . . . . . . . . . . . . . . 1
Need . . . . . . . . . . . . . . . . . . . . . 1
Purposes . . . . . . . . . . . . . . . . . . 6
Research Hypotheses . . . . . . . . . . . . . 7
Rationale for Research Hypotheses . . . . . . 8

Hypothesis A . . . . . . . . . . . . . . . . 8
Hypothesis 8 . . . . . . . . . . . . . . . . 8
Hypothesis C . . . . . . . . . . . . . . . . 9
Hypothesis D . . . . . . . . . . . . . . . . 10
Hypothesis E . . . . . . . . . . . . . . . . 10
Overview . . . . . . . . . . . . . . . . . . . 11

II. LITERATURE REVIEW . . . . .

Part 1: Substantive Review
Summary . .

Part 2: Theoretical Review
Sampling . . . . . . . .
Measurement .

The Reliability of validity Coefficient

Multiple Regression . . .

Multivariate and Univariate Analysis of
variance and the T-Test . . . . . . . . . 34

eeeee
eeeeee
eeeee
eeeee
eeeee
eeeee
eeeee
eeeeeeee
eeeeeeee

N

p

III. PROPOSED THEORY . . . . . . . . . . . . . . . 38
General-criterion (GC) Regression Approach . . 39
Using Multiple Regression to Shrink Error . . 44
A.The case where the school variable is
absent . . . . . . . . . . . . . . . . . 45
B. The case where the sample has a

predominate rule . . . . . . . . 45
C. The case where rules vary within the

school . . . . . . . . . . . . . . . . . 46
D. Where prediction is improved for extreme

cases . . . . . . . . . . . . . . . . . 46

E. The case where a proxy criterion is used . 47
Proxy-criterion Alternatives to Conventional

Prediction . . . . . . . . . . . . . . . . . 48

Local-proxy (LP) . . . . . . . . . . . . . . 49

Chapter
General-DICK}? (GP) e e e e e e e e e e e

General-criterion-mixed or General-mixed (GM)

smary O O O O O O O O O O O O O O O O O O

IV. DESIGN . . . . .
Population Sample
Predictors . . . .
Criteria . . . . .
Analyses . . .

Test of Methods (MANO VATO
Sample . . . . . . . .
Conditions . . . . .
Dependent Variable . . . .
Satisfaction of Assumptions
Application to Hypotheses .

Test of Factor and Year Effects (MANO
Sample . . . . . . . . . . .
Conditions . . . . . . . . .
Dependent Variable . . . . .
Application to Hypotheses .

)

5

eeee'ﬂeeeeeeeeeee

V. RESULTS . . . . . . . . . . . . . . . . . . .
Test of Methods across Two Years . . . . .
Contrasts Between UGPA and Local-proxy
Estimation . . . . . . . . . .
Comparisons Against the General-criterion
“athOd O O O O O O O O O O O O O O O O O
Test of Years and Other Factors under LP
(Local-proxy) . . . . . . . . . . . . .
Independent Variable Contrasts . . . .
Relative Validity of Non-MSU and MSU UGPAs .

VI. DISCUSSION . . . . . . . . . . . . . . .
High UGPA Validity . . . . . . .
Potential Usefulness of Proxy Methods . . .
Potential Improvements to Estimation with a

Proxy Criterion . . . . . . . . . . . .
How Intercorrelation May Remain Benign . .
Extreme Outcomes in Figures 5 and 6 . . .
Cautions Regarding Study Realism . . . .
Why R2 Wasn't Used as a Measure of Validity
Regression Equations Have Superfluous

Predictors . . . . . . . . . .
Practical Implications of the Study . . . .

VI I O SUM! O O O O O O O O O O O O O O O O O O O

vi

v

90
90
101

103
105
106

107
111
112
115
120

122
122

132

Chapter Page

APPENDICES . . . . . . . . . . . . . . . 140

Substitute Merit Values . . . . . . . . . . . . 141

Error Weight by GPA Plot . . . . . . . . . . 143
Correlations: LEWAR-transformed Error by

Predictors and GGPA . . . . . . . . . . . . . 146

REFERENCES O O O O O O O O O O O O O O O O O O O O O O O 1 5 3

vii

Table
01

02

03

04

05

06

07

08

09

10

11

12

13

14

LIST OF TABLES

Page

Cross-institutional disparities in
predictors, predictor weights, and
prediction validities . . . . . . . . . . . .

Comparative validity: Ordinary Least Squares,
Local-proxy, General-criterion, and college

GPA O O O O O O O O O O O O O O O O O O O O O

Regression bias factors based on Richards
(1982) O O O O O O O O O O O O O O O O O O O

Channels by which error may affect
calibration methods . . . . . . . . . . . . .

Potential systematic error for each
calibration method . . . . . . . . . . . . .

Potential random error for each of four
calibration methods . . . . . . . . . . . . .

Potential advantages for each of four
calibration methods . . . . . . . . . . . . -

Disadvantages of each of four calibration
Ramada O O O O O O O O O O O O O O O O O O O

Optimal conditions for use of each calibration
math“ O O O O O O O O O O O O O O O O O O O

The study sample: Five cohorts of veterinary
applicants . . . . . . . . . . . . . . . . .

Predictors used in levels of the
intercorrelation factor . . . . . . . . . . .

Confirmation of expected outcomes . . . . . .

Contrasts of UGPA prediction against LP
prediction O O O O O O O O O O O O O O O O O

Method efficiency in logs of absolute
rank-error e e e e e e e e e e e e e e e e e

viii

14

17

28

62

63

63

64

64

65

67

69

77

81

83

Table
15

16
17
18
19

20

21

22

23
24
25
AI
AIII

Page

Estimate of average contrast between general-
criterion and general-mixed transformed
prediction error means . . . . . . . . . . . 84

Test of main effect for methods (MANOVATOM) . 84
Test of main effect for years (MANOVATOM) . . 85
Cross-validities for prediction methods . . . 86

Test of main efects for years (MANOVATOF) . 92

T-test of mean error between MSU and non-MSU
applicants . . . . . . . . . . . . . . . . . 101

A chart for identifying ideal method
prediction factors . . . . . . . . . . . . . 109

Method betas under optimal prediction
conditions O O O O O O O O O O O O O O O O O 12 5

Predictor names and their definitions . . . . 126
LP betas by year: full sample . . . . . . . . 127
LP betas by year: 75% sample . . . . . . . . 128
Values substituted for missing merit values . 142

LEWAR transformed error by predictors and
GGPA correlations . . . . . . . . . . . . . . 147

ix

LIST OF FIGURES

Figure Page
1 Relative levels of transformed prediction
error among GGPA estimation methods (1984) . . . 88
2 Relative levels of transformed prediction
error among GGPA estimation methods (1985) . . . 89

3 Prediction error under local-proxy prediction
where the proxy-criterion has been
restricted O O O O O O O O O O O O O O O O O O O 9 3

4 When measured by transformed prediction
error, local-proxy prediction appears little
affected by level of intercorrelation . . . . . 94

5 When measured by transformed prediction
error, restriction of the proxy-
criterion range only appears to affect
prediction for the 1984 cohort . . . . . . . . . 96

6 When measured by transformed prediction
error, source of UGPA only appears to affect
prediction for the 1984 and 1985 cohorts . . . . 97

7 When measured by transformed prediction
error, local-proxy prediction appears affected by
level of intercorrelation for the 1984 cohort . 98

8 When measured by transformed prediction
error, sample size appears to
have little effect on prediction . . . . . . . . 99

AIIa LEWAR/LEWA weights plotted by GGPA rank
(n. 88) O O O O O O O O O O O O O O O O O O O O 144

AIIb LEWAR/LEWA weights plotted by GGPA
(n. 88) O O O O O O O O O O O O O O O O O O O O 145

CHAPTER I

INTRODUCTION

Need

The efficiency of any process and the quality of its
output improves as the selection of its input becomes more
and more purposeful. Concurrent refinement of both (1) the
criteria identifying output quality and (2) selection of
input is additionally beneficial as an approach to quality
development.

When the process is graduate level education and the
outcomes are licensed practitioners, the selection process
is administered by a college department of admissions,
while the criteria are largely determined by both ( 1) a
host of university instructors, and by (2) professional
examination boards. In such a context, selection and
criteria tend to become estranged. However, because the
refinement of selection requires that knowledge of the
identity and weights of valid predictors (which reside in
the admissions department data) be linked to quality
criteria (which remain under the domain of the student
records authority) a cross- departmental flow of
information is desirable. For many years the logistical
difficulty of bridging the departmental offices was an
authentic barrier to their reciprocation.

1

2

Today, however, inexpensive and. adequate technology
exists for the necessary data storage, integration, and
analysis to allow admissions policy to be shaped by data on
student performance. An equation which weights and
combines admissions data (predictors) to estimate a
performance outcome (or criterion) becomes a useful link
between pre-program credentials (admissions data) and
program performance (student records). Such an equation is
known as a linear model. Perhaps the most useful tool in
developing’ a linear' model, (or' "selection formula" for
present purposes) is the multiple regression procedure.
Multiple regression allows the use of past experience to
inform present decisions. Given a criterion of quality
(e.g. graduate school grades) and a set of application
scores, the procedure can select the most predictive
variables and weight them to maximize prediction of the
criterion. This linear' model ‘which. was optimal as a
selection formula for' the original data set. may still
remain useful for predicting the future grades of present
students.

Unfortunately, research in this area has failed to
demonstrate consistent outcomes. Typically, selection
formulas will differ by location or by year of data studied
(see Niedzwiedz 8 Friedman, 1976). For the most part,
discrepancies are unsurprising, due to the limited sampling
and sample sizes involved (single graduating classes of

less than 100 students are typically used). Often, reports

3
mention only a limited number of predictors which were
considered significant, leaving the reader to guess what
additional predictors may or may not have been tried (see
Niedzwiedz & Friedman, 1976: Hart, Payne, & Lewis, 1981:
Markert, 1983: and Jones & Thomae-Forgues, 1984).

In the past, multiple regression research would demand
substantial resources. Collection of admissions data would
require increasing administrative costs and organization as
the set of predictive variables was expanded.
Additionally, prior to the advent of the computerized
office, many hours of clerical labor were required for the
transfer of both admissions and student performance data
from office documents to a usable medium for data analysis.
In addition to these obstacles, selection formulas obtained
from one year could be notoriously unreliable for
predicting performance for a subsequent year. The resource
drain projected for a multiple-year regression study was
considerable, and few admissions officers could be
confident that the advantage would compensate for the loss
entailed.

Today, fortunately, many of these previous costs have
diminished due to the advent of the microcomputer. The
evolution of methodological innovations may also provide
cost effective improvements in the use of available data.
One such potential multiple regression innovation may be
the use of proxy-criteria (variables which measure a set of

factors similar to those measured by the ideal criterion:

4
e.g. undergraduate grade-point average [UGPA] may serve as
a proxy for graduate grade-point average [GGPA]).

The implementation of proxy criteria in multiple
regression calibration studies may provide an alternate
means of estimating a selection formula and extend the use
of multiple regression selection. If the error between a
proxy and the real criterion is of less importance than the
error associated with the potential confounding due to
years, location, or other factors, a selection formula
calibrated with the use of a proxy criterion may' be
preferable for decision making.

This author found no evidence that a proxy approach had
been studied prior to the author's own pilot study (Stuck,
1986). In that case, the proxy criterion (prerequisite
veterinary GPA) allowed substantially' higher’ predictive
validity than a selection formula that was calibrated on
one year's data from one location. In addition to using a
proxy criterion with a sample limited to a single year and
location (referred to as "local-proxy method" or "LP"), the
proxy may substitute for the criterion in a multiple year
and/or multiple location sample calibration (the ”general-
proxy method" or "GP") or the proxy criterion may
substitute conditionally-- only where real criterion data
are absent (the "general-criterion and proxy method" or

"GM”) for multiple year and/or multiple location samples.

5
To summarize, proxy criterion estimation could use one
of the following forms:

(LP) single site, single year proxy criterion multiple
regression estimates of betas (predictor weights)

(GP) multiple site and/or multiple year proxy criterion
estimates of betas

(GM) identical to GP except that a proxy substitutes

for the criterion only where the criterion measure

is absent from the case.

Besides allowing the use of additional cases to
increase the sample size, use of an adequate proxy
criterion in the LP, GP, and GM methods would allow the
inclusion of the normal range of applicant ability, thus
eliminating the need to correct subsequent weightings for
restriction of range. For the LP method (using solitary
year and location data), additional variables can be added
to the set of predictors in any year. This could allow,
for instance, the use of two optional admission tests (e.g.
GRE: Graduate Record Exam, and MCAT: Medical College
Admissions Test) with confidence that they would both be
appropriately weighted in the selection formula. The LP
method would also be useful where the necessary data from
previous years are unavailable.

For neither the conventional calibration approach (GC),
nor for the proxy-inclusive approaches (LP, GP, GM) is
there literature addressing the nature of admissions
prediction error. It is likely that such knowledge could

be useful in several ways: (1) if error in predicted scores

6

is random, then the quest for additional predictors might
be ill- advised, (2) knowledge of relative levels of random
error among methods would be of value for the analyst in
choosing a calibration approach, and (3) where moderated
prediction error is evident, evidence of its
characteristics may assist to control or to reduce such
error by the introduction of new variables or
transformations.

Of particular concern is the nature of variation by
years and by locations. Should prediction error appear to
be moderated by these factors, then the LP approach may be
advantageous due to its year-specific calibration and its
option of selecting a different set of predictors. If the
error appears to be both random and unilevel (identically
distributed) across factors, then the LP approach may

provide no advantage over the conventional calibration.

Purposes

The purposes of this research are to ( 1) create from
veterinary school admissions data these four selection
formulas: LP- a single year and location formula using a
proxy criterion, cc- a conventional generalized formula
using an authentic criterion, GP= a generalized formula
using a proxy criterion, GM- a generalized formula
substituting a proxy criterion only for cases which lack an

authentic criterion: (2) to compare them in terms of

7
predictiveness on new applicant cases, and (3) to examine

the nature of their prediction error.

Research Hypotheses
This research is designed to test the following list of
propositions (which precedes a subsequent commentary):

Hypothg§i§_5. For students falling within a cut score
zone, the local-proxy (LP) formula will be more
predictive of graduate GPA (GGPA) than will
undergraduate GPA (UGPA).

Hypothg§i§_3. For students falling within a cut score
zone, the general-mixed (GM) formula will be more
predictive of graduate GPA (GGPA) than will the
conventional prediction model (the general-criterion
formula, GC) as corrected for range restriction.

. With prediction error as the dependent
variable, and with variation controlled with respect to
years, methods, sample-size, academic origin, and
intercorrelation of predictors, prediction
differences among years and methods will be obtained.

ﬁypgtng§i§_n. With prediction error as the dependent
variable, with methods limited to the local-proxy
(LP)approach, and with variation controlled with
respect to years, sample-size, academic origin,
intercorrelation of predictors, and level of ability,
prediction will vacillate across years.

. Non-MSU undergraduates will be
associated with greater prediction error.

8
Rationale for Research Hypotheses

mm: It is widely held that a previous
grade-point-average (UGPA, for undergraduate study) is the
best single predictor of future GPA (Mehrens and Lehmann,
1984) . Therefore, given predictor cases limited to a
single year and location, the UGPA would be expected to be
the most reliable predictor of GGPA (graduate academic
performance). Any proposed alternative (such as a formula
formulated from the LP regression approach), therefore,
must be able to outperform UGPA. Hence, Hypothesis A is to
be evaluated by the relative validity of LP prediction
against UGPA prediction. In addition, because predictive
precision only matters where it may alter the conventional
outcome, validity difference (between methods) must be
granted more importance as it falls within the cut-score
zone (the lower bound of the veterinary doctor achievement
distribution). In this instance the weighting is done by a
non-linear transformation of the errors of prediction which
results in greater importance for the prediction errors for
marginal veterinary students.

W: Where data extends across years or
locations, the conventional GC (general-criterion)
regression-formula validity is the standard (where
corrected for restriction of range). Hence, for the proxy
criterion alternative to prove itself useful, Hypothesis 8
requires that the GM (general-mixed) regression-formula

validity must predict cut score proximity cases with less

9
error than does the conventional GC approach (corrected for
range restriction).

W3 Prediction by regression equation may
vary according to specific factors (such as size or
selection of the sample used in estimation of the
regression equation). To determine the relative importance
of such factors, it is necessary to control the influence
of each factor. Control of prediction factors can be
achieved by the deliberate selective sampling of
calibration cases to bias selection formulas in a
controlled manner (to deliberately exaggerate error
effects), or by other deliberate means. Measurement of
this bias is possible by applying the biased formulas to
new data and by estimating the prediction error (between
predicted and actual criterion scores). By entering these
condition-specific prediction error values into a repeated-
measures MANOVA procedure, the statistical and relative
importance of these factors may be evaluated. Where such a
MANOVA procedure is controlled for year and/or method
moderators, the emergence of effects for the LP (local-
proxy) and GP (general-proxy) methods should correspond
with concurrent effects for years: if the dependent
variable is not moderated, then the practice of cross-year
and cross- location generalization will be unimpaired, and,
hence, the conventional GC (general-criterion) approach may

be preferable for use. Assuming that calibration factors

10
will not exhibit random influence, MANOVA effects for years
and for methods are expected.

Wham: The following is consistent with the
case of prediction parameters which vary across time: with
a MANOVA procedure (1) limited to the local-proxy
calibration method, and (2) controlled for year, sample-
size, academic origin, redundancy of predictors, and level
of achievement, an effect for years is expected. (Under
the LP [local-proxy] approach, a larger sample-size and
achievement range is possible, thus allowing the inclusion
of an additional variable, achievement level, into the
study). The likelihood of finding year effects is enhanced
(1) due to the larger number of years which may be
included, and (2) due to the additional variance available
with the use of prediction error reported in interval
scale. A moderator effect (such as a year effect) would
suggest that (1) additional predictors are required in the
regression equation or that (2) blocking is required on
years. Blocking (e.g. local-proxy calibration) is a less
precise means of control (than the addition of missing
predictors), and therefore could be expected to only partly
account for variance caused by changes in predictor
validities.

mm: The importance of the variation in UGPA
standards across institutions may be confirmed by obtaining
a significant difference in mean absolute error between

subgroups which differ on UGPA origin (MSU vs non-MSU) .

11
Another indicator is the contrast between calibration
conditions which differ only on the N factor, but this

would be a weaker test (M: MSU, Na all).

Overview

Chapter II will present a two-part literature review:
Part 1 will review the use of multiple regression in
selection for health science and graduate school
admissions, and Part. 2 ‘will review’ theoretical issues
underlying the methods used in this study.

Chapter III presents the theory being examined by the
present study. The relative effects of proxy-criterion use
are hypothesized for three multiple regression approaches
(LP, GP, and GM) in relation to the conventional multiple
regression approach (GC). Also the potential for a
systematic (vs. random) nature for prediction error is
discussed.

Chapter IV outlines the designs for the two main
analyses (repeated-measures MANOVAs) conducted within the
research study: (1) the test of methods (TOM), a validity
test comparing four multiple regression methods (GC, LP,
GP, and GM) with respect to prediction error as the
dependent variable: the methods each being controlled on
three prediction factors (sample size, source of UGPA, and

intercorrelation of predictors), and (2) the test of

12
factors (TOF) , a validity test comparing prediction
conditions and controlling an additional prediction
factor--past academic performance in pre-veterinary
courses, while holding methods constant (using the local-
proxy, or LP method).

Chapter V presents the results of the study. From the
methods test MANOVA (MANOVATOH) , the plausibility of five
hypotheses will be judged: Hypothesis A, proposing the
greater predictiveness of the LP model relative to that of
the UGPA for marginal students: Hypothesis 8, proposing the
greater predictiveness of the GM model (that conditionally
substitutes a proxy criterion value where the authentic
criterion value is lacking) relative to that of the
conventional multiple regression model (GC): Hypothesis C,
proposing effects for years and methods: and Hypothesis E,
proposing a greater association of prediction error for
students claiming a non-MSU UGPA. In addition, the rank-
order of method effectiveness relative to prediction-error
will be observed. From the test of factors and years
(MANOVATOF), Hypothesis D will be tested (again) to confirm
the existence of year effects.

Chapter VI offers a discussion of the findings.

Chapter VII offers a summary of the research.

CHAPTER II

LITERATURE REVIEW

Part 1: Substantive Review

Health Sciences candidate selection (including
selection for human, veterinary, and dental medicine)
provides an ideal domain for the study of academic
selection because (1) the demand for medical education
remains fairly consistent, and (2) medical education tends
to remain uniform over time. Surprisingly, there have been
few multiple regression studies of academic selection in
this area, and none that this author has seen report any
efforts to validate selection' formulas longitudinally.
Niedzwiedz & Friedman (1976) did study academic selection
across schools, however. Table 1 shows disparity in the
magnitude of correlations between predictors and the four-
year veterinary school grades (ranging from r=
non-significant to r= .55). More important are the
differences among sets of predictors. Assuming that
similar scores and ratings are available to each
institution for the evaluation of applicants, and assuming
that the most predictive variables were reported,
prediction appears to be inconsistent across schools.
Additional studies (Hart,Payne, a Lewis, 1981: Markert,
1983; and Jones & Thomae-Forgues, 1984) found comparable

13

14

Table 1

Cross-institutional disparities in predictors, predictor
weights, and prediction validities

 

Criterion- GPA
Schools Predictors rzYearl r:Year2 r:Year4

 

[Niedzwiedz and Friedman, 1976]

A Physics GPA .31
Physics hours
Chemistry GPA .50

Extra-Curricular Rating
VAT Total Score
C Science GPA .55
Academic Rating
VAT Science Score

D (not reported) NS
[Hart,Payne, and Lewis, 1981]
E College Science .40 (w/biochem. mem.)
E College Science .43 (w/biochem.intp.)
E College Science .43 (w/biochm.p.lrn.)
E College Science .39 (physiology)
[Markert, 1983]
F College GPA .39
MCAT
[Jones and Thomae-Forgues, (1984)]
8-25 College GPA .41
8822 College GPA .37
8-25 MCAT .41 .37

 

s-number of schools in study

15
correlation magnitudes (all near r= .40), but
nevertheless, failed to demonstrate a reliable set of
predictors of medical school performance.

Using Class of 1985 data as the regression formula
calibration sample (to predict veterinary school
performance), and attempting to validate the formula on
1986 and 1987 cohorts, the author obtained estimates
correlating 0.49 with Vet School GGPA for 1986, but for
1987 the correlation dropped to a validity of 0.20. For
1987, however, there was a single variable by GGPA
correlation as high as 0.41. Clearly, predictors selected
for one class via multiple regression procedures may appear
to be unreliable across subsequent classes. As is evident
under these conditions, the regression formula may appear
sufficiently unreliable that admission directors will feel
justified in imposing subjective hunches or even prejudice
into their selection processes, subsequently resulting in
yet weaker and more prejudiced selection formulas.

Some efforts have been made to correct for error which
contributes to unreliable selection formulas. In
particular, attention has been directed towards error that
occurs across locations. We know that considerable
variation in academic standards exists from college to
college. There are also many opportunities for deliberate
and accidental transcription errors in the assessment of
applicant credentials. Clapp and Reid (1976) improved

prediction of medical student performance by weighting UGPA

16

by an index of undergraduate admissions standards. Linn
(1966) reviewed research attempting to re-scale multi-
standard applicant high school GPAs to a single, standard,
scale (HSGPA). Although some prediction gains were
observed for zero-order HSGPA X UGPA correlations by the
use of a specific-school- adjusted HSGPA (HSGPAS),
adjustments had no effect on multiple correlation
coefficients where admissions test data were among the
predictors. In all cases, where prediction gains were
obtained for the validation sample, the prediction
advantage shrunk substantially upon crossvalidity testing.

The use of the proxy criterion should increase sample
size, at the expense of criterion precision: this may be
preferable to accumulating potential year or location error
from using additional years and/or locations as a means of
increasing sample size. Wilson (1982) used UGPA as a proxy
criterion in estimating the validity of GRE (graduate
admissions test) scores. For chemistry majors (the
reported major most relevant to the health sciences), he
observed a correlation between UGPA and first year graduate
GPA of 0.30 (pooled data for years 1974, 1975, and 1978; n=
574).

Stuck (1986) observed that the use of a proxy
criterion (a pre-veterinary' UGPA) might provide better
estimation of veterinary school GGPA than may the use of a
conventional multiple regression approach, because it can

control for the potentially confounding effects of year and

17
location. This approach to developing a selection formula
is referred to as the LP (local-proxy) approach. An LP
selection procedure was carried out retrospectively for a
set of veterinary School applicants. Table 2 compares LP
(local-proxy) prediction results with outcomes from a GC
(general-criterion) model and the (optimal) ordinary least
squares (OLS) correlation. The OLS equation predicts at
Rols' .66. Because it is an original calibration, its
predictors are uniquely selected and their weights are

uniquely computed to minimize the squared error for that

Table 2

Comparative validity: Ordinary Least Squares,Local-proxy,
General-criterion, and college GPA

 

 

RUN APPLIC CALIBRATION PREDICTORS WEIGHTS r
COHORT CRITERION (SCORE X GGPA)

OLS 1984 GGPA 1984 1984 .66

LPPVS 1984 PVUGPA UGPA+VARS 1984 .58

LPCUM 1984 UGPA PVUGPA+VARS 1984 .51

GC31 1984 GGPA 1981 1981 .20

UGPA 1984 - UGPA 100% .20

 

OLS= optimal equation validity for data set

LP- Local-Proxy prediction method

LPPVS' LP approach using PVUGPA (veterinary
prerequisite UGPA) as a proxy-criterion

LPCUM‘ LP approach using UGPA as a proxy-criterion

GC= General Criterion (conventional) prediction
method

GC81- GC approach applying 1981 regression equation
to 1984 data

18

particular sample. GC31 uses a selection formula
calibrated with a GC approach (having fixed predictors and
fixed weights) computed from 1981 data which predicts at
r- .20 when applied to the 1984 data. The correlation for
UGPA, which predicts with UGPA alone, is the same as that
for GC81: at 0.20. The LP formulas LPpVS and LPCUM:
predicting at 0.58 and 0.51, provide a better level of
prediction.

The potential advantage of an LP (local-proxy)
approach lies in its avoidance of moderator error from
uncontrolled year, location, and other confounding effects.
Where data is sampled in a non-random fashion-~as is the
case with admissions data--the presence of such effects
must be expected unless there is substantial evidence to
the contrary. Of course, year and location effects are,
more precisely, artifacts of the changes in selection as it
varies across years or locations. The importance of such
moderation has been suggested by previous research.
Gender, ethnicity, socioeconomic status, personality,
sites, years, and high school rank are variables which have
been found to moderate prediction coefficients. Doolittle
and Cleary (1987) found that women do worse on math items.
Hogrebe, Ervin, Dwinell, and Newmann (1983) reported
differential validity among performance prediction models
for gender for white (but not for black) ethnic subgroups.
McCornack (1983) found white-ethnic subgroup differences

for blacks, and Asians. Goldman and Hewitt (1976) found

19
that minority performance predictiveness differed even
after controlling for specific program category. Wright
and Bean (1974) found. socioeconomic status to moderate
prediction for a sample of white urban male college
students. Heiner and Owens (1985) observed an association
between vocational choice and personality factors, whereas,
Gough and Lanning (1986) obtained male and female
cross-validity coefficients of r- .38 and r- .36 with the
California Psychological Inventory in predicting academic
performance. Hakstian and Woolsey (1985) , in turn, found
validity coefficients for males and females of r=.39 and
r-. 37 with the California Aptitude Battery in predicting
an introductory psychology course grade. Outside the
health sciences area, Linn, Harnisch and Dunbar (1981)
observed differences in LSAT validity for sites and years,
additionally concluding that one cause appeared to .be
variation in grading as opposed to variable aptitudes.
Goldman and Hewitt (1975) likewise found evidence of
grading variability. Particularly, they observed an
adaptation of grading standards relative to the ability
range of the lower two-thirds of the class. Humphreys and
Taber (1973) concluded from their postdictive studies that
variation in grading standards best explained non-linear
semester grade by GRE relationships. In part, this may be
explained by differential attrition from academic

disciplines (Loeb 8 Bowers, 1973) due to, in turn,

20
differential cross-discipline grading (Thornell & McCoy,
1935).

More recently, Elliot and Strenta (1988) were able to
improve the validity of UGPA for prediction by adjusting
scores according to departmental standards, thus reducing
prediction bias for race and gender groups. McCornack and
McLeod (1988) also reduced gender-related prediction bias
by controlling for subject matter. Although Sawyer and
Maxey (1979) found stable prediction over a four-year span
in the prediction of UGPA from .ACT (American College
Testing) scores, Sawyer (1986) later found UGPA variation a
major source of prediction bias, accompanied by the lesser
sources of age, gender, and race. Wood and Langerin (1972)
found that high school rank moderated prediction for high
ability students.

It is recognized that where differential validity is
inferred from discrepant correlation coefficients, the
cause may often be artifactual due to (1) sampling error,
(2) measurement error, or (3) the variability of the sample
studied relative to the variability of the sample to which
the equation is to be applied (commonly called the
restriction-of- range problem, see Mehrens and Lehmann,
1984). Thus some of the preceding findings must be
interpreted with caution, due to the uncertainty regarding

control of artifactual effects.

21

Summary

Mederation of selection formula validity by sites finds
some confirming evidence in the health science education
literature. Moderation across years, however, is more
difficult to evaluate through literature review due to a
dearth of longitudinal study of regression equation
validity. The author's longitudinal study of the
generalizability of a single-year, single-site, equation
over years, found the validity to be poor. For one year,
the use of a portion of the UGPA as a proxy variable
allowed the calibration of a more valid regression equation
for selection. Such an outcome may have been possible
because of the presence of moderating factors associated
with years. The literature reporting moderating factors is
quite extensive. If use of proxy criterion regression
avoids moderation by years, it nevertheless remains

somewhat less valid.

Part 2: Theoretical Review

Sampling

Sampling theory provides justification for drawing
inferences from samples under certain conditions. Suppose
that a population exists ‘who share some independently
acquired mutual attributes and characteristics but who
differ on other attributes and characteristics. If samples
are drawn in large enough numbers and in a random manner,

we are confident that: ( 1) randomly sampled, independently

22

acquired characteristics of the sample can be inferred to
the population as a whole, and (2) randomly sampled,
independently acquired characteristics of the sample can be
inferred to any other large random sample of that
particular population (these follow from the central limit
theorem, see Huntsberger and Billingsley, 1973, pp.
131-134).

Ross (1988) cites Kish fer classifying samples as (1)
experimental, (2) survey, or (3) investigative, based
largely on the quality of the sampling. An experiment
provides deliberate treatment with control of extraneous
variables by randomization or other' means. A survey
selects randomly from a defined population in which each
member has a specific probability of being studied. In the
investigation, however, control is the least. Sampling is
by convenience with neither randomization nor probability
sampling. The study of admissions data falls under this
latter category. Where cases are not sampled randomly, but
are selected according to their value on a particular
variable (let's say “selected on IQ" [scholastic
aptitudej), observed correlations between ‘that. selected
variable (IQ) and another (say, academic performance) may
be lower than would have been the case if the sample had
been sampled randomly (an additional instance of the
restriction-of-range problem).

If therefore, selection is on the dependent variable,

or if the regression uses standardized variables,

23
regression /validity coefficients for selected data will be
artificially low (Richards, 1982). This is the case in
sampling to produce a multiple regression selection
formula, wherein only selected applicant cases include a
GGPA/ criterion. Because the validity coefficient reflects
the proportion of true variance to error variance:
rxy - sZT / (szos s2T+sZE),

a reduction in true variance resulting from selection of a
restricted variable range leaves the error variance intact
thus reducing the proportion of true variance to error
variance. This is an instance of artifactual error,
because the proportion of error is inflated due to the
improper sampling procedures used.

Other error is due to the vagaries of the sampling
process. Nuisance variables (Kirk, 1982), confounding
variables, and moderator variables (Allen & Yen, 1979) are
common labels referring to another factor which may reduce
prediction validity during the sample selection phase of an
investigation. Inasmuch as all members of a population
will not equally share access to, nor interest in, graduate
admissions: certain papulation traits and characteristics
may be over-represented in a non-random sample of
applicants. When such unspecified and uncontrolled-for
variables affect performance on the dependent variable, an

additional source of error is imposed on the investigation.

24
Two hazards accompanying the use of non-randomly
selected admissions data are therefore: (1) restriction-of-

range artifacts and (2) confounded variables.

Measurement

Every measurement can be best regarded as an estimate
which includes unknown components of two types of error:
(1) unsystematic and (2) systematic. Unsystematic error
randomly increases and decreases the measurement ‘value
which is observed (relative to the true value of the object
or process being measured). Given a large number of
measurements, however, the positive and negative errors
tend to cancel, leaving a mean value that is virtually the
true mean for that set of measurements. Systematic error
affects the recorded measurement value in a consistent way
(such as always mistakenly using a meterstick instead of a
yardstick): regardless of the number of measurements
taken, the error remains in the computed mean as well as in
the individual measures. Nevertheless, if the nature of
the systematic error comes to light, individual
measurements and group statistics may be corrected.
Measurement error generally refers to the random kind of
error, whereas, systematic error in the measurement is an
unaccounted for factor which has a nonrandom influence on
the observed scores. If the systematic error is due to
instrumentation or procedures, the factor may be called an

"artifactual factor". Otherwise, the systematic factors

25
will be attributed to uncontrolled variables in the real
data. Of course, errors also differ in level (or
magnitude) 2 error that doesn't differ in magnitude across
samples is known as identically distributed error
(unilevel), whereas error that does differ across samples
(is multilevel) is known as moderated error.

In the context of a distribution, level of error is
known as error variance. If error variance is multilevel
(or heterogeneous: variance differs across factor levels),
it is said to be moderated by that factor. Where error
variance is multilevel and, in fact, correlated with the
levels of the factor, the error is systematic-- a special
case of moderated error.

Where it can be determined that error has systematic or
random qualities, the possibility of controlling the error

becomes more feasible.

The Reliability of Validity Coefficients

The effect of error on correlation coefficients is
more complex than is its effect on observed scores. With
no error, the bivariate correlation is a consistent
maximum: the coefficient of the two latent traits. To use
a biological analogue, error may be likened to a parasite
that invades a "host" variable. From a maximum, latent-
trait correlation value, coefficients decline in value as
greater levels of error affect the observed scores. This

is always true for random error and is virtually always

26
true for systematic error (the exceptional cases being (1)
where error adds a constant value to its host variable, or
(2) where systematic error is perfectly correlated with its
host variable).

Where the error in the observed scores is random, or
where the error is systematic relative to its host
variable, an unbiased estimate of the expected population
coefficient can be computed. It can be computed with
precision, moreover, if from a large sample: of course, the
resulting correlation coefficient will be attenuated from
the latent trait coefficient value due to the random error.
Where error 'varies systematically relative to external
influences, however, the computed estimate of the expected
population coefficient may be inaccurate in some consistent
fashion (biased). The biased estimate of the correlation
of the latent traits would, therefore, require a correction
of the observed-score correlation.

It is common for error to have attributes of both
systematic and random error. It may appear to be normally
distributed as in the case of random error, yet also prove
to be reducible by the addition of variance controls. This
would be the case of a moderating variable (such as year or
location) where error may vary across units of the variable
(e.g. times or sites) in either a systematic or a random
fashion. By blocking on potential moderator variables (see

Neter, Wasserman, & Rutner, 1985) or by using other

27
statistical controls, the moderated error can be removed or
reduced.

The restriction-of—range problem is analogous to the
problem of unreliability. Both can be accounted for in
terms of the proportion of true variance to error variance.
Bouh unreliability and range restriction are reflected in
coefficients which are reduced when the proportion of error
variance increases. By reducing the range of the variation
in the sample of scores (as a consequence of selecting
candidates via a cut-score criterion), the proportion of
true variance is decreased, and, as a consequence, the
proportion of error variance is directly increased (even
with no change in the amount of error variance). If the
error variance is substantially eliminated, the coefficient
approaches the latent-trait value (in the case of parallel
tests, that value should be one, though in the case of
latent traits, the value could range between positive one
and negative one). Linn and Hastings, (1984, p.166)
provide a good discussion of the range restriction issue.
Variation in range only affects raw-score regression
coefficients when the dependent variable range is subject
to variation between the calibration and the application
samples (Richards, 1982), although the precision of this
unbiased estimate depends heavily on a large sample size.

Richards (1982) discusses the data characteristics
displayed in Table 3, which result in error artifacts under

(1) raw-score and (2) standardized regression/

28
correlation. Most notably, raw-score regression
coefficient estimates are unbiased by measurement error in
the dependent variable, whereas. standardized regression
coefficients are unbiased by variation in scale units.
Neither type of regression is immune from bias due to
measurement error in the independent variable.

Although artifactual error in computed statistics may
often be reduced through the use of various correction
formulas (e.g. for unreliability or for range restriction),
these corrections, nevertheless, are limited by the
analyst's ability (1) to identify the affected variables
(2) to determine levels of variance or reliability under
other circumstances. Furthermore, a ”corrected"
coefficient cannot be assumed to be completely accurate,
and may be expected to be conservative (see Linn, Harnisch,

& Dunbar, 1981).

Table 3

Regression bias factors based on Richards (1982)

 

BIAS FACTOR TYPE OF REGRESSION
Raw-score Standardized

 

Units of measure differ XX

Dispersion of independent variable xx
Unreliability of dependent variable xx
Dispersion of dependent variable XX xx
Unreliability of independent variable xx xx
Norm referent scale xx xx
Change in test length xx xx
Selection on meditating variable xx xx

 

29

Correction of range restriction for a standardized
regression equation requires the correction of each partial
coefficient. Correction of the raw-score regression model
is difficult. because it requires the generation of a
constant. in addition to the transformation of partial
coefficients into "b" weights (raw score coefficients).

Some important implications of this theory for
selection need to be considered:

(1) If a sample is not randomly drawn its statistics
will, nevertheless, represent its population as a whole if
all of its relevant characteristics are invariant from
member to member (for example, all Girl Scouts are
invariant with respect to gender and relatively invariant
with respect to age).

(2) If a sample is not randomly drawn, its statistics
will also represent its population if the sample is large
enough and if the subjects happen to be representative:
automobile drivers are random relative to gender and
political party preference: ten cars in a line that are
picked from a public parking lot may not reflect population
composition accurately, but a few hundred cars picked as a
block from a parking lot may represent population
composition quite accurately relative to gender or
political preference. However, sample correlations are apt
to be biased due to moderated error (non-random samples

tend to systematically select certain subgroups).

30

(3) In practice, artifactual differences can be
anticipated. Because many factors may be predictive of a
particular kind of human performance, one must assume that
(a) people may perform at a similar level even though they
differ with respect to particular attributes (abilities on
several factors may compensate for deficiencies on other
factors): and (b) for a given year or location, non-random
pressures (self- selection or other non-random selection)
must be expected to favor particular factors/attributes
resulting in samples which are systematically different
from the population as a whole. For instance, a change in
requirements for admission to human medicine programs may
affect the rate and quality of applications to veterinary
medicine.

Sample subgroups may differ in quality and level of
preselection prior to inclusion in the sample, due to
either self or institutional selection. Aggressive
students may be over-represented due to self-selection, and
range widely on required aptitudes while students with high
verbal skills may be over-represented due to institutional
selection and they may range ‘very little on required
aptitudes. The effects of these disparities are (a) to
create the appearance of a differential validity of any
fixed selection formula for the various applicant
subgroupings (e.g. verbal, aggressive) and (b) to create
the appearance of a differential selection formula validity

across samples (e.g. applicants of different years or

31
locations may differ in their subgroup structure, see Linn,
1983: and Linn 8 Hastings, 1984). Nevertheless, the
'validity difference would be largely artifactual, a
consequence of range- restriction due to selection.

Differential validity may actually exist independent of
the artifactual manifestation, of course; however, a more
insightful conceptualization is to attribute this
particular validity problem to model misspecification
(i.e., a selection formulation lacking in one. or :more
important variables, such as origin of UGPA).

In short, under the uncontrolled conditions of an
investigation-level study such as the analysis of graduate
admissions data, the likelihood of moderated/ confounded
prediction across years or locations is substantial. The
result of moderated prediction may be biased regression and

validity coefficients.

Multiple Regression

The usual mathematical procedure used in multiple
regression yields what is called the ordinary least squares
(OLS) estimate of the criterion (or simply the least
squares estimate). This term means that the sum of the
squared prediction errors is minimized for the data used
(see Neter, Wasserman, & Kutner, 1985) . The OLS linear
model that is calibrated, however, is truly OLS only with
respect to the specific combination of the calibration

predictors and the calibration criterion. The OLS

32

correlation (coefficient- ROLS) is the optimal correlation
of predicted and actual scores because of the following:
(1) scores are calculated by applying the calibrated linear
model back onto the calibration predictors, (2) the formula
(the linear model) was specifically developed to predict
the same criterion data: the subsequent correlation of
”ideal” criterion and "ideal" estimated criterion scores is
optimal, (3) the OLS correlation is apt to be inflated, to
some extent, due to the chance correlation of error with
the criterion: when this chance correlation melts away
during the application of the linear model to a new sample,
the decrease in the coefficient upon cross- validity
measurement is called shrinkage.

Multiple regression assumes that (1) responses on the
dependent variable are independent, that (2) the variance
is constant across cases, that (3) the errors are normally
distributed, that (4) the system being modeled is in a
steady-state equilibrium, and that (5) errors are
uncorrelated (Kenny, 1979, pp.50, 51).

If all of the predictive factors are perfectly
represented by the set of predictors and the criterion, the
calibrated model will be optimal (although only a perfect
correlation if in a totally determined situation where the
latent trait correlation is one). Otherwise, as Deegan
(1976) and Pedhazur (1982) explain, the following will be
true: (1) where superfluous factors are included among the

predictors (Deegan's overspecified model), unsystematic

33

error will be added to the predicted scores when the model
is applied, thus causing an underestimate of the validity
coefficient of the prediction scores (attenuation due to
unreliability); (2) where some factors are omitted from the
set of predictors (Deegan's underspecified. model which
includes the case of omitted independent factors: Deegan,
p. 238) , systematic error will be added to the predicted
scores (this problem can be overcome only by providing the
missing predictor data): (3) when a combination of these
two situations exists, the model is said to be misspecified
(misspecified models have biased parameter estimates which
exhibit an interactive character, Deegan, p.238; obviously,
without evidence that all predictive factors are
appropriately represented, all practical models must be
assumed to be somewhat misspecified); (4) where important
predictor variables are highly correlated,
multicollinearity is said to exist.

Pedhazur (1982) points out that there is differential
use of the term, multicollinearity, but its problematic
manifestation is biased predictor weights. This problem
results in systematic error being added to predicted scores
when the calibrated model is applied to non-calibration
data (new data). Much work has been done to develop
alternate regression techniques to cope with the problem of
multicollinearity. Unfortunately, most of these techniques
are helpful only in the most severe circumstances (Huberty

8 Mourad, 1980: Morris, 1986: Cattin, 1981). Kenny (1979)

34

lists three characteristics associated with low predictor
error of measurements These characteristics are also
associated with reduced problems of multicollinearity: (1)
a high reliability, (2) a low regression coefficient, and
( 3) a low predictor intercorrelation. He also adds that
multicollinearity decreases as the number of predictors
decreases relative to the number of cases.

If, therefore, certain data characteristics exist, the
regression procedure will yield an equation which will
specify an efficient means of weighting several variables
in order to estimate a criterion. Regression equations may
suffer from either too few or too many predictor variables.
A lack of predictors results in biased regression
coefficient estimates, while too many predictors makes the
coefficient estimates less reliable. High predictor
variable intercorrelation may also bias coefficient
estimates although this problem is less severe for
predictors with high reliability and/or a moderate to low

correlation with the criterion.

Multivariate and Univariate Analysis of Variance and the
T-Test

The t-test is a special case of the more general ANOVA;
therefore they share similar theoretical assumptions. The
ANOVA procedure yields a ratio of the variation of means to
the variation of simple scores. Under the null hypothesis

of no effects for the levels of a factor studied, the ratio

35

will be approximately one (1:1) . Otherwise the variation
of means will result in a ratio greater than one thereby
suggesting the implausibility of the null hypothesis. The
analysis of variance requires an interval-level dependent
variable and a nominal-level independent variable with at
least two levels (a one-way ANOVA) . Where there is more
than one independent variable, the ANOVA may be two-way,
three-way, or etc. If two or more multi-level independent
variables are in the ANOVA design it is classified as a
factorial design.

The ANOVA procedure makes certain statistical
assumptions of the data being analyzed: where these
assumptions are violated, ANOVA findings may be less valid.
Under all circumstances the dependent variable scores must
be independent of each other. And where the sample-sizes
differ per condition (cell-sizes), the variances must be
equivalent. Violations of other assumptions tend to be
less important (see Kirk, 1982, pp. 74-79).

Another way of looking at the issue of the independence
of the dependent variable scores is in terms of accounting
for variance. When sampling is less than random, dependent
variable scores may not be independent. If dependent
variable scores are correlated, there is apt to be a
variable whose control would result in independent scores.
The question becomes, therefore, whether or not the
important factors have been controlled in the study's

design. The answer to this question requires a rational

36

analysis of the potential causes of score variance and the
adequacy of the study's design to differentiate such
variation. For example, where the dependent variable is a
performance score, it is critical that there be no overlap
in respondents' between-condition samples. However, where
the dependent variable is error-due-to-method, as in the
present study, variance in the dependent variable will not
be substantially affected by randomly overlapping samples
of respondents whose scores are fixed prior to the
experiment: the systematic error will be virtually a
consequence of the, mathematical transformations
attributable to the methods. If, however, the sampling is
restricted from certain levels of population ‘variation
(e.g. particular years, locations, ability levels), the
restricted variables need to be included in the
experimental design as independent. variables: otherwise
the dependent variable is likely to be dependent on one or
more unspecified moderators and ANOVA validity will suffer.

Where cell variances are unequal, it is important that
cell sizes be equal. If cell variances and sizes are
approximately equivalent, the .ANOVA. validity' should be
acceptable, particularly where the sample sizes are large.

Extending single dependent variable analysis of
'variance to the multivariate case (MANOVA), additional
assumptions must be met in order to make valid statistical
inferences. Tabachnick and Fidell (1983, pp.231-235)

include the following assumptions and requirements: (1)

37

homogeneity of covariances replaces its ANOVA analogue,
equality of cell variances: (2) the number of cases per
cell must exceed the number of dependent variables: (3) the
dependent variables should exhibit a multivariate normal
distribution: (4) there should be no outlier cases: (5) all
dependent. variables and covariates should share linear
relations: and ( 6) dependent variables should exhibit an
absence of multicollinearity.

In short, the t-test, ANOVA, and MANOVA, test between
group variation by the variation of simple scores in order
to conclude whether variation between groups exceeds limits
acceptable for the null hypothesis. Independence of
responses and the equality of either cell sizes or variance
are the critical assumptions. As this procedure is
extended to the multivariate situation, some additional

requirements become important.

CHAPTER III

PROPOSED THEORY

The term YQGGPAi in Equation 1 represents the
estimated GGPA (graduate school GPA) for student i. For a
conventional GC (general-criterion) multiple regression
calibration of pooled cross-year and/or cross-location
samples, the selection formula is identical to equation (1)
where GGPA has been regressed on mutual predictors (cross-
year/cross-site calibration can utilize only those
predictors which are mutually available: exceptional

predictors must be discarded):

Y.GGPA1 -ﬁO+ ﬁUGPA1*XUGPA11 +82X21 +,...,+ﬁani (1)
Where,
YeGGPAi - Estimate of Grad Program GGPA
XUGPAli - Undergraduate UGPA
(X21...Xni) - Other application variables
In practice, a formula is often calibrated with the
GGPA of only the first year. For the GC (general-
criterion) approach, the predictor data of non-admitted
applicants is ignored, while only the predictors of
accepted students are saved. The predictors lie idle
through the freshman through junior years, until the end of
the senior year when the final GGPA is available as a

criterion (C). The formula is calibrated on a selected

38

39
range of applicants in year four, nevertheless, applied to
the full range of applicants in year five.

In the LP (local-proxy) regression approach, PVUGPA, a
subset of UGPA (specifically, the college pro-veterinary
UGPA) becomes the proxy-criterion for the solitary-year-
and-location sample. Assuming that PVUGPA - GGPAi + error,
equation (1) also applies to the LP calibration when the
UGPA subset serves as a proxy criterion and when raw-score
regression is used with an adequate sample size (error in
the dependent variable does not bias the raw-score
regression coefficient: Richards, 1982) . In contrast to
the conventional approach, however, the LP calibration
cases are also the cases to which the subsequent selection
formula is applied. A regression model using all applicant
data for year ”Y" is calibrated at the time of application,
using a subset of UGPA as the criterion (PVUGPA, the UGPA
for the veterinary prerequisites) and UGPA as one of
several predictors. The resulting regression equation is
used as the selection formula for the same set of year "Y"
applicants: the selection formula is applied to the year

”Y” applicant data to compute selection scores for each

applicant.

General-criterion (GC) Regression Approach
The conventional strategy for implementing multiple
regression in the development of a selection formula will

be identified as the general-criterion approach, or GC.

40

For this model, a large pool of cases are cumulated across
years and/or across locations adding error where years
and/or' locations are moderators. An authentic though
restricted criterion is used (e.g. GGPA: it represents
mostly the higher performing applicants). The optimal
formula for predicting the criterion is limited by (1) the
availability of mutual predictors among all the cases, and
by (2) the predictiveness of those variables for that
particular pool of cases. Once a selection formula is
calibrated, the permissible predictor variables and their
accompanying weights are set until the next calibration.
Depending on the range of years and locations represented
in the calibration sample, the formula may be generalizable
across time and locations. If the selection-rating system
changes over time or location, however, the potential
validity of the formula may decline. A large number of
assumptions are required to support this approach due to
potential moderator variables, multicollinearity, and
restricted range problems. Moderators and
'multicollinearity’ become important concerns. because the
formula is being generalized to cases outside the
calibration pool (usually, across years and/or locations).
Because calibration case UGPAs are range- restricted
relative to the applicant pool, corrections for

restriction- in-range are required to adjust the calibrated

selection formula.

41
With this approach, error may enter by way of the
following channels:
Ca. criterion variables
Cb. predictor variables
Cc. statistical artifact (connected with the multiple
regression procedure or correction
specifications)
Cd. individual effects

Ce. moderators (e.g. year or location effects)

Potential sources of systematic error include the

following:

Sa. individual aptitude/motivation variation

Sb. halo and other individual error

Sc. qualitative/quantitative metric variation

Sd. restricted content domain

Se. unspecified predictors

Sf. multicollinearity among predictors

Sg. individual effects

Sh. moderator effects (e.g. years and locations)

The notion of an individual effect being systematic may
seem dubious to some. Nevertheless, it is both possible
for individual error (such as a halo effect) to (1) occur
across graders in a consistent fashion or, (2) to affect
graders in a random fashion. Potential sources of random

error include the following:

42
Ra. measurement error
Rb. sampling error
Rc. individual effects
Rd. moderator effects (e.g. years and locations)

Re. superfluous predictors

Some advantages for the general approach include the
following:

Aa. criterion accuracy--use of the authentic criterion

Ab. potential generalizability across years/or locations
(diminishing the need for frequent recalibrations)

Ac. the accumulation of a large calibration pool will
diminish the problems of sampling error relative to
the estimation of a population observed-correlation,
given that (1) sampling is equivalent to random
across years and locations and (2) moderator effects

are largely absent

Disadvantages of the general approach include the
following:

Da. the need for previous cohort data

Db. a fairly complicated analysis procedure is required

Dc. the selection formula is fixed (closed to new
predictors)

Dd. the criterion exists only in a selected sample, thus
requiring corrections of the calibrated selection

formula due to restrictions-of—range

43

De. the dangers of systematic error due to individual
effects, year effects, and/or location effects
(compounded since years and locations are seldom
drawn randomly or even with large numbers)

Df. the predictor pool is diminished because some
locations or years don't have conforming variables,
thus increasing the underspecification of predictors

problem (and systematic error)

Optimum conditions for the use of the general approach
are as follows:

Oa. a large calibration sample

Ob. a large application sample

Oc. the stability of qualitative/quantitative metrics
of selection variables across locations

0d. multiple independent (orthogonal) variables which
closely predict the criterion

Oe. a rich pool of parallel predictors which exist
across locations

Of. a minimum drift of the population model over time

Og. a high validity of population model over locations

Oh. stable demographic characteristics

Oi. a stable applicant pool (despite recruitment

variation)

44
Using Multiple Regression to Shrink Error

If a predictor such as UGPA has been measured with a
variety of attribution rules (for rating performance)
across applicants, the pooled predictor values will include
error moderated by locations and/or years. Linn (1966)
demonstrated how raw-score multiple regression can be used
to ”shrink" (reduce in magnitude) the moderated error.
This can be done where the following appropriate conditions
exist: (1) there are many cases sharing a given rule, (2)
at least two mutual measures of performance are known to be
standard for all of the applicants, and (3) these mutual
measures of performance are similar in nature to the
uncorrected predictor (measures share common factors). For
example, to correct (partially) UGPAs, one would like to
have a set of cohort data where, in addition to (1) the
uncorrected UGPA, there is ( 2) a mutual GGPA (graduate
program GPA), (3) a mutual admissions test score and (4) a
dummy variable for each rule (or school of origin). By
using the graduate program GGPA as the calibration
criterion and the test score and uncorrected college UGPA
as predictors, an adjustment weight can be obtained for
correcting cases of a similar rule.

If college UGPA, test score, and graduate program GGPA
are parallel measures, then the predicted scores resulting
from the application of the calibrated raw-score multiple
regression model, should be estimates of graduate GGPA.

This constitutes a particular case of improving an

45
underspecified model, since school variables were
correlated with the criterion and accounted for a certain
type of variation which otherwise would have been regarded
as error. The following special cases are possible

modifications of Linn's model augmentation approach:

WW

If, however, the raw score multiple regression is done
without variables identifying the school of origin, the
predicted scores will be estimates of graduate GGPA plus
some moderated error due to the absence of the missing
predictors (school variables). Nevertheless, to the extent
that the variation in rule (of UGPA standards) is random
and the number of cases is sufficiently large, the
predicted scores may be partly-corrected (errors of

estimate would be unbiased).

 

If a substantial proportion of the cases already have a
similar rule, the correction problem, obviously, is
diminished, and the precision of prediction for that
particular rule-subgroup will increase. In contrast,
subjects with non-conforming rules will be predicted with
less accuracy. To the extent that the non-conforming rules
differ randomly from the typical rule, quantitatively, and
to the extent that qualitative variation is random in
nature relative to the typical rule, the calibrated
selection formula will be optimal for the whole of the

46

assorted rule-subgroups despite its inferior prediction for
individuals having non-conforming rules. Where a predictor
variable's rules vary greatly (lacking a predominate rule-
subgroup), general precision will suffer and the calibrated
selection formula will tend to select more for general as
opposed to specific ability. This is because the predictor
variable will be less reflective of specific ability due to
varying qualitative and quantitative standards/ rules:
hence, only general ability will tend to remain intact as a
common factor.

WW

It may be assumed that cross-institutional variation in
academic standards is an important factor in criterion
integrity. However, variatiOn at other levels may be of
equal or greater importance, such as at the curriculum or
major level (Elliot & Strenta, 1988: McCornack 8 McLeod,
1988). On the other hand, academic-major variation may be
due to factors independent of subject matter, such as
specific course content or specific class instructor.
Error at this level cannot be reduced by merely controlling

for location.

 

Should a predictor differentiate unequally along some
important dimension of a sample of cases, the resulting
improvement in prediction would affect only a restricted
range of cases (i.e. interview ratings may only be valid

when augmenting prediction for the highest ability

47

students). In this instance an improvement in
predictiveness does not improve prediction in the cut score
region. Therefore, an increase in a coefficient value may
not correspond to any real improvement in selection formula

validity.

WWW

If the necessary conditions are obtained for the
shrinking of variable error (as outlined above) except that
the (graduate) GGPA criterion is replaced with a proxy
(UGPA) , the consequent model will estimate a GGPA with
prediction error shrunken (relative to the accuracy of the
proxy variable and the randomness of the school-rules in
which the proxy variable is measured). The greater the
potential year or location influence on the criterion, the
greater the potential for reducing prediction error. In
addition to allowing control of year or location influence,
a proxy criterion may also be useful as a means of
extending the size and variability of the calibration
sample (the sample used to calibrate a selection formula).
If the criterion (GGPA) represents the same
measurement factors as a proxy (UGPA) then the proxy may be
regarded as c+e (criterion 4- error). If a predictor
variable then correlates with the criterion, then rcp >
r(c+e)pr since attenuation of correlation results from
unreliability (or error) in a measure. However, if raw-
score regression coefficients are used, then the criterion

with error will combine with a predictor to yield an

48
unbiased estimate of the raw-score regression coefficient
(Richards, 1982) . Hence, to the extent that UGPA may
approximate GGPA plus a random error component, and to the
extent that prediction factors vacillate annually, or
locally, a proxy criterion may improve prediction and

selection.

Proxy-criterion Alternatives to Conventional Prediction
The use of a proxy criterion allows additional
alternatives to the conventional procedure for calibrating
a selection formula. The availability of a suitable proxy
criterion may, potentially, extend the number of cases
available to analysis in addition to extending the
variability of the cases available. Depending on the
quality of the proxy criterion, the sample may be
controlled for year or location variability by restricting
the sample on such confounding/moderating variables. If
the proxy criterion allows an increase in the usable sample
size per year, the loss of other-year or other-location
cases may not be critical. Where number of cases seems to
be a more critical factor than moderator problems, the
sample may be increased by adding criterion-absent cases to
those having a criterion, because a suitable proxy can
substitute for the criterion. Should no ideal criteria be
available, use of the above procedure with all cases while
using an inferior criterion may still be beneficial.

Tables 4 through 9 (on pages 62-65) contrast four

49

calibration methods on the following features: (5)
potential channels for error, (6) potential sources of
systematic error, (7) potential sources of random error,
(8) potential advantages, (9) potential disadvantages, and
(10) optimal conditions for use of each of four proxy
methods. A discussion of the attributes of three potential
calibration approaches precedes presentation of the tables
(items are ordered according to the tables):

W

The local-proxy calibration uses a single-year, single-
site sample to estimate predictor equation parameters. If
a proxy variable is appropriate and available for replacing
an unavailable criterion (e.g. future GGPA), a selection
formula can be calibrated and applied back to the same data
to yield prediction scores having a validity approaching
optimal validity. Using raw-score regression and assuming
that the proxy approximates c+e (criterion + random error),
the linear model generated will approximate the OLS
regression model. Validity here depends greatly on the
quality of the proxy criterion, although a second important
asset would be a rich assortment of reliable predictors.
Error, therefore, may enter by way of the following
channels:

Ca. criterion variables (particularly, the proxy
variable)

Cb. predictor variables (present or absent)

50
Cc. a statistical artifact (connected with the multiple
regression procedure)

Ce. moderators (e.g. gender, social class)

Potential sources of systematic error would be the

following:

Sa. individual aptitude variation

Sb. halo or other individual error

Sc. qualitative/quantitative rule variation

Sd. a restricted content domain

Se. unspecified predictors

Sf. the level of multicollinearity among predictors

Sg. individual motivation variation

Sh. moderators (e.g. gender, social class)

There is little reason to anticipate substantial
changes in individual aptitudes over the course of a
graduate program, although such is possible (e.g. brain
disease or injury). More likely is the possibility of
halo effects which consist in systematic increases or
decreases in the criterion score due to subjective bias on
the part of the instructor who assigns the predictor or
criterion score. Although differences in grade
attribution-rules among schools offer the potential for
systematic error in predicting criterion scores using
school UGPA as a predictor, Linn (1966) observed that this

problem was insignificant when admissions-test data was

51

included among the predictors. Where some cases contain
predictor scores representing performance on a narrower
content domain, those cases are likely to be overpredicted
on their criterion performance. Of course, a change in
level of motivation may affect criterion performance.
Where important predictors are excluded from the prediction
model, systematic bias is added to the estimated criterion.
.Although. multicollinearity is most problematic ‘where a
regression model is being generalized to additional
samples, it nevertheless can play a minor role in LP
estimation. In particular, multicollinearity may distort
variable weights so that when the calibrated selection
weight is applied to a parallel variable (i.e. a weight
calibrated on mostly 1983 MCAT-- Medical College Admissions
Test-- scores gets applied to a 1987 MCAT score),
systematic bias may be added. This problem is exacerbated
by the level of measurement error present. The absence of
important predictor variables from the selection model will
also distort the calibrated selection formula. Lastly, the
accumulation of moderators is likely to be a consequence of
non-random sampling. Moderators, in turn, may have a
systematic influence on prediction error.

Sources of unsystematic error in the predictors would
likely bias the selection formula. Random error affecting
the proxy criterion may distort regression weights for low
sample-size-to-measurement-error ratios while unreliable

predictors violate the regression assumption of perfectly

52
measured independent variables. Sources of random error
would be the following:
Ra. measurement error
Rb. sampling error

Rd. moderators (e.g. gender, social class)

Like measurement error, sampling error is defined by
statisticians as random error, although statistical
differences between random samples may be partly systematic
(Pedhazur, 1982, is an exception who includes systematic
error as a type of measurement error). The fact that
statisticians prefer to attribute the systematic component
of sampling error to unspecified predictors rather than to
the pool of sampling error does not alter the practical
fact that differences between sample statistics will always
be partly due to the problem of unspecified predictors.
Mbderator error can be expected to be random (i.e. level
of measurement error in MCAT scores may vacillate randomly
across years), except where a theoretical basis exists for
a systematic nature.

Some advantages for the LP approach include the
following:

Ad. a simple analysis procedure
Ae. freedom from the need for previous cohort data
At. the proxy criterion provides the desirable
feature of interval level scale where some graduate

programs have only dichotomous grading (pass/fail)

53

Ag. the option of adding predictor variables for any new
set of applicants (e.g. alternate admission test
scores can be specified and weighted)

Ah. freedom from some potential systematic or random
ability change) year effects, and location effects

Ai. range restriction problems are largely eliminated by
the implementation of all applicant cases with proxy
criterions

Aj. non-admitted cases can be used

Disadvantages of the LP approach include the following:
Dg. the need to recalibrate a new selection formula
for each set of new applicants
Oh. the danger of individual aptitude or motivation
change during the course of the program in question

Di. potential proxy criterion inadequacies

Although any change in individual aptitude or
motivation would decrease the ‘validity of an aptitude
measurement (lower the regression coefficient corresponding
to the aptitude measure), where an authentic criterion is
used, the lower validity would be accurate. With use of a
proxy criterion, however, the validity would remain
inflated because the proxy would not reflect the

problematic trait variation.

54
Conditions under which the LP approach will be optimum:

Oa/b. a large calibration/ application sample

Oc. the stability of qualitative/quantitative metrics
of selection variables across locations

0d. multiple independent (orthogonal) variables which
closely predict the criterion

Oe. a rich pool of parallel predictors which exist across
locations

Oj. the stability of aptitudes and motivation

W

The general-proxy (GP) approach uses a multi-year
sample and a proxy criterion. With the pooled sample
fixed relative to location, the proxy criterion approach
can be used to calibrate a selection formula from a pool of
cases accumulated across several years. The advantage of
this approadh is the potential for compensating for cases
lost while controlling for a moderator, because it allows a
greater number of usable cases within each applicant-year
sample. It may also generalize across years, thus reducing
the frequency of the need to recalibrate a formula.
Relative to the local approach, a potential liability is
the possible systematic error due to year effects. Because
only a small range of years of data is likely to be
accessible, the external validity of the selection formula
may be poor (the local approach does not attempt to

generalize). Other variations of this compromise approach

55
are possible also, such as using a pooled sample fixed
relative to year but not to location. Error may enter by
way of the following channels:
Ca. criterion variables (particularly, the proxy
variable)
Cb. predictor variables
Cc. statistical artifact (connected with the multiple
regression procedure)

Ce. moderators (e.g. year or location)

Potential sources of systematic error include the
following:
Sa. individual aptitude variation
Sb. halo and other individual error
Sc. qualitative/quantitative rule variation
Sd. a restricted content domain
Se. unspecified predictors
Sf. the level of multicollinearity among predictors
Sg. individual effects

Sh. moderator effects (e.g. year or location effects)

Potential sources of random error would include the
following:
Ra. measurement error

Rb. sampling error

Rc. moderator effects (e.g. year or location effects)

Rd.
Re.

56
individual effects

superfluous predictors

Some advantages for the general-proxy approach include

the following:

Ab.
AC.
Ad.
At.

Ai.

Aj.

generalizability to other locations and/or years
sample size can be increased by pooling
a simple analysis procedure
the proxy criterion provides the desirable
feature of interval level scale where some graduate
programs have only dichotomous grading (pass/fail)
freedom from potential moderated error due to
location effects (or alternatively, freedom from
moderated error due to year effects)
range restriction problems are largely eliminated
by the implementation of all applicant cases with
proxy criterions

allows use of non-admitted cases

Disadvantages of the general-proxy approach include the

following:

Da.

the need for previous cohort data
the selection formula is fixed
error due to years, sites, and individuals

restrictive tendency in predictor pool

57
Dh. the danger' of individual aptitude or :motivation
change during the course of the program in question

Di. proxy criterion inadequacies

Conditions under which the general-proxy approach will
be optimum:

Oa/b. a large calibration/ application sample

Oc. the stability of qualitative/quantitative rules
of selection variables across locations

Od. multiple independent (orthogonal) variables which
closely predict the criterion

Oe. a rich pool of parallel predictors which exist across
locations

Of. the stability of population model across years

Oj. the stability of aptitudes and motivation

The general-mixed (GM) calibration approach includes a
multi-year sample and a conditional criterion: an authentic
or a proxy criterion. This is the conventional strategy
for implementing multiple regression in the development of
a selection .formula except with the inclusion of
non-admitted graduate applicant cases. Non-admitted
graduate cases utilize PVUGPA (prerequisite veterinary
course UGPA) as a proxy for the authentic graduate GGPA
criterion. A large pool of cases are cumulated across

years and/or across locations. The optimal formula for

58

predicting the criterion is limited by the predictiveness
of those variables for that particular pool of cases. Once
a selection formula is calibrated, the permissible
predictor variables and accompanying weights are set until
the next calibration. Depending on the range of years and
locations represented in the calibration sample, the
formula may be generalizable across time and locations. If
the system changes over time or location, the potential
'validity of the formula may decline. As in the
conventional GC (general-criterion) model, a greater number
of assumptions are required to support this approach, but
the range restriction problems are effectively resolved so
that corrections for selection may be unnecessary.
Multicollinearity remains a concern since the formula is
being generalized to cases outside the calibration pool
and, usually, across years and/or locations.

With this approach, error“ may enter by ‘way of the
following channels:

Ca. criterion variables

Cb. predictor variables

Cc. statistical artifact (connected with the multiple

regression procedure or correction specifications)
Cd. individual effects

Ce. moderators (e.g. year and location)

Potential sources of systematic error include the

following:

59
Sa. individual aptitude/motivation variation
‘Sb. halo and other individual error
Sc. qualitative/quantitative metric variation
Sd. a restricted content domain
Se. unspecified predictors
Sf. multicollinearity among predictors
Sg. individual motivation variation

Sh. moderator effects (e.g. year or location effeCts)

Important sources of potential random error include
the following:
Ra. measurement error
Rb. sampling error
Rc. individual effects
Rd. moderator effects (e.g. year or location effects)

Re. superfluous predictors

Some advantages for the GM approach include the
following:

Aa. criterion accuracy- use of the authentic criterion

Ab. potential generalizability across years/or
locations (diminishing the need for frequent
calibrations)

Ac. the accumulation of a large calibration pool will
largely diminish the bias effect of sampling error
given that sampling is equivalent to random across

years (or locations)

Ad.

Ai.

Aj.

60
simplicity of analysis
range restriction problems are largely eliminated
by the implementation of all applicant cases with
proxy criterions

increased sample size due to added rejectee cases

Disadvantages of the general-criterion-proxy approach

include the following:

Dd.
Dc.

Df.

Dh.

Di.

the need for previous cohort data

the formula is fixed (is closed to new
predictors)

dangers of systematic error due to individual
effects, year effects, and/or location effects
(compounded since years and locations are seldom
drawn randomly or even in large numbers)

the predictor pool is diminished as some
locations don't have conforming variables thus
increasing the underspecification-of-predictors
problem and its systematic error

the danger of individual aptitude or motivation
change during the course of the program in question

proxy criterion inadequacies

Optimum conditions for the use of the GM approach:

Oa.

Ob.

a large calibration sample

a large application sample

61

Oc. the stability of qualitative/quantitative rules
of selection variables across locations

Od. multiple independent (orthogonal) variables which
closely predict the criterion

Oe. a rich pool of parallel predictors which exist across
locations

Of. a minimum drift of population model over time

Og. a high validity of population model over locations

Oh. stable demographic characteristics

Oi. a stable applicant pool (despite recruitment
variation)

Oj. stability of aptitudes and motivation

It is expected (in any measurement situation) that
unspecified factors will add a random distribution of error
to the scores of the cases being measured. Where scores
are estimates of future ratings (predicted scores) it is
possible to actually obtain a measure of these errors in
order to study the nature of the error. This is done by
subtracting subsequent outcome scores (the criterion) from
the prediction scores. If these errors are partly
correlated with one or more potential predictors, they are
systematic and it is possible that the score prediction
formula may be improved by modifying predictors or their
weights. If the errors appear to be non-randomly
distributed for a large sample, but they fail to correlate

with conceivable predictors, there nevertheless is likely

62
to be an unspecified factor’ of (predictive importance:
therefore the selection /prediction formula will be
inaccurate. If the distribution of errors is random, there
may, nevertheless, be a moderating variable (e.g. years,
locations) for which the level of error changes in an
unsystematic way. This also results in a selection/
prediction formula which is inaccurate. Only where error
is randomly distributed and apparently irreducible by (1)
the addition of predictors or' by (2) controlling for
potential moderator variables, can it be concluded that the

selection/prediction formula is precise.

Table 4

Channels by which error may affect calibration methods

 

 

CHANNELS FOR ERROR GC LP GP GM
Ca Criterion variables . . . . . . . . . . xx xx xx xx
Cb Predictor variables . . . . . . . . . . xx xx xx xx
Cc Statistical artifacts . . . . . . . . . xx xx xx xx
Cd Individual effects . . . . . . . . . . . xx xx
Ce Moderators . . . . . . . . . . . . . . . xx xx xx XX

 

GC= General-Criterion prediction method
LP- Local-Proxy prediction method

GP- General-Proxy prediction method
GM- General-Mixed prediction method

Potential systematic error for each

63

Table 5

calibration method

 

SOURCES OF SYSTEMATIC ERROR

O
O
E
G)
'U
9

 

 

 

 

Sa Aptitude/ motivation variation . . . . . xx xx xx xx
Sb Halo and other individual error . . . . xx xx xx xx
Sc Scale irregularity . . . . . . . . . . . xx xx xx XX
Sd Restricted content domain . . . . . . . xx xx xx xx
Se Unspecified predictors . . . . . . . . XX XX XX XX
Sf Multicollinearity among predictors . . . XX XX XX XX
Sg Individual effects . . . . . . . . . . . XX xx XX xx
Sh Mederator effects . . . . . . . . . . . xx xx XX XX

cc- General-Criterion prediction method

LP- Local-Proxy prediction method

GP- General-Proxy prediction method

GM: General-Mixed prediction method

Table 6

Potential random error for each of four calibration methods
SOURCES OF RANDOM ERROR GC LP GP GM
Ra Measurement error . . . . . . . . . . . XX xx xx XX
Rb Sampling error . . . . . . . . . . . . . xx xx xx xx
Rc Individual effects . . . . . . . . . . . xx XX xx
Rd Moderator effects . . . . . . . . . . . xx xx xx XX
Re Superfluous predictor rs . . . . . . . . . xx xx xx

 

cc- General-Criterion prediction method
LP- Local-Proxy prediction method

GP- General-Proxy prediction method
GMh General-Mixed prediction method

64

Table 7

Potential advantages for each of four calibration methods

 

ADVANTAGES GC LP GP GM

 

Aa Criterion accuracy . . . . .
Ab Generalizability . . . . .
Ac Increase sample by pooling .
Ad Simplicity of analysis . . .
Ae Needs only one year's admission data

Af Interval scale despite pass/fail GGPA
Ag Admits new predictors . . . .
Ah Reduced year, site, and individual error
Ai Avoids range restriction . . . . . . .
Aj Uses non-admitted cases . . . . . . .

ERR

§§§§

XX
XX

§§§§§§§
§§§ R ﬁﬁi

 

cc- General-Criterion prediction method
LP- Local-Proxy prediction method

GP- General-Proxy prediction method
GMa General-Mixed prediction method

Table 8

Disadvantages of each of four calibration methods

 

DISADVANTAGES GC LP GP GM

 

XX XX XX
XX
XX XX XX
XX
XX XX XX
XX XX XX

De Need for previous cohort data . .
Db Complicated analysis . . . . . . . .
Dc Selection formula is fixed . . . .
Dd Selection on the criterion . . . .
De Error due to years, sites, individuals
Df Restrictive tendency in predictor pool
Dg Need to recalibrate each year . . . .
Dh Risks aptitude or motivation change .
Di Potential proxy criterion inadequacies

XX XX XX
XX XX XX

 

cc- General-Criterion prediction method
LP- Local-Proxy prediction method

GP- General-Proxy prediction method
GM- General-Mixed prediction method

65

Table 9

Optimal conditions for use of each calibration method

 

OPTIMAL CONDITIONS FOR METHOD GC LP GP GM

 

XX XX XX XX
XX XX XX XX
XX XX XX XX
XX XX XX XX

Oa Large calibration sample . . . .
. XX XX XX XX

Ob Large application sample . . . .
Oc Stability of cross-site scales .
Od Multiple, sound predictors . . .

Oe Cross-site predictors . . . .
XX XX XX

XX XX
XX XX
XX XX

XX XX XX

Of Temporal stability of model . .

Og Model validity across samples . .

Oh Stable demographic characteristics

Oi Stable applicant pool . .

Oj Stability of aptitudes and motivatin

OD

 

GC= General-Criterion prediction method
LP- Local-Proxy prediction method

GP- General-Proxy prediction method
GM: General-Mixed prediction method

Summary

The conventional regression approach (GC) to
calibration of selection formulas may incorporate several
usages of a proxy criterion in the estimation process,
resulting in alternative regression procedures which are
labeled LP, GP, and GM (LP: formula calibration from a
single site and time, GP= calibration across sites or times
using a proxy criterion, and GM= calibration across sites
or times mixing both authentic and proxy criterion use).
Tables 4 through 9 attempt to compare and contrast the
methods with respect to their corruptibility, strengths,

and conditions for optimal performance.

CHAPTER IV

DESIGN

Population Sample

The non-random sample consisted of five cohorts: four
graduation cohorts and an additional three-year cohort from
the College of Veterinary Medicine at Michigan State
University. The cohorts included both accepted and
rejected applicants. Applicants who obtained a vet school
GGPA were considered to be GRADS (this includes the three-
year cohort): all others were considered to be non-
graduates, or NGRADS. Student attrition was estimated to
be at less than two students per cohort. Some overlap
existed among the cohorts, as some cases were repeat-
applicants. While the annual number of applicants changed
notably over the years, the number of candidates in-program
was relatively constant (see Table 10). All applicants in
this sample received scores on the New Medical College
Admissions Test (MCAT) and were ranked for admissibility by
a selection formula decided upon by the Veterinary
College's admission committee. The high admissibility of
an applicant could often mean that such an applicant might
accept another candidacy elsewhere, therefore, some NGRADS

were of this caliber.

66

67
Table 10

The study sample: Five cohorts of veterinary applicants

 

 

APPLICATION GRADUATION NUMBER OF NUMBER OF
YEAR YEAR APPLICANTS GRADUATES
1981 1985 327 90
1982 1986 327 89
1983 1987 310 103
1984 1988 273 88
1985 1989 248 101

 

Three graduation cohorts (1985, 1986, 1987) were used
in the calibration sample for the GC, GP, and GM conditions
and the other two cohorts (1988, 1989) were used for
validity testing of GC, GP, GM, and LP conditions. For the
LP condition, the same cohorts (1988, 1989) were used for
both calibration and validity testing. The validity test
weighted higher a portion of the graduating subset of the
two validity-test cohorts: specifically, the validity-test
used cases found in the lower third of the GRAD program GPA
range. These cases were selected for their likely
proximity to the admissions cut-score region. The higher
performing students would be less affected by a different

admissions cutting criterion.

Predictors
Predictor variables were of two types, (1) within-unit

predictors (which were the following ordinary predictor

68

variables: CUMGPA (cumulative 'UGPA), PVUGPA. (veterinary
prerequisites UGPA), honor points, prerequisite course
honor points, credits without pass/fail credits, average
credits per term, summer credits, total credits, number of
terms, pass/fail credits, the Medical College Admissions
test subtest scores, age, sex, interview scores I and II,
work experience, veterinary experience, narrative work
sample, and source of UGPA), and (2) between-unit variation
(year). Although most of the variables were taken directly
from admissions documents, the source-of-UGPA variable is
defined especially for this research. This variable is
defined from a retrospective analysis of student numbers
together with consultation with an admissions staff-member
to determine which applicants clearly were not MSU
undergraduates. Subsequently, a small proportion of the
applicants actually fell into the non-MSU category: this is
particularly true for applicants who graduated from the
program. Where applicant cases had missing values,
default scores were assigned either by entering an average
score or by entering a minimum score. This was similar to
the practice of admission departments in determining an
applicant's merit (see default score list in Appendix I).

Predictors were available from admissions data
variables. Because many of these variables were not of
interval scale, and because some of them were composites of
several variables, such variables were deleted. Composites

were sacrificed for single variables. The twenty-two

69

Table 11

Predictors used in levels of the intercorrelation factor

 

PREDICTORS

INTERCORRELATION LEVELS
>.45

(.45

 

cumulative UGPA . .
veterinary prerequisites UGPA.

veterinary prerequisites points

honor points . . . . . . . .

credits . . . . . . . . . .

average term credits . . . .

sum of pre-veterinary credits
total credits . . .
number of terms .
pass/fail credits
MCAT Biology . . .
MCAT Chemistry . .
MCAT Physics . . .
MCAT Quantitative.
MCAT Reading . . .
MCAT Science . . .
age . . . . . . .
‘ex O O O O O O O O

veterinary experience
work experience . . . .

activities and achievements rating

narrative writing sample rating

ﬁﬁﬁﬁiﬁﬁﬁﬁﬁﬁﬁﬁﬁﬁﬁiiﬁﬁﬁﬁ

15¢ $819935

ES

1885 it

 

variables included in the predictor

levels are listed in Table 11.

Criteria

intercorrelation

Program GGPA served as the authentic criterion. The

proxy criterion (used with the LP, GP, and GM approaches)

was the PVUGPA (veterinary prerequisite course UGPA).

70
Analyses
All research hypotheses were addressed by analyses of
either of the following two analyses: one primarily used to
test calibration methods and another primarily used to test

for calibration factor and year effects:

Test of Methods (MANOVATOM)

Sample. Three graduation cohorts were used for the
calibration of GC, GP, and GM formulas with two additional
cohorts reserved for the purpose of validity testing. The
calibration of the LP formula used one of the reserved
cohorts in each of its two calibrations. Validity testing
for all calibrated formulas was on the two reserved
cohorts. Two years' data (1981 and 1983) were drawn for
the small sample GC, GP, and GM calibrations while three
years' data were used for their large sample calibrations:
the same two years were used as the source for all two-year
condition samples. For the LP method (which used a single
year's data) the small sample condition used a random
2/3rds of the year's data. Subsamples differed relative to
(1) the inclusion of non-MSU applicants and (2) the size of
the calibration pool.

Conditions. A repeated-measures MANOVA design was used
(MANOVATOM), using C (methods), A (sample size), N (source
of UGPA), and P (predictor intercorrelation) as within-
subjects factors. Y (year) was the sole between groups

factor. Because all but the four-level N factor' was

71
dichotomous, there were 64 conditions overall. The
dichotomous independent variables were, furthermore:
A (sample size), where A! a two-year sample, and B- a
three-year sample.
N (source), where M2 MSU, and NI: all academic origins
P (predictor intercorrelation), where P= uncorrelated

predictors, and 0- correlated predictors

Because, in actual practice, sample size variation
would be differentially affected by the recruitment
approach used, sample size control was imposed by reducing
a proportion of usable cases (as opposed to reduction to an
absolute numerical limit). Thus from the subsample of
cases qualifying for a particular combination of
attributes, one condition used the full set of cases, while
for a second condition the sample was reduced to (roughly)
2/3 of the full subsample size.

For the conditions using the GC method, the calibrated
regression formulas (which. used only’ higher’ performing
applicant cases) were range-restricted relative. to the
populations to which they were to be applied (namely, the
full, annual applicant roster). Because regression
coefficients were biased by restriction of the dependent
variable, correction for range restriction was appropriate
for each standardized partial in the calibrated regression
formulas of the general method conditions. Alexander,

Carson, Allinger, and Carr (1987) provided the following

72
formula which was used for correcting doubly truncated
correlations (range restriction in both the criterion and

in the predictor):

rho' - 1 (1 - rho'2)2 1/2
' O I 2

Where,

 

 

rho'-range restricted correlation

Ux‘ the ratio of restricted to unrestricted
standard deviations for variable x

Uys the ratio of restricted to unrestricted
standard deviations for variable Y

+/- :corresponds to the sign of rho'

To correct the GC regression coefficients, a
standardized correlation equation must be computed and the
partials corrected individually for range restriction.

Dependent_yariable. The difference score/dependent
variable was a transformed, rank-order difference between
rank-of—estimated GGPA and rank-of-actual GGPA (the
transformation is referred to as LEWAR: Log of prediction
Error -- Weighted, Absolute, and in Bank units) . Because
the magnitude of the rank difference was of concern, the
absolute value of the difference was used. Where weighting
was desired to reflect the proximity of the error to the

cut score region, values were weighted to reflect proximity

to the lowest GGPA rank. To normalize the distribution of

73
these absolute values, the modified scores were transformed
into their natural logarithms.

Satisfactign_gf_5§§umptign§. Two critical assumptions
for MANOVA are that (1) observations be independent, and
that (2) variance be constant (Keeves, 1988). The largely
objective nature of admissions and program data allowed
some confidence in the proposition that the admissions and
program data both had high levels of independence.
Differences in condition means were completely determined
by the treatment (application of potentially biased,
calibrated formulas) and, therefore, there was no
treatment-related error to correlate among subgroups.
Constant variance was expected due to the lack of
differential sampling: a solitary two year sample received
all treatments within a repeated-measures format.

Applicatign_tg_ﬂypgthe§e§. Error variables from these
calibration conditions were used to test hypotheses A, B,

C, and E.

Test of Factor and Year Effects (MANOVATOF)

Sample. Five graduation cohorts (including the 1985
cohort with only a three-year UGPA) were used for the
calibration and validity testing of LP models. Random
samples were drawn from subsamples of each calibration
cohort to obtain calibration subsamples of n= 80 and n=

135. Subsamples differed relative to (1) the inclusion of

74

non-MSU undergraduates, (2) the size of the calibration
pool, and (3) ability level as measured by PVUGPA.

gonoigiono. A repeated-measures MANOVA was used
(MANOVATOF), including A (sample-size), N (source of UGPA),
0 (selection on PVUGPA), and P (intercorrelation of
predictors) as within-subjects factors. Y (year) was again
the sole between-groups factor. Altogether, there were 80
conditions. The dichotomous independent variables were as
follows:

A (sample size), where A! 80 random cases of the selected
sample, and B: 135 random cases of the selected
sample. (For the high-ability condition for 1984,

B- 117: and for 1985, Ba 111 due to the smaller
pools of cases)

N (source), where M- MSU undergraduate origin, and
N- otherwise

P (predictor intercorrelation), where P- uncorrelated
predictors, and Q8 correlated predictors

O (achievement level), where Ga PVUGPA > 3.0, and

0- otherwise.

There were five years for the between-subjects
variable, year.

Because methods were not being compared in MANOVATOF,
sample-size variation was introduced in absolute numbers of

cases. Non-MSU cases were qualified as in MANOVATOM (the

75

methods MANOVA) and predictors were selected as in the
methods MANOVA.

ngngnggn3_!§ziahlg. LEWAPtranstormed prediction error
served as the dependent variable (LEWA- Log of prediction
(Error: ﬂeighted and in Absolute values). The
transformation into logarithms normalized the distribution
at the deviations. Prediction-error residuals were
weighted according to their proximity to the GGPA rank of
one (1).

Apnligatign_tg_ﬁypgtn§§g§. Error variables from these

calibration conditions were used to test Hypotheses D.

CHAPTER V

RESULES

Table 12 summarizes the research findings relative to
the outcomes expected. In a local-proxy (LP) analysis,
prediction varied significantly by year, confirming
Hypothesis D:

ﬁypgtng§1§_n. With prediction error as the dependent
variable, with methods limited to the local-proxy (LP)
approach, and with variation controlled with respect to
years, sample-size, academic origin, intercorrelation

of predictors, and level of ability, prediction will
vacillate across years.

Unconfirmed were hypotheses A, B, C, and E. Hypotheses
A and 8 follow:

nyngtng§1§_5. For students falling within a cut score
zone, the local-proxy (LP) formula will be more
predictive of graduate GPA (GGPA) than will
undergraduate GPA (UGPA).

. For students falling within a cut score
zone, the general-mixed (GM) formula will be more
predictive of graduate GPA (GGPA) than will the
conventional prediction model (the general-criterion
formula, GC) as corrected for range restriction.

The failure to find method and year differences
contradicted Hypothesis C. Year effect represents
differing annual error means in logs of weighted, absolute,
prediction-error residuals (a= .05). For the following
hypothesis, C, prediction (measured by prediction-error
means) did not vary across years, although the probability

of the observed outcomes was only at .069:

76

77
Table 12

Confirmation of expected outcomes

 

HYPOTHESIS CONFIRMATION
YES NO

 

A. LP more valid than UGPA ...............
8. GM more valid than GC .................
C. Method and Year effects (TOM)
Method (TOM) .....................
Years (TOM) .....................
D. Year effect (TOF) ..................... Y
E. Non-MSU GGPA less validly predicted

2 252 232

 

TOM= Test of Methods

TOF= Test of Factors and Years

LP- Local-proxy estimation formula
UGPA:Undergraduate grade point average

GM: General-mixed estimation formula

Gc- General-criterion estimation formula

MSU= Source of UGPA is Michigan State University
GGPAaGraduate grade point average

. With prediction error as the dependent
variable, and with variation controlled with respect to
years, methods, sample-size, academic origin, and
intercorrelation of predictors, prediction differences
among years and methods will be obtained.

For the following hypothesis, E, the numerical outcome
was in the right direction, nevertheless, the difference

was not significant:

. Non-MSU undergraduates will be associated
with greater prediction error.

Tests of Methods (TOM) across Two Years
'I t: ‘ 3...".1 .‘i 1". l. 1-...$_S ll: ’!
This analysis addressed Hypothesis A: that the LP (local-

proxy) approach would out-predict UGPA as an estimator of

78
GGPA. For the dependent ‘variable, a prediction-error
residual transformation was used. The transformation was
given the acronym LEWAR, representing Log of prediction
Error: Weighted, Absolute, and in.3ank units.

The original prediction error residual was the
difference in rank between the estimated GGPA and the
actual GGPA. It was transformed by the following steps:

(1) take the absolute value of the difference in ranks
between the rank of the estimated GGPA and the rank
of the actual GGPA (magnitude and not direction of
the error was important),

(2) weight the absolute value by a non-linear function
which is biased towards low GGPA rank (estimation
precision is most critical for cases in the cut-
score region: for cases which are marginally
acceptable),

(3) obtain the natural log of this weighted error (this

serves to normalize the distribution of the
variable).

The LEWAR weighting function is the following:
[RKGPA+l]3

LEWAR- Ln ((l+ABS[RKGPA-RKEST]) * ( +2)}
RKGPA3

Where LN- a function providing the natural log of a
term,
A88- a function providing the absolute value of a
term,
RKGPA= the rank of the GGPA score,
RKEST- the rank of the GGPA estimate

(see Appendix II, Figure IIA for a graph of the weighting
used for the LEWAR error transformation).
The LEWAR transformation eliminated the negativity of

error residuals, inasmuch. as error' magnitude (and not

79

direction of error) was the vital concern. The logarithm
transformation was then needed, however, to restore the
normal-shaped distribution lost with the transformation of
the negative side of the scale. Also included in the
transformation was a non-linear weighting (graphed in
Figure IIA) which served to increase the importance of
prediction-error for cases in the lower range of the GGPA
ranking. It was assumed that these cases were most likely
to correspond to applicants considered marginal by
admission committees. This weighting, which reduced the
importance of cases having higher merit rank, served to
reduce the actual degrees of freedom to an unknown extent;
the systematic deletion of cases is a special case of
variation in case weighting. Subsequent significance
tests, therefore, must be somewhat liberal.

With respect to assumptions of normality and
independence of variables for these conditions, dependent
variable frequencies approximated normal distributions and
the LEWAR-transformed prediction-error by predictor
correlations were mostly non-significant. LEWAR-
transformed prediction-error was explicitly and negatively
biased according to GGPA rank due to the weighting:
however, other significant correlations appeared for
variables representing activities and achievements, work
experience, narrative writing, MCAT-Chemistry, and average
credits carried per term. Only activities and achievements

and average credits carried per term ever obtained positive

80

correlations, and only activities and achievements was
consistently positive. GGPA obtained the highest
correlation with a LEWAR-transformed variable at -.32 for
1984 data. For 1985, the highest correlations were about
ten points less, see the correlation matrix in Appendix
III). Except for the intentional bias due to GGPA
weighting, there was no reason, to expect differential
subgroup prediction.

Contrasts using LEWAR-transformed prediction-error as
the dependent variable were conducted between UGPA and LP
estimation by subtracting the LP discrepancy score (GGPA
minus estimated GGPA) from the UGPA discrepancy score.

Table 13 displays four contrasts, not one of which
attains statistical significance. Contrasts were conducted
separately between 1984 and 1985 data for two (optimum) LP
conditions differing only on source of UGPA allowed into
the calibration of the formula. The variables contrasted
with UGPA were (1) “MSU:
where,

M. only MSU UGPAs were allowed into the calibration,
B. the calibration used the total year's cases,

P. intercorrelated predictors ( >.45 ) were not
permitted into the calibration:

and (2) mm. where,

N. applicants were not selected on the source of UGPA

B. the calibration used the total year's cases,

P. intercorrelated predictors ( >.45 ) were not
permitted into the calibration

81
Table 13

Contrasts of UGPA prediction against LP prediction*

 

 

YEAR CONTRAST DIFFERENCE tVALUE DF PROB
UGPA- LP

1984 eu-ennsu- 2.7740-2.7831- -.0091 -.15 87 .881

1985 eu'enMSU- 2.9655-2.9758- -.0103 -.18 100 .855

1984 eu-enALL- 2.7740-2.7345- .0395 .59 87 .558

1985 eu’enAII' 2.9655-2.9503- .0152 .25 100 .805

 

' Dependent variable expressed as LEWAR transformed error=
Log of Error: being Weighted, Absolute, and in
units of Bank

UGPA- Undergraduate grade point average

GGPA- Graduate grade point average

LP- Mean log of weighted absolute prediction-error in
ranks, using local-proxy estimation
eu- Mean log of weighted absolute prediction-error in

ranks, using UGPA estimation

enMSU’ Mean log of weighted absolute prediction-error in
ranks using local-proxy estimation with an all MSU
calibration sample

eﬁALLF Mean log of weighted absolute prediction-error in
ranks using local-proxy estimation with a
calibration sample unselected on UGPA

Not one of the contrasts attained significance, thus

ruling out the possibility of significance familywise.

The test of Hypothesis A, therefore, did not confirm

greater validity for the LP estimate over UGPA for cases in

the cut-score region.

egaei :e': ,0: 1‘ 1‘ ’1‘ °- ‘ _°! 1' 9....
These analyses addressed (1) Hypothesis B: that the GM
(general-mixed) approach would out-predict the GC (general-
criterion) , and (2) Hypothesis C: that prediction would

differ by method used and by year's data estimated.

82

Methods of selection formula calibration were evaluated
with respect to mean level of LEWAR (the dependent
variable). Thus significant effects (MANOVA/ ANOVA) would
indicate less valid estimation. GC equations were
calibrated using standardized regression in order to allow
corrections of partials (standardized regression
coefficients) for restrictions-of-range. The proxy method
equations used raw-score regression. Precision between the
two types of method equations was compared in terms of
LEWAR-transformed residuals of rank (between estimated
GGPAs and actual veterinary school GGPAs). Method
difference in estimation across the whole range of GGPAs
is of some interest, although differences in estimation
would be most relevant to the cases close to the cut-score.
For instance, the college's selection ratio may expand,
invalidating the previous selection procedures for a
particular cut-score. As a general comparison of methods,
Table 14 presents average predictive rank among the four
calibration methods where error is reported unweighted in
its natural log (log of absolute rank error). No
differences were significant.

Table 15 presents between-subject. by' within-subject
effects for the test-of-methods using a repeated-measures
MANOVA (MANOVATOM) with the following within subject

factors:

83
Table 14

Method efficiency in logs of absolute rank-error*

 

METHOD LOG OF ABSOLUTE RANK ERROR
GC LP GP GM

 

MEAN 2.50 2.43 2.71 2.09
8.6. 1.31 .71 1.10 1.08
8.8. .34 .18 .28 .28

 

I Rank-error is absolute value of a discrepancy: estimate

rank minus rank of GGPA
GC= General-criterion,
LP- Local-proxy
GP= General-proxy,
GM= General-mixed (authentic and proxy criterion)

C= calibration methods,
0- intercorrelation of calibration predictors,
Na source of calibration applicants, and
A! size of the calibration sample.
Years (Y) were the only between subject factors.

The dependent variables were prediction rank-error
residuals for a factorial matrix of 64 calibration
conditions. The average absolute prediction-error in ranks
across conditions ranged from 17 to 34 positions. These
residuals received the LEWAR transformation (described on
page 78, 79). The plots of the dependent variable
frequencies approximated normal distributions and the
predictor by dependent variable correlations were often
marginally significant for predictors similar to GGPA (due
to explicit weighting). As presented in Table 15, the a
priori contrast between the GM (general-mixed) and GC

(general-criterion) conditions was not significant,

indicating comparable prediction. Table 16 reports no

84
Table 15

Estimate of average contrast between general-criterion and
general-mixed transformed prediction-error means

 

SOURCE OF VARIATION COEFF ST.ER T-VALUE SIG.T L-BND H-BND

 

GM minus GO -.092 .107 -.863 .389 -.304 .119

 

* Measured in transformed prediction-error residuals
(IiIWAR-I Log Error: Weighted, Absolute, and in Bank
un ts)

GM- general mixed: multi-year calibration sample with
authentic and proxy criterion

GC= general criterion: multi-year calibration sample
with authentic criterion

differences among methods: Table 17 likewise reports no
differences among years (although for years, the
probability of the results under the null hypothesis is

only .069).

Table 16
Test of main effect for methods* (MANOVATOM)

 

M U L T I V A R I A T E
SOURCE-OF-VARIATION WILKS-LAMBDA MUDT-F HYP-DF ERR-DF SIG

 

 

Methods .99160 .52258 3 185 .667
'Measured in transformed prediction-error residuals
(LEWAR- (Log Error: Weighted, Absolute, and in

Bank units)
Methods- local-proxy, general-criterion, general-proxy,

& general-mixed
MANOVATOH- Test-of—methods repeated-measures MANOVA

85
Table 17

Test of main effect for years' (MANOVATOM)

 

SOURCE OF VARIATION SS DF MS F PROB

 

Year 66.09 1 66.09 3.34 .069

 

' Measured in transformed prediction-error residuals

(LEWAR=Log Error: Weighted, Absolute, and in Bank units)
MANOVATOM-Test-of-methods repeated-measures MANOVA
Years-1984, 1985

Plots of the means in Figures 1 and 2 (for 1984 and
1985 cohorts respectively) illustrate the contrasting
interactiveness between the GC (general-criterion) and the
GM (general-mixed) methods. Under the following conditions
the GC method is notably less predictive: (l) the sample is
small, (2) the calibration sample is highly selected on a
single source of applicants (relative to the graduating
cohort), and (3) the calibration does not screen out
correlated predictors (note condition AMQ for 1984 and
1985: where,

A: sample size= small
M- source of UGPA- MSU
QB intercorrelation of predictors > .45).

It should be noted that the GC method uses a much
smaller calibration sample than do the competing proxy
methods, although all small-sample conditions use two-
cohort calibration samples. For the GC method, this
amounted to about 200 applicants, for the proxy methods,

over 400 applicants were available per the two cohorts.

86
Table 18
Cross-validities for prediction methods

 

 

METHOD CONDITION COHORT 1984 COHORT 1985
n- 87 n- 101
General Criterion . NO . . . .60 . . . . . .49
General Mixed . . . HQ . . . .58 . . . . . .34
General Proxy . . . NP . . . .54 . . . . . .38
Local Proxy . . . . MP . . . .53 . . . . . .40
PVUGPA . . . . . . . . . . . .49 . . . . . .37
UGPA O O O O O O O O O O 0 O O 55 O O O O O O 41

 

Method condition acronyms represent the following:
N- sources . . . . . . . . . all UGPAs

Mb source: . . . . . . . . . MSU UGPA

0- intercorrelation> . . . . .45
P- intercorrelation< . . . . .45

Inclusion of (l) UGPA error in the calibration sample
(a consequence of multiple UGPA sources in the calibration
sample) and ( 2) the reduction of correlated predictors,
appears to reduce the adverse impact of small calibration
sample size (note condition ANP for 1984 and 1985; where,

A- sample size= small
N= source of UGPA- all
P- intercorrelation of predictors < .45).

Table 18 presents cross-validity coefficients for the
presumed optimal condition for each of the four methods
when applied to predict 1984 and 1985 GGPA from the
appropriate admissions data. It is apparent from the table
that, numerically, proxy criterion GGPA estimates (from all
but the general criterion method) correlate somewhat below

the correlation level of conventional (GC) multiple

predictor GGPA estimates (from the general criterion

87

method). Because cross-validation ‘was performed on a
restricted sample (applicants with a subsequent GGPA),
however, it cannot precisely index method validity for
application to ordinary admissions selection (where the
range of the criterion is unrestricted). To compensate for
this problem, the regression weights for the GC (general-
criterion) were corrected for restriction prior to the
computation of the cross validity coefficients. This
adjusted the GC range-restricted weights towards weight
levels appropriate for a wide-range criterion. Problems
with using correlation coefficients to evaluate selection
validity are discussed later.

The test-of—methods addressed Hypotheses B and C as
follows:

B. Hypothesis B, that GM (general-mixed) prediction
would exceed that of the GC (general-criterion)
approach, was not confirmed. Conversely, under the
most normal conditions (e.g. larger calibration
sample-size, and multiple source UGPAs, the GC
method generally appeared more predictive than
other methods for cases in the cut-score region,

C. Hypothesis C, that main prediction differences among
methods and years would emerge was not confirmed,
suggesting that these factors were unimportant to
cases in the cut-score region. Nevertheless, the
probability of the outcome for years (under the null

hypothesis) was low (p= .07).

138

Aemmav moonuma cowumﬁwumm
«mum macaw uouum cowuowcmum omauommcmuu mo ma0>0a 0>ﬁu0Hmm .H mmDOHm

(hm-Ugh 4&6: U”X~2l~ﬂh0ﬂ~00 ~AKOLQI~GH0COU ~ANOanl—ﬂ004— GoqhvudhnUIuahvcﬂo
D X 4 O + D
mzm mz< m2m m2< sz 324 02m 02<
h p _ P _ _ O©.N

 

1 95m

X
(

> OQN

 

I OQN

l Goa”

mv. A sawumamuuouumucﬁ woquooun no
me. v newusaouuoouuucw ROHOwuoum um
nouuwuumoucs «mun uu>ueum no

neonaom «moo Has a: I oNd

uousom «mus pm: haze a!

o.n A «no: uu>uoua no

swam oaaﬁnm Hazy nu

«awn canﬁsm moun\n It I and

JOJJS UOIQOIpeJd pamJOJSUBJL HVMS'I

unknowns soduducou uou aux u

 

 

QTm

 

$9 Acacia/02.8.3 mzﬁz 202.528

89

Amwmwv moosuma coﬁusﬁwumm
«moo mcoao Houum coﬂvuwooum omﬁhommcmuu wo mH0>0H 0>wumamm .m mmDUHm

 

 

 

 

 

 

$53?“ «£0: coxEIELoCeU mxoLmIELoCmo axoamLaooA CoEBCULaLoCoo
D x d o + 0
mzm mz< min 53. 02m az... 02m 02¢.
_ _ a _ _ r O$.N

1
Iohm m
v
H
I om.m L
J
B
u
S
I cod J
n o
. .._
v m c RT M JI/WL x . m
< I co m D.

v
I d
. a
[com m
me. A sawusaeuuoououcﬁ uouuﬁooun no . I
me. v coﬂunamuuoououcw uouuwvoum um m.
omuowuummucs «mun 9970.3 50 m.
moousom «mu: :0 I2 I omd u

uuuoom duos om: haco ll
o.n A «no: u0>luum nu 3
Guam «HE—Bu .35“ In and u
ouwm magnum moun\~ ad ”I m
abound cognac—~00 How 29—
ova

 

mm? 38362525 2.522 202.528

90
Test of Years and Other Factors under LP (local-proxy)

Mans—YW- This analysis
addresses Hypothesis D which predicts an LP method
prediction difference among years. The dependent variable
was a transformed prediction-error residual differing from
LEWAR variables by its use of actual rather than rank error
(see pages 78 and 79). Its transformation is referred to
as LEWA, representing Log Error: Weighted and Absolute.
The use of actual prediction error (rather than the
predictive ranking error ‘used in the LEWAR-transformed
dependent variable) allowed more powerful significance
testing than in the test of methods analysis.

The original prediction-error residual was the
difference in the estimated GGPA and the actual GGPA.
Across conditions, the average absolute prediction-error
ranged from .29 to 3.87. The LEWA transformation is as
follows:

(1) take the absolute value of the difference between
the estimated GGPA and the actual GGPA (magnitude
but not direction of the difference is of interest)

(2) weight the absolute value by a non-linear function
which is biased towards low GGPA rank (see Appendix
II for a graph of the weighting function used to
give greater weight to the cut-score region),

(3) obtain the natural log of this weighted error (to

normalize the distribution of the error variable).

91

The LEWA weighting function is the following:
[RKGPA+1]3
LEWA= Ln {(1+ABS [GGPA-ESTGGPAJ) * ( +2))
RKGPAJ

Where,

LN= a function providing the natural log of a term,

ABS= a function providing the absolute value of a term,

RKGPA- the rank of the GGPA score,
ESTGGPA= the GGPA estimate.
LEWA-transformed variable distributions were approximately
normal. Due to deliberate weighting by GGPA rank, LEWA
variables correlated negatively with predictors which were
similar to GGPA, and positively with predictors which were
inversely related to GGPA. Negative correlations with GGPA
were as large as -.75. (Correlations were as great as .66
for age, .70 for possessing a college degree, and -.45 for
average credits taken per term. Generally, however,
correlations were not significant).

.A testrof-factors-and-years MANOVA (MANOVATOF) was
performed with five levels for the between variable, year.
For MANOVATOF, it was possible to include an additional
calibration factor, range restriction, defined as
restricted where the PVUGPAs (undergraduate prerequisite
veterinary course UGPAs) were above 3.0 and not restricted
otherwise. Table 19 displays the ANOVA table confirming

prediction differences among years among these conditions

(and thereby confirming Hypothesis D). For cases in the

92
Table 19
Test of main effects for years* (MANOVATOF)

 

 

SOURCE OF VARIATION SS DP MS F PROB
Within Cells 643.78 462 1.39

Constant 1619.13 1 1619.13 1161.95 <.001
Years 33.60 4 8.40 6.03 <.001

 

'Measured in LEWAR-transformed prediction-error residuals
LEWARsLog Error: Weighted, Absolute, and in Bank units)
MANOVATopsTest-of-factors 8 years a repeated-measures

MANOVA
Years= 1981, 1982, 1983, 1984, 1985

cut-score region, LP (local-proxy) estimation differed

across years.

To further amplify the interaction among calibration
factors, Figures 3 through 8 graph the dependent variable
means (prediction error) for selected contrasting
conditions. In Figure 3, the better predictiveness of the
non-restricted condition is weakly suggested by the
relative levels of error. (compare unrestricted ANOP,

[where A: sample size= small

N8 source of UGPA: all

0- range restriction- unrestricted
P- intercorrelation < .45]

against restricted ANGP,

[where As sample size= small
N- source of UGPAa all
G- range restrictions on
P- intercorrelation < .451)

£93

omuoﬁuummu coon men coaumuwuonmxoum
on» muons coﬁuoﬁomhm mxoumnamooa nouns uouum cowuuwomum .n mmDOHm

 

 

 

 

 

 

doz< + moz< D
mums»
0mm“ vmoﬁ mama mmmu «mow

_ 1 _ Nd
w.
3
M
V
L
J
n
I 9d w
J
o
J
I 65 m
a
n.
I ago a
a
m
me. A coﬁuoaouuooumucw Roundumum no I.oAU m
me. v sawunamuuoououca uouowomua um n
omuuﬂuummucs (moo u0>noum no 0
mouusom dawn Has n2 u
ouuoom anon Om: waco n» I a a
0." A ammo uw>umud no 4
exam madame aasu nn 4
swam maaamm noun\n Id I «A m

mlhnouud nowadunou How aux
NA

moz< m> moz<

mhmomou maﬁh mmE/O Homrmrmm mwz<m

94

:oﬁumamuuooumpcﬁ no H0>0H ha omuomuum oauuwa museums cowuowooum
axoualamooa .uouum coauOHumun coauoumsmuu an consumma G053

mozm +

mmmu omma

muse»

mmmu

002m

B

NmoH

«mad

 

me. A scausaouuoououcw soundness
me. v coduaaeuuoououca uouuwoeud
ouuoﬂuumouco snob u0>nmum
«mousom mac: Has

mouse» «so: am: wane

o.n A «moo uo>uoud

onwm unease Hana

swan masses ecun\~

unasouud soﬁuwocoo

 

b

4044834:

1‘

now won

 

CI

 

 

moZm m> 002m

mHmOEOO mm>o ”Human?“ ZOHH<AmmmOUmmHZH

m6

v6

nd

md

50

Q6

md

«4

NA

.c.HHMRﬂuHmH

JOJJQ UOIQOIDOJd DOWJOJSUBJJ‘ VME'I

95
In Figure 4, the difference in efficiency between
calibration conditions BMOQ
(where B- sample size- large
Mb source of UGPA- MSU
0- range restriction- off
0- intercorrelation > .45)
and BMOP,
(where B: sample size- large
M- source of UGPA- MSU
0- range restriction- off
P- intercorrelation < .45:
differing on intercorrelation of predictors: BMOQ having
greater intercorrelation) fails to indicate a consistent
pattern of difference. In Figure 5, BMOQ and BMGQ
(where B- sample size- large
Ms source of UGPA- MSU
G- range restriction- on
0- intercorrelation > .45:
differing by restriction of range: BMGQ being restricted),
differ dramatically in 1984-- a dramatic interaction for
years by range restriction. In Figure 6, BMOQ and BNOQ
(where B- sample size- large
N- source of UGPA- all
0- range restriction- off
Q= intercorrelation > .45:
differing by the inclusion of non-MSU undergraduates: BNOQ
being the all-sources condition) differ substantially in
both 1984 and in 1985, an interaction this time between

years and non-selected UGPA source (in computing the

formula calibration).

unocoo «mad 0:» you cowuuﬁomum Hummus ou mummmmm waco mossy coﬁuouHHOImxoum
may no coﬁuoﬁuummu .uouum ceauowomua omauoumcmuu an monommma c053 .m Hmoon

 

 

 

 

96

 

 

 

coin + 002m B
muse»
mom” vmma mama mama 2mm“
_ _ _ «WC

3 1
\ a! 1.14 . o M
n v
1 II 0.0 L
J
m
I md c
J
o
J
Ind m
9
n.
I md d
a
m
me. A ceausauuuoouuucw Mongoose no I ad m
me. v casuaauuuoououcw mongoose um n
vmuuwuumeucs «moo 0970.3 no 0
moouoou (no: Has 82 u
cannon (so: am: maco a: I s a
o.n A «non uo>I0um no a
0:0 mama—am :3 am 4
on: 0.33am noun} as I «A m

unknowns Gonzo—.00 Mow >02

NA

 

002m m> 82m

mHmOmOU Ezra mm>o Homrmhm HUZ<K

97'

mononoo mmma one emma on» How coﬁuowooum uoouum O» mummoom mama «moo mo
muusom .uouum coauoﬂomum omahoumcmuu an omuomomﬁ :053 .m mmDon

 

 

 

 

 

002m + 002m 0
muse»
mam“ smog mama mmm~ Hams

_ _ _ ms
P a; . 1
\ Ir :1 L .V O M
n v
I ago I
J
m

md
a n
o
1 J
[as m
a
p
I md d
J
m
me. A coﬂumHouuooumucw Houoﬁomum no I ad m
mv. v coﬁumamuuooumucﬂ yoquomuo no u
couoﬁuumoucs «do: um>u0um no 0
moouoom <moD Ham nz u
mouoom «so: am: xaco n: I.“ a
o.m A «moo um>n0uo no J
anew mamsmm Hana nm . m
swam mamamm moun\m «4 I «a J

mahEOH0¢ newuwocoo you 202
m4

 

Gozm m> 002m

mhmomoo mm>o HUddv/S ”momDOml<&UD

phonon vmma may you cowumHouuooumucw m0 H0>0H an omuommmo museums coauoﬂomum
wxoumIHmOOH .uouum cowUOwomuo omauoumcmuu ha omusmmmﬁ ch3 .b mmDon

 

 

98

0028. + .524 a
mumo>
mmmu ¢mmq mama mama «mma
_ _ _ 0.0

. 1
I IIIIIIIH so a
4 IIIIIIIImIIIIIII M
n v
1 I “.0 tum
a
B
I 2 w
J
o
J
T. 56 w
a
p
I on d
a
m
me. A c0wumaeuuoouuucq uouuwuoum no I m.o m.
me. v sawumHmuuoououcﬁ uouowooum mm m
omuoﬂuummuc: (do: um>u0um no 0
mouusom «do: Hae.nz I ~ u
sousou «no: am: xaco I: a
o.n A «moo um>uoum no a
swan camasu Hazy um . m
ouwm masses uuun\~ ad I - a

unknowns s0auqucoo you aux

NA

 

 

 

Guz< m> moz<

mHmOEOU max/O ”Powhhm ZOHH<AmmmOOmMHZH

sawuuﬁomum so nomuum mauuwa 0>mn ou
mummmmm muwm mamﬁmm .uouum cowuoﬁomum UmahoumCMHu an venommmﬁ con? .m mmaon

 

 

 

 

 

 

aoz< + 002m B
manor
moms $3” mama mmma Emu
_ _ _ ms

. 1
m» .vo M
n v
- I ago I
J
9 m
9 I m6 8
I.
o
1
1A8 m
a
p
I was a
a
a
p
me. A cowusaouuoououcw uouoﬁooun no I may m
me. v coﬁumaouuoououcw uouoﬁooum nm n
nevusuumouco (moo uu>I0uo lo 0
moouoom 559 Sun I: I a u
oousom «so: am: xaco n: 3
o.n A anon um>naua no 4
Guam cameos Hana no . u
mean magnum moun\~ [d I «H J

ulhsouuﬂ scaumucou you 20%

NA

GOZ< m> 002m

WNHm wing/Em QOM\N m> AADm

100
In Figure 7, the ANGP-ANGQ contrast
(where A! sample size- small

N- source of UGPA- all

G- range restriction- off

Qa intercorrelation of predictors < .45:
differing by intercorrelation of predictors) also interacts
with year in 1984. In Figure 8, a difference in sample
size between BMOQ and AMOQ

(where A! sample size- small

Mh source of UGPA- MSU

0- restriction of range- unrestricted

0- intercorrelation of predictors < .45)
results in small differences in prediction error until 1985
when year interacts with sample size.

This graphical evidence is most noteworthy for its
illumination of the interactive effect of years on all of
the factors: (N) source, (A) sample size, (G) restriction
of achievement level, and (Q) intercorrelation of
predictors. It may be recalled that in the calibration for
this test of years and factors, the large sample size was
n- 135, while the small size was n- 80). It should be
noted, therefore, that in the case of intercorrelated
predictors ( r > .45), the interactive effect of years only
occurs where the sample is small (compare ANGP:ANGQ in
Figure 7 with BMOP:BMOQ in Figure 4). Surprisingly, an
interactive pattern appears under the large sample,

restriction of PVUGPA, which doesn't occur under the small

sample condition (in Figure 5, contrast BMOQ:BMGQ, then

101
compare with ANGP:ANOP in Figure 3). It happened that for
the ‘years 1984 and 1985, the large sample, restricted
PVUGPA conditions had low case counts (1984: n- 117 ,1985:
n- IJJJ. Due, perhaps, to the greater sampling error for
these two years, the selection for’ these large sample
conditions wasn't representative (and was, perhaps, greatly
restricted for' MSU' applicants). By chance, the small
sample conditions for these two years select a much more

representative sample.

Relative Validity of Non-MSU and MSU UGPAs

Contrary to Hypothesis E, Table 20 shows that non-MSU
applicants were not associated with significantly greater
prediction error (in 1981 and 1982) . For these years,

log-transformed absolute ranking error does appear to be

Table 20

T-test of mean error* between MSU and non-MSU applicants

 

 

ERROR TYPE N MEAN so SE T DF PROB(2-TAIL)
LEAR cccg
N-MSU 26 2.7924 1.045 .205 .75 187 .449

MSU 163 2.6307 1.003 .079

LEAR UGPA
N-MSU 26 2.9755 .866 .170 1.42 187 .189
MSU 163 2.7108 .963 .075

 

' Measured in transformed, unweighted prediction error
residuals (log of absolute rank error)
Gch= General-criterion formula where (1) sample size is
large, (2) intercorrelated predictors are used,
and (3) non-MSU UGPAs are included

102
slightly greater when the estimator of GGPA is UGPA as

opposed to the GC (general-criterion) estimator.

_: E"

i-_ 3“

CHAPTERVI

DISCUSSION

This study enquires whether prediction (or estimation)
using a proxy criterion might exhibit better precision than
conventional methods (use of UGPA, and use of regression
estimates from multiple predictors of previous annual
samples). It is assumed that moderation of prediction
error by years is a data characteristic favoring the
practice of local-proxy estimation. An ancillary concern
is the role of observable prediction factors which might
influence the relative validity between proxy and authentic
criterion prediction methods.

The experimental proxy criterion methods do not
significantly differ from the conventional methods when
evaluated on the basis of logs of weighted absolute values
of prediction error residuals (weighted according to
proximity to the cut-score) . Thus, for these data and
methods, method of estimation seems to have little impact
on cases most vulnerable to rejection. Nevertheless, no
predictive advantage is observed for any of the proxy
criterion methods over that of the simple UGPA (moreover,
by the same standard, Figures 1 and 2 suggest no
significant advantage over UGPA for the conventional
multiple predictor method either).

103

h

104

For the study and control of prediction factors (Q=
level of intercorrelation of predictors, NB source of UGPA,
and A- sample size) in the methods analysis (or MANOVATOM),
two levels for each factor were represented in the
condition matrix. In addition, a between groups factor,
years, also had two categories, 1984 and 1985. For these
years, factors did not significantly differ from each
other in their influence on prediction.

Likewise for the test-of—factors-and-years (or
MANOVATOF), similar prediction factors were in the
condition matrix plus another two-level factor, range
restriction (O) on the veterinary pre-requisites UGPA. The
methods factor was eliminated: all conditions used the LP
(local-proxy) approach. The sole between groups factor,
years, had five categories (five years of data). For these
years, three of four factors differed from each other in
their influence on prediction. Factor scores for factors
predictor’ intercorrelation (Q), and for’ restriction-of-
range on the veterinary prerequisites UGPA (O) , differed
from each other and from the factor scores for sample size
(A) and applicant source (N) factors. The latter factor
scores, however, failed to differ significantly from each
other. While this study does not find a consistent
predictive advantage for the proxy methods over the GC
(general-criterion method) or even over the UGPA, (1) some
potential enhancements of the method remain untried, and

(2) the theoretical viability of the concept is not

105
clearly refuted. Nevertheless, from Figures 1 and 2, it
appears that the UGPA acting alone is virtually as valid a
predictor of GGPA as is multiple variable prediction
(particularly when (1) ranking-error is the error variable,

and where (2) cut-score region cases are emphasized).

High UGPA Validity

Standardized graduate admissions examinations provide
consistent scales of evaluation across schools, and may be
indirectly responsible for the apparent reliability of UGPA
that was present in the data studied: local grading
policies may be shaped by national admissions test
performance. The same may also be true for other
admissions variables: effective variables may provide
little independent prediction where their effects are
mediated through the UGPA. It is probably more likely the
case, however, that the sounder psychometric indicators
(the UGPA and the admissions tests) are more valid as
predictors due to (1) superior reliability and due to (2) a
factor structure similar to the GGPA.

The adequacy of UGPA prediction can be expected to
diminish, however, where a larger proportion of multiple-
source UGPAs have to be handled. This would be the case in
prediction across sites and in non-veterinary graduate
programs which may be less influenced by geography
(veterinary programs serve primary geographic regions and

are, on average, less than one per state). Likewise, GC

. _——_———.WW

106
(general-criterion) prediction would be less precise due to
measurement error in the independent variables during
formula calibration, and due to the subsequent lack of
precision in appropriately’ applying formula weights to
larger numbers of ambiguous UGPA scores. The localizing of
prediction limits the entry of uncontrolled moderators into
the statistical analysis, potentially improving internal
validity, but it nevertheless diminishes external validity.
The present study is local with respect to the dependent
variable (100% MSU GGPAs), and it happens to be mostly
local with respect to the UGPA predictor. The
generalizable prediction methods in this study (GC, GP, and
GM) are largely limited to generalizing across years. For
this data there is only weak evidence that years moderate
prediction (plots of means indicated only one of five years
where prediction error differed notably). Moderation-due-
to-years in other sites, and moderation-by-sites remain to

be investigated.

Potential Usefulness of Proxy Methods
Although it is true that problems due to error-laden
UGPAs also impair prediction (and selection) with proxy-
criterion methods, the potential exists for situations
where proxy methods may be optimal. One such occasion may
exist where changes are being made in the outcome variable.
Use of the GGPA as a prediction criterion is

questionable because it can fail to adequately represent

107
practical competence. Ultimately, however, the problem is
not the limitations of GGPA, per se, but of the measurement
design and procedures which succeed or fail to measure
appropriate performance factors. Typically, this
measurement issue is ignored while attention is fixed on
the issue of selecting and weighting adequate predictors.
Researchers in this field often conclude that the prospects
of predicting beyond a given level of precision may be
futile. Such a conclusion, however, neglects to address
the dependence of prediction on the quality of measurement
which determines the criterion variable. The measurement
of factors which distinguish adequately between levels of
professional competence must be the most important
component to the improvement of academic selection. As
institutions depart, nevertheless, from measurement
conventions (across years or locations) the need for
localized prediction will emerge. It may be in such a
context that proxy-criterion prediction methods find

greater practicality.

Potential Improvements to Estimation with a Proxy Criterion
Figures 1 and 2 indicate that for the two years
represented, the GC method appears to consistently out-
predict UGPA when the calibration condition is BNQ
(where B- sample size= large

N- source of UGPA- all
0- predictor intercorrelation >.45),

' ..-.....m”

108

(where A- sample size- small
Me source of UGPA- MSU
P- predictor intercorrelation <.45),

B- sample size- large
Me source of UGPA- MSU
P- predictor intercorrelation <.45),

(where

(where A- sample size- small
N- source of UGPA- all
Pr predictor intercorrelation <.45),
or BNP
(where B- sample size- large
N= source of UGPAas all
P- predictor intercorrelation <.45):
the LP method predicts similar to the UGPA condition for
the two years under the calibration condition BMP: and the
GM method is consistently as predictive as UGPA under
calibration conditions BNO, ANQ
(where A- sample size- small
N- source of UGPA- all
0- predictor intercorrelation >.45),
and ANP. Hypotheses are hereby provided to account for
these patterns.

In Table 21, factors are specified (IDEAL FACTORS)
under each method-sample size combination. Each ideal
factor is based on a logical or empirical expectation
relating (1) intercorrelation of predictors (Q= r > .45,
P-r < .45) to sample size (large > 270, small= 180),

(2) source of UGPA (N- all sources, M- MSU) to type of
criterion used, and (3) source of UGPA to type of UGPA

predictor used. For method G0 with a small sample size, P

109

Table 21

A chart for identifying ideal method prediction factors

 

 

 

 

 

APPLICATION CALIBRATION
METHOD Variable Predictor N Criterion Predictor
Estimated Used Used Used
GC-small smple GGPA4 UGPA 180 GGPAx UGPAx
IDEAL FACTORS . . . . . . <P> <N>
GC-large smple GGPA4 UGPA4 27o GGPAx UGPAx
IDEAL FACTORS . . . . . . <Q> <N>
LP-small smple GGPA4 UGPA4 200 PVS4 UGPA4
IDEAL FACTORS . . . . . . <P> <M> <N>
LP-large smple GGPA4 UGPA4 300 PVS4 UGPA4
IDEAL FACTORS . . . . . . <M> <N>
GP-small smple GGPA4 UGPA4 600 PVSx UGPAX
IDEAL FACTORS . . . . . . <P> <M> <N>
GP-large smple GGPA4 UGPA4 900 pvsx UGPAx
IDEAL FACTORS . . . . . . <P> <M> <N>
GM-small smple GGPA4 UGPA4 600 GGPAx UGPAx
vax UGPAx
IDEAL FACTORS . . . . . . <P> <N>
GM-large smple GGPA4 UGPA4 900 GGPAx UGPAx
pvsx UGPAx
IDEAL FACTORS . . . . . . <Q> <N>
GGPA4- estimated graduate GPA of applicant
UGPA4- applicant's undergraduate GPA
PVS4 - applicant's veterinary prerequisites UGPA
GGPAX- includes GGPAs of applicants of other years
UGPAxs includes UGPAs of applicants of other years
PVSx = includes PVS4s of applicants of other years
<P>- predictor intercorrelation < .45
<Q>= predictor intercorrelation > .45
<N>= UGPA not selected by source
<M>- only MSU UGPAs in sample

110

(predictor intercorrelation < .45) is recommended due to
the small sample size, while IN (a calibration sample
containing some non-MSU applicants) is recommended due to
the fact that the UGPA4 (UGPA for the fourth year cohort)
used in the application stage will be error-laden relative
to the frequency of non-MSU applicant UGPAs (if factor M
were used, only MSU UGPAs would be in the calibration, thus
potentially inflating the beta weight for the UGPA
predictor). For all conditions, N is generally
appropriate because for all of the applicant data there are
non-MSU UGPAs which impose scaling error into the data.
Nevertheless, for methods LP and GP, :M (specifying a
calibration sample containing only MSU applicants) provides
a calibration criterion having less error than one which
includes non-MSU applicants. Because it is impossible to
use UGPA from all sources as a predictor and to
simultaneously use a PVUGPA (proxy criterion) from only
MSU cases, a choice must be made between two advantages:
(1) a realistic predictor or (2) a less fallible proxy
criterion. The advantage from M (a less fallible proxy
criterion) is arbitrarily granted more importance than that
from N and thus M is recommended for LP and GP.

Where the sample size is sufficiently large and the
calibration criterion adequately precise, Q (predictor
intercorrelation > .45) seems to improve prediction,
otherwise P (predictor intercorrelation < .45) seems more

efficient.

111

Having used the present data to inform this set of
expectancy rules, the hit rate for these rules on the same
two years of data is 75%. The factor that is difficult to
specify with confidence is Q (Predictor intercorrelation)
because (1) the calibration criterion is imprecise and
because (2) the sample size is only moderately large.

Study of the condition means, therefore, gives rise to
three hypotheses: (1) LP prediction may be improved by the
use of single-source selected cases (to .improve the
reliability of the proxy-criterion when calibrating the
selection formula), (2) use of an overly reliSble UGPA
predictor (in the calibration) may contribute to prediction
error: hence, the addition of random error to the UGPA
predictor in the calibration may reduce the inflation of
the UGPA predictor weight, and (3) multicollinearity may be
less of a problem with these variables than has been
presumed: given an adequate sample size and precise
measures, moderately correlated multiple prediction may be
more valid than prediction with correlated variables

selected out.

How Intercorrelation May Remain Benign

As Pedhazur (1982) points out, there is no agreement
on the breadth of meaning in the term multicollinearity,
although its existence is unambiguous where predictor
intercorrelation biases regression coefficients. It is

clear that parameter bias is more likely as (1) the number

....a nus—.444» —uw

112
of predictors approaches the number of cases in the
calibration sample, (2) the predictors share mutual
factors, (3) intercorrelated predictors are highly
correlated with the criterion, and (4) predictors lack
reliability (Kenny, 1982). Although for this study GC
(general-criterion) performance is limited to only two
years, for both of those years GC prediction using
correlated predictors (having intercorrelations > .45, see
Appendix III) was numerically (though not significantly)
better than. GC prediction ‘with 'uncorrelated predictors
(when the sample size was the maximum: review Figures 1 and
2). irt may be that given (1) the unique factor structure
ﬂor intercorrelated predictors used and (2) the
reliability' of the intercorrelated predictors used,
prediction using mmderately intercorrelated predictors may

be a desirable method.

Extreme Outcomes in Figure 5 and 6

As can be seen from Figures 5 and 6, respectively, two
LP (local-proxy) calibration conditions produce
substantially higher error means for 1984 and somewhat

higher means for 1985 data:

BHGQ
(where Ba sample size= large
Ms source of UGPA- MSU
G- range restriction- on
On predictor intercorrelation >.45)

113

and BNOQ
(where B- sample size- large
N- source of UGPA- all
Oa range restriction- off
0- predictor intercorrelation >.45).

This underscores the previous recommendation that
local-proxy calibration be performed with uncorrelated
predictors due to the combination of (1) unreliability of
the proxy criterion and (2) the moderate sample size.
Beyond these observations, the question remains-- ‘what
affected prediction for these two years which was not
evident for the preceding years?' Multicollinearity might
be offered as an explanation for the unusual increase in
error because, in both conditions, uncorrelated predictors
are employed in the calibration. For the three years prior
to 1984, however, the error level is uniformly low.
Although the BMGQ calibration conditions were more
restricted than were the data subsequently estimated by its
selection formula, BNOQ (using ‘unrestricted calibration
conditions) exhibits greater error than BMGQ.

One explanation might be sampling error. Although the
LP calibration would ordinarily include all cases in the
calibration and in the estimation process, for this
experimental study, calibration samples are, in fact,
limited artificially to condition (a) of 80 cases, or to
condition (b) of 135 cases, while the application sample is
unrestricted. The 1984 and 1985 data entered the study

midstream (it was added after 1981-1983 condition samples

__3

 

114

had already been drawn), and because the case counts for
these years were low, the restricted calibration samples
for these years were somewhat under-sized (1984: B- 117,
and 1985: B- 111, while for 1981-1983: B- 135). Although
the calibration with the restricted sample may be expected
to be less reliable than a calibration with a larger
sample, the opposite outcome is also quite possible.

Although sampling error might account for this erratic
prediction, such an explanation seems inconsistent with
Figure 8, for instance, which depicts large and small
sample prediction (BMOQ vs AMOQ) equal and constant across
five years. Interactions only occur dramatically for 1984
data suggesting that a year effect is present for 1984 and
perhaps (to a lesser extent) for 1985. The nature of this
potential year effect is not evident. Because this
apparent year effect is not systematically associated with
any particular prediction factor, it can be suspected that
it results from general unreliability in the selection
equation due, perhaps, to some particularly incongruent
non-MSU veterinary prerequisite UGPAs, or non-MSU, ordinary
UGPAs which appear for these two years. Only these two
variables are expected to be both important and yet
potentially unreliable (due to possible differences in
grading standards) to the degree sufficient to cause such a
year effect. Probably the most feasible explanation for
the resulting interactions are that some unusually

incongruent non-MSU UGPAs in the 1984 (and to a lesser

 

115
extent, 1985) data when combined with two other prediction
invalidating conditions (prediction factors) created
notably unreliable regression equations. When the
unreliable equations were used to create GGPA estimates,
they again drew upon a UGPA predictor which remained quite
unreliable. Therefore, in using a proxy criterion, it must
be remembered that unreliability makes its mark both in the
calibration of the equation and in the estimation of the

criterion.

Cautions Regarding Study Realism

The small number (5) of suitable cohorts available for
analysis limits the emergence and range of prediction
factors in the data. To compensate for such limitation,
this study exaggerated potential sources of variation in
prediction error by deliberately biasing the selection
formulas. This was achieved by (1) selecting on certain
variables (eg. PVUGPA and MSU UGPA) , (2) controlling the
sample size, and (3) controlling the admission of
correlated variables into the formula calibration.
Unfortunately, some of the resulting conditions depart from
realism. For that reason, omnibus tests of factor effects
that test for general effects (over a large number of
conditions) may have less practical validity than certain
realistic, specific contrasts.

Some calibration conditions in this study are either

not realistic to practice, or they might otherwise mislead

'.._o 4

“>.“‘“-?‘ITI"-—r ’

0..

 

116
the interpreter of the study. In the method contrasts, for
example, the sample sizes of the proxy methods are always
greater than those of the GC (general-criterion) method,
which accounts for the more erratic variation of GC error.
Small sample size here was defined as data from only two of
the three years of the cases available. Roughly 270
veterinary graduate cases were available to GC ‘method
calibration within three years of data, while about 900
cases were available to the GP (general-proxy) and to the
GM (general-mixed) methods within the three years of data.

For the tests of methods, error was defined as error-
in-ranking the estimated GGPAs against actual GGPA (or,
LEWAR, where

L! natural log value
E- estimated

W- weighted

A- absolute value
Re error in rank).

Although the use of error in ranks was necessary in
order to compare raw score regression error (for the proxy-
criterion methods) with standardized regression error (for
the GC method which required standardized regression in
order to correct partial coefficients), rank error was more
valid with regards to selection error. This weighted LEWAR
and LEWA error

(where L: natural log value
E- estimated

W3 weighted
A- absolute estimation error residual)

should be more appropriate than using ordinary unweighted

 

‘ Q

117

estimation error, as ordinary estimation error has little
importance for cases above the cut-score. In addition, the
use of log transformed error values gave greater weight to
errors of large magnitude. Larger errors, were believed
most likely to be due to (1) errors near the cut score,

(2) errors from non-MSU UGPAs, or ( 3) criterion errors.
Among generalizable calibrations, and among LP
calibrations, ( 2) and ( 3) should remain constant, leaving
cut-score region errors to explain the differences between
these conditions. A problem linked to the use of weighted
scores is the loss of degrees of freedom and the resulting
liberalizing effect on significance tests: systematically
weighting some cases greater than others, is the same (with
respect to degrees of freedom) as deleting some cases.

In the LP (local-prediction) method contrasts, the
calibration sample sizes (either 80 or 135) were less
authentic due to limits in absolute number (it became
impossible to provide even these numbers of cases for all
conditions for the years 1984 and 1985) . Error measures
also were less realistic for the test across LP (local-
proxy) methods, because they were in LEWA error (logs of
weighted absolute differences between the estimated and
actual criterion) rather than in ranking error. This
sacrifice was made in the hope of improving the power of
the statistical tests.

In practice, selection formulas would be applied to

wide ranges of applicant scores: therefore, selection

1114 ”-1)“ I

:1. a) :‘A

118
formulas might ideally be calibrated on samples having the
same wide ranges of scores. According to Table 3 (adapted
from Richards, 1982), except for measurement error in the
independent variable, raw-score regression coefficients
could remain correct despite selection on the independent
variable. Thus restriction of range might not be a problem
if one wished to use a raw-score regression model as a
selection formula. Such reliability in scores, however, is
difficult to substantiate. Richards also demonstrates that
(1) variation in the dependent variable range between
application and calibration samples, and (2) variation in
scale interval meaning affects the validity of the raw-
score regression coefficient. The alternative regression
procedure, standardized regression, is not affected by the
problem of scale variation, and in addition, it allows for
correction (actually, only a shrinking of error) of its
betas (or partial correlation coefficients) for restriction
of range on both the independent and dependent variables.
Alexander et al. (1987) provide a formula for correcting
correlations for both types of range restriction, and the
author has extended the application to the betas (partial
coefficients) in a standardized regression model. Since
correction, in fact, is likely to be conservative (Linn,
Harnisch, and Dunbar, 1981) and may ignore significant
predictors (to result in an under-specified model),
‘adjustment for dispersion' may be a more 'appropriate

expression. In addition, such adjustment is also required

 

119

for expansion of range, where this occurs. In the
practical application of selection formulas, the proxy-
criterion methods would ordinarily not require any
adjustment for dispersion differences between calibration
and application samples, whereas, adjustment would be
required for the conventional GC (general-criterion)
method. Ordinarily, therefore, the error due to dispersion
differences would be greater for the GC approach since the
adjustment could only be approximate.

In this study, however, the validity-test sample
happens to be ideal for the GC method because the
application sample is range restricted (to upper
distribution cases) in the same way as the GC caIibration
sample is restricted. The (unrestricted sample) proxy-
criterion methods are faced with an uncharacteristic range
restriction problem relative to the (range restricted)
validity-test sample. In the test of methods, this has
been partly countered by adjusting the GC (general-
criterion) selection formulas for application to the total
ranges of applicant scores. As such an adjustment is
expected to be conservative, however, the GC method is
expected to retain a slight predictive advantage
attributable to the characteristics of the application
sample of this particular study. 0n the other hand, the
potential exists for the adjustment to accrue extraneous
errors if the standardized regression is subject to the

effects of sampling error and therefore imprecise

 

120

(standardized score regression is required in order to
estimate the partial correlations to be adjusted for
dispersion differences for each predictor). The author's
adjustment technique, though extrapolated from respected
bivariate correlation, has no literature to support its use
with multiple coefficient correlation. This author did
test the procedure on a small scale simulation with
anticipated results, and therefore uses it with some
justified confidence.

Of course, argument can be offered for not adjusting
the GC selection formula, inasmuch as the full-range
selection formula will not be optimal for cut-score
situations where the selection ratio may be small. Such a
course, nevertheless, risks the likelihood of misspecifying
the selection model by erroneous predictor inclusions or
exclusions or by weightings associated with the use of a

restricted calibration sample.

Why R2 Wasn't Used as a Measure of Validity

R2 is the square of a multiple correlation coefficient,
a conventional measure of validity for regression
equations. R2 was not used in this study as an index of
validity for several reasons. Not only does R2 not give
greater weight to cases near the cut-score, but it gives
greater weight to cases the farthest from the cut-score.
Because R2 is a variance statistic, it follows that cases

farther from the mean will contribute a disproportionate

n- 1." vxm-’

121

share of the variance:

variance- SUM (score - mean)2 / N.

Note that the difference from the mean is squared, thus
larger discrepancies from the mean can disproportionately
influence the magnitude of the variance (e.g. the outlier
problem). Because a graduate candidate distribution
represents the upper tail of a distribution (higher ability
college students), this distribution will be strongly
negatively skewed with the distribution mean near the cut
score. Under these circumstances, the most likely means of
improving an R2 would be to improve prediction at the
place in the distribution the most extreme from the mean:
among the highest ability candidates. Such an
‘ improvement' in prediction may, in fact, decrease the
level of discrimination in the cut-score region. It is
also possible for the R2 to increase significantly without
any corresponding change in the ordering of scores. In
such circumstances, selection would remain unchanged
despite better absolute prediction. Although the
alternative index of validity, transformed prediction
error, leads (as mentioned before) to liberal significance

tests, it, nevertheless, can measure the change of interest

to the admissions office.

 

122

Regression Equations Have Superfluous Predictors

To control for confounding, calibration (regression)
runs were executed without deleting ineffective predictors
by means of conventional statistical tests. Statistical
significance tests for coefficients are based partly on the
number of predictors and partly on the size of the sample.
Significance tests thus potentially confound with
prediction factors such as (1) sample size and (2)
intercorrelation of predictors. The consequence of
superfluous predictors in the regression equation, however,
is the addition of random error to the predicted scores,
thereby attenuating actual predictor validities (Deegan,
1976). Equations without superfluous predictors will

follow below.

Practical Implications of the Study

Presented. in. Tables 22, 24, and 25 are regression
models developed through a stepwise multiple regression
procedure where conventional methods of coefficient
significance testing (a- .05) have been used to refine the
set of active predictor variables. In Table 22 (Method
betas by condition) three estimation methods, GC (general-
criterion), GP (general-proxy), and GM (general-mixed) use
pooled. data (1981 through 1983) to calibrate a single
(estimation) regression equation for each method. For the
LP method, 1984 and 1985 data are used to calibrate a

regression equation for each of the two years. The GC and

123

GM methods (using GGPA and PVUGPA [veterinary prerequisites
GPA] respectively) produce highly similar regression
equations and they also give the greatest weight (.94,
virtually all the predictive weight) to the UGPA variable.
The GP method differs from the previous two methods in its
weighting scheme, and weights UGPA at .77. The LP method
runs (1984 and 1985) utilize fewer predictors and weight
UGPA at .67 and .63 respectively.

The similarity between GC (general-criterion) and GM
(general-mixed) calibrations further underscores the
similarity of UGPA and GGPA. at the site under study
(although for GM the alternate criterion is, in fact a
proxy, PVUGPA- veterinary prerequisites UGPA).

As displayed in Table 22, for the GC method, the
selection formula validity (without crossvalidation) with
five years of data was .91 (or about 80% of criterion
variance accounted for by the predictors, or R2- .81). The
best average error using the GC formula to predict
subsequent GGPA ranking of approximately 90 vet school
graduates averages was a discrepancy of 17 positions. Also
in Table 22, proxy-criterion methods obtained similar
validity coefficients (non-crossvalidated). Of course
these formulas predicted the proxy rather than the real
criterion. Measured in terms of actual prediction error,
the best average (absolute) proxy criterion (GM) prediction

error was .29 from the actual GGPA.

124

Tables 24 and 25 provide additional LP runs for years
1981 through 1983: in Table 25 regression equations were
calibrated on only 75 percent of available cases. Across
the five years the weighting scheme varies somewhat,
although UGPA maintains its dominance in prediction
(ranging from .63 to .99 for the full samples).

Sample-size differences (100% vs 75%) have a minor effect
on the regression coefficients estimated, but little effect
on the predictors selected.

Because this study is site specific, generalization
from these findings to other sites must be regarded as
tenuous. One important feature of this data set is its
homogeneity with regards to origin of UGPA (undergraduate
GPA is predominately MSU). Admission of students with non-
MSU UGPAs may require a relatively higher level of ability:
graduate school faculty may be more willing to accept
equivalent credentials from students whom they know rather
than from less familiar candidates, thereby ruling out all
but the highest performing non-local candidates. Should
such be the case, prediction. error in ranks *would. be
minimized (rank error decreases at the extremes of the
distribution). This could explain (1) why UGPA is so
effective as a predictor, and (2) why non-MSU UGPAs are not
associated with significantly greater prediction error (in
ranks). At other sites or in other selection situations,
the UGPA and GGPA. may’ be less dependent, opening' the

potential for (1) alternative predictors with competitive

125

Table 22

Method betas under optimal prediction conditions

 

 

94;?

 

-.-.4-- .._..
. -;

9.9 9.! 9.2 L291 LE9:
Predictor B p B a B
s.e. s.e. s.e. s.e. s.e.
01 actach -.032 -.031 -.043 . .
.013 .013 .019 . .
02 age . . . . .093
. . . . .034
03 avgcred ----------------------------------------------
04 cred .447 .460 .346 . .
.119 .120 .164 . .
05 ugpa .938 .940 .767 .674 .626
.037 .037 .053 .031 .038
06 intlsc . . .149 . .
. . .019 . .
07 int2sc . -.027 . . .
. .013 . . .
08 mcatb -------------------------------------------------
09 mcatc .048 .051 .084 . .
.018 .018 .022 . .
10 mcatp .038 .057 . . .
.018 .018 . . .
ll mcatq -------------------------------------------------
12 mcatr -.039 -.046 . . .
.015 .015 . . .
l3 mcats -------------------------------------------------
14 narr --------------------------------------------------
15 numterms .122 .128 . . .
.039 .039 . . .
16 pfcred . . .062 . .
. . .019 . .
17 pts -.678 -.692 -.445 -.110 -.125
.114 .114 .164 .023 .034
18 pvspts .217 .213 .170 .341 .383
.018 .018 .025 .031 .039
19 sex -.049 . . . .
.014 . . . .
20 sumcred -----------------------------------------------
21 totcred -----------------------------------------------
22 vetexp ------------------------------------------------
23 workexp -----------------------------------------------
R .91672 .91599 .81681 .93142 .91284

 

Note: See Table 23 for predictor variable definitions

GC= general-criterion, GP= general-proxy,

GM- general-mixed, LP84= local-proxy for 1984, LP85= local-
proxy for 1985

126

Table 23

Predictor names and their definitions

 

 

NAME DEFINITION
01 actach: activities and achievements (non-academic)
02 age: age in years
03 avgcred: average term credits
04 cred: total (grade point) credits
05 ugpa: undergraduate grade point average
06 intlsc: interviewer rating number 1
07 intZsc: interviewer rating number 2
08 mcatb: Medical College Admissions Test: Biology
09 mcatc: Medical College Admissions Test: Chemistry
10 mcatp: Medical College Admissions Test: Physics
11 mcatq: Medical College Admissions Test: Quantitative
12 mcatr: Medical College Admissions Test: Reading
13 mcats: Medical College Admissions Test: Science

Reasoning

14 narr: narrative writing sample
15 numterms: number of terms enrolled in college
16 pfcred: credits on a pass/fail basis
17 pts: total honor points
18 pvspts: total honor points in veterinary prereq.
19 sex: gender
20 sumcred: total veterinary prerequisites credits
21 totcred: total grade-point and pass/fail credits
22 vetexp: veterinary experience rating
23 workexp: work experience rating

127
Table 24

LP betas by year: full sample

 

 

 

 

 

 

1291 .1292 1291 1291 1299
Predictor p p ﬁ ﬂ 3
s.e. s.e. s.e. s.e. s.e.
01 actach —— - — ————=
02 age .138 . . . .903
.047 . . . .034
03 avgcred — — -- ————— - ==— — — ———=
04 cred . .640 .591 . .
. .190 .186 . .
05 ugpa .823 .994 .914 .674 .626
.031 .064 .064 .031 .626
06 intlsc -.068 . . . .
.023 . . . .
07 intZsc ------------------------------------------------
08 mcatb . .076 . . - .
. .030 . . .
09 mcatc . .058 .082 . .
. .028 .025 . .
10 mcatp .103 . . . .
.026 O O O O
11 mcatq -------------------------------------------------
12 mcatr . -.101 . . .
. .027 . . .
l3 mcats -6 -----------------------------------------------
l4 narr - - —— — — —=— — ——===
15 numterms .170 . . . .
.078 . . . .
16 pfcred — — - ———-====
17 pts -.388 -.797 -.719 -.110 -.125
.068 .189 .192 .022 .034
18 pvspts .220 .196 .261 .341 .383
.031 .029 .029 .031 .039
19 sex ---------------------------------------------------
20 sumcred . . .067 . .
. . .023 . .
21 totcred -----------------------------------------------
22 vetexp . . . . -.056
. . . . .026
23 workexp -----------------------------------------------
R .91595 .91893 .92455 .93142 .91284

 

Note: See Table 23 for predictor variable definitions

”f .... "“"“"‘.ﬁ322.flf

LP betas by year: 75% sample

128

Table 25

 

 

1291 1292 1299. 1299 1299
Predictor p p p p B
s.e. s.e. s.e. s.e. s.e.
01 actach ------------------------------------------------
02 age . . . . .
. . . . .036
03 avgcred -----------------------------------------------
04 cred . . 1.027 . .
. . .226 . .
05 ugpa .852 .814 1.003 .735 .662
.035 .031 .072 .050 .040
06 intlsc -.053 . . . .
.026 . . . .
07 int2sc ------------------------------------------------
08 mcatb . .069 . . .
. .033 . . .
09 mcatb . .085 .078 . .
. .031 .030 . .
10 mcatp .105 . . . .
.028 . . . .
11 mcatq -------------------------------------------------
12 matr . -.107 . . .
. .030 . . .
l3 mcats -------------------------------------------------
l4 narr --------------------------------------------------
15 numterms .332 . . . .
.080 . . . .
16 pfcred ------------------------------------------------
17 pts -.435 -.189 -1.l71 .350 -.150
.080 .028 .235 .113 .037
18 pvspts .201 .189 .281 .344 .364
.034 .031 .034 .037 .040
19 sex ---------------------------------------------------
20 sumcred . . .102 . .
. . .027 . .
21 totcred . . . .237 .
. . . .114 .
22 vetexp . . . . -.064
. . . . .029
23 workexp -----------------------------------------------
R .91853 .92245 .91723 .92782 .92002

 

Note: See Table 23 for predictor

variable definitions

 

129
validity, and (2) substantially better relative prediction
for multiple regression prediction methods.

For selection situations similar to that in this study,
the use of admission test scores is open to challenge.
Admission test predictors obtain only marginally
significant (as .05) regression weights (see the MCATc,
MCATp, and MCATr weights in the GC column, Table 24) when
the UGPA predictor variable is already in the selection
equation. Nevertheless, if the test score information is
used by undergraduate institutions (as a secondary purpose
for the data), to evaluate and modify internal curricular
and grading standards, abandonment of admission test
requirements might well result in a decline in UGPA
validity. The potential influence of the admissions test
on UGPA validity is probably sufficient reason to retain
admissions test scores in the selection equation even
though the immediate consequence may be marginal. Over the
course of several years, the continuing inclusion of the
admissions test predictors in the selection formula may
preserve or even improve the validity of UGPA.

One strategy would be to use the admission test scores
to correct all UGPAs, or to correct just the outside (e.g.
non-MSU) UGPAs. The work of Linn (1966) in adjusting GPA
was cited earlier. He found that GPA adjustment by
admissions test scores effectively eliminated the GPA error
due to source of GPA for high school GPAs. After

correction of the appropriate UGPAs, admission-test

130
predictors would be withheld from the selection formula.
In such a procedure, UGPAs would be adjusted up or down
depending on the UGPA-admission test score discrepancy.
Such a process, nevertheless, might provide little benefit
to selection where few outside UGPAs reside near the cut
score.

The inclusion of demographic variables in the
regression equation may be quite informative for purposes
{of research. For application however, the use of
demographic variable predictors cannot be recommended.
This caution must be exercised because data samples are
voluntary: hence, they are non-random data, plagued with
systematic selection effects related to candidate
recruitment and personal motivation. Effects attributed to
categorical variables (e.g. race, gender) may be completely
spurious. For example, due to a shortage of black
candidates in human medicine, all minimally qualified
blacks may be intensely recruited to human medicine,
greatly depleting the remaining pool of blacks who would
consider application to veterinary study. Blacks from
this depleted pool would not provide a sample of black
characteristics which could be validly generalized. Where
data are not random, demographical variables must be
regarded as moderator variables which may control variance
to a certain extent, but which do not account for that
variance in a literal sense (e.g. gender effect not being

due to one's gender, per se). Ultimately, the predictive

-""'-,‘.-.3':.‘.:‘y

131

advantage gained by blocking cases according to (moderator)
categories (e.g. gender, race) may be retained by
identifying other variables which account for the
qualitative differences between category levels. For
example, Table 22 shows a significant GC weight for the
predictor variable "sex". In reality, the sex predictor
'variable may be mmderating "level of affection for
animals", or "temerity towards human patients". By
including measures of these traits in the selection
equation, the advantage of including the gender variable
may disappear. Use of the categorical (moderator)
predictor (e.g. gender) in the actual admissions selection
formula merely on the basis of its predictive value is a
questionable policy which is difficult to justify.

Beyond the consideration of merit, certain categorical
variables neutral to race, religion, or gender (e.g.
economic disadvantage) may be chosen and weighted by the
admitting institution to create a non-merit criterion for
admissions as an exception to the usual merit criteria (see
Roos, 1978, for relevant information on non-merit
admissions selection conforming to the Bakke judicial
decision). The rating of such non-merit attributes,
nevertheless, shouLd not interfere with the evaluation of
conventional applicant academic merit. The balance between
merit and non-merit considerations should be specified as a
consistent policy prior to application of non-merit

considerations to particular cases.

CHAPTER VI I
SUMMARY

Refinement of the selection process is an essential
component of any serious effort to enhance public benefit
from educational programs. In the introductory chapter it
is noted that only recently has ease of data entry and
retrieval made the prospects of a scientific graduate
candidate selection process feasible. A review of research
on candidate selection within the health sciences reveals a
lack of consistent findings, likely partially due to
limited sampling and sample sizes. Additionally,
inconsistencies may arise from unspecified factors
associated with particular years and locations which
influence the composition of the non-random pool of
applicants. One potential solution to such year and
location effects would be to ‘localize' estimation by
attempting to estimate future performance based on a sample
restricted to a single site and year. This prediction
strategy was dubbed the LP (local-proxy) approach because
it specified that the sample be local and that the
regression criterion used be a proxy for the GGPA (graduate
GPA). The proxy criterion would be the grade point average
for the undergraduate veterinary prerequisite courses
(PVUGPA), and this would allow the UGPA (undergraduate GPA)

to serve as one of the multiple predictors. Three

132

133
additional variants of the LP approach would include
(1) the GP (general-proxy) where admissions data would be
multi-year, (2) the GM (general-mixed) where admissions
data would be multi-year (or multi-site) but the criterion
would use either authentic or proxy criterions, or (3) the
lone variable UGPA as an estimator of GGPA. Advantages to
be gained by these methods might include (1) an increase in
the sample size of local cases and (2) an expansion to the
full applicant range of local cases (no cases need be
deleted from the analysis for lack of a subsequent GGPA),
(3) additional predictors can be added or deleted for any
year (alternate admissions test scores could be accepted to
a limited extent), and (4) previous GGPA data would not be
required in order to estimate GGPA.
The research hypotheses called for some estimation
methods to exceed the predictive validity of others:
A: LP (local-proxy) to be more valid than UGPA
B: GM (general-mixed) to be more valid than the GC
(general-criterion)
E: MSU applicant prediction to be more valid than
prediction for non-MSU applicants,
or for the appearance of year or method effects:
C: Year and method effects to appear in an analysis of
methods

D: Year effects to appear in an analysis of years and
other factors

The review of literature in Chapter II concludes that
research on predictor validities for the health sciences

are not consistent from year to year nor across sites.

134

Linn's (1966) work to equate high school GPAs indicated
that for these data the introduction of admissions test
scores into the multiple regression was as effective as
other more elaborate equating methods. Although some work
has been done using UGPA as a proxy, no literature was
found (other than the author's) where the context is
academic selection. A large number of studies reported
evidence of factors which moderate prediction among
educational samples.

In ‘the theoretical review' of“ Chapter II ‘topics of
sampling, measurement, coefficient validity, multiple
regression, and multivariate and univariate analysis of
variance and the t-test are discussed.

In Chapter III conventional prediction is identified as
general-criterion, or GC (general, because the regression
equation is generalized across years and sites: criterion,
because the criterion used is an authentic criterion).
This prediction is compared and contrasted with the
following three experimental multiple predictor prediction
procedures:

(1) local-proxy, or LP (local, because the equation is
not generalized beyond year or site: proxy, because the
criterion is a ‘stand-in' [veterinary prerequisites UGPA]
for an ‘authentic' criterion [graduate GPA]),

(2) general-proxy, or GP (general, because the
equation can be generalized: proxy, because a proxy

criterion is used),

135

(3) general-mixed, or GM (general, because the
equation may be generalized: mixed, because the criterion
will be an authentic criterion when available, otherwise,
the criterion will be a proxy.)

Channels for the entry of error into the prediction
jprocess, potential for systematic and random error,
advantages and disadvantages, and optimal conditions for
the conventional and experimental approaches are discussed
and compared.

Chapter IV describes the design of the proposed study.
Five admissions cohorts were used from the Michigan State
University College of Veterinary Medicine. Of these
applicants, some were subsequently admitted to the
veterinary program and subsequently received a graduate
GPA. For all applicants, parallel admissions data were
available. These took the form of grades, admissions test
scores, ratings, and some demographic variables.
Admissions data constituted the source of regression
predictors, the graduate GPA was the authentic predictor,
and the veterinary prerequisites GPA served as the proxy
criterion.

Two major analyses were performed: both were repeated
measures MANOVAs using dependent variables that were
transformed prediction-error residuals. A test-of—methods
MANOVA looked for methods and year effects, while
controlling for the following factors: (Q) predictor

intercorrelation, (A) sample size, and (N) source of UGPA.

. m- m“ _._
T“ .-'

136

The test-of-factors (and years) MANOVA used one estimation
method, local-proxy, across five years of data. It looked
for a year effect and it controlled for these factors: (Q)
predictor intercorrelation, (A) sample size, (N) source of
UGPA, and (O) restriction of range. For the test-of-
methods MANOVA, general-criterion estimates were obtained
with a standardized regression procedure, the betas being
corrected for restrictions of range. For this MANOVA,
prediction rank minus GGPA rank was the prediction-error
used. For the test-of—years MANOVA, actual prediction-
error from raw-score regression estimates was used.

Results are reported in Chapter V. As summarized in
Table 12, Hypothesis D was confirmed. For Hypothesis D,
cut-score prediction error differed across the five years.
A view of Figures 5, 6, and 7, again reveals substantial
interactions for 1984 data and modest interactions for 1985
data. Because dramatic interactions occur specific to only
1984, with lesser interactions for 1985, it is likely that
data for these two years was of lower reliability (perhaps
due to greater diversity in UGPA standards).

Unconfirmed are Hypotheses A, B, C, and E. For
Hypothesis A, local-proxy estimation does not differ from
estimation using UGPA as a sole predictor for cases in the
cut score region. For Hypothesis B, both the general-
criterion and the general-mixed method prediction
validities are about the same. For Hypothesis C, neither

method nor year effects occurs under the test-of—methods

137
for cases in the cut score region. Also failing to differ
is prediction error for MSU and non-MSU applicants,
contrary to Hypothesis E.

Chapter VI provides a discussion of the findings.
While it is acknowledged that this study does not find a
consistent predictive advantage for the proxy methods with
the present data, some potential enhancements remain to be
tried: (1) calibration of local-proxy equations using a
single source of UGPAs (e.g. MSU), and (2) addition of
random error to the UGPA predictor prior to the
calibration. Also, potential for the methods may remain,
albeit, under different circumstances. It is noted that
the UGPA validity level was high for these data. In view
of the geography of veterinary education which finds less
than one school per state, this is not surprising. Where
UGPA sources are more diverse, or where local grading
standards are changing, the validity of the UGPA is bound
to decline and competing estimation procedures (such as
proxy criterion methods) may find practical use.

Intercorrelation of predictors did not appear to
constitute a multicollinearity problem for general-
criterion prediction. When the intercorrelated predictors
did pose a problem for local-proxy estimation, it may have
been due to the greater number of predictors tolerated in
the high intercorrelation condition (and not from
intercorrelation, per se). Kenny (1982) did note that

multicollinearity was associated with predictor

138
unreliability, overlapping factors, and with the high
correlation of the intercorrelated predictors with the
criterion. Perhaps multicorrelation is not a serious
problem for moderately intercorrelated admissions data
which are sufficiently reliable and factor independent.

The reader is again cautioned regarding several aspects
of the study which might be ‘misleading: (1) general-
criterion sample size levels were numerically smaller than
those for the proxy-criterion methods: this was true to
life but, nevertheless, sample size levels were not
identical, (2) the dependent ‘variable in the test-of-
methods differs from that in the test-of—factors (the first
is reported in rank-error, while the second is reported in
actual error): the rank-error was more relevant to effect
on cut-score cases, although the actual error allowed more
powerful testing, (3) the general-criterion regression
equations were computed using standardized regression and
adjusted for restriction of range while the proxy-criterion
estimation used (unadjusted) raw-score regression
estimates, (4) the high predictive validity of UGPA for
these data may not generalize to other sites or programs:
UGPA validity is likely tied to the dominating proportion
of MSU UGPAs in the applicant pool.

It is acknowledged that the regression equations used
to estimate graduate grade point average in this study were
not optimal, because they retained non-significant

predictors. Non-optimal estimation allowed the testing of

139
sample size and predictor intercorrelation which would have
been confounded with statistical testing, had it also
occurred. To provide accurate regression equations for the
methods in this study, therefore, regressions were run
again with statistical testing of coefficients. The
general-criterion and general-mixed equations were
virtually identical, while all equations gave the dominate

predictive role to the UGPA predictor.

APPENDICES

140

n- .b-“.——-
‘ .

APPENDIX I

Substitute Merit Values

141

142

Table AI

Values substituted for missing merit values

 

 

VARIABLES VALUE
activities and achievements (non-academic) 2.5
age in years 21.935
average term credits 14.612
total (grade point) credits 58.
interviewer rating number 1 20.483
interviewer rating number 2 20.711
Medical College Admissions Test: Biology 2.
Medical College Admissions Test: Chemistry 2.
Medical College Admissions Test: Physics 2.
Medical College Admissions Test: Quantitative 2.
Medical College Admissions Test: Reading 1.
Medical College Admissions Test: Science 2.
narrative writing sample 3.072
number of terms enrolled in college 10.38
credits on a pass/fail basis 7.2
total honor points 162.
gender 1.53
veterinary experience rating 4.833
work experience rating 2.26

 

 

APPENDIX I I

I
1
I
1..

Error Weight by GPA Plot

143

 

#91! SUP'SIUC' 210 Puﬁbﬂﬁ

144

CUT-SCORE WT BY RKGPA

++/ /-+----+----+----+----+----+----+----+----++
1.2+ +
I I
I I
I I
I I
1+ +
I I
I I
I I
I I
.8+ +
I I
I I
I I
I I
06+ +
I I
I I
I I
R I
.4+ +
I I

I I

I I

I 21 I
.2+ 12 +
I 22 I

I 22221 I
I 12222222221 I
I 12222222222222222222 I
0+ +
I I
I I
I R

I I
-.2+ +
I I

I I

I I

I I
-.4+ +
++/ /-+----+----+----+----+----+----+----+----++

20 40 60 80

RKGPA

FIGURE IIA. LEWA (or LEWAR) weight plotted
against RKGPA (n= 88).

4 I ‘2, b.‘"-.‘.‘ DDT

145

CUT-SCORE WT BY GPA

'1 FA ‘41 . .eq4‘ .

H+IIII+IIII+IIII+IIII+IIII+IIII+IIII+IIRI+

.

.

.

.

+ 1

. 1.

. 1.

u 1
2

+ 2

. 2

.

u 2
3

+ 3

. 6

. 2

u A.
2

+

. 5

. 1.2

n 3

6

+ 3

. 5

. 4.1

. 2

. 1

+ 14.2

. 1

u 1.31

. 1.1

d. 1. 1 11.

.

u 1 1

+

.

/

1

/

+

+

IIII+IIII+IIIR+IIII+IIII+IIII+IIII+IIII+
1 8 6 4 2 0 2 A.

1.2+

'LRLW.A an. runuw.nnn “Wm;

2.8

4
112
/

+
+

+
+
.
.
.
.
+
.
.
.
.
+
.
.
.
.
+
_
.
.
.

+ 2
.
.
.
.
+
.
.
.
_
+
.
.
.
.
+
.
.
.
.
+
_

GPA

LEWA (or LEWAR) weight plotted

against GPA (n= 88).

FIGURE IIB.

APPENDIX III

Correlations: LEWAR-transformed Error by Predictors and GGPA

146

Table AIII

Correlations: LEWAR-transformed error by predictors and GGPA

 

\
ERROR \

\ VARIABLE 2
N
\ PROB

\

ACTACH

 

SUICRED

SEX

PVSGPA

ICATC ICATR NARR PFCRED PTS

CUMGPA

AVGCRED

198‘
LEIARHA

LEUARUO

LIIAIIC

(
Pm

LBIARID

LEUARII

LEIARUJ

LIUARIK

LIHARIL

LEUARNA

LEUAINB

LEUARNC

”A.
‘0“
00°
N

e

LENARND

LiwARNx

LEWARNJ

Table AIII-continued

 

\
ERROR \

\ VARIABLE 2

N
\ PROB

\

ACTACH

 

CUIGPA ICATC ICATI "ARR PFCRED PTS PVSGPA SEX SUMCREO

AVGCREO

-v~o

LEUARHP

LEIAROE

LEUAROF

cas-

LEUAROG

LIUAROH

148

“WAN

LEUAROM

LIUARON

NAI-

LIIAROO

“Is-sw-
FINN
MMV
o .

LENAROP

LEUARPE

LIUARPF

s-A's

LEUARPG

LEUARPH

LEUARDM

LEUARDN

"“T

Table AIII-continued

 

\
ERROR \

\ VARIABLE 2
N
\ PROB

 

CUMGPA NCATC ICATR NARR PFCRED PTS PVSGPA SEX SUICRED

AVGCREO

ACTACH

LEUARPO

LEUARDP

—A°
’00
I00”
0 u

u
«A

LEUARC

LEVARP

GGPA

WORKEXP

GGPA

HORKEXP

GGPA

HORKEXP

GGPA

NORKEXD

149

LEUARPO

LEUAROM

LEUARNG

LEUARHE

LEWARPP

LEUARON

0758

101)
.226

LEUARNH

LIUARMF

r-m

LENARC

LEUARNI

LEIARHG

LEwARP

NAG
Cam'-
0 P

LENAROP

LEHARNN

LEUARMH

GAO
0mm
0400

ll
Va

LEUARPE

LEUARNO

LEUARIﬂ

LEUARPF

-.0294
( 101%
P: .38

LEUARNP

LEUARIN

“an

M“.

u
v0

LEUARDG

090

101)
.454

LEUAROE

LEUARMO

GAO

NOV
o'- .

LEWARPH

LEUAROF

LEUARHD

NBC)
0D"

'-

~41

LEUARPM

LEIARNE

LEKAROH

LEUARHF

h‘k'

_ 4. -q._

Table AIII-continued

 

\
ERROR \

\ VARIABLE 2
N
\ PROB

\

ACTACH

AVGCRED

 

MCATR “ARR PFCRED PTS PVSGPA SEX SUICRED

ICATC

CUMGPA

1985
LEHARME

(
Pm

LEUARHF

”A.

LEUARIG

LEUARHH

36)
301

LEIAHHM

150

LEUARHN

LEVARMO

LEUARHP

96)
.151

(
Pm

0A.-

LEUARNE

NAI-

LEUARHF

LBUARNG

LEUARNH

LEUARNM

New}

use
m—e
oo—

e-e-o

LEUARNN

,LENARNO

Table AIII-continued

 

\
ERROR \

\ VARIABLE 2
N
\ PROB

 

AVGCRED CUMGPA ICATC ICATR HARE PFCRED PTS PVSGPA SEX SUICREO

ACTACH

.0391

.191
88
O

-.0697
SI .ggg

0297
.20;

L

LEIARNK

LEHARNL

ID ‘0
man.
“:90
N

~46

”A'-
50“
o -

LEUAROA

0"

LEUAROO

LEUAROC

LEWAROD

NFC
FUND
COMO
(‘4 -

LEUARO!

LENAROJ

u—Aa
OM10
Ono
' -

LEUAROI

LEUAROL

0A0
v'OO
‘0'“

' I

LE'ARPA

LE'ARPI

LENARDC

oru~

LEUARPD

LENARPI

Table AIII-—continued

 

\
ERROR \

\ VARIABLE 2
N
\ PROB

\

ACTACH

 

CUMGPA ICATC MCAT! NARR PFCRED PTS PVSGPA SEX SUICRED

AVGCRED

LEIARPJ

In“
”0"
"an
o .

LIUAIPK

LEUAEPL

LEUARC

LEIARP

152

GGPA

WORKEXP

GGPA

WORKEXP

GGPA

VORKEXP

LEHARPA

LEHAROA

LEUARNA

LEUARIA

LEUARPB

-.1388
88
.09

(
Pm

LEWIROO

-.1807
( 88%
PI .04

LEHARNB

LEUAIIB

PAN

LEUARPC

LEUARIC

LENARPD

LEWAROD

'.1779
88)
.049

(
Pm

LEUARID

LEUARPI

LEUARNI

LEIARHI

LEUARPJ

-.1494
88)
.082

(
9m

LEWAROJ

LEUARNJ

LEIARIJ

LEHAROK

LEUARNK

LEIARMK

LEWARPL

LEUARNL

LEUARHL

.._-- j?

REFERENCES

153

REFERENCES

Allen,M.J.,Yen,W.M. (1979) u
199292. Monterey, California: Brooks/Cole.

.Alexander,R.A.,Carson,K.P.,Alliger,G.M.,Carr,L. (1987)
Correcting doubly truncated correlations: An improved
approximation for correcting the bivariate normal
correlation when truncation has occurred on both
variables. E99saIi2na1.an9.2sxshglegisal_nsasurement.

47, p.309-315.

Cattin,P. (1981) The predictive power Of ridge regression:
Some quasi-simulation results. lgnzn§l_gf_Apnli_g
2919991991: 6533: PP-232'290-

Clapp,T.T.,Reid,J.C. (1976) Institutional selectivity as a I
predictor of applicant selection and success in medical

school- I29Inal.of_negisal.zgusatign. 51. pp-851-852-

Deegan,J.Jr. (1976) The consequences Of model
misspecification in regression analysis. M919129I19§9

Bebaxioral.gesearsb. April. 1976. pP-237-248-

Doolittle,A.E.,Cleary,T.A. (1987) Gender-based differential
item performance in mathematics achievement items.

I2urnal.2f.Egssatignal.usasurement. 24:2. pp-157-166-

Elliot,R.,Strenta,A.C. (1988) Effects Of improving the
reliability Of the GPA on prediction generally and on
comparative predictions for gender and race particularly.

25:4, pp.333-347.

_ u_.--'

Goldman,R.D.,Hewitt,B.N. (1976) Predicting the success Of
Black, Chicano, Oriental, and White college students.

12EIEa1.2f.Eggsafional.neasgrement. 13:2. pp-107-117~

Goldman,R.D.,Hewitt,B.N. (1975) Adaptation-level as an
explanation for differential standards in college

grading. I29rna1.of.Egusational.nea§gremsnt. 12:3.
pp.149-l6l.

Gough,H.G.,Lanning,K. (1986) Predicting grades in college
from the California Psychological Inventory. Educational

nng Psycholggical Measurement, 46, pp.205-2l3.

Hakstian,A.R.,WOOlsey,L.K. (1985) Validity studies using the
comprehensive ability battery (CAB): Predicting
achievement at the university level. Educagiona; and

Psychologiggl neasunemeng, 45, pp.329-341.
154

155

Hart,M.E.,Payne,D.A.,Lewis,L.A. (1981) Prediction Of basic
science learning outcomes with cognitive style and
traditional admissions criteria.

Eeueatien. 56:2. pP-137-139-

Hogrebe,M.C.,Ervin,L.,Dwinell,P.L.,Newman,I. (1983) The
moderating effects Of gender and race in predicting the
academic performance Of college developmental students.

0 43:
pp.523-530.

Huberty,C.J.,Mourad,S.A. (1980) Estimation in multiple

correlation / prediction- E99eafien21.ane.ze¥ehelesieal
Measurement. 40:1. 99-101-112. Spring-

Humphreys,L.G.,Taber,T. (1973) Postdiction study Of the
Graduate Record Examination and eight semesters Of
college grades- ie9Ina1.ef.Eeueetienel.ueeeurement.
10:3, pp.179-184.

Huntsberger,D.V.,Billingsley,P. (1973) Elgngn;§_gﬁ

9L5Iie§ieal.Inferenee- Boston: Allyn and Bacon.
pp.131-134.

Jones,R.F.,Thomas-Forgues,M. (1984) Validity Of the MCAT in
predicting performance on the first two years Of medical

school. Ieurnal.ef.neeieal.neueatien. 59:6. pp. 455-464-

Keeves, J. P. (1988) Multivariate Analysis. In Reeves (ed)
Iago n: 1° ,-. - ﬁne t' -

MW Pergamon. New York pp. 527- 537
KennY.D-A- (1979) 9errelatien.ane.eauealitx- Wilely 8 Sons.

New York.

Kirk.R-E. (1982) EEnerimental_neeign.1289_e91- Monterey,

California: Brooks/Cole.

Linn,R.L. (1966) Grade adjustments for prediction of

academic performance: a review. lgnnnn1_gﬁ_Egngn§19nnl
Measurement. 3. pp- 313-329-

Linn,R.L. (1983) Pearson selection formulas: Implications
for studies Of predictive bias and estimates Of
educational effects in selected samples. igngnal gf

Eeueaeienal.ueaeerement. 20:1. pp-l-lso

Linn,R.L.,Harnisch,D.L.,Dunbar,s.B. (1981) Corrections for
range restriction: An empirical investigation Of
conditions resulting in conservative corrections.

1eurnal_ef.AenlieQ.Pexehelegx. 66:6. 99-655-663o

.711

156

Linn,R.L.,Hastings,C.N. (1984) Group differentiated
prediction- WI. 8:2.

Loeb,J.,Bowers,J. (1973) Programs Of study as a basis for
selection, placement, and guidance Of college students.
, 10:2, pp.131-139.

Markert (1983) Relationship Of Old and new MCAT scores to
performance on the Part III examination Of the NBME.

1eErnal.ef.neeieel.zdueetien. 60:1. pp- 53-55-

McCornack,R.L. (1983) Bias in the validity Of predicted
college grades in four ethnic minority groups.
I 43:
pp.517-522.

McCornack,R.L.,McLeOd,M.M. (1988) Gender bias in the
prediction Of college course performance. lgn;nn;_g£

Edueaiienel.neaeurement. 25:4. pp-321-33l-

Mehren8.W..Lehmann.I-(1984) ueaeuremen:.and_ExalueEien.in
Chicago: Holt,

Rinehart, & Winston.

Morris,J.D. (1986) Selecting a predictor weighting method by
PRﬁss. E9ueaIienal_an9.2exehelegieel.ueeeeremen§. 46

pa 853-869 0

Niedzwiedz, E. R.,Friedman, B. A. (1976) A comparative analysis
Of the validity Of pre-admissions information at four

colleges Of veterinary medicine. l9n;nn1_g§_ye§ezinnzy
Meeieal.Eeneeeien 3: 2. pp 32- 38-

Neiner,A.G.,Owens,W.A. (1985) Using biodata to predict job
choice among college graduates. lgnznnl_gﬁ_Annlieg
Perenelegx. 70:1. PP-127-l36-

Neter ,J.,Wasserman, W.,Kutner, M. H. (1985)

92211e9.Lineer
9IeIie$1991.929ele..1229..99.1 Homewood Illinois:

R. D. Irwin, Inc., p. 10.

Pedhazur,E. J. (1982) Eu ulgiple negngss sign in behgvigna a;
re.eaIen._9ERlaneIien.e89.2Ieeietien.1289..e9.l

Chicago: Holt, Rinehart, and Winston.

Richards,J.M. (1982) Standardized versus unstandardized
regression weights. s c a Measu e t,
5:2, pp.201-212.

157

Roos,P.D. (1978) The implications Of the Bakke decision on
affirmative action admissions and related programs. In:
t . Connolly, W.B., Dilworth,

E.J., and Leach,D.E. (chairmen). New York: Harcourt Brace
Jovanovich, p.209.

Ross, K. N. (1988) Sampling. In: Reeves (ed) Egngngignnl

_ - e.
- I r- 9.1... AA fl’. 1 ' . 1"-l I _l 1!: 'I'.

ﬁnndhggk. Pergamon. New York, pp. 527-537.

Sawyer,Richard (1986) Using demographic subgroup and dummy
variable equations to predict college freshman grade

everage- 9eurna1_ef.Edueariena1.Measurement. 23:2.
pp.l31-145.

Sawyer,R.,Maxey,J. (1979) The validity Of college grade
prediction equations over time.

Measurement. 16:4. pp-279-284-

Stuck,I.A. (1986) Selection by concurrent prediction:
an alternative to the validity generalization Of
selection models. Michigan State University: Author.

Tabachnick,B.G.,Fidell,L.S. (1983) Wsing_nul§ignrin§e
ggngisnigs. New York: Harper 8 Row.

Thornell, J. G.,McCoy (1985) The predictive validity Of the
Graduate Record Examinations for subgroups Of students in
different academic disciplines. Egngnnignn1_nnd

2sxenelegieal.Measurement. 45. pp-415-419~

Wilson,K.M. (1982) A study of the validity Of the
restructured GRE aptitude test for predicting
first-year performance in graduate study. Educational
Testing Service Research Report 82-34, p.60.

WOOd,D.A.,Langerin,M.J. (1972) Moderating the prediction Of
grades in Freshman engineering. lgn:nn1_gﬁ_Edngn§19nn1

Measurement. 9:4. pp-3ll-320-

Wright,R.J.,Bean,A.G. (1974) The influence of socioeconomic
status on the predictability Of college performance.

1eurna1.ef.Euueauiena1.Measurement. 11:4. 99-277-284-

TE UNIV LIBRARIE

IILIIIIWIIIHIIHISN 0|!” 7||H7|

LIHLIIIIJHIIWHI