ENVESTEGATION OF TWO PROPOSED
SOLUTEONS TO THE Mb‘LTiPLE FALUBLE
COVARZABLE PROBLEM FOR
QUASI - EXPERIMENTS

Dissertation forlthe Degree of Ph. D, a
' MICHIGAN STATE UNIVERSITY
‘KOWET PRAVALPRUK
1974 ‘1.

.41 .‘I. I '

rHF-‘CH‘;

 

    
 

MiChi:?.. ‘1 ,l: _' K)

. , . . ‘. 4
LIL. ‘I“z ‘ J J
4‘
. 5” ' " 3
Um ‘» crsj :7 :‘

Kern-.m.., "A" N“ {-
""‘ 9...... a

-5 "an" “ ‘.

-a—LL" _...ul.'_"_‘__._.L- 2-.th -J

This is to certify that the

._‘.

thesis entitled

INVESTIGATION OF TWO PROPOSED SOLUTIONS TO
,THE MULTIPLE FALLIBLE COVARIABLE PROBLEM
FOR QUASI-EXPERIMENTS

presented by

Kowit Pravalpruk

has been accepted towaIds fulﬁllment
of the requirements for

limb—degree in Education—

~.. at:

 

Hrs

ABSTRACT
INVESTIGATION OF TWO PROPOSED SOLUTIONS To
THE MULTIPLE FALLIBLE COVARIABLE PROBLEM
FOR QUASI-EXPERIMENTS
By

Kowit Pravalpruk L

One of the assumptions underlying classical analysis of
covariance (ANCOVA) with random covariables is that the covariables
are observed free from errors of measurement. When random assign-
ment of experimental units to levels of the treatment independent
variable is an aspect of the experimental design, failure to meet
the perfectly reliable covariables assumption decreases the sta-
tistical power of ANCOVA but does not cause it to test biased
treatment effects. When random assignment is not an aspect of the
design, however, and ANCOVA is being used to correct for initial
differences among treatment groups on the covariables, use of
less than perfectly reliable covariables not only decreases the
power of ANCOVA, but also causes ANCOVA to test biased treatment
effects. Several correction procedures have been suggested for
the single fallible covariable design. The intent of this thesis
was to extend the earlier work by describing two alternative
correction procedures for the multiple fallible covariable design,

demonstrate their properties in terms of population parameters, and

All)
(DOW ‘
(9 Kowit Pravalpruk
empirically investigate the sampling distributions of their test
statistics, i.e., probability of Type I error and power.

First, a brief review of past work on the single fallible
covariable problem was presented. Next, the effects of errors of
measurement in multiple linear regression were incorporated into
the multiple covariable model. Finally, the two proposed solutions
to the multiple fallible covariable problem were described and
their properties investigated, first analytically and then via a
Monte Carlo study.

A solution to the multiple fallible covariable problem requires
a procedure that provides unbiased estimates to the regression
coefficients defined on the latent true variables. An existing
single covariable solution, the substitution of estimated true scores
corrected bivariate regression coefficients between the dependent
variable and each covariable, but left uncorrected the bivariate
regression coefficients among the covariables. Thus Method A con-
sisted of l) substituting estimated true scores for each observed
covariable, and 2) correcting for attenuation the relationships
between the estimated true scores covariables.

A second approach to the solution of the general problem,

Method B, was motivated by the simplified situation which exists
for uncorrelated covariables. Method B can be described for two
covariables as follows: 1) one covariable is transformed to make
it orthogonal to the other; 2) estimated true scores are substi-
tuted for the two orthogonal covariables and computations proceed

as for regular ANCOVA.

Kowit Pravalpruk

The two correction procedures were investigated analytically
to determine whether they test the correct hypothesis when there are
two fallible covariables in a quasi-experiment. The conclusions
were: 1) if the latent true covariables are uncorrelated, estimated
true scores ANCOVA tests the desired hypothesis; 2) when the latent
true covariables are correlated but have equal reliability Method
A tests the correct hypothesis; and 3) Method B does not appear to
test the hypothesis of interest under any circumstances. The remain-
ing question to be answered was how do the small sample distributions
of the various test statistics behave?

A computer program for the CDC 6500 computer system at the
Michigan State University Computer Center was written to get empiri-
cal F distributions of the estimated true scoreyANCOVA when two
fallible covariates were independent of each other, and of the two
proposed correction methods when two fallible covariables were
related. All distributions were based on 1000 samples and empirical
a's and power were reported for nominal levels of .10, .05, and .01.
A pseudo-random unit normal deviate generator was used to generate
observations from a trivariate normal distribution with known
parameters.

The results of the Monte Carlo investigation of estimated true
scores ANCOVA for two uncorrelated fallible covariables, the two
treatment quasi—experiment design, 40 observations per treatment
and the correlations of latent true covariables with the dependent
variable each .7 as were the reliabilities of each covariable,

yielded slightly liberal Type I error rates, but within two standard

Kowit Pravalpruk
errors for all three nominal values, .10, .05 and .01. As was
expected, the statistical power of estimated true scores ANCOVA
was substantially lower than that for latent true covariables.

When the number of treatments increased to four, the estimated
true scores ANCOVA empirical Type I errors were markedly discrep-
ant from the nominal values (.177, .109 and .037 at nominal values
of .10, .05 and .01 respectively). These discrepancies were found
even though average pooled within regression coefficients were
correct (.703 and .700).

The empirical Type I error rates from the Monte Carlo study
of Methods A and B for two correlated fallible covariables (.2
intercorrelation between the two latent true covariables) were not
within the range of practical utility for either method, even though
the analytic demonstration suggested that Method A tested the right
hypothesis. Several modifications of Method A were proposed in an
attempt to decrease too liberal Type I error rate, none of which

was successful.

INVESTIGATION OF TWO PROPOSED SOLUTIONS TO
THE MULTIPLE FALLIBLE COVARIABLE PROBLEM

FOR QUASI-EXPERIMENTS

BY

Kowit Pravalpruk

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Personnel Services and Educational Psychology

1974

ACKNOWLEDGEMENTS

I am especially grateful to my committeechairman, Professor
Andrew C. Porter, for his endless advice, time and encouragement
throughout all phases of my study. Special thanks goes to Pro-
fessor Maryellen McSweeney, Professor Robert L. Ebel and Professor
Kenneth J. Arnold for their help and valuable comments.

Working in the Office of Research Consultation provided me the
most valuable experience which I will never forget. Many thanks
to my friends and colleagues in the Office who were very kind to
me, to Professor Andrew C. Porter and Dr. John H. Schweitzer, who
gave me this job, and to Professor Maryellen McSweeney, who gave
me two years of research assistantship prior to the Office job.

I wish to thank my wife, Sor-Wasna, and my daughter,
Puttaporn, for giving me some time to work on my study. My
parents, sisters and brothers are appreciated for their support
and sympathy. Without funds from the Thai Government and skillful
typing from Ms. Janice Fuller, this research could not have been

completed.

ii

TABLE OF CONTENTS
CHAPTER Page
I INTRODUCTION. . . . . . . . . . . . . . . . . . . . . 1
Classical Analysis of Covariance (ANCOVA). . . 3
Random Covariable Measured with Error. . . . . 9

Proposed Problem . . . . . . . . . . . . . . . 15
Purpose of Study . . . . . . . . . . . . . . . 16

II REVIEW OF ANCOVA WITH A SINGLE FALLIBLE COVARIABLE. . 18
III ANCOVA WITH MULTIPLE FALLIBLE COVARIABLES . . . . . . 29
MethOd A o o 0 o o o o o o o o . o o o o o o o 32

MethOd B C O O C C O O O O O Q 0 O . O O . O O 33

Monte Carlo Study. . . . . . . . . . . . . . . 36

Parameter Setting . . . . . . . . . . . 39

Data Generating . . . . . . . . . . . . 43

Estimating Reliabilities. . . . . . . . 46

ANCOVA O C C C 0 O C O O C . C . O O O O O O 48

Method B ANCOVA . . .
Distribution Building
Print Output. . . . . . . . . . . . . . 50

MethOd A ANCOVA o o o o o g o o o o o o 48

IV RESULTS 0 O I O O O O O O O O O O O O I O O O O I O O 51
Estimated True Scores ANCOVA with

Independent Covariables. . . . . . . . . . . 51

Methads A and B O O Q 0 O O O O O Q 0 O O O O O 59

V SUMMARY AND CONCLUSION. . . . . . . . . . . . . . . . 65

BIBLIOGRAPHY O O O O O O O O O O O O O O O O O Q 0 O O 0 O O O 70

iii

LIST OF TABLES
Table Page

1 DeGracie's Analysis of Covariance Using Corrected
Slope b3. 0 I O O O O O O O O I O I O O O O O O O O O 27

2 Sources of Variation for Analysis of Covariance
with Two Covariables. . . . . . . . . . . . . . . . . 38

3 Design of Study . . . . . . . . . . . . . . . . . . . 40

4 Example Distribution Based on 10,000 Trivariate
Cases Generated by RANN . . . . . . . . . . . . . . . 45

5 Empirical Type I Error, Statistical Power, Average
Mean Square and Average Adjusted Means for
Estimated True Scores ANCOVA. . . . . . . . . . . . . 52

6 Empirical Cumulative Distribution of Regression
Coefficients from Estimated True Scores ANCOVA. . . . 54

7 Empirical Type I Error, Statistical Power, Average
Mean Squares and Average Adjusted Mean for
Estimated True Scores ANCOVA. . . . . . . . . . . . . 56

8 Empirical Cumulative Distribution of Regression
Coefficients from Estimated True Scores ANCOVA . . . . 57
9 Empirical Type I Error and Statistical Power for

Estimated True Scores ANCOVA Using Population
Reliabilities . . . . . . . . . . . . . . . . . . . . 58

10 Empirical Type I Error, Statistical Power, Average
Mean Squares and Average Adjusted Means for
Method A and Method B . . . . . . . . . . . . . . . . 60

11 Empirical Cumulative Distribution of Regression
Coefficients for Method A and Method B. . . . . . . . 61

12 Empirical Type I Error and Statistical Power of

Estimated True ScoresANCOVA with Some Additional
Correction Methods. . . . . . . . . . . . . . . . . . 63

iv

LIST OF FIGURES

Figure Page
1 Effect of errors of measurement, Lord's example . . . 12
2 Flow chart of the computer program. . . . . . . . . . 42

CHAPTER I

INTRODUCTION

Analysis of covariance (ANCOVA) has been employed widely in
educational research both in random and non—random assignment
designs. The latter is of particular interest here. A current
example of giant proportions is Follow Through "planned variation",
which has used ANCOVA to compare students' achievement between
those in Follow Through and non-Follow Through groups and among
various "sponsors" within Follow Through (Stallings, 1973). The
desired comparisons were whether, "other things being equal", there
were differences on dependent variables among various groups.

Since there was no random assignment of students to "sponsors"
nor to Follow Through/non-Follow Through, potential confounding
variables, e.g., age, sex, ethnic origin, months of Head Start
experience, days absent, and Wide Range Achievement Test score
(pre-test) needed to be "controlled" post hoc. Thus far analyses
of Follow Through data have relied heavily on ANCOVA procedures.

The heavy reliance on ANCOVA to control for initial differ-
ences in non-random assignment designs coupled with the fact that
ANCOVA was originally conceived for other purposes prompts careful
reconsideration of its assu.mptions. One of the assumptions is
that random covariables are observed free from errors of measurement.

1

2
When random assignment of experimental units to levels of the treat-
ment independent variable is an aspect of the experimental design,
failure to meet the perfectly reliable covariables assumption
decreases the statistical power of ANCOVA but does not cause it
to test biased treatment effects. When random assignment is not
an aspect of the design, however, and ANCOVA is being used to
correct for initial differences among treatment groups on the
covariables, use of less than perfectly reliable covariables not
only decreases the power of ANCOVA, but also causes ANCOVA to test
biased treatment effects. Several correction procedures have been
suggested for the single fallible covariable design. The purpose
of this thesis was to extend the earlier work by describing two
proposed correction procedures for the multiple fallible covariable
design, demonstrate their properties in terms of population
parameters, and empirically investigate the sampling distributions
of their test statistics, i.e., probability of Type I error and
power.

First, however, an important caveat is necessary. There is no
perfectly acceptable solution to the problem of estimating causal
relationships from quasi or naturally occurring experiments. Per—
haps ANCOVA is a useful procedure in some situations, but it is
clearly limited by the care and ingenuity used in selecting the
covariables. The problem here addressed is not how to select a
useful set of covariables or even whether that task can ever be
accomplished. Rather, the problem concerns the effects of errors
of measurement in situations where a useful set of covariables has

been identified.

3

The remainder of this chapter presents the linear model for a
one-way ANCOVA with one and two covariables, summaries of the full
set of assumptions, and discusses in detail the assumption of error
free covariates. Finally, the purpose of the present work is
stated formally. Chapter II provides a review of the literature
on ANCOVA with a single fallible covariable and Chapter III provides
a review of the literature of the multiple fallible covariable
problem, a description of the two proposed solutions, and an
analytic demonstration of the hypothesis tested by each. Chapter
III concludes by describing the Monte Carlo study designed to
investigate the distributional properties of each proposed solution.
Results of the Monte Carlo study are presented in Chapter IV and

conclusions of the thesis in Chapter V.

Classical Analysis of CovariancegjéNCOVA),
For simplicity, discussion of ANCOVA will be based on the one-
way analysis of covariance model, although generalization to more
complex designs is not difficult. The one-way analysis of covariance

linear model for a single covariable is:

Yij = ”Y.. + “Y1 + BY.X(xij‘“X..) + 813’

where Y’ denotes dependent variable for the ith treatment and

13
3‘“ individual,

“Y denotes the grand mean of the dependent variable,

&Y denotes the adjusted ith treatment effect,
1

BY.X denotes the pooled within bivariate regression slope of

Y on X,

4

1th treatment and jth individual,

Xij denotes covariate for the
”X.. denotes the grand mean of the covariable, and
eij denotes the error term.

The null hypothesis to be tested is:

For P covariables, the adjusted treatment effects are defined:
E s
a = a - a
where a's are defined as previously and Bk denotes the pooled within

kth covariable.

multiple regression coefficient of
When two covariables are included in ANCOVA, the linear model

has one more term, the explanable part of the dependent variable

which belongs to the second covariable,
Yij ' “Y” + “ri + Bx(xij'“x..) + 52(313'“3..) + eij’

where zij is the second covariable,
Bx is the pooled within multiple regression coefficient for X,
B8 is the pooled within multiple regression coefficient for Z,
"Z is the mean for second covariable Z,
and other notations defined previously.
The adjusted treatment effect can be expressed in terms of the
unadjusted dependent variable effects, the first covariable effects

and second covariable effects:

or a = a - B

and the null hypothesis to be tested is:

I
Ho: 2 (

a -8
i=1 Y1

xaxi-Bzaz)2 = 0
The use of covariables in ANCOVA serves two basic purposes.

First, to the extent that the covariables are related to the

dependent variable, the error variance for hypothesis testing

and interval estimation is reduced. For example, if pYX denotes

2

the correlation of dependent variable Y and covariate X and 0e

denotes the error variance when the covariable has not been used,

the error variance for ANCOVA is
2 2 ~ 1 ‘
08(1 pr){1+?;:§}

where fe denotes the degrees of freedom for the estimate of 0:.

The second purpose of ANCOVA is to remove group differences
on the covariable. In other words, all treatment effects on the
. dependent variable are adjusted by subtracting out a multiple of
the group differences due to the covariate. The adjusted treatment
effects are defined
1 ‘ Yi " BY.x°‘xi’
where a's denote treatment effects (treatment group mean minus the
grand mean), BY.X is the pooled within regression coefficient for
predicting the dependent variable from the covariable, and a ” sign

denotes adjusted.

6

Assumptions and conditions underlying the analysis of covariance
model are explained in three categories: assumptions on error in
the model, assumptions and side conditions on the treatment effects,
and assumptions on the covariable. Although the third category of
assumption provided the motivation for this thesis, a quick overview
of all assumptions is appropriate.

First consider assumptions on the error component:

1. Errors (e11) are normally distributed.

2. Variances of the distributions of errors are the same
across all treatment groups.

3. There is no relationship between errors and treatment
effects and among errors themselves.

Violation of the normality assumption only slightly affects
Type I error and statistical power given that the covariable is
normally distributed (Box and Anderson, 1962; Atiqullah, 1964). The
robustness of the F test against violations of the homogeneity of
variance assumption depends upon the degree of heterogeneity of
variance in the covariable (Potthoff, 1965). The effect of viola-
tion of the independence assumption is the same as for ANOVA.
Elashoff (1969), Glass, Peckham and Sanders (1972) have given
complete discussions of the robustness of the F test statistic
concerning violation of these assumptions in ANCOVA. Scheffé
(1959) and Atiqullah (1962, 1964) have also discussed ANCOVA
robustness both analytically and from empirical results.

Now consider assumptions and side condition on the treatment

effects.

7

1. There is random assignment of subjects to treatment groups
(Elashoff, 1969, p. 386).

2. The sum of all weighted treatment effects equals zero
(2 niai - 0).

3. For all subjects within a treatment group, the treatment
effects are the same, i.e., the dependent variable is a linear
combination of mutually independent components: grand mean,
treatment effect, a linear regression on covariable and error term.

There is some disagreement about the first assumption. Lord
(1969) and Cronbach and Furby (1970) both recommended against
using ANCOVA when randomization has not been used. In addition,
Evans and Anastasia (1968) warned that ANCOVA was likely to be
misleading when used in situations where intact groups and treat-
ments occur together naturally. In a letter to Porter, however,
Cronbach said that ANCOVA can be useful to adjust for bias due to
covariate when randomization had not been done. On the other hand,
Porter (1973), Elashoff (1969) and Harnquist (1968) agreed upon
the utility of analysis of covariance even though randomization
could not be made. The second condition is a restriction on the
model or side condition for fixed model. The third assumption is
one of no treatment by subject interaction, i.e., an assumption of
additivity. This assumption is primarily useful for interpretation
of results (Porter, 1973) and violation of the assumption implies
the need for a more complex model.

The third set of assumptions concerns the covariable.

8

1. The covariable is fixed and measured without error.

2. There is no treatment effect on the covariable.

3. There is a linear* relationship between the dependent
variable and covariable.

4. The regression slope is the same across all treatment
groups. This is subsequenced under the third assumption in the
second category of assumptions.

Violation of the first assumption will be discussed in more
detail later. The second assumption is needed when the covariable
is measured after the treatments have been operating. In situa—
tions where marked departures from a linear relationship are likely,
blocking or matching procedures should be used or the appropriate
degree of relationship should be added to the model (Atiqullah,
1964; Finney, 1957; Kirk, 1968). Violation of the fourth assump-
tion indicates that there is an interaction between the covariable
and the treatment (Elashoff, 1969). In the presence of this treatment-
slope interaction the F-test may yield misleading results (Atiqullah,
1964). Peckham (1968) found that the F-test became conservative
as the differences among slopes were increased.

For more discussion of the effect of violating these assump-
tions, see Elashoff (1969), Glass, Peckham and Sanders (1972) and

Smith (1957). On additivity see Porter (1973).

 

*
ANCOVA is also available for non-linear relationship when

the appropriate model is used.

9
Random Covariable Measured with Error

The assumption of primary interest here is that the covariable
be fixed and measured without error. Violations of the assumption
of a fixed covariable do not have serious effects on the validity
of testing hypothesis about means. The least squares method still
can be applied and unbiased estimates obtained. "The only dif-
ference from the classical result is that the variance of the
estimates (a1 and B) are averaged over all values of X (covariable)"
(DeGracie, 1968, p. 48).

Throughout the discussion in this study, errors are assumed
to be random and to meet all classical assumptions. Let "H repre-
sent the error of measurement on the true covariable Xj, then uj
satisfies the following assumptions.

1. uj is normally distributed with mean of zero and variance
of oi, i.e., Zuj - 0 for all j.

2. All correlations among errors are zero, i.e., puu' - 0.

3. There is no relationship between errors of measurement and
their associated true scores, i.e., pux. - 0.

In educational research, the potential covariable is usually
measured by one of the standardized tests, e.g., IQ tests, achieve-
ment tests, and aptitude tests. Scores from these tests are
considered to be random and less than perfectly reliable. When
the covariable is random and measured with error, the least squares
regression slope is a biased estimator of the slope defined on the

latent true variable. Consider the following relationship in the

linear regression model:

10
Y = B' + B'X (j = 1,2...n)
j o J
h r Y a Y' +
w e e j j vj,
X - '
j Xj + uj,
Xj, Yj denote observed scores,
X3, Y5 denote true scores,
uj, vj denote errors, and

8;, 8' denote the true regression coefficients.

The least squares estimate becomes an unbiased estimator of

E(B) = _£'._

2
0

1+3
2
Ox.

where 8' is the slope defined on the latent true variables, i.e.,
the structural relationship,

8 is the least squares slope defined on the observed variables, and

OX' is variance of latent true X.
2
0u
Thus, . E {(1 + 7-98} = 8'
ox.

From classical measurement theory (Gulliksen, 1950),

2

0'
u
l + 2

ox.
is the reciprocal of the reliability of the covariate. In other
words, the least squares regression slope divided by the reliability
of the predictor is an unbiased estimator of the structural rela-

tionship (Berkson, 1950; Porter, 1967; DeGracie, 1968).

11
There are several ways to get an estimate of the reliability.
When one form of the test instrument is administered on only one

occasion, internal consistency reliability can be obtained in a

variety of ways, e.g., Kuder-Richardson methods, Hoyt's procedure
through ANOVA, Split-half method. When more than one form of the same
instrument is used or the same form has been administered twice

to the same group of subjects, the correlation between the two
measures is an estimate of the reliability.

Choice of method for estimating reliability is an important
and complex topic in its own right and will not be discussed
further here. Given the definition of reliability and the pro—
perties of errors of measurement, a good point estimate is needed.
Simulations in this thesis used a parallel forms method of estimat—
ing reliability.

Although the bias of the least squares regression coefficient
for estimating the structural relation has been recognized since
1878 (Adcock, 1878), the information was not considered in work on
ANCOVA until as late as 1960 (Lord, 1960). Lord (1960) described
a situation where there was no treatment effect on the dependent
variable, but errors of measurement on the covariable affected the
adjusted means resulting in a spurious treatment main effect.
Figure 1 illustrates the situation discussed by Lord (1960). If
there was no error on X, conditional variances were small as repre-
sented by the narrower contours in Figure 1. When X was measured
with error, conditional variances were larger as represented by the

outer ellipses. When there was no error of measurement, the

12

GROUP A /’

 

 

 

Figure 1. Effect of errors of measurement,
Lord's example.

l3
intercepts of both groups A and B were the same. When there were
errors of measurement, intercepts were not the same. The differ—
ence between intercepts YA and YB represent the spurious treatment
main effects produced by errors of measurement. Porter (1967) has
illustrated all possible situations of treatment effects resulting
from errors of measurement on a single random fallible covariable.

Lord proposed a large sample solution which is restricted to
a two group design with two observations on a single random
fallible covariable. Building on Lord's work, Porter proposed an
estimated true scores solution which at least in theory is not
limited by the complexity of the design to be analyzed (Porter,
1967; Porter and Chibucos, 1974), but is limited to a single
covariate and does require an estimate of the reliability of the
covariable. Briefly, estimated true scores ANCOVA is computationally
identical to traditional ANCOVA with the single exception that
estimated true scores are used as a substitute covariable (Porter
and Chibucos, 1974). A Monte Carlo investigation of the two pro-
cedures for the single covariate two group design indicated that
both were equally satisfactory (Porter, 1967). In the same study,
the utility of estimated true scores ANCOVA was also demonstrated
for a one way layout with four treatment groups.

Mbre recently DeGracie (1968) has proposed a solution quite
similar to estimated true scores ANCOVA, and cited the above-
mentioned Monte Carlo investigation as support for the utility of
his test statistic. Stroud (1972) has also proposed a solution

for the two group case with a single fallible covariable and argued

14
that it is readily extendable to more complex designs. As yet,
however, no small sample distributional investigations have been
done on the Stroud statistic. Thus far work has been restricted
to a single random fallible covariable.

In conclusion, Lord's statistic is restricted by the number
of treatment groups, and computation of DeGracie's ANCOVA requires
knowledge about error of measurement variance. Porter's estimated
true score procedure is direct and simple. Once the covariable is
transformed, the computation can be performed by any classical

ANCOVA computer program. For one-way design, the transformation is

where X13 represents the estimated true scores of X13,
‘Ri. represents the 1th group mean on X, and

p represents reliability of X.

XX

Since Porter's procedure has been studied in a one-way design with
one covariable only, an attempt should be made to broaden the
procedure, both by complexity of design and number of covariables.
When Porter's estimated true score procedure is used in complex
designs, the question of which means should be used in calculating
estimated true scores is raised. Discussing this problem, Porter
suggested that for large sample size per cell, cell means produced
stable estimates of means and should be used. He demonstrated that
using cell means, one can test all hypotheses with correct F ratios.
However, when the sample size per cell is small, cell means are

probably not good. In this situation marginal means provide better

15

estimates when testing the hypothesis about main effects. When one
set of marginal means is used, however, other main effects and
interactions are not corrected. In a memorandum to Smith, Porter
concluded that

"Estimated true scores have no effect on tests of dimen-

sions summed over in calculating the means used in

calculating the estimated true scores nor on the inter-

actions of those dimensions with other dimensions in the

design. Therefore, it is important to use means which

do not involve summing over dimension for which con-

trolling covariable differences are important or for

which interactions with dimensions for which controlling

covariable differences are important. Using cell

means will work for any design having more than one

observation per cell, however, the fewer the observa-

tions the less stable the procedure. Obviously using

the cell mean for designs having one observation per

cell is a do nothing operation."

Proposed Problem

In the multiple covariable case, Porter's estimated true score
ANCOVA is not directly applicable. If estimated true scores are
used, correction is done on effect of first fallible covariable on
dependent variable and effect of second fallible covariable on
dependent variable only. There is no correction due to the com-
bined effect of both fallible covariables. This combined effect
vanishes when there is no correlation among the covariables, or
reliabilities of both covariables are high (Cochran, 1968). For
the two covariable case, a possible solution is to correct the
bivariate regression slope between the two fallible covariables
before estimated true scores are substituted. The other possibility

is to make the two fallible covariables independent of each other

by transforming the second fallible covariable to be independent

16
from the first fallible covariable, then, apply the estimated true
scores procedure.
The hypothesis tested and the sampling distribution for these
two proposed solutions are in question. Which of the two is better?
Are these two procedures as good as estimated true scores in the

one fallible covariable case?

Purpose of Study

The purpose of this study was to investigate whether Porter's
estimated true score ANCOVA yielded a good fit when applied with
additional corrections to the two fallible covariable situation
for one-way designs. Two additional correction methods were
proposed.

Method A. Correct for attenuation of correlation between the
two covariables before computing beta weights of the two estimated
true score covariables.

Method B. Transform second fallible covariable to be inde-
pendent from the first fallible covariable and then apply estimated
true scores procedure.

The first part of the study investigated analytically whether
the two proposed solutions test the right hypothesis. The second
part of the study was a Monte Carlo investigation of the distribu-
tional properties of several F test statistics at three levels of
nominal a, i.e., .10, .05 and .01. Specifically, the investigations
were done in two separate situations: Situation I, estimated true

scores applied to ANCOVA with two independent fallible covariables;

l7
Situation II, Method A and Method B applied to ANCOVA with two
intercorrelated fallible covariables.

In the empirical investigation of Situation I, F distributions
were generated for 1) error free covariates, 2) fallible covariates,
3) estimated true score covariates. The criteria of Type I error
rate and power for a single non-central case were used to compare
the three types of F distributions. In Situation II, F distributions
were generated for 1) error free covariates, 2) fallible covariates,
3) Method A, 4) Method B. The same criteria for comparing F
distributions were used for Situation II as for Situation 1. Both
Situations were also described by average mean squares, average
adjusted treatment effects and their variances.

Finally, empirical cumulative distributions of the two within
regression coefficients from all configurations described earlier
were collected. Averages and variances of these regression coef-
ficients were compared among each other and to the known desired

values.

CHAPTER II

REVIEW OF ANCOVA WITH A SINGLE FALLIBLE COVARIABLE

That ANCOVA tests biased treatment effects when there are
initial differences on a random fallible covariable can be seen
through inspection of the null hypothesis for a one-way ANCOVA.

The null hypothesis can be stated

I 2
Z {a - B a } B 0
1,1 Yi Y.X xi ’

where aYi is the 1th treatment effect on the dependent variable Y,
axi is the 1th treatment effect on the random fallible
covariable X, and
BY.X is the least squares pooled within 1 slope of the
regression of Y on X.
Although errors of measurement satisfying classical measurement
assumptions do not cause bias in the estimation of 0Y1 and ex ,
they do cause a bias in using the least squares regression coef-
ficient as an estimate of the regression coefficient defined on the
latent true variables, i.e., the structural relationship of Y on X.
The bias of the least squares regression coefficient for estimating
the structural relationship is a function of the reliability of

the random covariate and can be stated as

18

19

. .1.
BY'.X' pxx BY}.

where primes denote latent true variables and pxx denotes the relia-
bility of X.

For a fallible covariable the effect of using the least squares
regression coefficient in ANCOVA is a function of the values of axi.
If as in a random assignment design the axi's are all zero, ANCOVA
will test the desired hypothesis. For quasi-experiments, however,
the axi's are typically not zero, and so ANCOVA tests biased
estimates of the adjusted treatment effects. The bias can result
in either a spurious rejection of the null hypothesis or a spurious
retention of the null hypotheses, depending upon the values of the
axi's (Porter, 1967).

Most studies of the effect of errors of measurement in regres-
sion (Madansky, 1959; Dorff, 1960; Lord, 1960; Porter, 1967; DeGracie,
1968) have given the honor to Adcock (1878) as the first person to
notice that the least squares procedure provides a biased estimate
of the structural relationship when the predictor is fallible.
Biasedness of the least squares estimate of the regression slope
defined on the latent true variable was also reported in Roos'
paper (1937). Corrado Gini was first to point out that the biased
regression slope from the least squares procedure was larger than
the regression slope for the latent true variables. Madansky
(1959), Dorff (1960), Cochran (1968) and Porter (1967, 1971) have
presented complete reviews of the effects of errors of measurement

in regression, structural and functional relationships.

20
Following DeGracie (1968), consider the linear equation

YJ - 3") + s'xj'+ej (j = 1,2...n)

where X5 denotes the true measure, X5 - Xj-u , and

J

uj denotes the errors of measurement.

Assume that uj is normally distributed with
E (1.11) . O,

puu -0 1*j'

j J'
and vau I 0
J 3
By substitution the linear relationship becomes

Yj - B; + B'XJ-B'uj+ej

Yj - BO +B'Xj-i-wJ

The covariance of error term w and X is

J 3

(X -E(XJ)) - wJ(X -X'

“’33 JJ

- wjuj.
For individual 1, his expected value is his true score.
E(wjuj) = E{(ej-B'uj)uj}

2
=—'
B on

Since the error term is not independent from the covariable, the

least squares procedure is inappropriate. However, DeGracie (1968)

21

proved that the least squares estimator was biased by the fraction

of 2
0u
“'7
02"
or
MS) - 8
O2
1+——‘2Jl
ox,
as discussed previously. Hence,
1 .
E(E-—ﬁ) ' B

XX
where pXX denotes reliability as defined in classical measurement

theory (Gulliksen, 1950), i.e.,

Lord (1960) proposed a U-statistic to solve the ANCOVA problem
for two groups with duplicate measures on a single fallible co-

variable. His statistic was defined

: “’n__v _ 1A "n -H_-!_—l
U (Y Y ) 580(X1+X2 X1 X2)

where U was adjusted mean difference,
X“ was unadjusted mean of dependent variable for first sample,

T" was unadjusted mean of dependent variable for second sample,

>ﬂ

" was mean of first duplicate fallible for first sample,

_1 was mean of first duplicate fallible for second sample,

22

i was mean of second duplicate fallible for first sample,

X“ was mean of second duplicate fallible for second sample, and

2
2
5 1 1 _ .Y_ 182 _ 1 —H —n__v__v A
Var(U) (NT + N") {g w + 580(k w)}+z-»(X1+X2 X1 X2)2Var so
where S -.X. l - k RT“
0 w { Now }
N° - N'+ "
Var(é ) - —1——{v2(l‘—2- - 21‘- + 1- (k+w)}
o Nowz w2 w) 2 3
. ley-l-szy
where Oxy - —— within sample
2
3: +3:
“ l 2

o --——————-wdthin sample

0" cv“::
k a N oxx +N o}SE
O
N

 

N'O' +N"O’"
KY

 

NO

I I H H
N Sx1x2+N lexz
w = and

 

Duplicate measures on the fallible covariable provided the
estimate of error variance or reliability of the covariable to be
used as the correction term for the relationship between the
dependent variable and the first duplicate measures of the covariable.

Lord's U-statistic was normally distributed for large sample size.

23
A more general procedure which does not necessarily require

duplicate measures on the covariable was developed by Porter (1967).
It requires knowledge of reliability of the covariable or some
estimate of it. Computation is straightforward. As described
previously, instead of using the fallible covariable, the fallible
measures of the covariable are transformed into estimated true
scores. The estimated true scores are then used in regular analysis

of covariance computations. Estimated true scores are defined as
X13 " x1. + pXX(xij-xi.)

where pxx denotes the reliability of X. Since the estimated true
score covariable is a linear transformation of fallible covariable
within levels of i, then:

1. The within 1 correlation between the dependent variable
and fallible covariable is equal to the correlation between the
dependent variable and the estimated true score covariable.

2. The group means, i.e., for each level of i, and grand mean
of the estimated true score covariable are equal to those of the
fallible covariable as well as unbiased estimates of the infallible
covariable means which cannot be observed.

3. The slope of the regression line between the dependent
variable and the estimated true score covariable equals the slope
of the regression line between the latent true variables.

The direct effect of the estimated true score transformation
was to reduce the variance of the covariable. The variance of

estimated true scores equal to the variance of the fallible

24
covariable times the square of the reliability coefficient. To
understand the estimated true scores, one should refer to classical

theory of measurement. Let
X = X' + u

where X denotes the observed score,

X' denotes the true score, and

u denotes the error of measurement.
Under assumptions of the classical theory,

2 2 2
Ox - Ox: + Gus

Let X denote the estimated true scores, then within levels of i,

Var(X) - Var(Xi ) + 2p Cov(X

xx 1’ X-ijx i.)

2 .—
+ px§8r(xij-Xi.)°

Treating X; as a constant, hence,

2 2 2
.-p o
x xxx
0‘2_ 2
°x ‘J‘x(1 pxx)
-02_02
X G

where 0% denotes error variance of predicting true scores by observed
scores. Therefore, the transformation of the covariable to esti-
mated true scores is equivalent to taking errors of measurement

variance from the observed variance of fallible scores.

25

Porter's estimated true score ANCOVA produced F-ratios distri-
buted approximately as the central F—distribution when the null
hypothesis defined on the latent true variables was true. The fit
of Porter's procedure was as good as Lord's U-statistic for the
two group design.

Another approach has been presented by DeGracie (1968). He
discussed relaxation of the assumption of having a fixed covariable
measured without error. He showed that when the covariable was
random but free from error, unbiased estimates and valid confidence
intervals as well as valid statistical tests could be obtained
from the usual analysis procedures, but that the variance of the
estimates was averaged over all value of covariable. His discussion
went on to consider the situation where the covariable was measured
with error but fixed. A corrected regression slope, b3, was pro-

posed and proven to be an unbiased estimator of true slope.

o2 o:
b3.7x.¥.{1+-:;(E“+ %—2“ )}-1
Sxx Sxx Sxx
whereS --—1——{22(X -X )(Y -Y- )}
XY t(n-l) 1 3 11 1° 13 1" ’
xij - xij+uij where uij was errors of measurement,

“ _ _ 2
SXX' t(n- _1) {i j(X13 -Xi )2 t(n l)ou },
o: was variance of errors of measurement and known,

t was number of treatments, and

n was cell frequency.

26
Since there was a relationship between b3 and the adjusted mean
square within, the test statistic was not distributed as F. Craméi's
theorem, which is the same as Lord also employed, was used to find
an approximate solution. Finally DeGracie concluded that for a two

group design the test statistic

(Ti-:1) - (Ti-T3)

 

A

oTi-Tj
where Ti, TJ were estimates of adjusted treatment effects,
T1,TJ were true treatment effects,
O§i_§ was an estimate of standard deviation of sampling

distribution of differences,
was asymptotically normal. Later, DeGracie illustrated the use of
b3 to form an index of response and analysis of covariance under
the assumption that the fallible covariate was random. For large
sample size, his test statistic, the ratio of adjusted mean square
treatment (MST) and adjusted mean square error (MSE) was distributed
approximately as F with t-l and t(n-l)-l degrees of freedom. His
analysis of covariance is shown in Table 1. His procedure was
similar to Porter's and he cited Porter's empirical results to
support the utility of his test statistic.

When measurement error variance is known, another approach has
been developed by Stroud (1972). Stroud's procedure compares
conditional variances with conditional means for a two group deSign.
He claimed that his procedure can be extended to more than the two

group design with "no great technical difficulties." In his

27

Table l. DeGracie's Analysis of Covariance Using Corrected Slope b3

 

 

Sources df SS ' Corrected SS df M.S. F

Treatment t-l £(V; 4?. )2

Error t(n-1) 2(Y114Ti.)2 (1) t(n-l)-l MSE

Total nt-l 2(Y114T' )2 (2) nt-2

Adj. Treatment (2)-(l) t-l MST MST/
MSE

 

(l) . ij(Yij-Y1 )z-b b3 13(x13‘x 1 )2 -2t(n-l)o:

(2) . §1(Y134Y;.)2-b§ ij(xij4§;.)2-2(tn—1)o§

 

development he set measurement error variance to be 1.0. The Wald
statistic for the comparison of two conditional means of dependent
variable was given as follows:
- - - - 2
vv _ l___ 'u..-
(g1+gz)(b1 b2)+h1{(Y1Y2hz(Y1 Y2)} +h2{Y1 Y2 b1(X1- 1(2)}2

— — f
(g' 1+32) (h' 1+115) + (xl-xz) hihé

 

where b1 - SXY/(Sxx-l) within group 1,

b2 - SXY/(Sxx-l) within group 2,

n1 - (sisxx +b2)/(Sxx -1> (1 = 1.2)
1 xx1
8i ' 81/31:
hi - hi/ni’ and
2 -2
gll - sW 'SJZKY1(SXX -1)1+sxY (Sxx -1) .

i i 1H1

28

The Wald statistic was distributed as x2

with 2 degrees of freedom.
Data from a study in Portland, Oregon, were presented as an example

of the application of the procedure.

CHAPTER III

ANCOVA WITH MULTIPLE FALLIBLE COVARIABLES

Unfortunately the problem of multiple fallible covariables in
ANCOVA is more complex. Consider the null hypothesis for a onedway

ANCOVA with two random fallible covariables:

2
i X X1 3 31

E {
a
i=1 Y

where again the a's denote treatment effects on the dependent variable
Y and the covariables X and z, and BX and B3 are the pooled within 1
least squares regression coefficients for predicting Y from X and 8
respectively. As before, the estimates of the a's are left unbiased
by the introduction of errors of measurement which follow classical
assumptions, but the least squares regression coefficients are biased
estimates of the corresponding regression coefficients for the latent
true variables. The nature of the bias is

_ spun - p'ﬁzpzz) + egegmu - pzppxx
1 _.

9
X

,2
p xszxpzz

 

and 33

1__ t2
9 xapxxpza

where primes denote statistics for the latent true variables, pxx

and Oz; are the reliabilities of X and B respectively, and 9X8 is

29

30
the correlation between the latent true X and E (Cochran, 1968). A
useful restatement of the above two expressions in terms of bivariate

statistics is

_ °xx5§.x - pxxpzzB§.232.x

 

 

BX 1- '2
pxszxpzs
V V '
pzzBY.z ' pxxpsaBY.xBx.z
and 32 - '2 ,
1 ' pxszxpzz

For more than two predictors, Cochran (1970) gave a general relation-

ship between Bk and Bi as

 

2
O
- ' § ' B!
Bk kakk- k k' Okk' k'
l-o
0‘ Bk ‘ kakk ‘ k5k"—-%:E;'Bk'.k3k'
p'kk'
'2

where 81"..k' . B£!.k - pkk' '

Cochran concluded that

"Thus the direct effect of an error in Xk on 8 is to
decrease its absolute value to pkkBk or somethIng less,
but Bk also receives contributions from errors of
measurement in any other Xk, that is correlated with
Xk. Even if such errors occur in only Xk, they can
affect the values of all the Bk" By working a few
examples with varying p and Bk , it becomes evident
that interpretation of IHe Bk as if they were the Bk
can become quite misleading unless all the p k are
big " (p. 34, notation changed to be consistent with
that in present paper).

A solution to the multiple fallible covariable problem requires
a procedure that provides unbiased estimates of the regression
coefficients defined on the latent true variables. The substitu-

tion of estimated true scores for the observed covariables does not

31
adequately solve the general problem, but does provide a solution
in the restricted case of uncorrelated latent true covariables.

For uncorrelated latent true covariables,

- 0 and BX.Z - B 0,

v I I
pxz 2.x

and the two regression coefficients become
_ ' .. I
8x ‘ pxst.x’ 8% ” pZZBY.Z ‘
Applying estimated true scores gives

' Bx/Pxx ' Bi.x and

u)
)4)

m
m

A ' 53/923 ' 3i.z '

The point of breakdown for the estimated true scores solution
to the general problem provided a suggestion for a new procedure,
Method A. Use of estimated true scores corrected bivariate regres-
sion coefficients between the dependent variable and each covariable,
but left uncorrected the bivariate regression coefficients among
the covariables. Thus Method A consisted of l) substituting
estimated true scores for each observed covariable, and 2) correct-
ing for attenuation the relationships between the estimated true
scores covariables.

A second approach to the solution of the general problem,
Method B, was motivated by the simplified situation which exists for
uncorrelated covariables. Method B can be described for two
covariables as follows: 1) one covariable is transformed to make

it orthogonal to the other; 2) estimated true scores are substituted

32
for the two orthogonal covariables and computations proceed as for

regular ANCOVA.

Method A
First consider the effects of Method A on the pooled within
regression coefficients for a one-way ANCOVA having two fallible
covariables. Using standard ANCOVA procedures, the population
regression coefficient for one of the covariables, X, is

_ swaoswrx - swxzesms
swx-swz - swxs2

 

8x

where SW denotes a sum of squares within. Substituting estimated

true scores for X and Z replaces SWX with péxSWX,

SWZ with piaSWZ,

SWXZ with pxxpzzSWXZ,
SWYX with pxxSWYX, and
SWYZ with pzzSWYE,

where pXX and pzz are the sample reliabilities of X and B respectively.

It follows that

2 C — 2 O
B“ _ pzszxSWZ SWYX pzszxSWXZ SWYE
X p2 p2 swx-swz - p2 p2 swxz
22 xx 22 xx

where X - x1,-pXX(xij - Xi.) denotes estimated true scores for X.

 

The expression can be simplified to

33

-¥¥— SWYX -l—- swxz . SWYE

B _ pxx wa " pxx swx swz
x 1 _ swxz swxz °
swx ‘ swz

 

Further correction for attenuation of the relationship between the
'covariates by dividing SWXZ by the square root of the product of the

reliabilities of X and 3 results in,

 

D
—l;-SWYX _ _§§._l_.swxz —l—-Ssz
. _ pxx swx pxx pxx swx ° 922 swz
x
1 _ —l—._l;.swxz .waa

pxx pzz wa ' swz

By substitution,

p

88 '
BY.x ' ‘ﬁi§BE.XBY.E
.2
X8

 

1 ' 9

Thus when 938 - pXX’ Bi is equal to the regression coefficient for
the latent true X. Similar steps result in the parallel conclusion
that the regression coefficient 83, provided by Method A, is identi-
cal to the regression coefficient for the latent true 3 when 98% - pXX’
Since substitution of estimated true scores does not change the means
of the covariables, it follows that Method A tests the desired

hypothesis.

Method B
Method B starts with a transformation on the second covariable
that results in a new variable which is orthogonal to the first

covariable. The transformation used was

W13 ' 313 ' B2'3.Xxij ’

34

where 53.x denotes the pooled within regression coefficient for
predicting 8 from knowledge of X. It should be noted that for
perfectly reliable X and 8, use of covariables X and W does not
change the hypothesis tested.

The null hypothesis tested by Method B is
i { 1 1 2
i=1 0‘Yi " qY-XGXI ' Wﬁawi} ' o
where BY.X and BY.W are bivariate regression coefficients since X
and W are uncorrelated. The question is whether this null hypothesis
is identical to the desired null hypothesis stated in terms of the
latent true variables Y, X and 8.

Since W is a new variable, its reliability must first be

estimated. By definition,

9 _ varEW')
WW var W) ’
where W' denotes the latent true W. But

var(W) - var(z) + B: xvar(X) — 28%.xcov(X,E)

2

- var(Z)(l - pxz

)’and

 

var(W') - var(a') + B: xvarKX') - 288.xpxz\//‘var(8)var(X)

- pzzvarw) + 912:; %% pxxvar(X) - ZPXE‘ "33%)1 px3 \frar(x)var(8)

- var(3)(pzz - 2012‘; + pﬁszx).

 

Thus,

 

Further,

B _ covSY,W)

Y-W var(W)

- Bz xcov(Y,X)
2

var (Z) (1 - on)

cov(Y,E)

_ BY.z - BX.BBY.X
l - OXZ
and

HW “z ‘ 33.x”x'

The last two terms in the squared quantity for stating the null
hypothesis for Method B can now be restated as

__L

(11 -u)-1 (u -3 U.-u +8 11)
’Oxx BY.x x1. x.. 'EEEBY.W 31. 3.x x 3.. 2.x x..

1.

' ‘(BY.x/0xx ' 3Y.sz.x!°wu)(uxi ' “x..) ‘ £34! (“a ‘ “a..)-
. 9W 1.

By further substitution,

BY.x/°xx ' BY.w82.x/°ww

becomes

 

2
29 p
rxz 2 xx
' _ -———— _ I 1
BY.X(1 p88 Zszpa ) BY.% 2 xBxx
- 3
p 9
XS 2 xx

36

and BY.W/pww becomes

where again primes denote statistics for the latent true variables.
Since the two expressions do not simplify to the regression coef-
ficient for the latent true X and 8 covariables, it follows that
Method B does not test the desired hypothesis. In retrospect the
error in logic was that the transformation forced the manifest

variables to be orthogonal, but not their latent true counterparts.

Monte Carlo Study

Thus far the two modifications of ANCOVA have been considered
as to whether or not they test the correct hypothesis when there are
two fallible covariables in a quasi-experiment. The conclusions
were: 1) if the latent true covariables are uncorrelated, estimated
true scores ANCOVA tests the desired hypothesis; 2) when the latent
true covariables are correlated but have equal reliability, Method
A tests the correct hypothesis; and 3) Method B does not appear to
test the hypothesis of interest under any circumstances. The
remaining question to be answered was, how do the small sample distri-
butions of the various test statistics behave?

A computer program for the CDC 6500 computer system at the
Michigan State University Computer Center was written to get empiri-
cal F distributions of the estimated true score ANCOVA when two

random fallible covariates were independent of each other, and of the

37

two proposed correction methods when two random fallible covariables
were related. The program was composed of seven major parts:
parameter setting, data generating, estimating reliability, comput—
ing ANCOVA on true score, estimated true scores with the Method A
correction and with the Method B correction, and distribution
building. The remainder of Chapter III provides an overview of
the computations involved in a one-way ANCOVA with two covariables
followed by a discussion of each of the seven parts of the computer
program.

Analysis of covariance was defined by the following linear

model.
Y1; ' “y.. + (it + 8x (X13 ' "x..) + 32 (‘13 ‘ "a..) + 913'

The null hypothesis about treatment main effects is:

or Ho: i (6.1)2 . o. (1 . 1,2...1)

Sums-of—squares and cross-products are presented in Table 2.
Using the computational guides from Kirk (1968) and Winer (1962),

the regression coefficients in Table 2 were defined as follows:

_ STZ ° STYX - STXZ ° STYZ

 

 

 

bTX STX - STz - STxiZ ’
STX - STYZ - Ssz . STYx
sz ' 2 ’
STX - STZ - STXZ
.SWZ'SWYX-SWXZ'SWYZ
and wa 2 ,

SWX ’ SWZ - SWXZ

38

_wa-SWYz-waz-SWYx

b
swx - swz - swxz2

 

WZ

Table 2. Sources of Variation for Analysis of Covariance with Two

 

 

 

 

 

 

Covariables
Sums of squares and crossgproduct

Sources df Y X Z YX YZ XZ
Between t-l SBY SBX SBZ SBYX SBYZ SBXZ
Within t(n-l) SWY SWX SWZ SWYX SWYZ SWXZ
Total tn-l STY STX STZ STYX STYZ STXZ
Adjusted
Total tn-3 STY* - STY - bTxSTYX - szSTYZ
Within t(n-l)-2 SWY* - SWY - waSWYX - bszWYZ
Between t-l SBY* - STY* - SWY*

 

The notation for a sum-of-squares is as follows:
S denotes a sum-of-squares
the letter following S denotes total (T) or within (W),
the final letter(s) denote(s) the variables involved.
The subscripts on the b's are similar:
the first subscript denotes total (T) or within (W)

the second letter denotes the covariable.

39

The F test statistic was defined as

SBY* / t-l
SWY* / t(n-l)-2

which was distributed as the F distribution with t-l and t(n-l)-2

degrees of freedom.

Parameter Setting

One of the purposes of this study was to compare the results
with Porter's study (1967). To facilitate the comparisons most
parameters were set the same as Porter's. Table 3 indicates all
possible combinations of the parameters included in this Monte
Carlo study. A "X" indicates the cells used in this study. Relia-
bilities of the two covariables were equal in all situations invesh
tigated due to the previously noted limitations of Method A.

Consistent with Cox's suggestion (Cox, 1957), the lowest
correlation between a latent true covariable and the dependent
variable was .6. An intermediate value of .7, and a maximum value
of .9 were also in the design. Reliabilities of both fallible
covariables were .5, .7, and .9. Correlations higher than .9 and
reliabilities lower than .5 were considered out of the range of
interest to educational researchers. The one-way ANCOVA balanced
design with sample sizes varied from 10 observations per cell to 15,
20 and 40 at the largest. The number of treatment groups were 2

and 4.

40

De I «z ON I mz

ma I NZ

ca I Hz

 

m.
h.
n.

a.

N.

n.

 

a.
N.
m.
m.
N.
m.
a.
h.
m.

 

L7

“0
h.
m.

 

#2 m2 NZ Hz #2 m2 NZ H2
«no Nuu
o. I NMQ

 

#2 m2 NZ Hz #2 m2 NZ Hz
sun «as
N. U NNQ

 

#2 m2 NZ H2 #2 m2 NZ Hz
QIU NIH
.N o I “MO

 

 

xenon mo gunman

.m manna

41

The intercorrelation between the two latent true covariables
was kept low, varying from .0 to .2 and .4. Using multiple covariables
with high intercorrelations has little practical utility and should
be avoided.

The number of iterations per configuration was 1000. Three
levels of nominal a (probability of a Type I error) were chosen,
i.e., .10, .05 and .01. These three levels of significance are
those most frequently used in practice. Further, comparisons for
a's lower than the .001 level may sometimes be misleading (Glass,
Peckham and Sanders, 1972). Use of .001 itself is rarely seen in
research and decision making.

For studying statistical power, the same nominal a's were used.
The non-central case was created by adding one-half standard devia-
tion of the marginal distribution of the dependent variable to each
observation in one treatment group. Theoretical statistical power
in the two treatment design for latent true variable would be .99
and .98 for the four group case at nominal a of .05, cell size of

40 and multiple correlation of .9

(péx - .7, 9Y8 - .7 and 9X8 - .2).

A flow chart for the computer program is presented in Figure 2.

In the first phase of the computer program, as shown in Figure
2, the following parameters were defined:

NT, number of treatment groups

NB, number of observations per cell

RHOX, reliability of X covariable

 

(m )

N/

CLEAR
COUNTER
SET

PARAMETERS _

L

READ IN
PARAMETERS

 

 

 

 

 

 

42

 

 

 

 

 

l4

DATA
GENERATING

 

 

 

 

 

 

 

 

J,

[INITIALIZED

l

COMPUTE
ELIABILITIES
SFORMATION
COMPUTE

sm MD
sm

 

 

 

 

 

 

 

ANCOVA METHOD A

e—>

 

 

 

 

 

Figure 2.

 

 

 

ADD .5
TO
FIRST GROUP

 

 

COMPUTE
ave. & var.
MSE' MST'
R a

 

 

 

 

 

 

 

 

Puwma
cwmrm

CENTRAL
CASE

l

 

 

 

METHOD B

 

 

Flow chart of the computer program.

 

 

 

FREQUENCY
COUNT FOR

NON-CENTRAL

 

 

 

PN-.5

43
RHOE, reliability of E covariable
RHYX, latent true correlation between Y and X
RHYE, latent true correlation between Y and E
RHXB, latent true correlation between X and-E
F(l), theoretical F value at .10 nominal a level
F(2), theoretical F value at .05 nominal a level

F(3), theoretical F value at .01 nominal a level

Data Generating

Each pseudo-random normal deviate was obtained by calling sub-
routine RANN which was written in machine language (COMPASS). RANN
was a CDC 6500 version of RANSS which was created for the University
of Wisconsin computer (Porter, 1967). The unit normal generator
involved two stages. First, the multiplicative congruent method
was used to generate sixteen pseudo-random.numbers from.a uniform
distribution. Second, the sixteen numbers were summed and linearly
rescaled to provide a pseudo-random unit normal deviate via the
Central Limit Theorem. The RANSS program has been tested on random-
ness as well as goodness of fit and found to have good properties
on both criteria (Porter, 1967).

The subroutine RANN required a starting number in octal digit
specified as parameter RANDOM. Each time the program was run,
parameter RANDOM was changed to insure independency among resulting
F and distributions of beta weights. Changing the starting number
was achieved by changing the RANDOM card in subroutine RANN. These

starting numbers were selected randomly prior to running the program.

44
To create dependent variable Y and latent true covariables X'
and 3', RANN was called three times giving three random normal
deviates a, b and c. The three deviates were used in the following

relationships:

X' = a

2
I. I _I
Z apxz-l-bv1pxz
2 2
Y - apilx + Bib l-sz + c l-R'

where R' was the multiple correlation for predicting Y from X' and 3',
i.e.,

2

R' = B + 8

v
prx EpYz '
The resulting variables X', Y and B' were distributed as trivariate
normal each with expected value zero, variance one and bivariate
intercorrelations DXZ’ 9Y3 and 9&8.

Example distributions of random normal deviates generated by
RANN, correlation coefficients, means and variances of the dependent
variable Y and the two latent true covariables X' and 8' are con-
tained in Table 4. Each statistic in Table 4 was based on 10,000
trivariate cases. All statistics were in close agreement with the
known parameter values, thus providing strong support for the
validity of the data generator.

The next step was to create fallible X and 8 variables. Two
more random normal variates were called from RANN. These two

variates were multiplied by their corresponding standard errors of

measurement and the results were added to X' and 8' respectively.

45

Table 4. Example Distribution Based on 10,000 Trivariate Cases
Generated by RANN

 

 

 

Variable _ Mean H Variance Skewness Kurtosis
Normal 0.0 1.00 0.0 0.0
a -0.0078 1.0066 0.0367 -0.0857
b -0.0073 0.9826 0.0457 -0.ll92
c 0.0053 0.9814 -0.0257 0.0144
Y 0.0031 0.9869 rYx - .699 DYX - .70
X -0.0039 0.9859 rYz - .898 9Y8 - .90
8 0.0053 0.9863 rX8 - .394 9X3 - .40

 

Using classical measurement theory (Gulliksen, 1950), error variances

corresponding to reliabilities were found.

Given that
2
Ox,
Dxx 9
0x! + on
then
1 - p
0' I 402' ’
U pxx x

or the positive square root is the standard error of measurement.

Let X and 3 denote the fallible X' and 8' covariables. Then,

- I
X1 X + dlou ,

a I
81 Z + d2°w ,

46
where d1, d2 were random normal deviates,
ou was the standard error of measurement for X, and

0V was the standard error of measurement for E.

Estimating,Reliabilities

Lord's idea of duplicated variables (Lord, 1960) was followed
to provide test-retest (or parallel form) reliability estimates.
Therefore, an additional set of falliable observations were needed
for each latent true covariable. RANN was called twice again. Each
random normal variate was multiplied by the corresponding standard
error of measurement and the result was added to the true score on

X' and 3' respectively,
_ I

where d3 and d4 were the additional randmm normal deviates.

All seven variables (Y, X', 8', X1, 31, X2 and 82) were created
until the desired sample size was achieved. The correlation coef-
ficient between X1 and X2 was computed and served as the estimate
of the reliability of X. Similarly, the correlation between 81 and
32 served as the estimate of the reliability of E. The two relia-
bility coefficients were denoted by SXX and 628' Example of 100
reliability coefficients generated by this procedure with sample
size of 40 yielded the average of .704 and variance of .0015 for the

parameter value of .700.

 

I..IIIII.III|'
. i
ill’gl‘l'll'l‘
[III
I

47

A quasi-experimental design was simulated by creating differ-
ences among the covariable means. After all observations were
generated, a different constant was added to each observation in
a treatment group. The set of constants was the same for each
covariable. Means of both covariables for the first treatment
group were 6.0 in both the 2 and the 4 treatment group designs.

In the two treatment case, means for the second group were 0.0.
In the four treatment group design, means for both second and third
groups were 3.0 and the mean was 0.0 for the fourth group.

Group means on the dependent variable were computed internally
in the program. Latent multiple regression coefficients were used
to calculate these means so that for latent true covariables, all
adjusted treatment effects would be zero in the central case. These
means were computed by the following relationship:

11 - B'u + B'u .
Y1. x xi. 3 31.

To create the non-central case, the value of 0.5 (half standard
deviation of the marginal distribution of the dependent variable)
was added to each observation in the first treatment group. In
doing this, the non-central case data and the central case were
related to each other. Since there was no attempt to compare sta-
tistical power with Type I error, however, dependency between the
central and the non-central F distributions was not considered a
problem. Obviously, the double use of pseudo-random normal deviates

provided a great saving in computer time.

48
ANCOVA

Basic statistics, sums-of'squares and cross-products were come
puted as shown previously in Table 2. Analysis of covariance on
the dependent variable Y with X' and 3' as covariables was calculated
first. Accuracy of the program was tested against the Finn program
(Finn, 1968) and agreement found. The F ratios as well as beta
weights were compared. To ensure the accuracy, hand calculation
was performed on the same set of test data and agreement obtained.

In addition to ANCOVA on the true covariables, ANCOVA on the
two fallible covariables was also performed. Again, the outcomes
were compared to those from the Finn program and the hand calcula-
tion and were in agreement. Finally, the two proposed correction

methods A and B were calculated and checked for accuracy.

Method A ANCOVA
Fallible variables X1 and 31 were used as covariables in the
Method A ANCOVA. The two fallible covariables were transformed to

their estimated true scores,

)2 =x +5 (x x)
ij 11. xx 1ij 1i.
and i=3 +5 (2 -z )
ij 11. 33 lij 11.
where X1 and 31 were treatment means for the two respective
i. i.

covariables. Basic computations were done again to find sums of
squares and cross-products. Before the adjusted sums-of—squares of
the dependent variable Y were calculated, the cross-products between
the two estimated true score covariables were corrected for

attenuation:

49

 

 

z
STX3A - STX
pxx “as
and SWX3A - SWX3

pxx 923

7

The adjusted cross-products STX3A and SWX3A were used in the compu-
tation of the total and pooled within regression slopes of the X1

and 31 covariables.

Method B ANCOVA
In the Method B correction procedure, the second fallible
covariable, 31, was transformed into a new variable, W, which was

uncorrelated with the first covariable, X This transformation

1.
was done by taking out the part of the second fallible covariable
that was predictable from the first fallible covariable. Thus the

new covariable was defined

 

W I Z - B X
2.x '
11 111 113
with reliability,
,. 93% 29x3 + px25"xx
9XX 1 .2 '
pxz

Analysis of covariance using the two estimated true score

covariables of X1 and W was then performed. The F ratio was ob—

tained as well as the two within regression coefficients.

50

Distribution Building

Twelve distributions were built in the frequency counting
phase. Four of them were F distributions corresponding to the four
ANCOVA's performed. Only the right tail of four F distributions
was accumulated. Three frequencies corresponding to the probabili-
ties of Type I error at nominal a, .10, .05 and .01 were collected
for each F distribution. The theoretical F values at each of the
three a levels for specified sample sizes and number of treatment
groups were read in the first phase of the program, the parameter
reading routine. These F values served as reference points for
the frequency counts.

In addition the distributions of eight regression coefficients
(two for each of the forms of ANCOVA's) were built. The distribu-
tions covered the range of values from .5 to .9 in intervals of

width .025.

Print Output

After 1000 iterations were done, the average and variance of
each regression coefficient, mean square (treatment and within) and
adjusted treatment mean was computed. Then all distributions, means
and variances of regression coefficients, mean squares and adjusted

treatment means were printed in the final phase (see Figure 2).

CHAPTER IV

RESULTS

Estimated True Scores ANCOVA with Independent Covariables

The results of the Monte Carlo investigation of estimated true
scores ANCOVA for two uncorrelated random fallible covariables are
provided in Tables 5 through 11. As stated previously, the sample
size per treatment group was 40, and the correlations of latent
true covariables with the dependent variable were each .7 as were
the reliabilities of each covariable.

Empirical Type I errors, statistical power, average mean
squares, and average adjusted means for the two treatment design
are given in Table 5. The Type I error rates for estimated true
scores ANCOVA were slightly liberal but within two standard errors
for all three nominal values. By contrast the results using latent
true covariables were slightly conservative but also within two
standard errors. The inappropriateness of using fallible covariables
in ANCOVA for quasi-experiments was clearly supported by the .999
empirical Type I errors for all three nominal values.

As was expected, use of latent true covariables resulted in
substantially greater power than estimated true scores ANCOVA. The
difference in power is explained by two factors. First, the multiple
correlation of estimated true scores is identical to that for the

51

52

 

 

 

 

 

 

Table 5. Empirical Type I Error, Statistical Power, Average Mean

Square and Average Adjusted Means for Estimated True

Scores ANCOVA with t . 2, n - 40, 9XX - .70, p35 - .70,

p§x - .70, 9Y3 - .70, 9X3 - .00, "X1 - ”31 - 6.0,

"X2 - ”32 - 0, qu - 8.4, “Y2 - 0.0

CENTRAL NON-CENTRAL
Nominal a .10 .05 .01 .10 .05 .01
TRUE COV. .092 .044 .005 .973 .950 .814
EST. TRUE .115 .063 .013 .200 .128 .046
FALLIBLES .999 .999 .999 1.000 1.000 1.000
MEANS SQU. BETWEEN WITHIN BETWEEN WITHIN
TRUE COV. .0178(.0006) .0200(.00001) .2694(.0203) .0200(.00001)
EST. TRUE .34ll(.2253) .3150(.0026) .5292(.4906) .3150(.0026)
FALLIBLES 9.3305(13.25) .3150(.0025) i3.2614(20.38) .3150(.0026)
ADJ. MEANS
T1 T2 T1 T2

TRUE COV. -0.002(.018) -0.0007(.0005ﬁ .497(.0189 *
EST. TRUE -0.055(.508) -0.0002(.0115N .444(.508) *
FALLIBLES 2.523(.208) -0.0008(.00861 3.022(208) *

 

 

 

__'_

Primes denote parameters on latent true variables.

*
Same as in central case.

53
fallible covariables, while the multiple correlation for the latent
true covariables is consistent with a correction for attenuation.
Second, the variance of the estimated true scores covariables is
equal to the variance of the corresponding fallible covariables
multiplied by the respective squared reliabilities. The variance
of the latent true covariables, however, is equal to the variance
of the corresponding fallible covariables multiplied by the respec-
tive reliabilities. The smaller variance of estimated true scores
covariables operates to further dampen the power of the procedure.

The slight liberal tendency of estimated true scores ANCOVA
was also reflected in the average mean squares for the central
case, which showed the average mean square between to be slightly
larger than the average mean square within. Support for the
earlier analytic demonstration that the procedure tests the correct
null hypothesis was given by the average adjusted means. For the
central case the average adjusted means were -.055 and -.0002, which
were very close to the desired values of zero.

Average pooled within regression coefficients and cumulative
distributions for the 1000 samples are provided in Table 6. The
average coefficients were .703 and .707 for the two covariables,
which were very close to the .7 value of the population coefficient
for the latent true covariables. As was expected, the standard
errors for the regression coefficients were substantially larger
for estimated true scores than for latent true covariables. The
reasons are the same as those given previously for the discussion

of statistical power.

54

 

 

 

Table 6. Empirical Cumulative Distribution of Regression

Coefficients from Estimated True Scores ANCOVA

with t - 2, n - 40, 9XX - .70, p33 - .70, p§x - .70,

9;; - .70, pi; - .00, Si - .70, a; - .70

TRUE COV EST. TRUE FALLIBLES

X 3 X 3 X 3
.50 4 1 572 569
.55 22 21 873 878
.60 92 83 981 975
.65 0 0 258 237 997 998
.70 490 506 515 494 1000 1000
.75 1000 999 746 743
.80 1000 884 875
.85 958 956
.90 985 978
.95 990 995
MEAN .701 .700 .703 .707 .489 .490
VAR .0003 .0002 .0069 .0065 .0029 .0030

 

 

 

 

Primes denote parameters on latent true variables.

55

Tables 7 and 8 contain comparable data for the four treatment
group design. Number of treatment groups did not noticeably alter
the results for ANCOVA using latent true and fallible covariables.
The estimated true scores ANCOVA empirical Type I errors, however,
were markedly discrepant from the nominal values, i.e., .177, .109,
and .037 for nominal values of .10, .05, and .01 respectively.
This was true despite the average adjusted means being quite close
to the desired value of zero, i.e., -.021, -.017, -.011, -.0020.
Further, the average pooled within regression coefficients, as
shown in Table 8, were .703 and .700.

It was hypothesized that the use of sample reliabilities for
calculating estimated true scores was the cause of the liberal
nature of the F test statistic. Therefore, the simulations were
replicated using population reliabilities. The empirical Type I
errors and statistical powers for the replications are reported
in Table 9. For both the 2 and 4 treatment groups designs the
empirical Type I errors for estimated true scores ANCOVA were
slightly closer to the nominal values than they had been using
estimated reliabilities. For the 4 treatment group design, however,
the empirical Type I errors were still quite liberal, i.e., .163,
.097, .029 for nominal values of .10, .05, and .01 respectively.
The liberalness of the estimated true scores test statistic for
the four treatment design is consistent with, but much more pro-
nounced than, that found for a single fallible covariable (Porter,
1967). With a single fallible covariable, however, the empirical

Type I error rates were still within the bounds of practical utility,

56

MENU HNHU—ku EH m.“ Qam¥

.moanmwum> menu unuumH do mumuuauumm museum moaﬁum

 

 

 

 

 

 

I R I “NHH.Vm~o.m Aseoo.vm~oo.- Aemo.e~o~.H Awmo.emm~.H AuHH.van.N mmamng<m
I I I nam~.vaee. AnHHo.vo~oo.- Aeeo.eaeo.- “moo.veao.- Aem~.vH~o.n some .amm
I I I Aoeo.vncm. Aeooo.eeooo.u amoo.emoo. Amoo.emoo. aoeo.vmoo. .>oo meme
«8 me we as «a me Ne He mzemz .nae
I AeH~.memsmm.e Amsoo.emaem. Aaem.~vwwmo.s memHga<e
I “Heaa.eomoe.e Ameoo.vnmam. AumHH.eemoe. meme .emm
I Ammwo.ew~ﬂo.e Amoocoo.ewmeo. Amoco.vmomo. .>oo meme
szeHz zmmzemm zemaH: zmmzemm .m.z
ooo.e ooo.e ooo.e ooo.a ooo.e ooo.H mmamHaq<m
new. eon. Awe. emo. see. she. meme .emm
ooo.e coo.e ooo.e «no. «mo. mos. .>oo mama
Ho. no. OH. Ho. no. OH. 5 Hmaeaoz
a<mezmouzoz gemazmo

 

 

 

c.o I as:

.34» I m»: I «we .36 u :5 .o.o I own I «x: .o.m I man I um: I new: I Na: 66 I am: I Q:

.oo. n mwQ .95. I NWQ .on. I man .on. I NNQ .oq n u .e I u :uﬁa <>ooz< muuoom wank vauuaﬁumu

you new: woumahv< uwuuo>< one moumsrm use: owuuo>< .umzom Museumaumuw .uouum H unha Houauﬁmﬁm

.n manna

57

Table 8. Empirical Cumulative Distribution of Regression
Coefficients from Estimated True Scores ANCOVA
With t . 4’ n . 40, pxx - 070, pzz - 070’ péx - 070,

0&3 - .70, pk; - .00, Si - .70, a; - .70

 

 

 

TRUE COV EST. TRUE FALLIBLES

X 3 X 3 X 3
.50 0 0 612 600
.55 l 2 953 944
.60 25 29 998 997
.65 0 0 178 196 1000 1000
.70 532 497 490 517
.75 1000 1000 796 814
.80 959 943
.85 991 993
.90 998 999
.95 1000 1000
MEAN .6999 .700 .703 .700 .490 .490
VAR .00014 .00012 .0031 .0033 .0014 .0014

 

 

 

 

Primes denote parameters on latent true variables.

58

 

 

 

Table 9. Empirical Type I Error and Statistical Power for

Estimated True Scores ANCOVA Using Population

Reliabilities with n - 40, pXX - .70, 933 - .70,

' I ' I ' I
pr .70, 9Y3 .70, pxz .
CENTRAL NON-CENTRAL

Nominal u .10 .05 .01 .10 .05 .01
T - 2
TRUE COV .099 .045 .008 .963 .943 .819
EST. TRUE .100 .056 .012 .219 .127 .032
FALLIBLES .999 .999 .992 .999 .999 .999
T - 4
TRUE COV .098 .045 .007 1.000 1.000 1.000
EST. TRUE .163 .097 .029 .789 .703 .479
FAELIBLES 1.000 1.000 1.000 1.000 1.000 1.000

 

 

Primes denote parameters on latent true variables.

 

59
i.e., .111, .058, and .013. Porter's single fallible covariable
results were for the same parameters except sample size, which was

twenty rather than forty per treatment group.

Methods A and B

The results of the Monte Carlo investigation of Methods A and
B for two correlated fallible covariables are presented in Tables
10 and 11. The earlier analytic demonstrations suggested that
Method A should test the right hypothesis while Method B should not.
Nevertheless, Method B was investigated on the chance that it might
have some practical utility. The parameters of the Monte Carlo
simulations were as before, with the exception that the latent true
covariables had a .2 intercorrelation.

Empirical Type I error, statistical power, average mean squares,
and average adjusted means for the two treatment designs are given
in Table 10. The average adjusted means for Methods A and B sup-
ported our analytic findings. The averages for Method A were in
close agreement with the desired zero values, i.e., -.070 and -.002,
while the averages for Method B were not, i.e., ‘-.514 and -.003.
Further support for the analytic work is provided in Table 11. The
average values of the regression coefficients for Method A were in
close agreement with the desired .58 value, i.e., .586 and .591,
while for Method B there was little agreement, i.e., .705 and .640.

Unfortunately the empirical Type I error rates were not within
the range of practical utility for either method. The finding for
Method B was not suprising, but greater hopes were held for Method

A. The too liberal nature of the F test statistic for Method A

Table 10.

60

Empirical Type I Error, Statistical Power, Average Mean

Squares and Average Adjusted Means for Method A and

Method B with t - 2, n I 40, pXX - .70, 933 - .70,

' I ' I n I I ' .
pYx '70' 0YB '20: 11x1 “81 6°0’ 11x2 ”22 0 0’

“Y1 . 6.96, “Y2 - 0.0

 

Nominal

 

 

 

 

 

CENTRAL NON-CENTRAL

a .10 .05 .01 .10 .05 .01
TRUE cov. .106 .055 .010 .377 .254 .100
METHOD A 1.000 1.000 1.000 1.000 1.000 1.000
METHOD B .180 .096 .027 .120 .057 .013
FALLIBLES .980 .956 .845 .998 .996 .979
MS BETWEEN WITHIN BETWEEN WITHIN
TRUE 00v .188(.066) .184(.0009) .507(.313) *
METHOD A 179.1(945.6) .429(.0050) 204.56(1205.7) *
METHOD B 1.242(1.99) .399(.0043) .733(.958) *
FALLIBLES 5.980(9.48) .398(.0043) 9.369(16.05) *
ADJ. MEAN T1 T2 T1 T2
TRUE cov .015(.155) -.000(.005) .515(.155) *
METHOD A -.O70(.606) -.002(.014) .429(.606) *
METHOD B -.514(.566) -.003(.015) -.014(.566) *
FALLIBLES 2.349(.236) *

 

1.848(.236) -.001(.011)

 

 

Primes denote parameters for latent true variables.

*
Same as in central case.

61

Table 11. Empirical Cumulative Distribution of Regression

Coefficients for Method A and Method B with t I 2,

 

 

 

n I 40, pxx I .70, 038 I .70, péx I .70, 9&8 I .70,
OX8 I .20, 8% ".58, 8% I .58
TRUE COV METHOD A METHOD B FALLIBLES
X Z X Z X Z X 3
.50 53 63 189 160 29 56 881 868
.55 261 248 366 331 76 153 982 975
.60 639 638 564 552 169 353 1000 996
.65 910 912 732 743 316 562 100
.70 993 994 886 876 496 762
.75 997 1000 961 937 672 874
.80 1000 986 979 805 943
.85 995 994 901 981
.90 997 999 958 995
.95 999 999 976 997
MEAN .581 .582 .586 .591 .705 .640 .427 .431
VARIANCE .0025 .0025 .0093 .0092 .0129 .0090 .0035 .0037

 

 

 

 

 

Primes denote parameters on latent true variables.

62

stemmed from a far too large average adjusted mean square for
treatments, i.e., 179.1.

Three modifications of Method A were proposed in an attempt
to decrease the adjusted mean square for treatments, none of which
resulted in empirical Type I error rates within the bounds of
practical utility. Empirical Type I errors and statistical powers
of the three modifications are presented in Table 12 for the four
group design with parameters as before. Since the adjusted mean
square for treatments was obtained by subtraction of the adjusted
mean square within from the adjusted mean square total, all three
modifications attempted to reduce the adjusted mean square for
total.

By studying Method A in detail from the print output of basic
computations for 10 iterations, it was observed that correction
of the sum-of-squares total cross-product resulted in too low total
regression coefficients for both covariables. If the correction
was not done to the sum.of—squares total cross-product, the adjusted
mean square total would be decreased. The first modification, then,
was made to apply the correction to the sum-of-square within cross-
product only. The Type I error rates (row 1 of Table 12) were
found too conservative, i.e., .031, .021 and .007 for nominal a
of .10, .05 and .01 respectively. Statistical powers were also low.

The second modification was motivated by the argument that the
reliability of a covariate for the total sample should be greater
than the pooled within treatment reliability. Thus the estimated

true scores and correction to the within treatment cross-products

63

 

 

 

 

 

Table 12. Empirical Type I Error and Statistical Power of

Estimated True Scores ANCOVA with Some Additional

Correction Methods with n I 40, t I 4, pXX I .70,

988 I .70, 92X I .70, 98% I .70, 0&5 I .20

CENTRAL NON-CENTRAL

Nominal a .10 .05 .01 .10 .05 .01
FIRST
TRUE COV. .109 .053 .015 .974 .932 .818
METHOD A .031 .021 .007 .157 .122 .073
SECOND*
TRUE COV. .1 .0 .0 .8 .8 .5
METHOD A 1.0 1.0 1.0 1.0 1.0 1.0
THIRD
TRUE COV. .108 .052 .015 .965 .936 .824
METHOD A .999 .999 .999 .999 .999 .999

 

 

 

*
Only 10 iterations

were performed.

64

were calculated using pooled within reliabilities while the correc-
tion to the total cross-products used the reliabilities for the
total sample. For ten iterations the obtained Type I error was
1.0 for all three nominal values

The third modification was an attempt to decrease the adjusted
mean square within treatments after the first modification was
employed. The decrease was brought about by correcting for attenu-
ation the sum of cross-products between the dependent variable and
each covariable. The effect was so great that the Type I error

rates jumped to .999 for all three nominal a values.

CHAPTER V

SUMMARY AND CONCLUSION

In educational research it appears that random assignment of
experimental units to treatments is frequently not accomplished,
yet the researchers are interested in testing whether treatments
cause differences in the dependent variable. Program evaluation
efforts provide numerous examples to support this contention;
for example, the National Follow Through Program and the Head
Start Planned Variation Program.' In the past, ANCOVA has frequently
been employed to control for at least some of the rival treatment
explanations as to any differences or lack of differences found.

As has been seen, ANCOVA is not an appropriate strategy if the
covariables have less than perfect reliability, which is nearly
always true in practice. Estimated true scores ANCOVA provides

an increasingly popular solution to the single covariate case;
however, most evaluations have multiple fallible covariates. The
two procedures (Methods A and B) suggested here are a first attempt
at providing solutions to meet their needs.

It should be noted that the investigation was an attempt to
find a solution to the problem of errors of measurement in random
covariables for use in quasi-experiments. In the investigator's
opinion, however, there is no perfectly acceptable solution to the

65

66

problem of estimating causal relationships from quasi- or naturally-
occurring experiments.‘ Probably the best approach is to use
multiple analysis strategies each having somewhat different
assumptions.

At least three categories of procedures, matching, ANCOVA
and ANOVA of the Indices of Responses, are useful in quasi-
experiments. When results are invariant across multiple analysis
strategies for a given set of quasi-experimental data, conclusions
about cause can be relatively strong. When results differ, greater
caution in interpretation is warranted.

The effect of having random fallible covariables in ANCOVA
are twofold. First, the unreliability of the covariables decreases
the statistical power of the omnibus F test. This is indicated
by inspection of the expected value of the mean square error for

ANCOVA.which contains the factor

2(

O 2)
e

1 - R

where a: is the population within treatments variance on the depen-
dent variable and R is the multiple correlation of the covariables
with the dependent variable. The unreliability of the covariables
attenuates the multiple correlation, thus inflating the expected
mean square error and bringing about a concomitant decrease in the
statistical power of the test for treatment effects. The loss in
power is somewhat analogous to the loss in power of ANOVA due to
unreliability of the dependent variable. No suggestions for cor-

recting these problems with power were offered, other than the

67
obvious one of using the most reliable but still valid measures
possible in both ANCOVA and ANOVA.

The second effect of having random fallible covariables in
ANCOVA is far more disquieting. The unreliability of the covariables
can cause ANCOVA to provide biased estimates of the treatment main
effect, i.e., test the wrong hypothesis. Consider a treatment

effect for ANCOVA.with two covariables:

a I - B a - B a
X X ’
Y1. 0LY1. 1. z 31.
where “Y I “Y - “Y , treatment effect on Y;
1. 1. "
ax I “X - “X..’ treatment effect on X;
i i.
a I u - u , treatment effect on 3, and
81 81 Zoe

BX and 3% are within treatment group regression coefficients.

l

If random assignment is a part of the design, allax 's and 68
i. i.
are zero and the ANCOVA treatment effects, EY , are equal to the
1.
Y1.
less of the values of Bx and 88‘ The sole purpose of using ANCOVA

unadjusted ANOVA of Y treatment effects, a This is true regard-

rather than ANOVA is to improve statistical power. Since no bias is

present in the adjusted treatment effects, ANCOVA is still appropriate.
When random assignment has not been employed, the 0x1 '8 and

a3 '3 will typically not be zero and the primary motivatiOn for

03163 ANCOVA is to remove the initial difference on X and 8 from

the effects on Y. When X and Z are fallible, Cochran (1968) has

shown that BX and 83 will be biased estimates of the regression

68

slopes defined on the latent true variables. Since ax and dz
1. i.

are unaffected by errors of measurement, given classical measure-
ment assumptions, the use by classical ANCOVA of the biased regres-
sion coefficients will remove the wrong amounts of initial dif-
ference on X and 8, and thus not test the treatment effects defined
on the latent true variables. Since the researchers' hypotheses
are in terms of the latent true variables, ANCOVA using fallible
covariables tests the wrong hypothesis. What is needed is a pro-
cedure which tests the hypothesis that the adjusted effects defined
on the latent true variables are equal to zero.

Lord (1960) provided the first method for correcting ANCOVA to
test the hypothesis of no adjusted treatment effects defined on the
latent true variables, but his solution was restricted to the case
of a single independent variable with only two levels and required
multiple observations on a single covariate. Porter (1967, 1973)
provided a solution to the problem‘which can be used in complex
designs and requires only an estimate of the reliability of the
covariable, but is restricted to the case of only a single covariable.
DeGracie (1968) has provided a solution similar to Porter‘s but
computationally more difficult. Procedures suggested here deal with
the multiple fallible covariable case and both are extensions of
the reasoning underlying Porter's single covariable solution.

The study investigated three procedures and two situations:
Situation I, two independent random fallible covariables, Situation
II, two intercorrelated random fallible covariables. In Situation

I, estimated true score covariables were used in ANCOVA. In

69
Situation II, two proposed procedures, Method A and Method B, were
investigated. Method A consisted of l) substituting estimated true
scores for each observed covariable, and 2) correcting for attenua-
tion the relationships between the estimated true score covariables.
Method B had two steps: 1) one covariable was transformed to make
it orthogonal to the other; 2) estimated true scores were substi-
tuted for the two orthogonal covariables and computations proceeded
as for classical ANCOVA. Analytic investigation showed that the
estimated true score procedure in Situation I and Method A in
Situation II (given equally reliable covariables) test the right
hypothesis. Method B did not test the right hypothesis under any
conditions.

A Monte Carlo study was conducted to investigate the small
sample distributional properties of the three procedures. For
independent random fallible covariables, the estimated true scores
ANCOVA provided satisfactory Type I error rates for the two group
design but too liberal Type I error rates for the four group
design. None of the procedures provided satisfactory Type I error
rates for two correlated random fallible covariables. This was
true despite the fact that Method A yielded average adjusted
treatment means and average pooled within regression coefficients
in close agreement with desired values. As yet the problem of
multiple random fallible covariables in ANCOVA of quasi-experiments

is unresolved.

BIBLIOGRAPHY

BIBLIOGRAPHY

Adcock, R. J. A problem in least square. Analyst, 1878, 2, 53-54.

Atiqullah, M. The estimation of residual variance in quadratically
balanced least squares problems and the robustness of the F
test. Biometrika, 1962, 42, 83-92.

Atiqullah, M. The robustness of the covariance analysis of a one
way classification. Biometrika, 1964, 51, 365-372.

Berkson, J. Are there two regressions? Jaurnal of'American Sta-
tistical Association, 1950, 32, 164-180.

Box, George E. P. and Andersen, S. L. Permutation theory in the
derivation of robust criteria and the study of departures
from assumption. JOurnal of the Beyal Statistical Society,
l955,_lz, 1-26.

Cochran, W. G. Analysis of covariance: Its nature and uses.
Biometrics, 1957,.13, 261-281.

Cochran, W. G. Errors of measurement in statistics. Technometrics,

Cochran, W. G. Some effects of errors of measurement on multiple
correlation. JOurnal of'American Statistical Association,
1970, 62, 22-34.

Cox, David R. The use of a concomitant variable in selecting an
experimental design. Biometrika, 1957, 44, 150-158.

Cox, David R. Planning of’Ebperiment. New York: Wiley, 1958.

Cronbach, L. J. and Furby, L. How we should measure "change"--or
should we? Psychological Bulletin, 1970, 14, 68-80.

DeGracie, James. Analysis of covariance when the concomitant

variable is measured with error. Unpublished Ph.D. Thesis,
Iowa State University, 1968.

70

71

Dorff, Martin. Large and small sample properties of estimators
for a linear functional relationship. Unpublished Ph.D.
Thesis, Ames, Iowa, 1960.

Elashoff, Janet D. Analysis of covariance: A delicate instrument.
American Educational Research JOurnal, 1969, 6, 383-401.

Evans, 8. H. and Anatasio, E. J. Misuse of analysis of covariance
when treatment effect and covariance are confounded. Psycho-
logical Bulletin, 1968, 62, 225-234.

Finn, Jeremy D. Multivariance...Univariate and multivariate analysis
of variance, covariance, and regression: A FORTRAN IV pro—
gram. State University of New York at Buffalo, June 1968.

Finney, D. J. Stratification, balance, and covariance. Biometrics,
1957, 12, 373—386.

Glass, Gene V., Peckham, Percy D. and Sanders, James R. Conse-
quences of failure to meet assumptions underlying the fixed
effects analysis of variance and covariance. Review of
Educational Research, 1972, 5;, 237-288.

Gulliksen, Harold. Theory ofTMental Tests. New York: Wiley, 1950.

Harnquist, K. Relative changes in intelligence from 13 to 18.
Scandinavian JOurnal of’Psychology, 1968, 2, 50-82.

Harris, Chester W. Problems in Measuring Change. Madison, Wisconsin:
University of Wisconsin Press, 1963.

John, Peter W. M. Statistical Design and.Analysis of’Emperiments.
New York: Macmillan, 1971.

Kendall, M. G. Regression, structure and functional relationship,
Part I. Biometrika, 1951, 38, 11-25r

Kendall, M. G. Regression, structure and functional relationship,
Part II. Biometrika, 1952, 32, 96—108.

Kirk, Roger E. Experimental Design: Procedures fbr the Behavioral
Sciences. Belmont, California: Brooks/Cole, 1968.

Lord, F. M. Large sample covariance analysis when the control
variable is fallible. American Statistical Association
JOurnal, 1960, 25, 309-321.

Lord, F. M. A paradox in interpretation of group comparisons.
Psychological Bulletin, 1967, 68, 304-305.

72

Lord, F. M. Statistical adjustments when comparing pre-existing
groups. Psychological Bulletin, 1969, 12, 336-337.

Madansky, Albert. The fitting of straight lines when both variables
are subject to error. American Statistical Association
JOurnal, 1959, 24, 173-205.

McSweeney, Maryellen and Porter, A. C. Small sample properties of
nonparametric index of response and rank analysis of
covariance. Paper presented at the AERA Convention,

New York, 1971.

Myers, Jerome L. Fundamentals of Experimental Design. Boston:
Allyn and Bacon, 1966.

Peckham, Percy D. An investigation of the effects of non-
homogeneity of regression slopes upon the F test of analysis
of covariance. Laboratory of Educational Research, Report
No. 16, University of Colorado, Boulder, Colorado, 1968.

Porter, Andrew C. The effects of using fallible variables in the
analysis of covariance. Unpublished Ph.D. Thesis, Madison,
University of Wisconsin, 1967.

Porter, A. C. How errors of measurement affect ANOVA, regression
analyses, ANCOVA and factor analyses. Paper presented at
the AERA Convention, New York, 1971.

Porter, A. C. Analysis strategies for some common evaluation para-
digms. Paper presented at the AERA Convention, New Orleans,
February 1973.

Porter, A. C. and Chibucos, T. R. Selecting analysis strategy. In
Gary Borich (ed.), Evaluating Educational Programs and
Products. Educational Technology Press, 1974.

Roos, C. F. A general invariant criterion of fit for lines and
planes where all variates are subject to error. Metron,
1937, 13, 3-20.

Scheffé, Henry. The Analysis of Variance. New York: John Wiley
& Sons, 1959.

Smith, H. Fairfield interpretation of adjusted treatment means and
regressions in analysis of covariance. Biometrics, 1957,

g, 282-308.

Stroud, T. W. F. Comparing conditional means and variances in a
regression model with measurement errors of known variances.
Journal of the American Statistical Association, 1972, 61,
407-414.

73

Stallings, Jane A. Follow Through Program Classroom Observation
Evaluation 1971-1972. California: Stanford Research
Institute, August 1973.

Tukey, John W. Components in regression. Biometrics, 1951, 1,
33-70.

Wald, Abraham. The fitting of straight lines if both variables

are subject to error. Annals of'Mathematical Statistics,
1940, g, 284—300.

winer, B. J. Statistical Principles in Emperimental Design. New
York: McGraw-Hill, 1962.

 

lllill'.
Jul 11.1 I.

lli
[1

5111‘. I’ll P4 ‘1
. .‘I

111111

11111

111111

3

0
El:
TM”

1

11111