ENVESTEGATION OF TWO PROPOSED SOLUTEONS TO THE Mb‘LTiPLE FALUBLE COVARZABLE PROBLEM FOR QUASI - EXPERIMENTS Dissertation forlthe Degree of Ph. D, a ' MICHIGAN STATE UNIVERSITY ‘KOWET PRAVALPRUK 1974 ‘1. .41 .‘I. I ' rHF-‘CH‘; MiChi:?.. ‘1 ,l: _' K) . , . . ‘. 4 LIL. ‘I“z ‘ J J 4‘ . 5” ' " 3 Um ‘» crsj :7 :‘ Kern-.m.., "A" N“ {- ""‘ 9...... a -5 "an" “ ‘. -a—LL" _...ul.'_"_‘__._.L- 2-.th -J This is to certify that the ._‘. thesis entitled INVESTIGATION OF TWO PROPOSED SOLUTIONS TO ,THE MULTIPLE FALLIBLE COVARIABLE PROBLEM FOR QUASI-EXPERIMENTS presented by Kowit Pravalpruk has been accepted towaIds fulfillment of the requirements for limb—degree in Education— ~.. at: Hrs ABSTRACT INVESTIGATION OF TWO PROPOSED SOLUTIONS To THE MULTIPLE FALLIBLE COVARIABLE PROBLEM FOR QUASI-EXPERIMENTS By Kowit Pravalpruk L One of the assumptions underlying classical analysis of covariance (ANCOVA) with random covariables is that the covariables are observed free from errors of measurement. When random assign- ment of experimental units to levels of the treatment independent variable is an aspect of the experimental design, failure to meet the perfectly reliable covariables assumption decreases the sta- tistical power of ANCOVA but does not cause it to test biased treatment effects. When random assignment is not an aspect of the design, however, and ANCOVA is being used to correct for initial differences among treatment groups on the covariables, use of less than perfectly reliable covariables not only decreases the power of ANCOVA, but also causes ANCOVA to test biased treatment effects. Several correction procedures have been suggested for the single fallible covariable design. The intent of this thesis was to extend the earlier work by describing two alternative correction procedures for the multiple fallible covariable design, demonstrate their properties in terms of population parameters, and All) (DOW ‘ (9 Kowit Pravalpruk empirically investigate the sampling distributions of their test statistics, i.e., probability of Type I error and power. First, a brief review of past work on the single fallible covariable problem was presented. Next, the effects of errors of measurement in multiple linear regression were incorporated into the multiple covariable model. Finally, the two proposed solutions to the multiple fallible covariable problem were described and their properties investigated, first analytically and then via a Monte Carlo study. A solution to the multiple fallible covariable problem requires a procedure that provides unbiased estimates to the regression coefficients defined on the latent true variables. An existing single covariable solution, the substitution of estimated true scores corrected bivariate regression coefficients between the dependent variable and each covariable, but left uncorrected the bivariate regression coefficients among the covariables. Thus Method A con- sisted of l) substituting estimated true scores for each observed covariable, and 2) correcting for attenuation the relationships between the estimated true scores covariables. A second approach to the solution of the general problem, Method B, was motivated by the simplified situation which exists for uncorrelated covariables. Method B can be described for two covariables as follows: 1) one covariable is transformed to make it orthogonal to the other; 2) estimated true scores are substi- tuted for the two orthogonal covariables and computations proceed as for regular ANCOVA. Kowit Pravalpruk The two correction procedures were investigated analytically to determine whether they test the correct hypothesis when there are two fallible covariables in a quasi-experiment. The conclusions were: 1) if the latent true covariables are uncorrelated, estimated true scores ANCOVA tests the desired hypothesis; 2) when the latent true covariables are correlated but have equal reliability Method A tests the correct hypothesis; and 3) Method B does not appear to test the hypothesis of interest under any circumstances. The remain- ing question to be answered was how do the small sample distributions of the various test statistics behave? A computer program for the CDC 6500 computer system at the Michigan State University Computer Center was written to get empiri- cal F distributions of the estimated true scoreyANCOVA when two fallible covariates were independent of each other, and of the two proposed correction methods when two fallible covariables were related. All distributions were based on 1000 samples and empirical a's and power were reported for nominal levels of .10, .05, and .01. A pseudo-random unit normal deviate generator was used to generate observations from a trivariate normal distribution with known parameters. The results of the Monte Carlo investigation of estimated true scores ANCOVA for two uncorrelated fallible covariables, the two treatment quasi—experiment design, 40 observations per treatment and the correlations of latent true covariables with the dependent variable each .7 as were the reliabilities of each covariable, yielded slightly liberal Type I error rates, but within two standard Kowit Pravalpruk errors for all three nominal values, .10, .05 and .01. As was expected, the statistical power of estimated true scores ANCOVA was substantially lower than that for latent true covariables. When the number of treatments increased to four, the estimated true scores ANCOVA empirical Type I errors were markedly discrep- ant from the nominal values (.177, .109 and .037 at nominal values of .10, .05 and .01 respectively). These discrepancies were found even though average pooled within regression coefficients were correct (.703 and .700). The empirical Type I error rates from the Monte Carlo study of Methods A and B for two correlated fallible covariables (.2 intercorrelation between the two latent true covariables) were not within the range of practical utility for either method, even though the analytic demonstration suggested that Method A tested the right hypothesis. Several modifications of Method A were proposed in an attempt to decrease too liberal Type I error rate, none of which was successful. INVESTIGATION OF TWO PROPOSED SOLUTIONS TO THE MULTIPLE FALLIBLE COVARIABLE PROBLEM FOR QUASI-EXPERIMENTS BY Kowit Pravalpruk A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services and Educational Psychology 1974 ACKNOWLEDGEMENTS I am especially grateful to my committeechairman, Professor Andrew C. Porter, for his endless advice, time and encouragement throughout all phases of my study. Special thanks goes to Pro- fessor Maryellen McSweeney, Professor Robert L. Ebel and Professor Kenneth J. Arnold for their help and valuable comments. Working in the Office of Research Consultation provided me the most valuable experience which I will never forget. Many thanks to my friends and colleagues in the Office who were very kind to me, to Professor Andrew C. Porter and Dr. John H. Schweitzer, who gave me this job, and to Professor Maryellen McSweeney, who gave me two years of research assistantship prior to the Office job. I wish to thank my wife, Sor-Wasna, and my daughter, Puttaporn, for giving me some time to work on my study. My parents, sisters and brothers are appreciated for their support and sympathy. Without funds from the Thai Government and skillful typing from Ms. Janice Fuller, this research could not have been completed. ii TABLE OF CONTENTS CHAPTER Page I INTRODUCTION. . . . . . . . . . . . . . . . . . . . . 1 Classical Analysis of Covariance (ANCOVA). . . 3 Random Covariable Measured with Error. . . . . 9 Proposed Problem . . . . . . . . . . . . . . . 15 Purpose of Study . . . . . . . . . . . . . . . 16 II REVIEW OF ANCOVA WITH A SINGLE FALLIBLE COVARIABLE. . 18 III ANCOVA WITH MULTIPLE FALLIBLE COVARIABLES . . . . . . 29 MethOd A o o 0 o o o o o o o o . o o o o o o o 32 MethOd B C O O C C O O O O O Q 0 O . O O . O O 33 Monte Carlo Study. . . . . . . . . . . . . . . 36 Parameter Setting . . . . . . . . . . . 39 Data Generating . . . . . . . . . . . . 43 Estimating Reliabilities. . . . . . . . 46 ANCOVA O C C C 0 O C O O C . C . O O O O O O 48 Method B ANCOVA . . . Distribution Building Print Output. . . . . . . . . . . . . . 50 MethOd A ANCOVA o o o o o g o o o o o o 48 IV RESULTS 0 O I O O O O O O O O O O O O I O O O O I O O 51 Estimated True Scores ANCOVA with Independent Covariables. . . . . . . . . . . 51 Methads A and B O O Q 0 O O O O O Q 0 O O O O O 59 V SUMMARY AND CONCLUSION. . . . . . . . . . . . . . . . 65 BIBLIOGRAPHY O O O O O O O O O O O O O O O O O Q 0 O O 0 O O O 70 iii LIST OF TABLES Table Page 1 DeGracie's Analysis of Covariance Using Corrected Slope b3. 0 I O O O O O O O O I O I O O O O O O O O O 27 2 Sources of Variation for Analysis of Covariance with Two Covariables. . . . . . . . . . . . . . . . . 38 3 Design of Study . . . . . . . . . . . . . . . . . . . 40 4 Example Distribution Based on 10,000 Trivariate Cases Generated by RANN . . . . . . . . . . . . . . . 45 5 Empirical Type I Error, Statistical Power, Average Mean Square and Average Adjusted Means for Estimated True Scores ANCOVA. . . . . . . . . . . . . 52 6 Empirical Cumulative Distribution of Regression Coefficients from Estimated True Scores ANCOVA. . . . 54 7 Empirical Type I Error, Statistical Power, Average Mean Squares and Average Adjusted Mean for Estimated True Scores ANCOVA. . . . . . . . . . . . . 56 8 Empirical Cumulative Distribution of Regression Coefficients from Estimated True Scores ANCOVA . . . . 57 9 Empirical Type I Error and Statistical Power for Estimated True Scores ANCOVA Using Population Reliabilities . . . . . . . . . . . . . . . . . . . . 58 10 Empirical Type I Error, Statistical Power, Average Mean Squares and Average Adjusted Means for Method A and Method B . . . . . . . . . . . . . . . . 60 11 Empirical Cumulative Distribution of Regression Coefficients for Method A and Method B. . . . . . . . 61 12 Empirical Type I Error and Statistical Power of Estimated True ScoresANCOVA with Some Additional Correction Methods. . . . . . . . . . . . . . . . . . 63 iv LIST OF FIGURES Figure Page 1 Effect of errors of measurement, Lord's example . . . 12 2 Flow chart of the computer program. . . . . . . . . . 42 CHAPTER I INTRODUCTION Analysis of covariance (ANCOVA) has been employed widely in educational research both in random and non—random assignment designs. The latter is of particular interest here. A current example of giant proportions is Follow Through "planned variation", which has used ANCOVA to compare students' achievement between those in Follow Through and non-Follow Through groups and among various "sponsors" within Follow Through (Stallings, 1973). The desired comparisons were whether, "other things being equal", there were differences on dependent variables among various groups. Since there was no random assignment of students to "sponsors" nor to Follow Through/non-Follow Through, potential confounding variables, e.g., age, sex, ethnic origin, months of Head Start experience, days absent, and Wide Range Achievement Test score (pre-test) needed to be "controlled" post hoc. Thus far analyses of Follow Through data have relied heavily on ANCOVA procedures. The heavy reliance on ANCOVA to control for initial differ- ences in non-random assignment designs coupled with the fact that ANCOVA was originally conceived for other purposes prompts careful reconsideration of its assu.mptions. One of the assumptions is that random covariables are observed free from errors of measurement. 1 2 When random assignment of experimental units to levels of the treat- ment independent variable is an aspect of the experimental design, failure to meet the perfectly reliable covariables assumption decreases the statistical power of ANCOVA but does not cause it to test biased treatment effects. When random assignment is not an aspect of the design, however, and ANCOVA is being used to correct for initial differences among treatment groups on the covariables, use of less than perfectly reliable covariables not only decreases the power of ANCOVA, but also causes ANCOVA to test biased treatment effects. Several correction procedures have been suggested for the single fallible covariable design. The purpose of this thesis was to extend the earlier work by describing two proposed correction procedures for the multiple fallible covariable design, demonstrate their properties in terms of population parameters, and empirically investigate the sampling distributions of their test statistics, i.e., probability of Type I error and power. First, however, an important caveat is necessary. There is no perfectly acceptable solution to the problem of estimating causal relationships from quasi or naturally occurring experiments. Per— haps ANCOVA is a useful procedure in some situations, but it is clearly limited by the care and ingenuity used in selecting the covariables. The problem here addressed is not how to select a useful set of covariables or even whether that task can ever be accomplished. Rather, the problem concerns the effects of errors of measurement in situations where a useful set of covariables has been identified. 3 The remainder of this chapter presents the linear model for a one-way ANCOVA with one and two covariables, summaries of the full set of assumptions, and discusses in detail the assumption of error free covariates. Finally, the purpose of the present work is stated formally. Chapter II provides a review of the literature on ANCOVA with a single fallible covariable and Chapter III provides a review of the literature of the multiple fallible covariable problem, a description of the two proposed solutions, and an analytic demonstration of the hypothesis tested by each. Chapter III concludes by describing the Monte Carlo study designed to investigate the distributional properties of each proposed solution. Results of the Monte Carlo study are presented in Chapter IV and conclusions of the thesis in Chapter V. Classical Analysis of CovariancegjéNCOVA), For simplicity, discussion of ANCOVA will be based on the one- way analysis of covariance model, although generalization to more complex designs is not difficult. The one-way analysis of covariance linear model for a single covariable is: Yij = ”Y.. + “Y1 + BY.X(xij‘“X..) + 813’ where Y’ denotes dependent variable for the ith treatment and 13 3‘“ individual, “Y denotes the grand mean of the dependent variable, &Y denotes the adjusted ith treatment effect, 1 BY.X denotes the pooled within bivariate regression slope of Y on X, 4 1th treatment and jth individual, Xij denotes covariate for the ”X.. denotes the grand mean of the covariable, and eij denotes the error term. The null hypothesis to be tested is: For P covariables, the adjusted treatment effects are defined: E s a = a - a where a's are defined as previously and Bk denotes the pooled within kth covariable. multiple regression coefficient of When two covariables are included in ANCOVA, the linear model has one more term, the explanable part of the dependent variable which belongs to the second covariable, Yij ' “Y” + “ri + Bx(xij'“x..) + 52(313'“3..) + eij’ where zij is the second covariable, Bx is the pooled within multiple regression coefficient for X, B8 is the pooled within multiple regression coefficient for Z, "Z is the mean for second covariable Z, and other notations defined previously. The adjusted treatment effect can be expressed in terms of the unadjusted dependent variable effects, the first covariable effects and second covariable effects: or a = a - B and the null hypothesis to be tested is: I Ho: 2 ( a -8 i=1 Y1 xaxi-Bzaz)2 = 0 The use of covariables in ANCOVA serves two basic purposes. First, to the extent that the covariables are related to the dependent variable, the error variance for hypothesis testing and interval estimation is reduced. For example, if pYX denotes 2 the correlation of dependent variable Y and covariate X and 0e denotes the error variance when the covariable has not been used, the error variance for ANCOVA is 2 2 ~ 1 ‘ 08(1 pr){1+?;:§} where fe denotes the degrees of freedom for the estimate of 0:. The second purpose of ANCOVA is to remove group differences on the covariable. In other words, all treatment effects on the . dependent variable are adjusted by subtracting out a multiple of the group differences due to the covariate. The adjusted treatment effects are defined 1 ‘ Yi " BY.x°‘xi’ where a's denote treatment effects (treatment group mean minus the grand mean), BY.X is the pooled within regression coefficient for predicting the dependent variable from the covariable, and a ” sign denotes adjusted. 6 Assumptions and conditions underlying the analysis of covariance model are explained in three categories: assumptions on error in the model, assumptions and side conditions on the treatment effects, and assumptions on the covariable. Although the third category of assumption provided the motivation for this thesis, a quick overview of all assumptions is appropriate. First consider assumptions on the error component: 1. Errors (e11) are normally distributed. 2. Variances of the distributions of errors are the same across all treatment groups. 3. There is no relationship between errors and treatment effects and among errors themselves. Violation of the normality assumption only slightly affects Type I error and statistical power given that the covariable is normally distributed (Box and Anderson, 1962; Atiqullah, 1964). The robustness of the F test against violations of the homogeneity of variance assumption depends upon the degree of heterogeneity of variance in the covariable (Potthoff, 1965). The effect of viola- tion of the independence assumption is the same as for ANOVA. Elashoff (1969), Glass, Peckham and Sanders (1972) have given complete discussions of the robustness of the F test statistic concerning violation of these assumptions in ANCOVA. Scheffé (1959) and Atiqullah (1962, 1964) have also discussed ANCOVA robustness both analytically and from empirical results. Now consider assumptions and side condition on the treatment effects. 7 1. There is random assignment of subjects to treatment groups (Elashoff, 1969, p. 386). 2. The sum of all weighted treatment effects equals zero (2 niai - 0). 3. For all subjects within a treatment group, the treatment effects are the same, i.e., the dependent variable is a linear combination of mutually independent components: grand mean, treatment effect, a linear regression on covariable and error term. There is some disagreement about the first assumption. Lord (1969) and Cronbach and Furby (1970) both recommended against using ANCOVA when randomization has not been used. In addition, Evans and Anastasia (1968) warned that ANCOVA was likely to be misleading when used in situations where intact groups and treat- ments occur together naturally. In a letter to Porter, however, Cronbach said that ANCOVA can be useful to adjust for bias due to covariate when randomization had not been done. On the other hand, Porter (1973), Elashoff (1969) and Harnquist (1968) agreed upon the utility of analysis of covariance even though randomization could not be made. The second condition is a restriction on the model or side condition for fixed model. The third assumption is one of no treatment by subject interaction, i.e., an assumption of additivity. This assumption is primarily useful for interpretation of results (Porter, 1973) and violation of the assumption implies the need for a more complex model. The third set of assumptions concerns the covariable. 8 1. The covariable is fixed and measured without error. 2. There is no treatment effect on the covariable. 3. There is a linear* relationship between the dependent variable and covariable. 4. The regression slope is the same across all treatment groups. This is subsequenced under the third assumption in the second category of assumptions. Violation of the first assumption will be discussed in more detail later. The second assumption is needed when the covariable is measured after the treatments have been operating. In situa— tions where marked departures from a linear relationship are likely, blocking or matching procedures should be used or the appropriate degree of relationship should be added to the model (Atiqullah, 1964; Finney, 1957; Kirk, 1968). Violation of the fourth assump- tion indicates that there is an interaction between the covariable and the treatment (Elashoff, 1969). In the presence of this treatment- slope interaction the F-test may yield misleading results (Atiqullah, 1964). Peckham (1968) found that the F-test became conservative as the differences among slopes were increased. For more discussion of the effect of violating these assump- tions, see Elashoff (1969), Glass, Peckham and Sanders (1972) and Smith (1957). On additivity see Porter (1973). * ANCOVA is also available for non-linear relationship when the appropriate model is used. 9 Random Covariable Measured with Error The assumption of primary interest here is that the covariable be fixed and measured without error. Violations of the assumption of a fixed covariable do not have serious effects on the validity of testing hypothesis about means. The least squares method still can be applied and unbiased estimates obtained. "The only dif- ference from the classical result is that the variance of the estimates (a1 and B) are averaged over all values of X (covariable)" (DeGracie, 1968, p. 48). Throughout the discussion in this study, errors are assumed to be random and to meet all classical assumptions. Let "H repre- sent the error of measurement on the true covariable Xj, then uj satisfies the following assumptions. 1. uj is normally distributed with mean of zero and variance of oi, i.e., Zuj - 0 for all j. 2. All correlations among errors are zero, i.e., puu' - 0. 3. There is no relationship between errors of measurement and their associated true scores, i.e., pux. - 0. In educational research, the potential covariable is usually measured by one of the standardized tests, e.g., IQ tests, achieve- ment tests, and aptitude tests. Scores from these tests are considered to be random and less than perfectly reliable. When the covariable is random and measured with error, the least squares regression slope is a biased estimator of the slope defined on the latent true variable. Consider the following relationship in the linear regression model: 10 Y = B' + B'X (j = 1,2...n) j o J h r Y a Y' + w e e j j vj, X - ' j Xj + uj, Xj, Yj denote observed scores, X3, Y5 denote true scores, uj, vj denote errors, and 8;, 8' denote the true regression coefficients. The least squares estimate becomes an unbiased estimator of E(B) = _£'._ 2 0 1+3 2 Ox. where 8' is the slope defined on the latent true variables, i.e., the structural relationship, 8 is the least squares slope defined on the observed variables, and OX' is variance of latent true X. 2 0u Thus, . E {(1 + 7-98} = 8' ox. From classical measurement theory (Gulliksen, 1950), 2 0' u l + 2 ox. is the reciprocal of the reliability of the covariate. In other words, the least squares regression slope divided by the reliability of the predictor is an unbiased estimator of the structural rela- tionship (Berkson, 1950; Porter, 1967; DeGracie, 1968). 11 There are several ways to get an estimate of the reliability. When one form of the test instrument is administered on only one occasion, internal consistency reliability can be obtained in a variety of ways, e.g., Kuder-Richardson methods, Hoyt's procedure through ANOVA, Split-half method. When more than one form of the same instrument is used or the same form has been administered twice to the same group of subjects, the correlation between the two measures is an estimate of the reliability. Choice of method for estimating reliability is an important and complex topic in its own right and will not be discussed further here. Given the definition of reliability and the pro— perties of errors of measurement, a good point estimate is needed. Simulations in this thesis used a parallel forms method of estimat— ing reliability. Although the bias of the least squares regression coefficient for estimating the structural relation has been recognized since 1878 (Adcock, 1878), the information was not considered in work on ANCOVA until as late as 1960 (Lord, 1960). Lord (1960) described a situation where there was no treatment effect on the dependent variable, but errors of measurement on the covariable affected the adjusted means resulting in a spurious treatment main effect. Figure 1 illustrates the situation discussed by Lord (1960). If there was no error on X, conditional variances were small as repre- sented by the narrower contours in Figure 1. When X was measured with error, conditional variances were larger as represented by the outer ellipses. When there was no error of measurement, the 12 GROUP A /’ Figure 1. Effect of errors of measurement, Lord's example. l3 intercepts of both groups A and B were the same. When there were errors of measurement, intercepts were not the same. The differ— ence between intercepts YA and YB represent the spurious treatment main effects produced by errors of measurement. Porter (1967) has illustrated all possible situations of treatment effects resulting from errors of measurement on a single random fallible covariable. Lord proposed a large sample solution which is restricted to a two group design with two observations on a single random fallible covariable. Building on Lord's work, Porter proposed an estimated true scores solution which at least in theory is not limited by the complexity of the design to be analyzed (Porter, 1967; Porter and Chibucos, 1974), but is limited to a single covariate and does require an estimate of the reliability of the covariable. Briefly, estimated true scores ANCOVA is computationally identical to traditional ANCOVA with the single exception that estimated true scores are used as a substitute covariable (Porter and Chibucos, 1974). A Monte Carlo investigation of the two pro- cedures for the single covariate two group design indicated that both were equally satisfactory (Porter, 1967). In the same study, the utility of estimated true scores ANCOVA was also demonstrated for a one way layout with four treatment groups. Mbre recently DeGracie (1968) has proposed a solution quite similar to estimated true scores ANCOVA, and cited the above- mentioned Monte Carlo investigation as support for the utility of his test statistic. Stroud (1972) has also proposed a solution for the two group case with a single fallible covariable and argued 14 that it is readily extendable to more complex designs. As yet, however, no small sample distributional investigations have been done on the Stroud statistic. Thus far work has been restricted to a single random fallible covariable. In conclusion, Lord's statistic is restricted by the number of treatment groups, and computation of DeGracie's ANCOVA requires knowledge about error of measurement variance. Porter's estimated true score procedure is direct and simple. Once the covariable is transformed, the computation can be performed by any classical ANCOVA computer program. For one-way design, the transformation is where X13 represents the estimated true scores of X13, ‘Ri. represents the 1th group mean on X, and p represents reliability of X. XX Since Porter's procedure has been studied in a one-way design with one covariable only, an attempt should be made to broaden the procedure, both by complexity of design and number of covariables. When Porter's estimated true score procedure is used in complex designs, the question of which means should be used in calculating estimated true scores is raised. Discussing this problem, Porter suggested that for large sample size per cell, cell means produced stable estimates of means and should be used. He demonstrated that using cell means, one can test all hypotheses with correct F ratios. However, when the sample size per cell is small, cell means are probably not good. In this situation marginal means provide better 15 estimates when testing the hypothesis about main effects. When one set of marginal means is used, however, other main effects and interactions are not corrected. In a memorandum to Smith, Porter concluded that "Estimated true scores have no effect on tests of dimen- sions summed over in calculating the means used in calculating the estimated true scores nor on the inter- actions of those dimensions with other dimensions in the design. Therefore, it is important to use means which do not involve summing over dimension for which con- trolling covariable differences are important or for which interactions with dimensions for which controlling covariable differences are important. Using cell means will work for any design having more than one observation per cell, however, the fewer the observa- tions the less stable the procedure. Obviously using the cell mean for designs having one observation per cell is a do nothing operation." Proposed Problem In the multiple covariable case, Porter's estimated true score ANCOVA is not directly applicable. If estimated true scores are used, correction is done on effect of first fallible covariable on dependent variable and effect of second fallible covariable on dependent variable only. There is no correction due to the com- bined effect of both fallible covariables. This combined effect vanishes when there is no correlation among the covariables, or reliabilities of both covariables are high (Cochran, 1968). For the two covariable case, a possible solution is to correct the bivariate regression slope between the two fallible covariables before estimated true scores are substituted. The other possibility is to make the two fallible covariables independent of each other by transforming the second fallible covariable to be independent 16 from the first fallible covariable, then, apply the estimated true scores procedure. The hypothesis tested and the sampling distribution for these two proposed solutions are in question. Which of the two is better? Are these two procedures as good as estimated true scores in the one fallible covariable case? Purpose of Study The purpose of this study was to investigate whether Porter's estimated true score ANCOVA yielded a good fit when applied with additional corrections to the two fallible covariable situation for one-way designs. Two additional correction methods were proposed. Method A. Correct for attenuation of correlation between the two covariables before computing beta weights of the two estimated true score covariables. Method B. Transform second fallible covariable to be inde- pendent from the first fallible covariable and then apply estimated true scores procedure. The first part of the study investigated analytically whether the two proposed solutions test the right hypothesis. The second part of the study was a Monte Carlo investigation of the distribu- tional properties of several F test statistics at three levels of nominal a, i.e., .10, .05 and .01. Specifically, the investigations were done in two separate situations: Situation I, estimated true scores applied to ANCOVA with two independent fallible covariables; l7 Situation II, Method A and Method B applied to ANCOVA with two intercorrelated fallible covariables. In the empirical investigation of Situation I, F distributions were generated for 1) error free covariates, 2) fallible covariates, 3) estimated true score covariates. The criteria of Type I error rate and power for a single non-central case were used to compare the three types of F distributions. In Situation II, F distributions were generated for 1) error free covariates, 2) fallible covariates, 3) Method A, 4) Method B. The same criteria for comparing F distributions were used for Situation II as for Situation 1. Both Situations were also described by average mean squares, average adjusted treatment effects and their variances. Finally, empirical cumulative distributions of the two within regression coefficients from all configurations described earlier were collected. Averages and variances of these regression coef- ficients were compared among each other and to the known desired values. CHAPTER II REVIEW OF ANCOVA WITH A SINGLE FALLIBLE COVARIABLE That ANCOVA tests biased treatment effects when there are initial differences on a random fallible covariable can be seen through inspection of the null hypothesis for a one-way ANCOVA. The null hypothesis can be stated I 2 Z {a - B a } B 0 1,1 Yi Y.X xi ’ where aYi is the 1th treatment effect on the dependent variable Y, axi is the 1th treatment effect on the random fallible covariable X, and BY.X is the least squares pooled within 1 slope of the regression of Y on X. Although errors of measurement satisfying classical measurement assumptions do not cause bias in the estimation of 0Y1 and ex , they do cause a bias in using the least squares regression coef- ficient as an estimate of the regression coefficient defined on the latent true variables, i.e., the structural relationship of Y on X. The bias of the least squares regression coefficient for estimating the structural relationship is a function of the reliability of the random covariate and can be stated as 18 19 . .1. BY'.X' pxx BY}. where primes denote latent true variables and pxx denotes the relia- bility of X. For a fallible covariable the effect of using the least squares regression coefficient in ANCOVA is a function of the values of axi. If as in a random assignment design the axi's are all zero, ANCOVA will test the desired hypothesis. For quasi-experiments, however, the axi's are typically not zero, and so ANCOVA tests biased estimates of the adjusted treatment effects. The bias can result in either a spurious rejection of the null hypothesis or a spurious retention of the null hypotheses, depending upon the values of the axi's (Porter, 1967). Most studies of the effect of errors of measurement in regres- sion (Madansky, 1959; Dorff, 1960; Lord, 1960; Porter, 1967; DeGracie, 1968) have given the honor to Adcock (1878) as the first person to notice that the least squares procedure provides a biased estimate of the structural relationship when the predictor is fallible. Biasedness of the least squares estimate of the regression slope defined on the latent true variable was also reported in Roos' paper (1937). Corrado Gini was first to point out that the biased regression slope from the least squares procedure was larger than the regression slope for the latent true variables. Madansky (1959), Dorff (1960), Cochran (1968) and Porter (1967, 1971) have presented complete reviews of the effects of errors of measurement in regression, structural and functional relationships. 20 Following DeGracie (1968), consider the linear equation YJ - 3") + s'xj'+ej (j = 1,2...n) where X5 denotes the true measure, X5 - Xj-u , and J uj denotes the errors of measurement. Assume that uj is normally distributed with E (1.11) . O, puu -0 1*j' j J' and vau I 0 J 3 By substitution the linear relationship becomes Yj - B; + B'XJ-B'uj+ej Yj - BO +B'Xj-i-wJ The covariance of error term w and X is J 3 (X -E(XJ)) - wJ(X -X' “’33 JJ - wjuj. For individual 1, his expected value is his true score. E(wjuj) = E{(ej-B'uj)uj} 2 =—' B on Since the error term is not independent from the covariable, the least squares procedure is inappropriate. However, DeGracie (1968) 21 proved that the least squares estimator was biased by the fraction of 2 0u “'7 02" or MS) - 8 O2 1+——‘2Jl ox, as discussed previously. Hence, 1 . E(E-—fi) ' B XX where pXX denotes reliability as defined in classical measurement theory (Gulliksen, 1950), i.e., Lord (1960) proposed a U-statistic to solve the ANCOVA problem for two groups with duplicate measures on a single fallible co- variable. His statistic was defined : “’n__v _ 1A "n -H_-!_—l U (Y Y ) 580(X1+X2 X1 X2) where U was adjusted mean difference, X“ was unadjusted mean of dependent variable for first sample, T" was unadjusted mean of dependent variable for second sample, >fl " was mean of first duplicate fallible for first sample, _1 was mean of first duplicate fallible for second sample, 22 i was mean of second duplicate fallible for first sample, X“ was mean of second duplicate fallible for second sample, and 2 2 5 1 1 _ .Y_ 182 _ 1 —H —n__v__v A Var(U) (NT + N") {g w + 580(k w)}+z-»(X1+X2 X1 X2)2Var so where S -.X. l - k RT“ 0 w { Now } N° - N'+ " Var(é ) - —1——{v2(l‘—2- - 21‘- + 1- (k+w)} o Nowz w2 w) 2 3 . ley-l-szy where Oxy - —— within sample 2 3: +3: “ l 2 o --——————-wdthin sample 0" cv“:: k a N oxx +N o}SE O N N'O' +N"O’" KY NO I I H H N Sx1x2+N lexz w = and Duplicate measures on the fallible covariable provided the estimate of error variance or reliability of the covariable to be used as the correction term for the relationship between the dependent variable and the first duplicate measures of the covariable. Lord's U-statistic was normally distributed for large sample size. 23 A more general procedure which does not necessarily require duplicate measures on the covariable was developed by Porter (1967). It requires knowledge of reliability of the covariable or some estimate of it. Computation is straightforward. As described previously, instead of using the fallible covariable, the fallible measures of the covariable are transformed into estimated true scores. The estimated true scores are then used in regular analysis of covariance computations. Estimated true scores are defined as X13 " x1. + pXX(xij-xi.) where pxx denotes the reliability of X. Since the estimated true score covariable is a linear transformation of fallible covariable within levels of i, then: 1. The within 1 correlation between the dependent variable and fallible covariable is equal to the correlation between the dependent variable and the estimated true score covariable. 2. The group means, i.e., for each level of i, and grand mean of the estimated true score covariable are equal to those of the fallible covariable as well as unbiased estimates of the infallible covariable means which cannot be observed. 3. The slope of the regression line between the dependent variable and the estimated true score covariable equals the slope of the regression line between the latent true variables. The direct effect of the estimated true score transformation was to reduce the variance of the covariable. The variance of estimated true scores equal to the variance of the fallible 24 covariable times the square of the reliability coefficient. To understand the estimated true scores, one should refer to classical theory of measurement. Let X = X' + u where X denotes the observed score, X' denotes the true score, and u denotes the error of measurement. Under assumptions of the classical theory, 2 2 2 Ox - Ox: + Gus Let X denote the estimated true scores, then within levels of i, Var(X) - Var(Xi ) + 2p Cov(X xx 1’ X-ijx i.) 2 .— + px§8r(xij-Xi.)° Treating X; as a constant, hence, 2 2 2 .-p o x xxx 0‘2_ 2 °x ‘J‘x(1 pxx) -02_02 X G where 0% denotes error variance of predicting true scores by observed scores. Therefore, the transformation of the covariable to esti- mated true scores is equivalent to taking errors of measurement variance from the observed variance of fallible scores. 25 Porter's estimated true score ANCOVA produced F-ratios distri- buted approximately as the central F—distribution when the null hypothesis defined on the latent true variables was true. The fit of Porter's procedure was as good as Lord's U-statistic for the two group design. Another approach has been presented by DeGracie (1968). He discussed relaxation of the assumption of having a fixed covariable measured without error. He showed that when the covariable was random but free from error, unbiased estimates and valid confidence intervals as well as valid statistical tests could be obtained from the usual analysis procedures, but that the variance of the estimates was averaged over all value of covariable. His discussion went on to consider the situation where the covariable was measured with error but fixed. A corrected regression slope, b3, was pro- posed and proven to be an unbiased estimator of true slope. o2 o: b3.7x.¥.{1+-:;(E“+ %—2“ )}-1 Sxx Sxx Sxx whereS --—1——{22(X -X )(Y -Y- )} XY t(n-l) 1 3 11 1° 13 1" ’ xij - xij+uij where uij was errors of measurement, “ _ _ 2 SXX' t(n- _1) {i j(X13 -Xi )2 t(n l)ou }, o: was variance of errors of measurement and known, t was number of treatments, and n was cell frequency. 26 Since there was a relationship between b3 and the adjusted mean square within, the test statistic was not distributed as F. Craméi's theorem, which is the same as Lord also employed, was used to find an approximate solution. Finally DeGracie concluded that for a two group design the test statistic (Ti-:1) - (Ti-T3) A oTi-Tj where Ti, TJ were estimates of adjusted treatment effects, T1,TJ were true treatment effects, O§i_§ was an estimate of standard deviation of sampling distribution of differences, was asymptotically normal. Later, DeGracie illustrated the use of b3 to form an index of response and analysis of covariance under the assumption that the fallible covariate was random. For large sample size, his test statistic, the ratio of adjusted mean square treatment (MST) and adjusted mean square error (MSE) was distributed approximately as F with t-l and t(n-l)-l degrees of freedom. His analysis of covariance is shown in Table 1. His procedure was similar to Porter's and he cited Porter's empirical results to support the utility of his test statistic. When measurement error variance is known, another approach has been developed by Stroud (1972). Stroud's procedure compares conditional variances with conditional means for a two group deSign. He claimed that his procedure can be extended to more than the two group design with "no great technical difficulties." In his 27 Table l. DeGracie's Analysis of Covariance Using Corrected Slope b3 Sources df SS ' Corrected SS df M.S. F Treatment t-l £(V; 4?. )2 Error t(n-1) 2(Y114Ti.)2 (1) t(n-l)-l MSE Total nt-l 2(Y114T' )2 (2) nt-2 Adj. Treatment (2)-(l) t-l MST MST/ MSE (l) . ij(Yij-Y1 )z-b b3 13(x13‘x 1 )2 -2t(n-l)o: (2) . §1(Y134Y;.)2-b§ ij(xij4§;.)2-2(tn—1)o§ development he set measurement error variance to be 1.0. The Wald statistic for the comparison of two conditional means of dependent variable was given as follows: - - - - 2 vv _ l___ 'u..- (g1+gz)(b1 b2)+h1{(Y1Y2hz(Y1 Y2)} +h2{Y1 Y2 b1(X1- 1(2)}2 — — f (g' 1+32) (h' 1+115) + (xl-xz) hihé where b1 - SXY/(Sxx-l) within group 1, b2 - SXY/(Sxx-l) within group 2, n1 - (sisxx +b2)/(Sxx -1> (1 = 1.2) 1 xx1 8i ' 81/31: hi - hi/ni’ and 2 -2 gll - sW 'SJZKY1(SXX -1)1+sxY (Sxx -1) . i i 1H1 28 The Wald statistic was distributed as x2 with 2 degrees of freedom. Data from a study in Portland, Oregon, were presented as an example of the application of the procedure. CHAPTER III ANCOVA WITH MULTIPLE FALLIBLE COVARIABLES Unfortunately the problem of multiple fallible covariables in ANCOVA is more complex. Consider the null hypothesis for a onedway ANCOVA with two random fallible covariables: 2 i X X1 3 31 E { a i=1 Y where again the a's denote treatment effects on the dependent variable Y and the covariables X and z, and BX and B3 are the pooled within 1 least squares regression coefficients for predicting Y from X and 8 respectively. As before, the estimates of the a's are left unbiased by the introduction of errors of measurement which follow classical assumptions, but the least squares regression coefficients are biased estimates of the corresponding regression coefficients for the latent true variables. The nature of the bias is _ spun - p'fizpzz) + egegmu - pzppxx 1 _. 9 X ,2 p xszxpzz and 33 1__ t2 9 xapxxpza where primes denote statistics for the latent true variables, pxx and Oz; are the reliabilities of X and B respectively, and 9X8 is 29 30 the correlation between the latent true X and E (Cochran, 1968). A useful restatement of the above two expressions in terms of bivariate statistics is _ °xx5§.x - pxxpzzB§.232.x BX 1- '2 pxszxpzs V V ' pzzBY.z ' pxxpsaBY.xBx.z and 32 - '2 , 1 ' pxszxpzz For more than two predictors, Cochran (1970) gave a general relation- ship between Bk and Bi as 2 O - ' § ' B! Bk kakk- k k' Okk' k' l-o 0‘ Bk ‘ kakk ‘ k5k"—-%:E;'Bk'.k3k' p'kk' '2 where 81"..k' . B£!.k - pkk' ' Cochran concluded that "Thus the direct effect of an error in Xk on 8 is to decrease its absolute value to pkkBk or somethIng less, but Bk also receives contributions from errors of measurement in any other Xk, that is correlated with Xk. Even if such errors occur in only Xk, they can affect the values of all the Bk" By working a few examples with varying p and Bk , it becomes evident that interpretation of IHe Bk as if they were the Bk can become quite misleading unless all the p k are big " (p. 34, notation changed to be consistent with that in present paper). A solution to the multiple fallible covariable problem requires a procedure that provides unbiased estimates of the regression coefficients defined on the latent true variables. The substitu- tion of estimated true scores for the observed covariables does not 31 adequately solve the general problem, but does provide a solution in the restricted case of uncorrelated latent true covariables. For uncorrelated latent true covariables, - 0 and BX.Z - B 0, v I I pxz 2.x and the two regression coefficients become _ ' .. I 8x ‘ pxst.x’ 8% ” pZZBY.Z ‘ Applying estimated true scores gives ' Bx/Pxx ' Bi.x and u) )4) m m A ' 53/923 ' 3i.z ' The point of breakdown for the estimated true scores solution to the general problem provided a suggestion for a new procedure, Method A. Use of estimated true scores corrected bivariate regres- sion coefficients between the dependent variable and each covariable, but left uncorrected the bivariate regression coefficients among the covariables. Thus Method A consisted of l) substituting estimated true scores for each observed covariable, and 2) correct- ing for attenuation the relationships between the estimated true scores covariables. A second approach to the solution of the general problem, Method B, was motivated by the simplified situation which exists for uncorrelated covariables. Method B can be described for two covariables as follows: 1) one covariable is transformed to make it orthogonal to the other; 2) estimated true scores are substituted 32 for the two orthogonal covariables and computations proceed as for regular ANCOVA. Method A First consider the effects of Method A on the pooled within regression coefficients for a one-way ANCOVA having two fallible covariables. Using standard ANCOVA procedures, the population regression coefficient for one of the covariables, X, is _ swaoswrx - swxzesms swx-swz - swxs2 8x where SW denotes a sum of squares within. Substituting estimated true scores for X and Z replaces SWX with péxSWX, SWZ with piaSWZ, SWXZ with pxxpzzSWXZ, SWYX with pxxSWYX, and SWYZ with pzzSWYE, where pXX and pzz are the sample reliabilities of X and B respectively. It follows that 2 C — 2 O B“ _ pzszxSWZ SWYX pzszxSWXZ SWYE X p2 p2 swx-swz - p2 p2 swxz 22 xx 22 xx where X - x1,-pXX(xij - Xi.) denotes estimated true scores for X. The expression can be simplified to 33 -¥¥— SWYX -l—- swxz . SWYE B _ pxx wa " pxx swx swz x 1 _ swxz swxz ° swx ‘ swz Further correction for attenuation of the relationship between the 'covariates by dividing SWXZ by the square root of the product of the reliabilities of X and 3 results in, D —l;-SWYX _ _§§._l_.swxz —l—-Ssz . _ pxx swx pxx pxx swx ° 922 swz x 1 _ —l—._l;.swxz .waa pxx pzz wa ' swz By substitution, p 88 ' BY.x ' ‘fii§BE.XBY.E .2 X8 1 ' 9 Thus when 938 - pXX’ Bi is equal to the regression coefficient for the latent true X. Similar steps result in the parallel conclusion that the regression coefficient 83, provided by Method A, is identi- cal to the regression coefficient for the latent true 3 when 98% - pXX’ Since substitution of estimated true scores does not change the means of the covariables, it follows that Method A tests the desired hypothesis. Method B Method B starts with a transformation on the second covariable that results in a new variable which is orthogonal to the first covariable. The transformation used was W13 ' 313 ' B2'3.Xxij ’ 34 where 53.x denotes the pooled within regression coefficient for predicting 8 from knowledge of X. It should be noted that for perfectly reliable X and 8, use of covariables X and W does not change the hypothesis tested. The null hypothesis tested by Method B is i { 1 1 2 i=1 0‘Yi " qY-XGXI ' Wfiawi} ' o where BY.X and BY.W are bivariate regression coefficients since X and W are uncorrelated. The question is whether this null hypothesis is identical to the desired null hypothesis stated in terms of the latent true variables Y, X and 8. Since W is a new variable, its reliability must first be estimated. By definition, 9 _ varEW') WW var W) ’ where W' denotes the latent true W. But var(W) - var(z) + B: xvar(X) — 28%.xcov(X,E) 2 - var(Z)(l - pxz )’and var(W') - var(a') + B: xvarKX') - 288.xpxz\//‘var(8)var(X) - pzzvarw) + 912:; %% pxxvar(X) - ZPXE‘ "33%)1 px3 \frar(x)var(8) - var(3)(pzz - 2012‘; + pfiszx). Thus, Further, B _ covSY,W) Y-W var(W) - Bz xcov(Y,X) 2 var (Z) (1 - on) cov(Y,E) _ BY.z - BX.BBY.X l - OXZ and HW “z ‘ 33.x”x' The last two terms in the squared quantity for stating the null hypothesis for Method B can now be restated as __L (11 -u)-1 (u -3 U.-u +8 11) ’Oxx BY.x x1. x.. 'EEEBY.W 31. 3.x x 3.. 2.x x.. 1. ' ‘(BY.x/0xx ' 3Y.sz.x!°wu)(uxi ' “x..) ‘ £34! (“a ‘ “a..)- . 9W 1. By further substitution, BY.x/°xx ' BY.w82.x/°ww becomes 2 29 p rxz 2 xx ' _ -———— _ I 1 BY.X(1 p88 Zszpa ) BY.% 2 xBxx - 3 p 9 XS 2 xx 36 and BY.W/pww becomes where again primes denote statistics for the latent true variables. Since the two expressions do not simplify to the regression coef- ficient for the latent true X and 8 covariables, it follows that Method B does not test the desired hypothesis. In retrospect the error in logic was that the transformation forced the manifest variables to be orthogonal, but not their latent true counterparts. Monte Carlo Study Thus far the two modifications of ANCOVA have been considered as to whether or not they test the correct hypothesis when there are two fallible covariables in a quasi-experiment. The conclusions were: 1) if the latent true covariables are uncorrelated, estimated true scores ANCOVA tests the desired hypothesis; 2) when the latent true covariables are correlated but have equal reliability, Method A tests the correct hypothesis; and 3) Method B does not appear to test the hypothesis of interest under any circumstances. The remaining question to be answered was, how do the small sample distri- butions of the various test statistics behave? A computer program for the CDC 6500 computer system at the Michigan State University Computer Center was written to get empiri- cal F distributions of the estimated true score ANCOVA when two random fallible covariates were independent of each other, and of the 37 two proposed correction methods when two random fallible covariables were related. The program was composed of seven major parts: parameter setting, data generating, estimating reliability, comput— ing ANCOVA on true score, estimated true scores with the Method A correction and with the Method B correction, and distribution building. The remainder of Chapter III provides an overview of the computations involved in a one-way ANCOVA with two covariables followed by a discussion of each of the seven parts of the computer program. Analysis of covariance was defined by the following linear model. Y1; ' “y.. + (it + 8x (X13 ' "x..) + 32 (‘13 ‘ "a..) + 913' The null hypothesis about treatment main effects is: or Ho: i (6.1)2 . o. (1 . 1,2...1) Sums-of—squares and cross-products are presented in Table 2. Using the computational guides from Kirk (1968) and Winer (1962), the regression coefficients in Table 2 were defined as follows: _ STZ ° STYX - STXZ ° STYZ bTX STX - STz - STxiZ ’ STX - STYZ - Ssz . STYx sz ' 2 ’ STX - STZ - STXZ .SWZ'SWYX-SWXZ'SWYZ and wa 2 , SWX ’ SWZ - SWXZ 38 _wa-SWYz-waz-SWYx b swx - swz - swxz2 WZ Table 2. Sources of Variation for Analysis of Covariance with Two Covariables Sums of squares and crossgproduct Sources df Y X Z YX YZ XZ Between t-l SBY SBX SBZ SBYX SBYZ SBXZ Within t(n-l) SWY SWX SWZ SWYX SWYZ SWXZ Total tn-l STY STX STZ STYX STYZ STXZ Adjusted Total tn-3 STY* - STY - bTxSTYX - szSTYZ Within t(n-l)-2 SWY* - SWY - waSWYX - bszWYZ Between t-l SBY* - STY* - SWY* The notation for a sum-of-squares is as follows: S denotes a sum-of-squares the letter following S denotes total (T) or within (W), the final letter(s) denote(s) the variables involved. The subscripts on the b's are similar: the first subscript denotes total (T) or within (W) the second letter denotes the covariable. 39 The F test statistic was defined as SBY* / t-l SWY* / t(n-l)-2 which was distributed as the F distribution with t-l and t(n-l)-2 degrees of freedom. Parameter Setting One of the purposes of this study was to compare the results with Porter's study (1967). To facilitate the comparisons most parameters were set the same as Porter's. Table 3 indicates all possible combinations of the parameters included in this Monte Carlo study. A "X" indicates the cells used in this study. Relia- bilities of the two covariables were equal in all situations invesh tigated due to the previously noted limitations of Method A. Consistent with Cox's suggestion (Cox, 1957), the lowest correlation between a latent true covariable and the dependent variable was .6. An intermediate value of .7, and a maximum value of .9 were also in the design. Reliabilities of both fallible covariables were .5, .7, and .9. Correlations higher than .9 and reliabilities lower than .5 were considered out of the range of interest to educational researchers. The one-way ANCOVA balanced design with sample sizes varied from 10 observations per cell to 15, 20 and 40 at the largest. The number of treatment groups were 2 and 4. 40 De I «z ON I mz ma I NZ ca I Hz m. h. n. a. N. n. a. N. m. m. N. m. a. h. m. L7 “0 h. m. #2 m2 NZ Hz #2 m2 NZ H2 «no Nuu o. I NMQ #2 m2 NZ Hz #2 m2 NZ Hz sun «as N. U NNQ #2 m2 NZ H2 #2 m2 NZ Hz QIU NIH .N o I “MO xenon mo gunman .m manna 41 The intercorrelation between the two latent true covariables was kept low, varying from .0 to .2 and .4. Using multiple covariables with high intercorrelations has little practical utility and should be avoided. The number of iterations per configuration was 1000. Three levels of nominal a (probability of a Type I error) were chosen, i.e., .10, .05 and .01. These three levels of significance are those most frequently used in practice. Further, comparisons for a's lower than the .001 level may sometimes be misleading (Glass, Peckham and Sanders, 1972). Use of .001 itself is rarely seen in research and decision making. For studying statistical power, the same nominal a's were used. The non-central case was created by adding one-half standard devia- tion of the marginal distribution of the dependent variable to each observation in one treatment group. Theoretical statistical power in the two treatment design for latent true variable would be .99 and .98 for the four group case at nominal a of .05, cell size of 40 and multiple correlation of .9 (péx - .7, 9Y8 - .7 and 9X8 - .2). A flow chart for the computer program is presented in Figure 2. In the first phase of the computer program, as shown in Figure 2, the following parameters were defined: NT, number of treatment groups NB, number of observations per cell RHOX, reliability of X covariable (m ) N/ CLEAR COUNTER SET PARAMETERS _ L READ IN PARAMETERS 42 l4 DATA GENERATING J, [INITIALIZED l COMPUTE ELIABILITIES SFORMATION COMPUTE sm MD sm ANCOVA METHOD A e—> Figure 2. ADD .5 TO FIRST GROUP COMPUTE ave. & var. MSE' MST' R a Puwma cwmrm CENTRAL CASE l METHOD B Flow chart of the computer program. FREQUENCY COUNT FOR NON-CENTRAL PN-.5 43 RHOE, reliability of E covariable RHYX, latent true correlation between Y and X RHYE, latent true correlation between Y and E RHXB, latent true correlation between X and-E F(l), theoretical F value at .10 nominal a level F(2), theoretical F value at .05 nominal a level F(3), theoretical F value at .01 nominal a level Data Generating Each pseudo-random normal deviate was obtained by calling sub- routine RANN which was written in machine language (COMPASS). RANN was a CDC 6500 version of RANSS which was created for the University of Wisconsin computer (Porter, 1967). The unit normal generator involved two stages. First, the multiplicative congruent method was used to generate sixteen pseudo-random.numbers from.a uniform distribution. Second, the sixteen numbers were summed and linearly rescaled to provide a pseudo-random unit normal deviate via the Central Limit Theorem. The RANSS program has been tested on random- ness as well as goodness of fit and found to have good properties on both criteria (Porter, 1967). The subroutine RANN required a starting number in octal digit specified as parameter RANDOM. Each time the program was run, parameter RANDOM was changed to insure independency among resulting F and distributions of beta weights. Changing the starting number was achieved by changing the RANDOM card in subroutine RANN. These starting numbers were selected randomly prior to running the program. 44 To create dependent variable Y and latent true covariables X' and 3', RANN was called three times giving three random normal deviates a, b and c. The three deviates were used in the following relationships: X' = a 2 I. I _I Z apxz-l-bv1pxz 2 2 Y - apilx + Bib l-sz + c l-R' where R' was the multiple correlation for predicting Y from X' and 3', i.e., 2 R' = B + 8 v prx EpYz ' The resulting variables X', Y and B' were distributed as trivariate normal each with expected value zero, variance one and bivariate intercorrelations DXZ’ 9Y3 and 9&8. Example distributions of random normal deviates generated by RANN, correlation coefficients, means and variances of the dependent variable Y and the two latent true covariables X' and 8' are con- tained in Table 4. Each statistic in Table 4 was based on 10,000 trivariate cases. All statistics were in close agreement with the known parameter values, thus providing strong support for the validity of the data generator. The next step was to create fallible X and 8 variables. Two more random normal variates were called from RANN. These two variates were multiplied by their corresponding standard errors of measurement and the results were added to X' and 8' respectively. 45 Table 4. Example Distribution Based on 10,000 Trivariate Cases Generated by RANN Variable _ Mean H Variance Skewness Kurtosis Normal 0.0 1.00 0.0 0.0 a -0.0078 1.0066 0.0367 -0.0857 b -0.0073 0.9826 0.0457 -0.ll92 c 0.0053 0.9814 -0.0257 0.0144 Y 0.0031 0.9869 rYx - .699 DYX - .70 X -0.0039 0.9859 rYz - .898 9Y8 - .90 8 0.0053 0.9863 rX8 - .394 9X3 - .40 Using classical measurement theory (Gulliksen, 1950), error variances corresponding to reliabilities were found. Given that 2 Ox, Dxx 9 0x! + on then 1 - p 0' I 402' ’ U pxx x or the positive square root is the standard error of measurement. Let X and 3 denote the fallible X' and 8' covariables. Then, - I X1 X + dlou , a I 81 Z + d2°w , 46 where d1, d2 were random normal deviates, ou was the standard error of measurement for X, and 0V was the standard error of measurement for E. Estimating,Reliabilities Lord's idea of duplicated variables (Lord, 1960) was followed to provide test-retest (or parallel form) reliability estimates. Therefore, an additional set of falliable observations were needed for each latent true covariable. RANN was called twice again. Each random normal variate was multiplied by the corresponding standard error of measurement and the result was added to the true score on X' and 3' respectively, _ I where d3 and d4 were the additional randmm normal deviates. All seven variables (Y, X', 8', X1, 31, X2 and 82) were created until the desired sample size was achieved. The correlation coef- ficient between X1 and X2 was computed and served as the estimate of the reliability of X. Similarly, the correlation between 81 and 32 served as the estimate of the reliability of E. The two relia- bility coefficients were denoted by SXX and 628' Example of 100 reliability coefficients generated by this procedure with sample size of 40 yielded the average of .704 and variance of .0015 for the parameter value of .700. I..IIIII.III|' . i ill’gl‘l'll'l‘ [III I 47 A quasi-experimental design was simulated by creating differ- ences among the covariable means. After all observations were generated, a different constant was added to each observation in a treatment group. The set of constants was the same for each covariable. Means of both covariables for the first treatment group were 6.0 in both the 2 and the 4 treatment group designs. In the two treatment case, means for the second group were 0.0. In the four treatment group design, means for both second and third groups were 3.0 and the mean was 0.0 for the fourth group. Group means on the dependent variable were computed internally in the program. Latent multiple regression coefficients were used to calculate these means so that for latent true covariables, all adjusted treatment effects would be zero in the central case. These means were computed by the following relationship: 11 - B'u + B'u . Y1. x xi. 3 31. To create the non-central case, the value of 0.5 (half standard deviation of the marginal distribution of the dependent variable) was added to each observation in the first treatment group. In doing this, the non-central case data and the central case were related to each other. Since there was no attempt to compare sta- tistical power with Type I error, however, dependency between the central and the non-central F distributions was not considered a problem. Obviously, the double use of pseudo-random normal deviates provided a great saving in computer time. 48 ANCOVA Basic statistics, sums-of'squares and cross-products were come puted as shown previously in Table 2. Analysis of covariance on the dependent variable Y with X' and 3' as covariables was calculated first. Accuracy of the program was tested against the Finn program (Finn, 1968) and agreement found. The F ratios as well as beta weights were compared. To ensure the accuracy, hand calculation was performed on the same set of test data and agreement obtained. In addition to ANCOVA on the true covariables, ANCOVA on the two fallible covariables was also performed. Again, the outcomes were compared to those from the Finn program and the hand calcula- tion and were in agreement. Finally, the two proposed correction methods A and B were calculated and checked for accuracy. Method A ANCOVA Fallible variables X1 and 31 were used as covariables in the Method A ANCOVA. The two fallible covariables were transformed to their estimated true scores, )2 =x +5 (x x) ij 11. xx 1ij 1i. and i=3 +5 (2 -z ) ij 11. 33 lij 11. where X1 and 31 were treatment means for the two respective i. i. covariables. Basic computations were done again to find sums of squares and cross-products. Before the adjusted sums-of—squares of the dependent variable Y were calculated, the cross-products between the two estimated true score covariables were corrected for attenuation: 49 z STX3A - STX pxx “as and SWX3A - SWX3 pxx 923 7 The adjusted cross-products STX3A and SWX3A were used in the compu- tation of the total and pooled within regression slopes of the X1 and 31 covariables. Method B ANCOVA In the Method B correction procedure, the second fallible covariable, 31, was transformed into a new variable, W, which was uncorrelated with the first covariable, X This transformation 1. was done by taking out the part of the second fallible covariable that was predictable from the first fallible covariable. Thus the new covariable was defined W I Z - B X 2.x ' 11 111 113 with reliability, ,. 93% 29x3 + px25"xx 9XX 1 .2 ' pxz Analysis of covariance using the two estimated true score covariables of X1 and W was then performed. The F ratio was ob— tained as well as the two within regression coefficients. 50 Distribution Building Twelve distributions were built in the frequency counting phase. Four of them were F distributions corresponding to the four ANCOVA's performed. Only the right tail of four F distributions was accumulated. Three frequencies corresponding to the probabili- ties of Type I error at nominal a, .10, .05 and .01 were collected for each F distribution. The theoretical F values at each of the three a levels for specified sample sizes and number of treatment groups were read in the first phase of the program, the parameter reading routine. These F values served as reference points for the frequency counts. In addition the distributions of eight regression coefficients (two for each of the forms of ANCOVA's) were built. The distribu- tions covered the range of values from .5 to .9 in intervals of width .025. Print Output After 1000 iterations were done, the average and variance of each regression coefficient, mean square (treatment and within) and adjusted treatment mean was computed. Then all distributions, means and variances of regression coefficients, mean squares and adjusted treatment means were printed in the final phase (see Figure 2). CHAPTER IV RESULTS Estimated True Scores ANCOVA with Independent Covariables The results of the Monte Carlo investigation of estimated true scores ANCOVA for two uncorrelated random fallible covariables are provided in Tables 5 through 11. As stated previously, the sample size per treatment group was 40, and the correlations of latent true covariables with the dependent variable were each .7 as were the reliabilities of each covariable. Empirical Type I errors, statistical power, average mean squares, and average adjusted means for the two treatment design are given in Table 5. The Type I error rates for estimated true scores ANCOVA were slightly liberal but within two standard errors for all three nominal values. By contrast the results using latent true covariables were slightly conservative but also within two standard errors. The inappropriateness of using fallible covariables in ANCOVA for quasi-experiments was clearly supported by the .999 empirical Type I errors for all three nominal values. As was expected, use of latent true covariables resulted in substantially greater power than estimated true scores ANCOVA. The difference in power is explained by two factors. First, the multiple correlation of estimated true scores is identical to that for the 51 52 Table 5. Empirical Type I Error, Statistical Power, Average Mean Square and Average Adjusted Means for Estimated True Scores ANCOVA with t . 2, n - 40, 9XX - .70, p35 - .70, p§x - .70, 9Y3 - .70, 9X3 - .00, "X1 - ”31 - 6.0, "X2 - ”32 - 0, qu - 8.4, “Y2 - 0.0 CENTRAL NON-CENTRAL Nominal a .10 .05 .01 .10 .05 .01 TRUE COV. .092 .044 .005 .973 .950 .814 EST. TRUE .115 .063 .013 .200 .128 .046 FALLIBLES .999 .999 .999 1.000 1.000 1.000 MEANS SQU. BETWEEN WITHIN BETWEEN WITHIN TRUE COV. .0178(.0006) .0200(.00001) .2694(.0203) .0200(.00001) EST. TRUE .34ll(.2253) .3150(.0026) .5292(.4906) .3150(.0026) FALLIBLES 9.3305(13.25) .3150(.0025) i3.2614(20.38) .3150(.0026) ADJ. MEANS T1 T2 T1 T2 TRUE COV. -0.002(.018) -0.0007(.0005fi .497(.0189 * EST. TRUE -0.055(.508) -0.0002(.0115N .444(.508) * FALLIBLES 2.523(.208) -0.0008(.00861 3.022(208) * __'_ Primes denote parameters on latent true variables. * Same as in central case. 53 fallible covariables, while the multiple correlation for the latent true covariables is consistent with a correction for attenuation. Second, the variance of the estimated true scores covariables is equal to the variance of the corresponding fallible covariables multiplied by the respective squared reliabilities. The variance of the latent true covariables, however, is equal to the variance of the corresponding fallible covariables multiplied by the respec- tive reliabilities. The smaller variance of estimated true scores covariables operates to further dampen the power of the procedure. The slight liberal tendency of estimated true scores ANCOVA was also reflected in the average mean squares for the central case, which showed the average mean square between to be slightly larger than the average mean square within. Support for the earlier analytic demonstration that the procedure tests the correct null hypothesis was given by the average adjusted means. For the central case the average adjusted means were -.055 and -.0002, which were very close to the desired values of zero. Average pooled within regression coefficients and cumulative distributions for the 1000 samples are provided in Table 6. The average coefficients were .703 and .707 for the two covariables, which were very close to the .7 value of the population coefficient for the latent true covariables. As was expected, the standard errors for the regression coefficients were substantially larger for estimated true scores than for latent true covariables. The reasons are the same as those given previously for the discussion of statistical power. 54 Table 6. Empirical Cumulative Distribution of Regression Coefficients from Estimated True Scores ANCOVA with t - 2, n - 40, 9XX - .70, p33 - .70, p§x - .70, 9;; - .70, pi; - .00, Si - .70, a; - .70 TRUE COV EST. TRUE FALLIBLES X 3 X 3 X 3 .50 4 1 572 569 .55 22 21 873 878 .60 92 83 981 975 .65 0 0 258 237 997 998 .70 490 506 515 494 1000 1000 .75 1000 999 746 743 .80 1000 884 875 .85 958 956 .90 985 978 .95 990 995 MEAN .701 .700 .703 .707 .489 .490 VAR .0003 .0002 .0069 .0065 .0029 .0030 Primes denote parameters on latent true variables. 55 Tables 7 and 8 contain comparable data for the four treatment group design. Number of treatment groups did not noticeably alter the results for ANCOVA using latent true and fallible covariables. The estimated true scores ANCOVA empirical Type I errors, however, were markedly discrepant from the nominal values, i.e., .177, .109, and .037 for nominal values of .10, .05, and .01 respectively. This was true despite the average adjusted means being quite close to the desired value of zero, i.e., -.021, -.017, -.011, -.0020. Further, the average pooled within regression coefficients, as shown in Table 8, were .703 and .700. It was hypothesized that the use of sample reliabilities for calculating estimated true scores was the cause of the liberal nature of the F test statistic. Therefore, the simulations were replicated using population reliabilities. The empirical Type I errors and statistical powers for the replications are reported in Table 9. For both the 2 and 4 treatment groups designs the empirical Type I errors for estimated true scores ANCOVA were slightly closer to the nominal values than they had been using estimated reliabilities. For the 4 treatment group design, however, the empirical Type I errors were still quite liberal, i.e., .163, .097, .029 for nominal values of .10, .05, and .01 respectively. The liberalness of the estimated true scores test statistic for the four treatment design is consistent with, but much more pro- nounced than, that found for a single fallible covariable (Porter, 1967). With a single fallible covariable, however, the empirical Type I error rates were still within the bounds of practical utility, 56 MENU HNHU—ku EH m.“ Qam¥ .moanmwum> menu unuumH do mumuuauumm museum moafium I R I “NHH.Vm~o.m Aseoo.vm~oo.- Aemo.e~o~.H Awmo.emm~.H AuHH.van.N mmamngoo meme «8 me we as «a me Ne He mzemz .nae I AeH~.memsmm.e Amsoo.emaem. Aaem.~vwwmo.s memHgaoo meme szeHz zmmzemm zemaH: zmmzemm .m.z ooo.e ooo.e ooo.e ooo.a ooo.e ooo.H mmamHaqoo mama Ho. no. OH. Ho. no. OH. 5 Hmaeaoz aooz< muuoom wank vauuafiumu you new: woumahv< uwuuo>< one moumsrm use: owuuo>< .umzom Museumaumuw .uouum H unha Houaufimfim .n manna 57 Table 8. Empirical Cumulative Distribution of Regression Coefficients from Estimated True Scores ANCOVA With t . 4’ n . 40, pxx - 070, pzz - 070’ péx - 070, 0&3 - .70, pk; - .00, Si - .70, a; - .70 TRUE COV EST. TRUE FALLIBLES X 3 X 3 X 3 .50 0 0 612 600 .55 l 2 953 944 .60 25 29 998 997 .65 0 0 178 196 1000 1000 .70 532 497 490 517 .75 1000 1000 796 814 .80 959 943 .85 991 993 .90 998 999 .95 1000 1000 MEAN .6999 .700 .703 .700 .490 .490 VAR .00014 .00012 .0031 .0033 .0014 .0014 Primes denote parameters on latent true variables. 58 Table 9. Empirical Type I Error and Statistical Power for Estimated True Scores ANCOVA Using Population Reliabilities with n - 40, pXX - .70, 933 - .70, ' I ' I ' I pr .70, 9Y3 .70, pxz . CENTRAL NON-CENTRAL Nominal u .10 .05 .01 .10 .05 .01 T - 2 TRUE COV .099 .045 .008 .963 .943 .819 EST. TRUE .100 .056 .012 .219 .127 .032 FALLIBLES .999 .999 .992 .999 .999 .999 T - 4 TRUE COV .098 .045 .007 1.000 1.000 1.000 EST. TRUE .163 .097 .029 .789 .703 .479 FAELIBLES 1.000 1.000 1.000 1.000 1.000 1.000 Primes denote parameters on latent true variables. 59 i.e., .111, .058, and .013. Porter's single fallible covariable results were for the same parameters except sample size, which was twenty rather than forty per treatment group. Methods A and B The results of the Monte Carlo investigation of Methods A and B for two correlated fallible covariables are presented in Tables 10 and 11. The earlier analytic demonstrations suggested that Method A should test the right hypothesis while Method B should not. Nevertheless, Method B was investigated on the chance that it might have some practical utility. The parameters of the Monte Carlo simulations were as before, with the exception that the latent true covariables had a .2 intercorrelation. Empirical Type I error, statistical power, average mean squares, and average adjusted means for the two treatment designs are given in Table 10. The average adjusted means for Methods A and B sup- ported our analytic findings. The averages for Method A were in close agreement with the desired zero values, i.e., -.070 and -.002, while the averages for Method B were not, i.e., ‘-.514 and -.003. Further support for the analytic work is provided in Table 11. The average values of the regression coefficients for Method A were in close agreement with the desired .58 value, i.e., .586 and .591, while for Method B there was little agreement, i.e., .705 and .640. Unfortunately the empirical Type I error rates were not within the range of practical utility for either method. The finding for Method B was not suprising, but greater hopes were held for Method A. The too liberal nature of the F test statistic for Method A Table 10. 60 Empirical Type I Error, Statistical Power, Average Mean Squares and Average Adjusted Means for Method A and Method B with t - 2, n I 40, pXX - .70, 933 - .70, ' I ' I n I I ' . pYx '70' 0YB '20: 11x1 “81 6°0’ 11x2 ”22 0 0’ “Y1 . 6.96, “Y2 - 0.0 Nominal CENTRAL NON-CENTRAL a .10 .05 .01 .10 .05 .01 TRUE cov. .106 .055 .010 .377 .254 .100 METHOD A 1.000 1.000 1.000 1.000 1.000 1.000 METHOD B .180 .096 .027 .120 .057 .013 FALLIBLES .980 .956 .845 .998 .996 .979 MS BETWEEN WITHIN BETWEEN WITHIN TRUE 00v .188(.066) .184(.0009) .507(.313) * METHOD A 179.1(945.6) .429(.0050) 204.56(1205.7) * METHOD B 1.242(1.99) .399(.0043) .733(.958) * FALLIBLES 5.980(9.48) .398(.0043) 9.369(16.05) * ADJ. MEAN T1 T2 T1 T2 TRUE cov .015(.155) -.000(.005) .515(.155) * METHOD A -.O70(.606) -.002(.014) .429(.606) * METHOD B -.514(.566) -.003(.015) -.014(.566) * FALLIBLES 2.349(.236) * 1.848(.236) -.001(.011) Primes denote parameters for latent true variables. * Same as in central case. 61 Table 11. Empirical Cumulative Distribution of Regression Coefficients for Method A and Method B with t I 2, n I 40, pxx I .70, 038 I .70, péx I .70, 9&8 I .70, OX8 I .20, 8% ".58, 8% I .58 TRUE COV METHOD A METHOD B FALLIBLES X Z X Z X Z X 3 .50 53 63 189 160 29 56 881 868 .55 261 248 366 331 76 153 982 975 .60 639 638 564 552 169 353 1000 996 .65 910 912 732 743 316 562 100 .70 993 994 886 876 496 762 .75 997 1000 961 937 672 874 .80 1000 986 979 805 943 .85 995 994 901 981 .90 997 999 958 995 .95 999 999 976 997 MEAN .581 .582 .586 .591 .705 .640 .427 .431 VARIANCE .0025 .0025 .0093 .0092 .0129 .0090 .0035 .0037 Primes denote parameters on latent true variables. 62 stemmed from a far too large average adjusted mean square for treatments, i.e., 179.1. Three modifications of Method A were proposed in an attempt to decrease the adjusted mean square for treatments, none of which resulted in empirical Type I error rates within the bounds of practical utility. Empirical Type I errors and statistical powers of the three modifications are presented in Table 12 for the four group design with parameters as before. Since the adjusted mean square for treatments was obtained by subtraction of the adjusted mean square within from the adjusted mean square total, all three modifications attempted to reduce the adjusted mean square for total. By studying Method A in detail from the print output of basic computations for 10 iterations, it was observed that correction of the sum-of-squares total cross-product resulted in too low total regression coefficients for both covariables. If the correction was not done to the sum.of—squares total cross-product, the adjusted mean square total would be decreased. The first modification, then, was made to apply the correction to the sum-of-square within cross- product only. The Type I error rates (row 1 of Table 12) were found too conservative, i.e., .031, .021 and .007 for nominal a of .10, .05 and .01 respectively. Statistical powers were also low. The second modification was motivated by the argument that the reliability of a covariate for the total sample should be greater than the pooled within treatment reliability. Thus the estimated true scores and correction to the within treatment cross-products 63 Table 12. Empirical Type I Error and Statistical Power of Estimated True Scores ANCOVA with Some Additional Correction Methods with n I 40, t I 4, pXX I .70, 988 I .70, 92X I .70, 98% I .70, 0&5 I .20 CENTRAL NON-CENTRAL Nominal a .10 .05 .01 .10 .05 .01 FIRST TRUE COV. .109 .053 .015 .974 .932 .818 METHOD A .031 .021 .007 .157 .122 .073 SECOND* TRUE COV. .1 .0 .0 .8 .8 .5 METHOD A 1.0 1.0 1.0 1.0 1.0 1.0 THIRD TRUE COV. .108 .052 .015 .965 .936 .824 METHOD A .999 .999 .999 .999 .999 .999 * Only 10 iterations were performed. 64 were calculated using pooled within reliabilities while the correc- tion to the total cross-products used the reliabilities for the total sample. For ten iterations the obtained Type I error was 1.0 for all three nominal values The third modification was an attempt to decrease the adjusted mean square within treatments after the first modification was employed. The decrease was brought about by correcting for attenu- ation the sum of cross-products between the dependent variable and each covariable. The effect was so great that the Type I error rates jumped to .999 for all three nominal a values. CHAPTER V SUMMARY AND CONCLUSION In educational research it appears that random assignment of experimental units to treatments is frequently not accomplished, yet the researchers are interested in testing whether treatments cause differences in the dependent variable. Program evaluation efforts provide numerous examples to support this contention; for example, the National Follow Through Program and the Head Start Planned Variation Program.' In the past, ANCOVA has frequently been employed to control for at least some of the rival treatment explanations as to any differences or lack of differences found. As has been seen, ANCOVA is not an appropriate strategy if the covariables have less than perfect reliability, which is nearly always true in practice. Estimated true scores ANCOVA provides an increasingly popular solution to the single covariate case; however, most evaluations have multiple fallible covariates. The two procedures (Methods A and B) suggested here are a first attempt at providing solutions to meet their needs. It should be noted that the investigation was an attempt to find a solution to the problem of errors of measurement in random covariables for use in quasi-experiments. In the investigator's opinion, however, there is no perfectly acceptable solution to the 65 66 problem of estimating causal relationships from quasi- or naturally- occurring experiments.‘ Probably the best approach is to use multiple analysis strategies each having somewhat different assumptions. At least three categories of procedures, matching, ANCOVA and ANOVA of the Indices of Responses, are useful in quasi- experiments. When results are invariant across multiple analysis strategies for a given set of quasi-experimental data, conclusions about cause can be relatively strong. When results differ, greater caution in interpretation is warranted. The effect of having random fallible covariables in ANCOVA are twofold. First, the unreliability of the covariables decreases the statistical power of the omnibus F test. This is indicated by inspection of the expected value of the mean square error for ANCOVA.which contains the factor 2( O 2) e 1 - R where a: is the population within treatments variance on the depen- dent variable and R is the multiple correlation of the covariables with the dependent variable. The unreliability of the covariables attenuates the multiple correlation, thus inflating the expected mean square error and bringing about a concomitant decrease in the statistical power of the test for treatment effects. The loss in power is somewhat analogous to the loss in power of ANOVA due to unreliability of the dependent variable. No suggestions for cor- recting these problems with power were offered, other than the 67 obvious one of using the most reliable but still valid measures possible in both ANCOVA and ANOVA. The second effect of having random fallible covariables in ANCOVA is far more disquieting. The unreliability of the covariables can cause ANCOVA to provide biased estimates of the treatment main effect, i.e., test the wrong hypothesis. Consider a treatment effect for ANCOVA.with two covariables: a I - B a - B a X X ’ Y1. 0LY1. 1. z 31. where “Y I “Y - “Y , treatment effect on Y; 1. 1. " ax I “X - “X..’ treatment effect on X; i i. a I u - u , treatment effect on 3, and 81 81 Zoe BX and 3% are within treatment group regression coefficients. l If random assignment is a part of the design, allax 's and 68 i. i. are zero and the ANCOVA treatment effects, EY , are equal to the 1. Y1. less of the values of Bx and 88‘ The sole purpose of using ANCOVA unadjusted ANOVA of Y treatment effects, a This is true regard- rather than ANOVA is to improve statistical power. Since no bias is present in the adjusted treatment effects, ANCOVA is still appropriate. When random assignment has not been employed, the 0x1 '8 and a3 '3 will typically not be zero and the primary motivatiOn for 03163 ANCOVA is to remove the initial difference on X and 8 from the effects on Y. When X and Z are fallible, Cochran (1968) has shown that BX and 83 will be biased estimates of the regression 68 slopes defined on the latent true variables. Since ax and dz 1. i. are unaffected by errors of measurement, given classical measure- ment assumptions, the use by classical ANCOVA of the biased regres- sion coefficients will remove the wrong amounts of initial dif- ference on X and 8, and thus not test the treatment effects defined on the latent true variables. Since the researchers' hypotheses are in terms of the latent true variables, ANCOVA using fallible covariables tests the wrong hypothesis. What is needed is a pro- cedure which tests the hypothesis that the adjusted effects defined on the latent true variables are equal to zero. Lord (1960) provided the first method for correcting ANCOVA to test the hypothesis of no adjusted treatment effects defined on the latent true variables, but his solution was restricted to the case of a single independent variable with only two levels and required multiple observations on a single covariate. Porter (1967, 1973) provided a solution to the problem‘which can be used in complex designs and requires only an estimate of the reliability of the covariable, but is restricted to the case of only a single covariable. DeGracie (1968) has provided a solution similar to Porter‘s but computationally more difficult. Procedures suggested here deal with the multiple fallible covariable case and both are extensions of the reasoning underlying Porter's single covariable solution. The study investigated three procedures and two situations: Situation I, two independent random fallible covariables, Situation II, two intercorrelated random fallible covariables. In Situation I, estimated true score covariables were used in ANCOVA. In 69 Situation II, two proposed procedures, Method A and Method B, were investigated. Method A consisted of l) substituting estimated true scores for each observed covariable, and 2) correcting for attenua- tion the relationships between the estimated true score covariables. Method B had two steps: 1) one covariable was transformed to make it orthogonal to the other; 2) estimated true scores were substi- tuted for the two orthogonal covariables and computations proceeded as for classical ANCOVA. Analytic investigation showed that the estimated true score procedure in Situation I and Method A in Situation II (given equally reliable covariables) test the right hypothesis. Method B did not test the right hypothesis under any conditions. A Monte Carlo study was conducted to investigate the small sample distributional properties of the three procedures. For independent random fallible covariables, the estimated true scores ANCOVA provided satisfactory Type I error rates for the two group design but too liberal Type I error rates for the four group design. None of the procedures provided satisfactory Type I error rates for two correlated random fallible covariables. This was true despite the fact that Method A yielded average adjusted treatment means and average pooled within regression coefficients in close agreement with desired values. As yet the problem of multiple random fallible covariables in ANCOVA of quasi-experiments is unresolved. BIBLIOGRAPHY BIBLIOGRAPHY Adcock, R. J. A problem in least square. Analyst, 1878, 2, 53-54. Atiqullah, M. The estimation of residual variance in quadratically balanced least squares problems and the robustness of the F test. Biometrika, 1962, 42, 83-92. Atiqullah, M. The robustness of the covariance analysis of a one way classification. Biometrika, 1964, 51, 365-372. Berkson, J. Are there two regressions? Jaurnal of'American Sta- tistical Association, 1950, 32, 164-180. Box, George E. P. and Andersen, S. L. Permutation theory in the derivation of robust criteria and the study of departures from assumption. JOurnal of the Beyal Statistical Society, l955,_lz, 1-26. Cochran, W. G. Analysis of covariance: Its nature and uses. Biometrics, 1957,.13, 261-281. Cochran, W. G. Errors of measurement in statistics. Technometrics, Cochran, W. G. Some effects of errors of measurement on multiple correlation. JOurnal of'American Statistical Association, 1970, 62, 22-34. Cox, David R. The use of a concomitant variable in selecting an experimental design. Biometrika, 1957, 44, 150-158. Cox, David R. Planning of’Ebperiment. New York: Wiley, 1958. Cronbach, L. J. and Furby, L. How we should measure "change"--or should we? Psychological Bulletin, 1970, 14, 68-80. DeGracie, James. Analysis of covariance when the concomitant variable is measured with error. Unpublished Ph.D. Thesis, Iowa State University, 1968. 70 71 Dorff, Martin. Large and small sample properties of estimators for a linear functional relationship. Unpublished Ph.D. Thesis, Ames, Iowa, 1960. Elashoff, Janet D. Analysis of covariance: A delicate instrument. American Educational Research JOurnal, 1969, 6, 383-401. Evans, 8. H. and Anatasio, E. J. Misuse of analysis of covariance when treatment effect and covariance are confounded. Psycho- logical Bulletin, 1968, 62, 225-234. Finn, Jeremy D. Multivariance...Univariate and multivariate analysis of variance, covariance, and regression: A FORTRAN IV pro— gram. State University of New York at Buffalo, June 1968. Finney, D. J. Stratification, balance, and covariance. Biometrics, 1957, 12, 373—386. Glass, Gene V., Peckham, Percy D. and Sanders, James R. Conse- quences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Review of Educational Research, 1972, 5;, 237-288. Gulliksen, Harold. Theory ofTMental Tests. New York: Wiley, 1950. Harnquist, K. Relative changes in intelligence from 13 to 18. Scandinavian JOurnal of’Psychology, 1968, 2, 50-82. Harris, Chester W. Problems in Measuring Change. Madison, Wisconsin: University of Wisconsin Press, 1963. John, Peter W. M. Statistical Design and.Analysis of’Emperiments. New York: Macmillan, 1971. Kendall, M. G. Regression, structure and functional relationship, Part I. Biometrika, 1951, 38, 11-25r Kendall, M. G. Regression, structure and functional relationship, Part II. Biometrika, 1952, 32, 96—108. Kirk, Roger E. Experimental Design: Procedures fbr the Behavioral Sciences. Belmont, California: Brooks/Cole, 1968. Lord, F. M. Large sample covariance analysis when the control variable is fallible. American Statistical Association JOurnal, 1960, 25, 309-321. Lord, F. M. A paradox in interpretation of group comparisons. Psychological Bulletin, 1967, 68, 304-305. 72 Lord, F. M. Statistical adjustments when comparing pre-existing groups. Psychological Bulletin, 1969, 12, 336-337. Madansky, Albert. The fitting of straight lines when both variables are subject to error. American Statistical Association JOurnal, 1959, 24, 173-205. McSweeney, Maryellen and Porter, A. C. Small sample properties of nonparametric index of response and rank analysis of covariance. Paper presented at the AERA Convention, New York, 1971. Myers, Jerome L. Fundamentals of Experimental Design. Boston: Allyn and Bacon, 1966. Peckham, Percy D. An investigation of the effects of non- homogeneity of regression slopes upon the F test of analysis of covariance. Laboratory of Educational Research, Report No. 16, University of Colorado, Boulder, Colorado, 1968. Porter, Andrew C. The effects of using fallible variables in the analysis of covariance. Unpublished Ph.D. Thesis, Madison, University of Wisconsin, 1967. Porter, A. C. How errors of measurement affect ANOVA, regression analyses, ANCOVA and factor analyses. Paper presented at the AERA Convention, New York, 1971. Porter, A. C. Analysis strategies for some common evaluation para- digms. Paper presented at the AERA Convention, New Orleans, February 1973. Porter, A. C. and Chibucos, T. R. Selecting analysis strategy. In Gary Borich (ed.), Evaluating Educational Programs and Products. Educational Technology Press, 1974. Roos, C. F. A general invariant criterion of fit for lines and planes where all variates are subject to error. Metron, 1937, 13, 3-20. Scheffé, Henry. The Analysis of Variance. New York: John Wiley & Sons, 1959. Smith, H. Fairfield interpretation of adjusted treatment means and regressions in analysis of covariance. Biometrics, 1957, g, 282-308. Stroud, T. W. F. Comparing conditional means and variances in a regression model with measurement errors of known variances. Journal of the American Statistical Association, 1972, 61, 407-414. 73 Stallings, Jane A. Follow Through Program Classroom Observation Evaluation 1971-1972. California: Stanford Research Institute, August 1973. Tukey, John W. Components in regression. Biometrics, 1951, 1, 33-70. Wald, Abraham. The fitting of straight lines if both variables are subject to error. Annals of'Mathematical Statistics, 1940, g, 284—300. winer, B. J. Statistical Principles in Emperimental Design. New York: McGraw-Hill, 1962. lllill'. Jul 11.1 I. lli [1 5111‘. I’ll P4 ‘1 . .‘I 111111 11111 111111 3 0 El: TM” 1 11111