This is to certify that the thesis entitled The Utilization of Antecedent Data in Conjunction with Test Results for Curricular Decision Making presented by Bernhard Darwin Kaufman, Jr. has been accepted towards fulfillment of the requirements for _Ph..D..__ degree in Wt and Evaluation 4:44am [4454” m Major professor Date Februarx 13: 1980 THE UTILIZATION OF ANTECEDENT DATA IN CONJUNCTION WITH TEST RESULTS FOR CURRICULAR DECISION MAKING BY Bernhard Darwin Kaufman, Jr. A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services and Educational Psychology 1980 ABSTRACT THE UTILIZATION OF ANTECEDENT DATA IN CONJUNCTION WITH TEST RESULTS FOR CURRICULAR DECISION MAKING BY Bernhard Darwin Kaufman, Jr. Decisions about mastery of an achievement domain are frequently made on the basis of a small sample of items. Because of the small number of items the possibility of incorrect decisions is high. One way of improving these decisions is to utilize additional information in consort with the test information. This study sought to determine the efficacy of incorporating non-test information into test based decision models. These models were compared, based on classification accuracy. The non-test information variables of the study were instructional time history,. instructional testing history, mathematics achievement and sex. The history variables were captured from files maintained on students in a computer managed instructional program. The standard by which the models were compared was mastery classification based on a 156 item test concerning a unit on multiplication and division. This variable also served as the dependent Bernhard Darwin Kafuman, Jr. variable in model development. There were three phases of analysis in the research. The first used stepwise regression to discover the relationships which existed among the non-test informa- tion variables, a set of subtests drawn from the 156 item test and the results of the 156 item test itself. Also, during this phase, the incremental validity of subtests was determined as well as the functional length of subtests combined with instructional time and mathe- matics achievement. During phase II least squares and Bayesian models were develoPed for the purpose of making decisions about mastery of the domain. The least squares model contained mathematics achievement and instructional time as non-test information. In order to apply the Bayesian model, a parameter indicating the value of prior information needed to be set. The coefficient which resulted in the best decision precision established the value of prior information at 2.75 test items. The final phase compared the Bayesian and least squares decision approaches with the raw score or pro- portion correct approach for making mastery classifications . Mastery levels of .70, .75, .80, .85, and .90 were examined. None of the approaches stood out as being more effective. Comparison of classification based Bernard Darwin Kaufman, Jr. the least squares models, containing the non-test infor- mation variables with and without a six item subset of the domain, indicated that adding the test information did not improve classification accuracy. Four conclusions were reached as a result of the analysis. First, a six item test does not improve mastery classification beyond what was possible with pre-existing information. Second, learning rate repre- sents information which is independent of mathematics achievement. Third, neither least squares or Bayesian approaches improve decision precision over that obtained uSing raw scores. Finally, decision precision is im- proved when twelve items are used rather than six. It was recommended that teachers develop ways of using pre-existing information as they monitor pupils. Having measures of achievement and learning rate, one may need only to keep track of on task behavior. Pupils behaviors suggesting frustration can be taken to indicate a need for diagnosis. At such a point, a test of suffi- cient length to yield accurate decisions can be adminia' Stered. In sum, if pupils are initially well placed in the curriculum and instructional methods and materials are carefully selected, testing can be restricted to points where diagnosis is indicated by off task behavior reflecting frustration whose cause the teacher cannot Bernhard Darwin Kaufman, Jr. easily identify. LIST OF LIST OF LIST OF Chapter I. II. III. TABLE OF CONTENTS TAB LES O O O O O O O O I O O O 0 F1 GURE S O O O O O O O O O O O O SYMBOLS O O C O O O O O O 0 THE PROBLEM . . . . . . . . . . Problem . . . . . . . . . . . . Solution . . . . . . . . . . . Need for the Study . . . . . . Purpose of the Study . . . . . Definition of terms . . . . . . REVIEW OF LITERATURE . . . . . . Definitions . . . . . . . . . . Estimating Domain Scores . . . Proportion correct . . . . . Classical Model II . . . . . Bayesian Model II . . . . . . Binomial Model. . . . . . . . Criterion Referenced Decisions Validity . . . . . . . . . . . Domain test length . . . . . . Summary . . . . . . . . . . . . DESIGN AND PROCEDURES . . . . . . Population . . . . . . . . . . Sample . . . . . . . . . . . . Variables . . . . . . . . . . . Methodology . . . . . . . . . . Phase I . . . . . . . . . . . Phase II . . . . . . . . . . Phase III . . . . . . . . . . ii iv vi vii Page ubWNNl-d 10 ll 12 16 23 24 27 30 33 37 37 37 38 42 42 47 49 IV. Variables Phase I Phase II Phase III V. INTERPRETATION, CONCLUSIONS AND RECOMMENDATIONS . LIST OF REFERENCES LIST OF NOTES FINDINGS iii 53 53 56 66 68 78 87 91 10. 11. 12. 13. 14. LIST OF TABLES Bayesian and classical variance components . Contrasts for the model factor . . . . . . . Descriptive statistics for all variables used in the study. . . . . . . . . . . . . . Descriptive statistics for DOMAIN, SUBTEST(J) and items comprising the six objectives . . Intercorrelation of information variables and domain achievement . . . . . . . . . . . Stepwise regression statistics for all permutations of the information variables with DOMAIN. . . . . . . . . . . . . . . . . Partial correlations and coefficients of alienation for the information variables with DOMAIN. . . . . . . . . ... . . . . . . Regression statistics for relating TIME and STEP to DOMIN O O O C O O O O O O O O 0 Statistics for incremental validity anal-YSiS O I O O O O O O O O O O O O O O I I Coefficients of correlation and determina- tion for SUBTEST(J) with DOMAIN. . . . . . . Regression statistics for TIME and STEP With DOMAIN O O O O I I O O O O O O O 0 Regression statistics for TIME, STEP and SUBTEST(6) with DOMAIN . . . . . . . . . p* values for three values of t for subtests of length 6, 12 and 18. . . . . . . Number and percent of correct classification iv Page 20 52 54 55 57 59 61 62 62 64 67 67 67 74 15. 16. 17. Analysis of variance statistics . . . . . . . 74 Means and variances for levels of model and mastery level . . . . . . . . . . . . . . 75 Scheffe' contrast statistics for the model factor . . . . . . . . . . . . . . . . 77 LIST OF FIGURES Reduction of uncertainty for combina- tions of 3 information variables . . . The relationship of R2 and subtest length. The relationship of t to correct classification for themastery level of .70 . . . . . . . . . . . . . . . . . The relationship of t to correct classification for themastery level of .75 . . . . . . . . . . . . . . . . . The relationship of t to correct classification for themastery level of .80 . . . . . . . . . . . . . . . . . The relationship of t to correct classification for the masterylevel of .85 O O O O I O O O O O O O O O O O O The relationship of t to correct classification for theinasterylevel of .90 . . . . . . . . . . . . . . . . . vi Page 58 65 69 70 71 72 73 KEY TO SYMBOLS AND NOMENCLATURE An individuals domain proportion correct Mastery level proportion An estimate of an individual's domain proportion correct An individuals mastery/non-mastery classification An individuals true score An estimate of an individuals true score The reliability of a test An individuals raw score Mean raw score for a group Classical true variance Classical error variance Mean classical true score for a group Classical observed variance Arcsin transformation of Ni Tukey-Freeman arcsin transformation of Xi Number of items in a test Mean of a group of gi's Classical estimate of true variance where scores have been subjected to Tukey-Freeman transformation Classical estimates of observed variance Classical estimate of error variance where scores have been subjected to Tukey-Freeman transformation vii Classical estimate of arcsin transformation of "i Inverse chi square Scale parameter of inverse chi square distribution Degrees of freedom for inverse chi square distribution Mean of the inverse chi square distribution Test information parameter Bayesian estimate of arcsin transformation of "i Mean of arcsin transformed ni's Number of individuals Bayesian estimate of the mean of arcsin trans- formed ni's Bayesian estimate of true variance where scores have been subjected to Tukey-Freeman transformation Bayesian estimate of error variance where scores have been subjected to Tukey-Freeman transformation Bayesian estimate of observed variance where scores have been subjected to Tukey-Freeman transformation Bayesian marginal mean estimate of arcsin trans- formation of “i Bayesian marginal mean estimate of pr0portion of true to observed variance Arcsin transformation of "o Mastery Non-mastery Loss associated with false positive Loss associated with false negative viii flllflz: X1,X 2 TEST: TIME: STEP: SUBTEST(J): DOMAIN: SIX: TWELVE: YHAT: YHATP: BAYES6: BAYESlZ: x N ~6>O Expected loss Mean of posterior marginal Variance of posterior marginal Proportion boundaries of the indifference region Raw score boundaries of the indifference region Instructional testing history Instructional time history Sequential test of Education Progress Mathematics Concepts Test Domain item samples of length J J = 6, 12, . . .60 Score on 156 item division and multipli- cation test Classification based on SUBTEST(6) Classification based on SUBTEST(12) Classification based on least square estimate containing TIME and STEP Classification based on least square estimate containing TIME, STEP and SUBTEST(6) Classification based on Bayesian marginal mean estimate containing SUBTEST(6) Classification based on Bayesian marginal mean estimate containing SUBTEST(IZ) Mean of classifications based on C Scheffe' contrast Variance of a Scheffe' contrast ix CHAPTER I THE PROBLEM Problem Individualized instruction requires frequent decisions about each person passing through the curri- culum. The basis for these decisions is often an estimate of a domain score based on a small sample of items from the domain. Because of the small sample the possi- bility of incorrect decisions is great. Millman (1973) has shown that with a mastery level of eighty percent, more than a third of those students whose actual domain achievement is sixty percent will get four of five items correct and thus be misclassified as having mastered the objective or unit. The test data available in such decision situations is not the only existant information which is pertinent to the decisions. In fact, there is usually information present prior to testing. Cronbach and Gleser (1965) have challenged testers to show that the application of their instruments result in an improvement in the quality of decisions. To use Sechrist's (1963) term, testers should demonstrate the incremental validity of the tests they employ. No such investigation has 1 been done with domain referenced tests. Thus, it is not known that the estimates of domain scores based on small item samples yield new information for decision making. Solution One way of improving the quality of decisions made with the aid of domain test estimates is to utilize additional information in conjunction with the estimate. Such information, once identified may be joined with test information in a mathematical model which should yield improved domain estimates. There are two statistical approaches to modeling; Bayesian and least square regression. Both of these will likely yield an improved estimate. No research has been done in an applied setting with the domain score known. Therefore there is no empirical basis for recommending one procedure over the other. Need for the study Domain tests are being widely used for decision making. It is conceivable that decisions based on short tests alone may be worse than those made knowing only historical information. While it may not be feasi- ble to eliminate tests from an instructional sequence, educators should be alerted to the fact that their results alone are not a sound basis for decisions. If test data do not provide information, decision makers should be so aware. Further, if the solutions proposed are sound, this should demonstrate in an applied setting. Then guidance in the application of the procedures should be made available to practitioners. Purpose of the study There are two components to the research reported herein. One has to do with the investigation of the information value of several variables, including test data, with respect to results on a domain test. Once these various information relationships were illuminated, two models were compared to each other and a raw score for their efficacy as a bases for criterion referenced decisions. Objectives 1 and 2 below form the first component. Objective 3 the second. Specifically stated the objectives were: 1. To determine the information existant in four antecedent and collateral variables relative to domain achievement. 2. To couple information with test results in order to determine: a. the incremental validity of short domain tests, b. if decision precision can be improved by using antecedent and collateral data with test results, c. the functional lengths of several short domain tests. 3. to compare the Bayesian marginal mean model, the least square regression model: andthe raw score approach with respect to decision precision. Definition of terms Given below are definitions of several terms which are used throughout this thesis. Domain test "Any test consisting of a random or stratified random sample of items selected from a well defined set or class of tasks."(Millman, 1974, p. 315) Criterion referenced testing The use of a test to make decisions about a criterion. Information Datum is information if and only if it reduces the uncertainty involved in making a decision. Functional test length The length of test necessary to provide informa- tion equivalent to that provided by collateral, antecedent and test information. Incremental validity The extent to which a multiple correlation is raised by the addition of test results to a set of prior existing information. Domain achievement The prOportion of items correct on a set of items which comprehensively cover an objective or set of objectives. Decision precision The proportion of correct classifications made on the basis of a given decision algorithm. CHAPTER II REVIEW OF THE LITERATURE Two excellent reviews have been prepared which cover criterion referenced testing comprehensively. These are Millman (1974) and Hambleton, Swaminathan, Algina, and Coulson (1978). Because of the comprehen- siveness of these monographs the present review draws heavily on these two papers. The topics to be covered in this review are: 1) definitions, 2) estimation of domain scores, 3) criterion referenced decisions, 4) validity, and 5) test length. Some of these tOpics are covered in greater depth than others. The criteria for depth of coverage was the topic's direct relevance to the research. For example, the estimation of domain scores is the direct focus of the study and thus the greatest amount of space is devoted to this area. Definitions As Hambleton et a1. (1978) have observed there is by no means a single accepted definition of a criterion referenced test. Two quotations which are at opposite poles of the generality continuum illustrate this. 6 The first is the most restrictive. "A pure criterion referenced test is one consisting of a sample of production tasks drawn from a well defined population of performances, a sample that may be used to estimate the proportion of perfor- mances in that population at which the student can succeed." (Harris and Stewart, 1971), p. 1) Ivens defined a criterion referenced test, in most general terms, as one "comprised of items keyed to behavioral objectives." (Ivens, 1972, p. 2) Clearly one must have a referent which is more specifically defined than is the case if both of these quotations are allowed within the class of the concept "criterion referenced test." The purpose of this section of the review will be to arrive at a term for and definition of the kind 'of test we are investigating in this research. To do this we will allude to some terms and corresponding referents which will help delimit our concept. Hambleton et al. (1978) point out that criterion refers to a minimal acceptable level of functioning. This definition is consistent with Glaser and Nitko (1971), Millman (1974L and Harris, et al. (1974). So a criterion referenced test could be one which was used to make a decision about this minimal acceptable level of functioning. Herein lies the problem, when one applies the accepted definition of criterion; cri- terion referenced implies only that the test has some relationship to a decision about level of functioning. Looking at it from this point of view, Iven's definition seems most appropriate. That is, a test comprised of items keyed to behavioral objectives defined as Mager (1962) does would be criterion referenced in the sense that the results could be used to make a decision about the minimal acceptable level of function- ing. Glaser and Nitko (1971), consistent with Harris and Stewart (1971), speak of production standard in their definition of criterion referenced but also, as do Harris and Stewart, they use the words "well defined population of performances." So, not only should these tests measure a level of functioning, that level should be generalizable to some larger domain or population. What Harris and Stewart do not allude to is criterion in the sense of minimal acceptable level of functioning. Hively, et al. (1968), Bormuth (1970) and Osburn (1968) have specified algorithmic procedures for defining a domain of test items. Popham (1975) describes what he calls an amplified objective which specifies in detail the testing situation, response alternatives and a criterion of correctness, in effect, defining the domain of items. Baker (1974) also provides pro- cedures for carefully defining the item domain of an objective. The direction of the work in this area seems to underline the importance of the notion of domain. As one might suspect the importance of the domain has motivated the term Domain Referenced Test. Millman (1974) defines such tests as: '"any test consisting of a random or stratified random sample of items selected from a well defined set or class of tasks." (Millman, 1974, p. 315) It should be noted that such a definition does not refer to a criterion. The definition of a test can be separated from the specification of a desired level of functioning (as Harris and Stewart's (1971) definition also illustratesl In fact, a single domain referenced test can be used to make decisions about more than one criterion. Admittedly, there is a connection between the decision criterion to be addressed with the results of a domain-referenced test and the definition of the "set or class of tasks." However, in developing the test items the emphasis is on content domain, the cri— terion can be established separately. Thus, it seems most appropriate to refer to domain tests. In current practice such tests are most often 10 used to make decisions about a person's status relative to a criterion. It is appropriate to say that scores are domain referenced and decisions based on the scores are criterion referenced. The use of the term criterion-referenced testing to describe general approaches whose overall aim is to make decisions about a criterion is useful. Domain or objective referenced tests are but tools which can be employed in this pursuit. Estimating Domain Scores The basic problem is; given an individual's ob- served score on a criterion referenced test, what is his score on the domain, and further,.does this represent mastery or non-mastery status (Hambleton and Novick, 1973). To use the symbols which seem to appear most consistently in the literature (Swaminathan, Hambleton and Algina, 1975; Hambleton and Novick, 1973; Novick, Lewis and Jackson, 1973); if Xi (an individual's score is known, what is “i (the domain score) and further what is mi (wi=1 if mastery, wi=0 if non-mastery). So the problem is to obtain fli (an estimate of fli) and mi (an estimate of mi). There are five distinguishable procedures which have been described in the literature for solving this 11 problem. These are: .1) proportion correct, 2) classical model II, 3) Bayesian model II, 4) Bayesian marginal mean, and S) the Binomial (Note 1). The first four of these differ from the fifth in that they provide a single direct fii. The binomial procedure yields information about the probability that "i is greater than some given mastery level no. The remainder of this section will provide dis— cussionwof each of these five procedures. Proportion Correct The estimate of the proportion correct is the ratio of correct items to the length of the test. This value can also be thought of as the raw score multiplied by a constant which is the inverse of the number of items. For a small number of items this estimate yields tenuous results. Millman (1974) has shown that for a mastery level of 80 percent, more than a third of those who could achieve only 60 percent of the domain of items will get at least four of five items correct and thus the decision of mastery will be in error. Hambleton,et al. (Note 1) observed that "procedures which take other information into account are more desirable." 12 Classical Model II The Classical Model II and Bayesian Model II allow for the inclusion of other information into the decision making process . The classical model includes the mean of the group in which the individual is a member. This is collateral information. The Bayesian Model II considers in addition to the group mean, an investiga- tor's subjective feeling regarding the prior status of the group. The remainder of this section discusses the classical model II in detail. Jackson (1972) observed that Truman Kelley's (1927) estimate of true score effectively joined test results with the collateral data of group mean. Lord and Novick (1968) state Kelley's formula for the estimate of true score (T) as m = oxxu X + (l-pxx.) ux (1) Where pxx' is the reliability, X test score and ux the mean for the group. Thus test data is incorporated through X and the collateral data by way of “x' Novick and Jackson (1974) observe that 2 2 0T x + 0E “T Q = (2) Classical true score theory (Lord and Novick, 1968) assumes that “T = “A” Thus expression (2) can be 13 rewritten in the form 2 2 GT X+OE ux T= 2+ 2 (3) OT GE 2 2 2 Further, true score theory assumes 0X = OT + CE so that 0 2 o 2 "_ T E T—02X+02ux (4) X X This expression makes clear the fact that Kelley estimates are "...a weighted sum of two separate estimates, one based upon the individual's observed score X and the other based on the mean of the group to which he bee longs..." (Lord and Novick, 1968, p. 65). It can further be observed that when the test is highly reliable (i.e., 0E2 is small) the test data is weighted heavily. If the test is not highly reliable then the estimate is more dependent on collateral data namely px. In order to utilize Kelley's procedure in situa- tions where binary decisions, such as mastery/non- mastery, are to be made; Jackson modified the above procedures. He applied the Tukey-Freeman arcsine trans- formation to individual scores (Xi) and obtained the» transformed estimate 1/2 g. = 1/2 [Sin-1(x1i/n+1)1/2 + Sin-1(Xi+l/n+lfl (5) l 14 Under this transformation of Xi' the corresponding trans- formed variable 7i for the proportion correct ( ni) is given by the expression . -1 Yi = Sin Vni (6) If the number of test items is at least eight then the distribution for 91 will be approximately normal with the mean being the transformed value of the prOpor- tion correct (yi) and the variance (4n»+2)-1 (Anscombe, 1). In classical notation this can be written as gi ~N(T,0E2). 1948). That is gi ~N(Y (4n+2)’ I The statement gi~'N(yi,(&1+2)-1) is about a fixed person (i) under the hypothetical condition of a finite number of repeated testings. If there is a single testing of a finite number (N) of persons (i.e., i = 1,2...N) then Jackson (1972) has shown that the mean is given by N 9~= Z gi/N (7) and the variance (¢c) is l N 2 -1 - $0 = l . 1 (91-9.) - (4n+2) 1(N-1) (8) 1: This expression can be rewritten as 15 .31 (g.-g.)2 N _1 ¢ = 1‘ 1 - (4n+2) (9) C N-l N-l to facilitate the determination of its connection with true score theory. The first term of the expression is the observed variance, the second term is error variance. We can write 65c = ¢gc ¢EC (10) . 2 and note that ¢c IS the analogue of 0T . Also ¢gc . ' 2 2 IS analogous to OX and ¢EC to 0E . Returning to (2), the Kelley formula for the transformed variables becomes $ 9. + ¢ g. Yic = f 1 EC (11) ¢c + ¢EC or A A ¢ ¢E Yic = C . + C (12) This is clearly a weighted sum of the transformed test scores and the mean of the scores, the mean's weight being inversely related to the reliability of the test. Once the transformed true prOportion correct (Tic) is obtained, one can return to the original scale by a sine transformation of §ic' namely 16 f. = (1+.5/n) Sin2 - .25/n (13) l Yic This value (fii) is the estimated domain proportion correct and is based not only on the proportion correct of a subset of items from the domain but also on a group's performance on the same subset of items. Bayesian Model II This model estimate uses test data (Xi), collateral data (X), as well as prior information. This method requires setting a prior distribution representing an investigator's belief prior to testing and then making revised estimates after testing. These revised estimates are based on prior beliefs as well as an individual's test results and the group mean. The distribution which takes all three pieces of information into account is called the posterior distribution. The question of determining the correct prior distribution has been the subject of considerable theo- retical study by Novick and his colleagues (Novick et al., 1973 and Swaminathan, et al., 1975). The current status of these investigations suggest the following. a) The specification of the mean is not particu- larly important and may be represented by a uniform distribution in which any score is equally likely. b) 17 The prior beliefs about variance can be ade- quately represented by an inverse chi square distribution with two parameters; scale and degree of freedom. i) ii) iii) The degree of freedom parameter (0) should be set at 8. The scale parameter (A) can then be solved for in an equation with a single unknown namely, the variance. The equation is A = (v-2)$bm (14) The necessary estimate of the variance (8b ) can be obtained as follows: a) mSpecify the true prOportion correct for the typical examinee in the sample. b) "...Specify the number of test items, t, that would have to be administered to the examinee in order to obtain as much information about "i as is deemed to be available (Note 1, p. 31). c) $b is then defined by the equation m A _ -1 - (4t+2) (15) ¢bm d) The true proportion correct (Yib) is then estimated by 18 1+z(vi-v.)2 1 gi[ N-V-l 1+2(yi-y.)2 ] + mb‘4t‘t'2) (16) -<> (.1. 0‘ ] + [4t+2]-1 N-v-l and the mean of the proportion correct (Y-b) by A 2Y1 Y'b N (17) e) Novick et al., (1973) observe that this is equivalent to x -1 § = gi $12) + Y‘b (4t+2) (18) lb $b + (4n+2)-1 A -l 2 where ¢b = (N+v—l) [A+z(yi-y.) ] (l9) $b is the Bayesian true variance estimate for Iib’ ¢Eb is the Bayesian error variance estimate for §ib, and $gb is the Bayesian observed variance estimate for §ib. Using this notation (16) can be rewritten as $ ¢ __2__ + ¢Eb y.b (20) ¢gb gb Yib = As Novick et al., (1973) indicate, this estimate has a form analogous to Kelley's true score estimation l9 procedure. The differences between 91b as estimated by equation (18) and f1 as estimated by equation (12) result from the procedures used for determining the several variance components and the use of y.b as the true mean rather than g.. Table 1 allows a comparison of the Bayesian and Classical variance estimates. Ex- aminationcflfthe formuli in the table indicates that prior information is incorporated into the estimate of yb by the estimation procedure for 3b . A is deter- mined by 1 1 = (v- 2)(4t + 2)‘ (21) where t is the number of test items that would need to be administered to the examinee to obtain as much information about ”i as is deemed available prior to testing. Further, because of the iterative nature of the solution of equation (16), the y.b obtained for the concluding iteration will have been influenced by the value of t. Thus differences in estimated values for y are a function of differing amounts of regression due to the variance estimates as well as a different "true" mean on which the regressions occur. Theoretically, the advantage of the Bayesian Model II procedure rests on an improvement in the estimates of true variance, 20 m a and u a nme + aw u n e om>mmmmo NAoml.Ov N. z nim+flac Huz u one Am+uas n nme mommm H 2 HI HIZ H I Z U H and Q nim+aav z n a ( @ HNA.>).>V m +4_H)AH(>+zsu a meme mi.m).msw z OUCMHHm> qaonmano zaHmmwmm Hmpoz mmumEHumm mocmfinm> HMUHmmMHO pom cmwmmmmm mo comwnmmfioo d .H manna 21 observed variance and the true mean accomplished by incorporating prior information through the parameter A. Bayesian Marginal Mean Lewis, Wang and Novick (1973) observe that if one wishes to make overall decisions about all groups, joint estimates such as those of Bayesian Model II are apprOpriate. However, they note that for individua- lized instruction, decisions about each individual are usually.desired and therefore marginal estimates are indicated. Hambleton,et al., (Note 1) note that the Bayesian Model II requires complicated iterative solutions. Tables prepared by Wang (1973) allow relatively easy computation of marginal estimates. The procedure demands that the degree of freedom parameter be set (again 8, according to Novick, et al. (1973)) and ¢ib is determined by specifying t in the manner descriged above. With these values p* can be read from Wang's table and the estimate of ?i is bm A = t - Yibm g. + p (91 g.) (22) which can then be transformed to “i by equation (13). The marginal mean procedure is an extension of the Bayesian Model II and as such effectively considers the three types of data; test, collateral, and prior 22 beliefs. It should be understood that all of the Bayesian estimates have been designed for use when one's knowledge of prior status can at best be represented by subjective belief about t. It is this subjective belief which is quantified by the method described for establishing the Bayesian true variance. The parameter p* is an estimate of A c) ¢+¢E a reliability indicator. Lewis, Wang and Novick (1973) report that an empirical study of 33). $b + $Eb and p* indicate that "...p* is substantially larger than p for moderate n." (p. 12) As the number of items increase, the discrepancy between p and p* becomes smaller (p. 13), and thus estimates of yb and ybm be- come increasingly similar. One might expect that if the Bayesian methods do allow for a meaningful incorporation of prior infor- mation into the computation of p, then these values would be larger than for the corresponding classically computed values. However, in at least one empirical study this was not the case (see Novick et al., 0973, pp. 39—41)). In this instance, the investigators 23 questioned the estimates of $i' It would seem that dilemmas such as this are best addressed by studying the quality of decisions made by various estimates. Binomial Model One may use the binomial model discussed by Mill- man (1974) for making probability statements about the true achievement status of an individual. In order to do this three parameters are needed; minimum passing score, number of items and the level of certainty required for establishing mastery. With these values specified, mastery/non-mastery decisions can be made to a prescribed probability level knowing only the actual score on a test. Tables prepared by Millman (1972) make this model very simple to apply. As Millman (1974) has observed, all Bayesian approaches yield a regressed estimate of domain scores. That is, if an individual's obtained score is below the group's mean, her estimated domain score will be higher than her obtained score. Analogously, if her obtained score is above the mean, her estimated score will be lower. .These statements also hold for classi- cal model II. Such statements do not hold for Millman's binomial model. 24 Criterion Referenced Decisions Only Hambleton, Novick and their colleagues (Ham- bleton, et al., 1973; Hambleton, 1974; Swaminathan et al., 1975; Hambleton, et al., 1978) seem to have given attention to the problem of making decisions based on domain estimates. It will be recalled from the previous review of procedures for estimating domain scores that once "i is obtained one must determine the appropriate value of wi' In the binary classification; if YiEEYothen mi = l or if yi< 110 P (yi .H I o unfim. mmmv. exam. 9<=> .H u o omwa. mmov. mean. m>cuze .H a o mmmm. hmmq. name. me .vm~(.v mo.w~m Hmm.m~ mm.m- z~ Godu0w>ma com: @532 cunccoum wannaun> >c=um ecu CH new: meancwun> Ham HOu mowumwumum m>Humfiuommo .m canoe 55 maam.m mmvm. mm»m.oa mamm.nv Iomvemmemom mamm.m momm. mmma.oa mmmm.ae Aemsemmemom mmem.~ mmmm. ammn.m vena.am imesemmemom mmma.m mama. nfimm.n noeo.mm imecemmemom Hmam.a mmmm. mmmm.m «Hmm.m~ Ammsemmemom mmaw.fi ooem. mmnn.v namfi.v~ Iomcemmemom aaon.a mvmm. mmoa.m venfl.mfi iemcemmemcm amma.a mmom. mmmm.m memo.ea Amfisemmemom man.H comm. Hn-.m mmmm.m Imasemmemom same. mmam. mHNN.H mmmo.m Amvemmemom mmmm.H seem. nmmm.o nomn.na o m>Heomemo onR.H oemm. mmme.m mmvm.mfl m m>Heombmo momm.H mmmm. mmnv.m mmmo.o~ a m>Heomnmo aomm.~ «mam. mamm.m nmmm.om m m>Heomnmo ommm.H ommm. mmmm.a nmmm.m~ m m>Heomemo amam.a mmmm. hmmm.e mmae.am H m>Heomemo mme~.a mama. mamm.mm Hmmm.mmH zHazoo Honum hawHMQMHHmm coflpma>ma new: umwa cnmocmpm neon ouncemnm mm>Huommno me on» mnemHHmEoo mamua cam Iecammemom .zHazoo now monumanmum m>anaanommo .v magma 56 Phase 1 Based on very low correlations of SEX with the other information variables (see Table 5), it was elimi- nated from further consideration in the study. Figure 1 depicts graphically the information value of various combinations of the variables STEP, TEST, and TIME. The vertical axis represents the coef- ficient of alienation while the horizontal indicates the configuration of variables under consideration. By following the lines on the graph from left to right one can gain insight into the uncertainty reduction which will accrue by adding the indicated variable. If one compares the slopes of the line segments with the same initial point but different ending points, the relative informational value of the added variable will be apparent. For example, comparing the slope of 08 with on indicates that STEP provides more infor- mation about the dependent variable than does TEST. In fact, examination of the segments representing the addition of TEST to equations, indicates that TEST contributes very little (if any) information. TIME appears to provide some information, but not as much as STEP. Another way of looking at the value of a variablehs information is seen in Table 6. Study of the Sixth table confirms that STEP is 57 ompuflfio mum mucwom Hmfiflomp HHm a pm pm mm am mm em om mm mm mm mm) mm) an Ho Loosemmemom .mH mm mm mm am mm as mm mm mm hm) mm) mm Ho- Asmsemmemom .eH mm mm Hm mm om mm om mm mm) mm) em mo imvcemmemom .mH mm mm. mm om am He mm am) am- mm Ho imasemmemom .NH mm om am am mm mm mm- am- mm mo .mmsemmemom .HH mm mm mm am am mm) mm- mm mo Aomoemmemom .OH om mm mm mm mm) «m: mm so Aemsemmemom .m Hm mm mm mm) mm) mm mo imHsemmemom .m mm Hm mH) hm) mm mo AmHsemmemom .5 am mHu on. em mo- Hmsemmemom .m mm- am) mm mo zHazoo .m Ho Hm) mo amme .s hm) mo- mzHe .m «NHI AMBm .N xmm .H mH «H mH NH HH oH m m a m m a m m H mflfimfimxwmflfluw GHMEOU 6G0 WOHQMHHM> COflflMEHOMGfl MO fiOflUflHwHHOUHmuhHH om OHQMB Uncertainty J. .90 080‘ .704 STEP TEST TIME -Q Figure l. 58 w U11» None 1: 2 Number of variables combined Reduction of uncertainty for combinations of 3 information variables 59 ooo. mmvo. mmmm. onmm. mzHB m ooo. omvN. mHNm. ommm. Amam m Hoo. mhmo. mhmo. mmmN. Emma w 000. MONN. momm. 05mm. mmem m 000. hmmo. Nmma. ommm. mEHB m Hoo. mnoo. mnoo. mmmm. Emma m hmv. HNoo. mmmm. onmm. Emma v 000. HONN. mwmm. mmmm. mmam v 000. Nva. Nvma. mmmm. MSHB w 000. momm. mmmm. onmm. mmam m mmm. omoo. NmMH. ommm. Emma m 000. NvMH. Nvma. mmwm. MSHB m hmv. HNoo. mmmm. onmm. EmmB N 000. Nomo. mvmm. mmmm. MEHB N coo. avom. avom. mamm. mmem N Hoo. mmeo. momm. 05mm. MEHB H NmH. vmoo. mNHm. ommm. Emma H ooo. Hvom. Heom. mHmm. mmem H vm coaumcflenmuma mo .mmou coaumcflEHouma cOHumamHHoo mmHQMHHm> mmpm coflumsvm ca mmcmnu no “#88338 mHoHpHsz szzoo nuaz meanneun> coflumEHOMcfl can mo mGOHuMHSEHmQ Ham MOM moflumwumum cofimmmummn cmfi3mmum .m manna .60 the most informative variable. Time is the second most informative. Examination of the results of the six equations (especially equation 4) suggests that TEST has no useful relationship with DOMAIN which is inde- pendent of STEP and TIME. Table 7 gives additional insight into the relationships of the information in- herent in the four variables. In particular, the zero order partials for the three information variables indicate that each has a significant information factor relative to DOMAIN. However, the lack of significance of the first and second order partials rDB .o, ’rDB.y and rEB- Y suggest that the information in TEST relative to DOMAIN is accounted for by STEP and TIME. Based on the data presented in Tables 6 and 7 and the earlier elimination of sex as a useful variable, the most parsimonious regression equation relating non-test informational variables to DOMAIN included the independent variables TIME and STEP only. The basic statistics for this equation are presented in Table 8. In order to determine the utility of including test information in the decision process regarding domain achievement, incremental validity was explored. Table 9 provides the data for assessing this incremental validity. The base, non-test, information accounts Table 7. 61 Partial correlations and coefficients of alienation for the information variables with DOMAIN _ =_ * =- * rDa .551* rDB .260 rDY .366 KDa— 835 KDB= .966 KDY- .931 Zero - 313* 273* 613* Order raB- . ray— . rsy— . K = .950 = .962 K - .790 dB aY BY rDB .110 rDyoa .269 erd .577 K = .994 K = .963 K - .817 . = -_-_-_. * : First rDa.3 .513* rDY-B .271 raYJ3 .1081 Order K = 858 K = .963 K - .9941 _ * = _ rDa.Y .504 rDB-Y .048 ran' .192 K = .864 K = .999 K = .981 =-— * = = . * Second rDy.aB .253 rDB.aY .057 rDa.BY 505 Order K = .967 K = .998 K = .863 *Significant at .05 level Domain STEP TEST TIME 62 OHm.mmm« comm. Nth. OHmm. mmmm. AmHvBmmBmDm www.mnoe NOHw. hmHm. hHmm. omam. ANHvBmmemDm nom.¢n« Nmmm. man. Hmmm. Hmvh. Amvammemom onm.m¢« mmom. mvmm. mmmm. AMZHB .mmemv mmmm m coaumcmflaé mo muwpflam> coepccwfihmuwa coaucawuuou moHQMHHm> ucoflowwmmoo HmucmfimnocH mo pcmflowmmmoo mHmfluasz mammHmcm NHHUHHM> HmucmEmHUGH MOM moflpmwumum .m manna vooo. mmmmm.m1 ommm.) MSHB oooo. Shomm.h mhmw. mmam Hm>mq mon>IB pcmfloflmmmou manmwnm> cocooHMHcmHm scammmummm powwcHMUcmum zHazoo on imam can mzHe mcHumHmn 90m moHumHumum GOHmmmnmmm .m mHnme 63 for over thirty-five percent of the variance in the dependent variable, domain achievement. The six item subtest result accounts for an additional twenty percent. If it is assumed that the relationship between information and test length is approximately linear in the interval between six and twelve items, a ten item test will augment the information in the base variables by an amount equal to that contained by the base variables. Since the F tests listed in Table 9 are for the partial regression coefficients relative to the dependent variable DOMAIN, each test significantly augments the base variables. Tables 9 and 10 may be used to deduce the func- tional length of SUBTEST(6) coupled with STEP and TIME. Table 9 shows that the coefficient of determination (R2) for the base variables plus SUBTEST(6) is .5521. Reference to Table]I)allows one to see that this R2 value lies between the r2 for SUBTEST(6) and SUBTEST(12). The graph of Figure 2 shows the relationship between the length of the subtests and the corresponding r2 with domain. Based on this graph, it seems reasonable to obtain the functional length of the six item test by linear interpolation. This process yields a value of 8.03 which is the functional length of the 6 item subtest augmented by the two information variables. Clearly, the coefficients of determination in Tables 64 «no. nmm. Homsemmemom com. 0mm. ismsemmemom com. mmm. Amecemmemom cam. mam. imasemmemom omm. oma. Imm.emmemom mam. cmm. Aomsammemom OHm. «mm. .qmsemmamom omm. mmm. .mHsemmemom mmm. nom. HmHsemmemom mHe. mam. Aesemmemom COH¢MCflEHO¥OQ MO m¢CmHUHMH®OU meQMHHm> mpcmfloflwmmoo coaumamnuoo ZH4EOQ £#H3 AhvBmMBmDm HOW GOfl#MCflEH®#0U UGM COHfiMHOHHOO MO mHCQHUHMMOOU .oH wanme 653 2 Figure 2. The relationship off! and subtest length 1.00) .90 .80 .70 .604 .50‘ .40 .30‘ .204 .10 12 18 24 10 36 42 48 54 60 ON Subtest length 66 9 and 10 suggest that the antecedent variables provide -a decreasing amount of information in combination with test data as the number of items in subtests of the domain increases. In effect, the functional lengths of subtests containing 12 or more items is the same as the length of the specific subtest. Phase II Tables 11 and 12 contain the basic statistics of the two least squares regression models which are to be the basis for classification. The first of these two tables contains only the information variables STEP and TIME. Table 12 presents the statistics for the information model with SUBTEST(6) added. The stan- dard errors of these two models are 22.42 and 19.40 respectively. The statistic of the Bayesian model which is roughly analogous to regression weights of the least square model iS(wh Table 13 presents values of 0* for three values of t which span the range of t values used in this study. Numbers are given for SUBTEST(6), SUBTEST(12), and SUBTEST(18). Reference to equation (22) of Chapter II suggests that as t increases, the influence of the mean becomes larger. Also, in all cases the influence of a subjects score becomes greater 67 ommmm. mmonHm. Hesse. emuH cmmuanm oamme. amass. Haoam. emuH w>Hmze mmHmm. mmmmm. amemv. smuH me mH m.m mb.N mNfim up up up camp mH UCM Nfl am ##OCQH MO WHmmgflgm (HOW (... MO WGSHM> mwhflu. .HOM mmfldflnr «.0 .MH MHQMB oooo. mnmon.m amov. Amsemmemom NNoo. mmomH.m( meN.) mzHB oooo. wmhmo.m mmme. mmem Hm>mH mon>|B ucmfloflwmmou mHQMHHm> cocooHMHcmflm coflmmmummm cmuflphmccmum zHazoc cqu imvemmemcm can imam .mzHe now moHanumum conmmnmmm .NH mHnme Hooo. mNHmH.¢) mmHm.) mzHB oooo. Hemom.> seem. mmem Hm>mH mon>IB ucmfioflmmmou mHQMHHm> cosmoHMHcmHm scammmummm cmufloumocmum zHazoo nqu mean can mzHa now moHumHumnm eonmmnmmm .HH mHnme 68 as test length increases. Figures 3, 4, 5, 6 and 7 illustrate the effect of the t parameter on the classi- fication of the Bayesian Model. One can see that in all cases the most accurate classification can be achieved with t set equal to 2.75. Thus for the purpose of com- paring models, it was judged apprOpriate to use the Bayesian Model with t equal to 2.75. This is the apparent "best" Bayesian model available for the present data. The means and variances of the two raw score decision criterion are given previously in Table 1. Phase III The concluding set of findings yield information about the relative effectiveness of the three approaches for making decisions about mastery or non-mastery of the achievement domain. Table 14 presents the number and percentage of correct classifications for each of the six models at each mastery level. The remainder of this section discusses results of the statistical analysis of these data. As is shown in Table 15, the analysis of variance yieldedsignificant mastery level and model effects. With respect to the mastery level factor, the proportion of correct classifications appears to decrease as mastery levels increases. This can be seen in Table 16. 69 - when m.vm m.NH m.oH m.m m.o m.v mh.N Io" .... NHII-z ((I man am. In I O . d O J 3 O t. .00 w O T: D O 1 I w 05. 3 3 TL P s s r. 3 :5. m. 9 ......O.........‘O'...’..I...O..OI..........I..........O.... 1 C t. ......O.....'...C....... o u I I ..cm. on. no Ho>oH pounce on» ecu :oHuconHmmmHo uumuuoo Cu u we mHzmcoHucHeu 029 .m cucmwm 7O . m.wH «”...a; m.NH m.c.n m.m m.o m.v mh.N Ill o H .... NHHZ I) mHuz om. d 1 o d o z .4 . M ow U c To. 3 o I 1 m /// on. 1 3 T. D. a: S t. ’l’ . .21. m. 9 - 1 ... ... .. ... .. ... ..... ......C. .. ............. .. .. ... ........I. .... C ........... . '...... m. IIIIIIIIIIIIIIIIIIIIIIIIIIIIII J;(IIIIII(III))IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII u so. mp. mo Hw>eH umanne ecu HOu :oHucoHMHmmmHo uoouuoo o» u uo mHzmcoHuchH 028 .v ousmflh '71 - .....3 as): m: 3: 2 m... m... m: .())( e uz .... NHuz --- mHuz om. d I 0 d O I .4 . t .30 m 0 J 3 O I I w or. 3 3 I P S S T. a 7: .om. m. p. 3 T. 00......OI...O.......I.........OOCOOUOIOOOCOCOO.........I........I.......O.I.0.0...O O U ,sm. om. no He>mH noumnE ecu HOL :oHumoHuHmmmHo womuuoo Do u we lwzmcoHuchH m2? .m oocmqm 72 ) when mflva m.Nm m.o~ m.m m.w m.v mh.N Ill-ll o "H .... NHJZ --- 3% cm. .d I 0 d O 1 .4 . W .ow U 0 T: 3 O I I a C O . O I . .. 0. . O . .. . C . .. .. ... . .. .. .. ... .. ... .. ... Oh OF. 3 H D o TL . P o S o s O T. o T? llllllllllllllllllllllllllllllllllllllllllllll scallllllllllllllllllllllllllllllllllilli om. fl. . P . ‘4 o t . O c U .om. mm. wo Hm>mH umummE ecu HON coHumoHWHmmcHo uowuuoo Cu u uo oHcmcoHochu are .m muzmHm 73 ) mHmH mHVH m.~H m.oH m.m m.e m.v m~.~ IIIIW" .... ~Huz --- mHuz om. d I 0 d O I 3 . M OOOOOOOIIOOOC” vow u . O o I: u 3 . O . 1 . I . H ............. “------a)-------((-------)(-)))---(--(-----(--)(--)------)uu---(------ ca. 3 u 3 . .l . 9 o S o S 0 T. OMCOOOOOIOOOIOCOI.....IOOOOOOOOOOOOOO.....OOODIOOIOOOOOCQOOIOOOOIOOCOO 7’ low. m. P 3 TL. 0 U .oa. cm. «0 Hc>wH sconce ecu new coHooomemmmHo uoouuoo 0o u wo dHcmcoHumHon one .5 mucmHm 74 Hm>mH mo. um namoHuHamHm. HmmmH. ommm cHnuHe mm. Hummfl. 0N cofiuomnwucH He.m u m Hmmo *mn.m HHmo~.H a Hm>mH Humane: § 0 Ha.“ n a Hmmo «os.OH sommm.H m choz :oHuannano m a m: we monsom m>Hum>Hmmcoo moflumflumum moccanm> mo mamaHmcd .mH manna mm me Hm mm me He 2 h.mh o.om o.mm m.mh h.Nm n.¢m w om , we be be me He me 2 m.mm h.Nm n.Nm o.vm m.Hm o.om w mm me on me e No cm 2 n.Nm n.0w m.mm m.mm h.Nm h.@@ w om mm mm em mm mm mm 2 h.Nm m.m> o.Nh m.mw h.Nm m.mn m mu em om mm mm «0 mm 2 m.mm o.om h.mn m.hb m.mm o.¢m w on NHmmwdm mmmwdm memmw Bdmw 0>Hw38 me Hm>mq Hmpoz mnmummz macaumoHMHmmmao uomHHoo mo pamoumm can Hmnfioz .vH manna 75 NNHN. mmmm. om. HMHN. mmmm. mm. ovON. wmfih. om. mama. Nth. mh. mmvH. thm. on. Hm>mq humummz mHvH. mme. NHmMW¢m NmHN. mmmw. mmmwdm NMHN. mmmo. madmw mNON. boon. Emmw mHaH. mammh‘ m>qmza HmHN. Mbhm. me HTGOE moccaum> cmmz Hm>mH Houomm Hm>mH wumumme can HcUoE mo mam>ma How mQUGMHHm> can meme: .mH manna 76 The Scheffe' contrasts in Table 17 suggests that the model effect stems from the classification differences between the two raw score models and the two Bayesian models. There are no significant differences between the three decision approaches. It is notable that the variances for correct classification by SUBTEST(12) and BAYESlZ is considerably lower than is the case for the other four models. This fact is appropriately con- sidered in consort with change in R2 values between SUBTEST(6) and SUBTEST(12) in Table 6. This chapter has summarized the findings of the three phases of analysis. Initially,.the utility of the information variables for reducing uncertainty about domain achievement was reported. Then the parameters of the decision models were presented. Finally, the results of the statistical comparisons of the six models were given. The final chapter of this thesis discusses the implications of the findings and presents conclusions which can be drawn from them. 77 Table 17. Scheffe' contrast statistics for the model factor Variance of Contrast (W) Contrast a? T/ag 32' + X X + _ 51x TWELVE _ YHAT XYHATP .0022 1.1304 2 2 i + — X + 3? SIX XTWELVE _ BAYESG BAYESlZ .0022 _ .08483 2 2 i + 2' X + X BAYESG BAYESlZ _ YHAT YHATP .0022 _1.2153 2 2 __ - _ _ * XSIX XTWELVE .0272 5.588 szx - XYHAT .0272 -1.081 ._ - — _ * XBAYES6 XBAYESlZ '0272 5°294 g- - g .0272 - .293 SIX BAYESG * “A = Wow >.05F1,4 7'71 CHAPTER V INTERPRETATION, CONCLUSIONS, RECOMMENDATIONS In this chapter the data presented in the previous chapter are evaluated in terms of the objectives given in Chapter I. Conclusions based on the evaluation are also given along with recommendations for practice and subsequent research. The first objective of this project was: to determine the information existent in four antecedent and collateral variables relative to domain achievement. It should be recalled that data are considered information if and only if it reduces the uncertainty involved in making a decision. Analysis of the four information variables suggested that only two truly yielded information. Sex was unrelated to any of the variables of the study. TEST, while correlated with domain achievement, contained no information not present in TIME. The other variable which contained information relative to DOMAIN was STEP achievement. The two significant information variables indi- cate prior mathematics achievement and learning rate. The relationship between prior mathematics achievement and subsequent test performance was certainly expected. 78 79 The fact that learning rate had predictive utility independent of achievement is of interest. This finding is consistent with.Carroll's (1963) hypothesis that time is a central factor in achievement. The findings of this research suggest that if two pupils have identical prior achievement and different prior learning rates, the student with the higher rate will be expected to score higher on subsequent achievement measures. Thus, in terms of estimating posterior scores everything else being equal, quicker students should surpass the less quick ones. In addition, students with slightly inferior achievement but higher learning rates should be expected to catch pupils with higher achievement Vbut lower learning rates. It seems reasonable to conclude that in the long run if opportunity and motivation are equal the advantage will always be with the quicker student. The classroom teachers trying to summarize the useful prior information they possess relevant to subse- quent achievement should consider both achievement and rate of learning. Achievement level seems to be most important; however, rate, being a dynamic variable, should be considered in terms of the length of time which has passed since the last appraisal of achievement level. Gettinger and White (1979) have recently reported 80 an approach to measuring time to learn which would allow teachers in traditional classroom settings to easily appraise learning rate. It is recommended that teachers familiarize themselves with their approach and apply it routinely. The procedure is as follows: pupils study stan- dard materials,which they have not mastered,for a speci- fied length of time and are then tested. This is repeated until mastery at some arbitrary upper limit has been reached. Time to learn is then said to be the number of trials required. The cited authors had students follow the process for six types of tasks and set time to Learn as the mean number of trials needed for mastery. The second objective of this study aimed to determine: 1) the incremental validity of short domain tests, 2) if decision precision can be improved by using antecedent and collateral data with test results, and 3) the functional lengths of several short domain tests. Incremental validity refers to the extent to which a multiple correlation is raised by the addition of test results to a set of prior existing information. 81 Thus the incremental validity of SUBTEST(6), SUBTEST(12), and SUBTEST(18) is .1478, .3167, and .3272 respectively. The incremental validity of the six item test is less one quarter of the base information (assuming no prior information with respect to the bases). Cronbach and Gleser (1965) have written that "tests should be judged by the increase in validity which they offer." In terms of information, as this study has defined it, the six item test does provide some. In order to determine if the amount of informa- tion is meaningful with respect to mastery-nonmastery decisions, the decision precision based on the prior information and the prior information combined with the six items subtest was compared. (It should be recalled that "decision precision" has previously been defined as the proportion of correct classifications made on the basis of a given decision algorithm. The decision based on the application of the algorithm to the domain achievement score is the correct one.) The results of this comparison indicated that decision precision was not improved by using the six item test. The implication of this finding is clear. Test data do not provide decision relevant information that was not available prior to testing. Thus, while use of the tests might be justified on instructional 82 grounds, a decision to test with six items is not justi- fiable as a means of improving decisions about mastery- nonmastery. This is true regardless of whether the prior information is incorporated by least squares or Bayesian approach. The number of test items necessary to provide information equivalent to that of the collateral, ante— cedent and test information already available is referred to as the functional length of a test. Thus TEST, TIME and SUBTEST(6) have a functional length of 8.03. One could use Figure 2 to set a functional length for the base prior information. The value would be slightly more than five. It is clear that if one considers the prior information and then the six item test, the information value of the test is reduced to that of about three items. The findings of phase III of the analysis suggest that this is not a sufficient number of items to improve decision precision, vis-a-vis mastery- nonmastery, significantly. For subtests of 12 items or more the functional length is the same as the actual length. Thus one would expect that the decision precision of an algorithm incorporating prior information would be the same as one based solely on test score. To address the final objective of this research, 83 comparisons of the three decision approaches were made. With respect to decision precision the three approaches do not differ. In order to spur insights into the result of no difference among the approaches it is useful to compare the approaches in detail. While each of the models is linear, the least squares approach is not directly comparable algebraically to the other two. However, the Bayesian and raw score approach are analogous and comparison of their algebraic basis is instructive. In order to do this, one should recall the Kelley model for estimating true scores. The Kelley model is T = pXX, X + (l-pxx,)X Where pxx' is the proportion of true to observed variance, X is an observed score and X is the mean of such scores (T = X). The raw score approach is the specific case where pxx, = 1 and thus T = X. The Bayesian Marginal Mean Model has the same form as Kelley's Model. Like Kelley's approach, it contains a parameter which is, in part, a function of score variance. However, this parameter is also influenced by prior subjective estimates about the sample in question. Specifically, this prior information 84 is incorporated into the model by specification of a value for prior information in terms of the number of test items the information is worth. Table 13 indi- cates that p* is clearly a function of t. However, Figures 3 through 7 as well as the results of the Scheffe' contrasts suggest that the decision about mastery- nonmastery is not particularly sensitive to t. It appears that for the purpose of classifications of mastery or non-mastery, incorporation of prior informa- tion by means of t has little value. For after the complex calculations of the Bayesian Model are completed it functions as the raw score form of Kelley's Model. For making the kinds of decisions made most frequently by educators, the raw score model is clearly indicated because of its simplicity. The following three points summarize the comparison of the models. 1. Decision precision was the same for the six item raw score model and the least square model containing only ante- cedent and concommitant information. 2. Decision precision was improved when 12 items were used rather than six. 3. The raw score model is preferred to the Bayesian model. 85 All of the preceding discussion holds for mastery levels of .70, .75, .80, .85, and .90. However, across all models the precision decreases as the mastery level is increased. This trend does not appear to be uniform for all models. It seems as though the models containing the least information decline most in precision. Both the raw and Bayesian approaches using six items show the greatest consistent decline. The finding of this research which seems to have the greatest utility for current classroom practice is that selected prior information appropriately weighted, can be used to yield decisions about subsequent achievement which are as accurate as decisions based on a six item test. This fact can be useful as teachers informally monitor pupils on a day to day or even minute to minute basis. Assuming that a teacher has prior measures of achievement and rate of learning is invariant (at least within a subject and group of pupils) it may be sufficient to keep track only of students on task behavior to assure they are progressing. Perhaps students can be taught that frustration in learning attempts signals a diagnosis point where they should ask for help. If the teacher can't easily identify the problem, then a test of suffi— cient length to diagnose the difficulty is called for. It may be that the frequent tests called for by current 86 individualized programs are unnecessary. What may be called for instead is a sound initial placement of instruc- tional materials and methods based on learning rate. After this start subsequent testing can be done when frustration is indicated by off task behavior or iden- tified by the student. Such an approach would probably result in some students taking frequent tests and others taking very few. It would reduce unnecessary assessment and assure that when a test was given its purpose would be clear to both teacher and student. Hopefully, it would allow tests with sufficient items to assure infrequent errors in instructional decisions. These suggestsion will need further investigation. This research cannot be generalized beyond the curriculum and grade level of focus. Such extension would require further research. It is suggested that efforts be focused on issues related to classroom practice as discussed in the previous paragraphs rather than the replication of the present study. LIST OF REFERENCES LIST OF REFERENCES Anscombe, F.J. The Transformation of Poisson, Binomial and Negative Binomial Data. Biometrika, 1948, 35, 246-254.» Baker, E.L. Beyond objectives: Domain-referenced tests for evaluation and instructional improvement. EDUC. TECH., 1974, 14, 10-16. Bormuth, J.R. On the Theory 9f Achievement Tests, Chicago: University of Chicago Press, 1970. Carroll, J.B. A model for school learning. Teachers College Record, 1963, 64, 723-733. Cohen, J. and Cohen, P. Applied Multiple Regression] Correlation Analysis for the Behavioral Sciences. Hilldale, New Jersey: Lawrence Erlbaum Associates, 1975. Cronbach, L.J. and Gleser, G.C. Psychological tests and Personnel Decisions. University of Illinois Press, Urbana, 1965. Cronbach, L.J. Test Validation In R.C. Thorndike (ed.) Educational Measurement: Second Edition, Washington, D.C.: American Council on Education, 1971. Draper, N.R. and Smith, H. Applied Regression Analysis. New York: John Wiley & Sons, Inc., 1966. Ebel, R.L. Criterion referenced measurements: limita- tions. School Review, 1971, 62, 282-288. Fhaner, S. Item sampling and decision making in achieve- ment testing. British Journal 2f Statistical Psychology, 1974, 21, 172-176. Gettinger, M. and White, M.A. Which is the StrongeSt correlates of school learning? time to learn or measured intelligence: Journal 9f Educational Psychology, 1979, 11, 405-412. 87 88 Glaser, R., and Nitko, A.J. Measurement in learning and instruction. In R.L. Thorndike (ed.) Educa- tional Measurement. Washington: American Council on Education, 1971. Greenhouse, S.W. and Geisser, S. On methods in the ,analysis of profile data. Psychometrika, 1959, 23, 95—112. Harris, C.W., Alkin, M.C. and Popham, W.J. Problems in Criterion-referenced Measurement, CSE monograph series in evaluation. No. 3. Los Angeles: Center for the Study of Evaluation, University of California, 1974. Harris, M.L. and Steward, D.M. Application of classical strategies to criterion referenced test construc- tion. A paper presented at the Annual Meeting of the American Educational Research Association, 1971. Hambleton, R.R. Testing and decision making procedures for selected individualized instructional programs. Review gf Educational Research, 1974, 44,’37l-400. Hambleton, R.R., Novick, M.R. Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 1973, 12, 1593170."’ Hambleton, R.R., Swaminathan, H., Algina, J., and Coulson, D.B. Criterion-referenced testing and neasurement: A review of technical issues and developments. Review 9f Educational Research, 1978, 48, 1-47. Hively, W., Patherson, H.L., and Page, S.A. A "universe- defined" system of arithmetic achievement tests. Journal of Educational Measurement, 1968, 5, 275-290._‘ Ivens, S.H. An investigation of item analysis, reliability and validity in relation to criterion-referenced tests. Unpublished Doctoral Dissertation. Florida State University, 1972. Jackson, P.H. Simple approximations in the estimation of many.parameters. British Journal 2f Mathematical and Statistical Psychology, 1972, 25, 213-229. 89 Kelley, T.L. Interpretation 9f Educational Measurements. Yonkers on Hudson, New York: World Book, 1927. Kerlinger, F.N. and Pedhazur, E.J. Multiple Regression in Behavioral Research. New York: Holt, Rinehart and Winston, Inc., 1973. Lewis, C., Wang, M.M. and Novick, M.R. Marginal distri- butions for the estimation of prOportions in m groups. ACT Technical Bulletin No. 13. Iowa City, Iowa: The American College Testing Program, 1973. Lord, F.M. and Novick, M.R. Statistical Theories 2f Mental Test Scores. Reading, Mass.: Addison- Wesley, 1968. Mager, R.F._ Preparing Instructional Objectives. Palo Alto, California: Pearson Publishers, Inc., 1962. Millman, J. Determining test length: Passing scores and test lengths for objective-based tests. Instructional Objectives Exchange, Los Angeles, California, 1972. Millman, J. Criterion-referenced measurement. In W.J. POpham (ed.) Evaluation in Education: Current Applications. Berkeley, California: McCutchan Publishing Co., 1974. Millman, J. Passing scores and test lengths for domain- referenced measures. Review prEducational Research, 1973, 43, 205-216. Novick, M.R. and Jackson, P.H. Statistical Methods for Educational and Psychological Research. New York: McGraw-Hill, 1974. Novick, M.R., Lewis, C. and Jackson, P.H. The estimation of proportions for M groups. Psychometrika, 1973, 38, 19—45. Novick, M.R. and Lewis, C. Prescribing test length for criterion-referenced measurement. In C.W. Harris, M.C. Alkin.and W.J. Popham (eds.) Problems in Criterion-Referenced Measurement. Monograph Series in Evaluation No. 3. Los Angeles, Center for the Study of Evaluation, University of California, 1974. 9O Osborn, H.G. Item Sampling for.achievement testing. pg. and Psych. News, 1968, 28, 95-104. Popham, W.J. Educational Evaluation, Englewood Cliffs, New Jersey: Prentice Hall, 1975. Rozeboom, W.W. Foundations 9f the Theory 9f Prediction. Homewood, Illinois: The Dorsey Press, 1966. Sechrest, L. Incremental validity: A recommendation. Educational and Psychological Measurement, 1963, 33, 153—158. Swaminathan, H., Hambleton, R.R., Algina, J.. A Bayesian decision-theoretic procedure for use with criterion- referenced tests. Journal 9f Educational Measurement, 1975, 12, 87-98. Wang, M.M. Tables of constants for the posterior marginal estimates of proportions in M groups. ACT Technical Bulletin No. 14. Iowa City, Iowa: The American College Testing Program, 1973. LIST OF NOTES 91 LIST OF NOTES 1. Hambleton, R.R., Swaminathan, H., Algina J., and Coulson, D. Criterion Referenced Testing and Measurement: A Review E: Technical Issues and Developments. Unpublished Manuscript, University of Massachusetts, 1975.