A COMPARE$ON 0F 'B’HE MMEAR AND A CQ‘NHQUM Q’CG’MNG FECE‘FENEWE FOR SQQRENEG AN QMECFWE FEST OF ACAE’MEC MGWVANQN W021: {Em fin Degree of MI. D. WIIW STATE Imwmm Lawrence Leroy Koppin 3968 Michigan Sta” University This is to certifg that the thesis entitled A COMPARISON OF THE LINEAR AND A CONFIGURAL SCORING TECHNIQUE FOR SCORING AN OBJECTIVE TEST OF ACADEMIC MOTIVATION presented by LAWRENCE LEROY KOPPIN has been accepted towards fulfillment of the requirements for Ph.D. degree in Education Major professor June 4, 1968 ABSTRACT A COMPARISON OF THE LINEAR AND A CONFIGURAL SCORING TECHNIQUE FOR SCORING AN OBJECTIVE TEST OF ACADEMIC MOTIVATION by Lawrence Leroy KOppin The purpose of this study was to 1) develop a configural scoring technique, 2) apply the technique to the scoring of an objective test of academic motivation, and 3) compare the reliability and predictive valid- ity of the configurally scored test with the linearly scored test. The Study made use of data and other materials from Farquhar's studies of academic motivation.l In particular, the linearly scored test used in the comparison was the Generalized Situational Choice Inventory (GSCI) developed by Farquhar. The configural technique was based on McQuitty's method of item selection 2 Criterion groups of over- and underachievers were each divided into two subgroups by the application of the Rank Order Typal Analysis. Four configural tests were constructed, each one composed of items which discriminated a particular pair of over- and underachieving types. A scoring rationale was developed in which configural test scores were combined to produce a set of Type scores, and in which one of the four Type scores was selected as the overall configural test score The sample used to compare the linear and configural forms con— sisted of 256 male eleventh grade students. Although the reliability of the configural form could not be estimated from the available data, the split-half correlation of the configural scores (.56) was significantly lower than the split~half correlation of the linear scores (.71) Lawrence Leroy Koppin The predictive validities of the two forms were explored by com- paring the coefficients of multiple correlation produced when the sub- ject's actual Grade Point Averages were compared with GPA predicted from 1) the linear GSCI score, and 2) the configural GSCI score, each in conjunction with the subject's Differential Aptitude Test-Verbal Reason— ing score. The coefficients (.685 and .693, respectively) were not significantly different. It was concluded that the configural form had no advantages over the linear form, and that, in fact, it had several disadvantages, namely, lower reliability and the greater effort required to produce the config- ural score. It was suggested that the general failure of configural studies to produce encouraging results might have resulted from the basic dif- ficulty of attempting to identify configural information in tests that had been carefully constructed along strictly linear dimensions. It was further suggested that more encouraging results might be obtained from the use of tests based on psychological theory which incorporated the concept of types and which predicted the dimensions along which the types would be discriminated. William W. Farquhar, Motivation Factors Related to Academic Achievement, U. S. Office of Health) safikziideahd NETfare, Cooperative Research—Troject #846, ER 9, Office of Research and Publications, College of Education, Michigan State University, East Lansing, Michigan, 1963, 506 pp Louis L. McQuitty, "Item Selection for Configural Scoring,” ESEE§£fi939}.§flg.PSXChOIOEical Measurements, 1961, Vols XXI, pp. 925-8 A COMPARISON OF THE LINEAR AND A CONFIGURAL SCORING TECHNIQUE FOR SCORING AN OBJECTIVE TEST OF ACADEMIC MOTIVATION by Lawrence Leroy Koppin A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education 1968 5;, 533.19 4:2 /‘/ 5/; ACKNOWLEDGMENT The writer is especially indebted to Dr, William W. Farquhar for his persistent encouragement in undertaking this project, and for his generosity in making available the data from his study of academic motivation. ii TABLE OF CONTENTS Chapter . Page TABLE OF CONTENTS . . . . . _ . . . . . . . . . iii LIST OF TABLES. . . . . . . . . , . . . . . . . . . . v LIST OF FIGURES . . . . . . . . . . . . . . . . . vi I. FORMULATION AND DEFINITION OF THE PROBLEM . . . . . . . 1 Purpose of the Study . . . . . . . . . . . . . . . 2 Need for the Study . , . . 2 Review of the Literature and Development of a Model. 2 The Linear Function . . . 2 The General Polynomial Function . . . . . . . . 5 Other Approaches to the Problem . . . . . . . . 10 McQuitty's Item Selection Technique . . . . . . . 12 Advantages of McQuitty's Technique. . . . . . 15 A New Scoring Rationale . . . . . . . . . . 17 Statement of the Problem . . . . . . . . . . . 21 The Research Hypotheses. . . . . , . . 22 Statement of the Research Hypotheses. . . . . . . 23 Organization of the Study. . . . . . . , . 24 II. INSTRUMENTATION AND PROCEDURE . . . . . . .'. . 25 The Generalized Situational Choice Inventory . . . 25 Rank Order Typal Analysis . . . . . . . . . . . . . . 27 Procedure . . . . . . . . . . . . . . . . . 32 Step One; The Identification of Types . . . . . 32 Step Two: The Construction of the Configural Tests. . . . . . . . . . . . . . . . 33 Step Three: The Test of the Hypotheses . . . . . . 34 Hypothesis Regarding Reliability . . . . . 35 Hypothesis Regarding Validity. . . . . . . . . . 35 Summary . . . . . . . . . . . . . . . . 36 III. THE CONSTRUCTION OF THE CONFIGURAL FORM . . . . . . . . 38 The Identification of Types. . . . . . . . . . . 38 The Typing Procedure . . . . . . . . . . . 39 The Types that were Identified . . . . . . . . 43 Factors which Discriminate the Types. . . . . . 44 The Construction of the Configural Tests'. . . 46 The Technique for Assigning Test Scores , . . 47 The Expected Score Technique . . . . . . . . . 49 The Pattern Technique 2 . . . -. . . . 56 iii Chapter Page The Significance Level for Item Selection . . . . 60 Summary. . . . . . . . . . . . . . _ . . . . 65 IV, THE TEST OF THE HYPOTHESES . . . . . . . . . . . 67 The Hypothesis Regarding Reliability . . . . . . . . . 67 The Statistical Hypotheses. . . . . . . . . . . 69 The Test Procedure. . . . . . . . . . . . . 69 Discussion of the Result . . . . . . . . . . 70 The Hypothesis Regarding Validity. . . . . . . 73 The Statistical Hypotheses. . . . . . . . . 74 The Test Procedure . . . . , . . . . . . 75 The Discussion of the Result . . . . . . . . . 76 Summary. . . . . . . . . . . . . . . . . . . . . 81 V SUMMARY, CONCLUSIONS AND IMPLICATIONS . . . . . . 84 Summary. . . . . . . . . . . . . . . 84 Conclusions . . _ . . . . . . . . . , 86 Implications . . . . . , . , , . . . . , . 87 BIBLIOGRAPHY. . . . . . . - -.- . . . _ . 90 APPENDIX, CHI-SQUARE STATISTIC FOR RESPONSES BY ALL POSSIBLE PAIRS OF TYPES. . . . . . . . . . . . . 92 iv Table 1.1 1.2 1 3 3.2 3.3 3 4 3 5 3 6 3.7 4 2 4.3 4 4 LIST OF TABLES Page Responses to Two Test Items by Four Categories of Subjects (Hypothetical Data) . . . - . . . . . . . . l3 Configural Test Scores by Four Categories of Subjects (Hypothetical Data of Table 1.1) . . . . . . . . . . 14 Type Score Patterns for Four Categories of Subjects (Hypothetical Data) . . , . . . . . . . . . . , . , 18 Rank Order Matrix of Several Sub—categories in which A is a Member (Hypothetical Data). . . . . . . . . 30 Items which Discriminate Both Overachieving and Underachieving Types (.001 Level). . . . . . . . . , . 45 Mean Configural Test and Type Scores for All Types (Preliminary Form) . . . . . . . . . . . . . . . . . 49 Cross Tabulation of Assigned Type Versus Known Type (Preliminary Form) , . . . . . . . . . . . . . . . , . . 60 Number of Items in Each Form of the Configural Tests . . 61 Number of Correct and Incorrect Identification of Types Made Using Each Form of the Configural Tests . . . , . , 62 Split—half Correlations for Configural Test and Overall Test Scores for Each Form of the Configural Tests. . . . . . . . . . . . . . . . . . . . . . . 63 Mean Configural Test and Type Scores for All Types Final Form . . . . . . . . . . . . . . . . . . . . , 64 Cross-Tabulation of Type Identification Made Using Split—half Forms of the Configural Test. . . . . . . . . 71 Beta weights, Multiple Regression Equation Coef— ficients, and Coefficients of Mu1tip1e Correlation for the Linear and Configural Forms of the GSCI. . - . 75 Intercorrelations Among the Linear and Configural Tests. . . . . . _ . . . . . . . . . . . . . . 77 Summary of the Polar Theory of High and Low Academic Achievement Motivation Used in Constructing the GSCI . . 79 LIST OF FIGURES Patterns of Type Scores for the Ideal Types of Table 1.3 (Hypothetical Data). Development of the Main Overachieving Type (First Cycle) . . . . . . . . . . . . . . . . Pattern of Mean Type Scores for All Types. Data from Table 3.2 Frequency Distribution of Assigned Scores for Overachieving Types Scores Assigned by Expected Score Technique; . . . - . . . . . . . . . Frequency Distribution of Assigned Scores for Underachieving Types. Scores Assigned by Expected Score Technique. . . . . . . . . Type B Score Pattern Positioned to Show Type B and D Scores Equidistant from Expected Scores. . Frequency Distribution of Assigned Scores for Over- achieving Types. Scores Assigned by Pattern Technique. . . . .'. . . . . . . . . Frequency Distribution of Assigned Scores for Under~ achieving Types. Scores Assigned by Pattern Technique. vi Page 19 41 50 53 54 55 58 59 CHAPTER I FORMULATION AND DEFINITION OF THE PROBLEM A recurring subject of interest in the field of psychometrics is that of ”configural scoring”, a term which refers to all those scoring techniques which attempt to take into account the pattern of a set of scoring responses to individual items in a psychological test or inven— tory. The techniques are in contrast to conventional approaches which, it is claimed, fail to appreciate the "dynamic, global, meaningful, holistic, subtile, . . . configural, patterned, [bi] organized ." nature of the responses. One approach, hereafter called the linear technique, takes into account only the number of items responded to in a predetermined direction. Implicit in the consideration of configural scoring is the assumption that the configuration of a set of responses contains useful psychological information which is not reflected in test scores obtained through the linear technique. Most configural scoring studies have been directed towards developing a mathematical model for use in identifying the hidden ”configural” information. Little progress, however, has been made in developing a configural scoring technique which has any advantages over the conventional scoring techniques This study represents one more attempt to improve on conventional scoring techniques. Paul Horst, ”Pattern Analysis and Configural Scoring," Journal of Clinical Psychology, 1954, Vol. X, p. 4. Purpose of the Study The purpose of this study was to develop a technique for config- ural scoring, to apply the technique to the scoring of a particular psychological inventory, and to compare the results thus obtained with those obtained by the application of the more common linear scoring technique. Need for the Study The study can be justified solely by the fact that a scoring technique which identifies and makes use of the information supposedly hidden in the pattern of responses to items in a psychological test would be a valuable psychometric tool. The usefulness of objective psychological tests might be extended by providing an investigator with information about a subject which is not presently available. In addition, the reliability and validity of a test might be improved. Review of the Literature and Development of a Model There have been, in general, two phases in the development of configural scoring concepts and techniques. First, there were attempts to analyze the problem using mathematical models. Then, techniques were developed which sacrificed some mathematical sophistication in an effort to solve some of the practical problems encountered in applying the mathematical models. This study is of the latter type. The Linear Function The linear scoring technique commonly used by psychometricians 3 defines the scale score as the simple sum of the items responded to in a predetermined direction. The mathematical model takes the form: Ti = xi1 + x12 + xi3 + . . . + xim' (1.1) The final score of individual i_is given by Ti’ where his 3 responses are symbolized xil’ x12, x13, . . . , xim' Frequently, each item is assigned a weight according to some pre- determined criteria of importance, and Ti becomes a differentially weighted sum. This more general model takes the form: T = 1 W0 + w + w 1xi1 2x12 + . . . + meim' (1 2) The weights assigned to each of the m_items are sumbolized w w 1’ 2’ “’3’ , wm. The W0 weight is an additive constant. Equation (1.1) is a special form of equation (1.2) in which W0 is zero and all the items are assigned a weight of one. Using the linear scoring technique, it is possible for many individuals, each with a different pattern of reSponses, to be assigned the same scale score. With an inventory consisting of only two equally weighted dichotomous items, there are two different response patterns that produce a scale score of one, 1.2., the ”yes-no" and ”no-yes" patterns. Theoretically, the two patterns could have quite different psychologisal meanings, although they produce the same scale score. If the inventory consists ofibur items, there are, for example, six differ- ent response patterns that produce a scale score of two. Obviously, the linear technique does not take into account the information supposedly contained in the response patterns. Meehl exposed the possible limitation of the linear scoring 4 technique by proposing his well-known paradox.l The paradox was developed by asking the following question: Consider two dichotomously scored test items, which we wish to use in predicting a dichotomous criteria . . . , say, "schizo- phrenic" versus ”normal" Suppose now that (in the supply) each of the two items has exactly 50 per cent difficulty within each category, so that half of the schizophrenics and half of the normals answer each item "true" and half of each group answer each item "false". Under these cir— cumstances any of the usual methods of item analysis . . will show both items to have "zero validity” for the criteria . . . . Under such conditions, is it possible to predict the criteria solely upon the basis of the response to these items, and if so, how well could it theoretically be predicted? Meehl reported that nearly everyone to whom he has proposed these questions concluded that accurate prediction was impossible. He then presented a hypothetical illustration in which half of the normals answered the items with a ”true-true" pattern, and half with a "false- false” pattern. Similarly, half of the schizophrenics answered with a ”true-false” and half with a "false-true” pattern. If these data were analyzed using conventional techniques it would have to be concluded that the items do not discriminate between the two groups. Meehl pointed out, hOcher, that if proper attention were given t) the reSponse pattern it would be found that the items discriminate perfectly between the two groups, inasmuch as all of the normals responded with either a ”true—true" or a "false-false" pattern, and all of the schizophrenics responded with either a ”true-false" or a ”false-true" pattern. Meehl did not attem,t to rzlate his analysis of the problem to a scoring meth- odology, but his analysis clearly illustrated the possibility that Paul Meehl, ”Configural Scoring,” Journal of Consulting Psychology, 1950, Vol. XIV, pp. 165-6. 5 useful information might be contained in the pattern of responses in a psychological inventory. The General Polynomial Function Horst1 examined the Meehl paradox by using a general polynomial approach he had proposed earlier: an approach to the problem of configural phenomena . is to consider a very general formulation involving sets of polynomials in all of the variables in such a way that each succeeding set would be one degree higher than the one immediately below, beginning with zero degree . . . . A system of this type could then be considered in such a way that each power of product term would be regarded as a separate variable. The application of the formulation yields the following general equation: T1 = wO +§3 kaij +Z3X: ijxijxik + Z:{:{: ijlxijxikxil + . . . . (1.3) j j k j k 1 Here, again, the final score of individual i'is given by Ti‘ The sum- mation in each instance is over the m;original scores (x11, Xi2> Xi3’ , Xim)3 that is, j, k, l, . . . each run over the integers l, 2, 3, . , m. If equation (1 2) is written in a parallel form, I l O jxij , (l.+) T. = w + w j it can be seen that the first two terms of the right hand member of equation (1.3) are identical with the right hand members of equation (1.4). The additional power and product terms in equation (1.3) represent the additional configural information. The general polynomial fifi—Y v vv—w l Horst, op. cit., pp. 3-11. Paul Horst gt six, "The Prediction of Personal Adjustment,” _§ocial Science §£§3a££h_pouncil Bulletin, 1941, p. 48. 6 equation is, therefore, one technique which, at least in theory, is capable of identifying the hidden configural information. Lubin and Osburn proposed that the problem of configural scoring be explored by directly investigating all of the possible response patterns of a test consisting of t dichotomous items, and assigning to each answer pattern a ”configural" score 1 They defined a "configural scale" as follows: Given a test of‘t items and a quantative criteria, form all possible answer patterns and assign to each subject a score which is the mean criterion score for all subjects in the answer pattern. Through a strictly mathematical analysis they demonstrated that the configural score was a more valid predictor of the original quantitative criteria than three common scoring techniques. They did not, however, empirically validate the theoretical excellence of the configural score by applying the technique to any sample data. They further demonstrated that the configural scale could be represented exactly by the general polynomial equation proposed by Horst. Although Lubin and Osburn demonstrated the theoretical superior- ity of their scoring technique over the linear techniques, their work also illustrated the limitation of the general polynomial approach. There are 2n possible answer patterns to an inventory consisting of‘n . . n . . . dichotomous items, and also 2 terms in the general polynomial equation. With an n'of only ten, there are 1054 possible answer patterns. H fi w v v—v lArdie Lubin and H. G. Osburn, ”A Theory of Pattern Analysis for the Prediction of a Quantitative Criteria," sychometrika, 1957, Vol. XXII, pp 63-74 2 Ibid., p. 64. 7 Obviously, as 3 increases, the number of terms quickly becomes unmanage- able. Of course, the availability of high speed computers partly a11e~ viates this difficulty. There remains, however, the problem of sample size. In order to obtain a reasonably good estimate of the criterion mean for the population with each response pattern, the sample would have to be sufficiently large to ensure some minimum number of subjects in each pattern group. If there were only ten subjects in each pattern group, the sample size for the case mentioned above would be more than ten thousand. Inasmuch as some patterns would probably occur rarely in any sample, the actual sample would have to be considerably larger than ten thousand to ensure a minimum of ten in the smallest pattern group. If, in an effort to minimize these problems, the test was limited to a small number of items, the question of test reliability would, doubtless, become critical. It seems obvious, therefore, that the general poly— nomial equation, as first proposed, cannot provide a practical solution to the configural scoring problem. Both Alfl and Horst2 deve10ped techniques for reducing the number of terms in the polynomial equation. Although the techniques they developed differ somewhat, they both employed a multiple regression analysis to identify and eliminate the terms that added little or noth- ing to the total variance. Lunneborg investigated the possibility of using a factor analyt- ii fiv vv—f -—v E” F, Alf, ”Configural Scaring and Predictien,” (Doctoral Dissertation, University of Washington, 1956) Technical Report, Public Health Research Grant M—743(C2), University of Washington Division of Counseling and Testing Services, November 1956. Paul Horst, ”The Uniqueness of Configural Test Item Scores,” Journal 2: Clinical Psychology, 1957, Vol. XIII, pp. 107-114. 8 ical technique to eliminate those terms in the general polynomial 1 equation which contained little or no configural information. He proposed that the terms of the polynomial equation be computed for each of a large number of subjects, and that a factor or dimensional analysis be made of the resulting matrix. Those terms which did not contribute to any of the resulting dimensions could be eliminated. Of course, if each of the polynomial terms is considered as a separate variable, the number of variables in the matrix could easily be so large as to make a dimensional analysis impractical or impossible. To reduce the number of terms, Lunneborg developed a technique for combining several variables to form a single variable. This was a modification of Horst's multiple 2 regression analysis approach. Lunneborg pursued the development one step further in order to arrive at a technique that would enable him to test the assumption upon which the concept of configural scoring is based, namely, that the pattern of responses contains usable information not found in the summed scores. He reaSoned that if the answer patterns did contain any config- ural information, the rank of the matrix that contained the higher order configural variables would be higher than the rank of the matrix that contained only the first order variables. In other words, if a factor analysis of the simple item responses yielded, say, five factors, then the factor analysis of the matrix containing configural variables should yield more than five factors. Any factors more than five could be l C. E. Lunneborg, ”Dimensional Analysis, Latent Structures, and the Problem of Patterns,” (unpublished Doctoral Dissertation, University of Washington, 1959). 2 . Horst, loc. Cit. 9 accounted £01 by the gresence of configural information in the anSWcr patterns. Unfortunately, when Lunneborg tested this hypothesis, the configural matrix yielded the same number of factors as did the matrix of simple responses, and he had to reject the hypothesis that the answer patterns contained configural information. In effect, this meant that all of the higher order terms in the general polynomial equation were eliminated. Although Lunneborg's findings casts some doubt on the validity of the assumption underlying the concept of configural scoring, the matter is far from settled. For one thing, in order to keep the number of variables within manageable limits, he used a psychological inventory consisting of only fourteen items. Such a short test might be expected to have low reliability. If he had used an inventory consisting of a greater number of items he might have found evidence of configural information. A more important consideration is the fact that a failure to identify configural information in existing psychological tests does not mean that a test cannot be constructed that would contain configural information. The consideration of configural scoring seems to imply that there are hitherto unidentified dimensions along which individuals in criterion groups can be discriminated. If the tests are constructed in ignorance of those dimensions, one should not expect the dimensions to magically appear in the test. Almost all of the literature surveyed thus far has been concerned with attempts to apply the general polynomial equation to the configural scoring problem. Although these attempts have generally yielded prom- ising theoretical results, the recurring problem of the large number of variables has defeated all attempts to produce a practical technique for lO configural scoring. As Lunneborg has said: It should be readily apparent . . . that research on the configural scoring problem has now reached the point where the major concern is no longer with a rational expression of the problem but rather with the overwhelming number of parameters inherent in such an expression. Unfortunately, an adequate solution to this problem has not been found. This is not to imply that the general polynomial approach will never yield more encouraging results. It does suggest, however, that it might be well to consider some alternate approach to the problem. Other Approaches to the Problem McQuitty proposed using a type of agreement analysis to score an inventory to differentiate mental patients from community persons. He developed a technique for computing an agreement score for each possible pair of subjects in each of two validation groups composed, respectively, of mental patients and community persons. The agreement scores for each group were separately listed in descending order, from most to least agreement. Successive pairs of subjects were defined as a set of subcategories called genera. First, the pair with the highest agreement score was selected. Then, the next highest pair was selected, not counting pairs in which either of the subjects in the first pair appeared. In a similar manner, the selection continued until all sub~ jects were included. Next, successive pairs of genera were combined to create a set of subcategories called Species. Then, pairs of species Lunneborg, op cit., p. 15. 2Louis L. McQuitty, ”Pattern Analysis Illustrated in Classifying Patients and Normals," Educational and Psychological Measurements, 1954 Vol. XIV, pp. 598—604. 11 were combined to form families. The process of combining pairs of lower order subcategories to form higher order subcategories continued until a complete hierarchy of subcategories was formed. The lowest order sub- category consisted of pairs of subjects; the highest order consisted of all the subjects in the validation group. A hierarchy was created separately for each validation group. For each subcategory, as genera, species, family, a response pattern was identified which was common to all the subjects in the subcategory. McQuitty then computed the agreement score of each subject in two cross-validation groups composed, respectively, of mental patients and community persons, with the reSponse patterns of each subcategory in each hierarchy. If a subject's highest agreement score was with one of the mental patient subcategories, that subject was identified as a mental patient Similarly, if the subject was most like one of the community person subcategories, he was identified as a community person. The assignments thus made were compared with the cross-validation group from which the subject was known to come. Although about twice as many correct as erroneous assignments were made, the trial failed to produce adequate evidence that the technique was superior to the common scoring techniques. Campbell developed a different configural scoring technique for 1 use with the Minnesota Vocational Interest Inventory. Each item in the inventory consists of a set of three activities. The subject is t) indicate both the activity in which he would most, and least, like to be 1 David Campbell, ”Another Attempt at Configural Scoring", Educational and Psychological Measurements, 1963, Vol. XXIII, pp. 721~7. 12 engaged. There are six possible response patterns for each item. Items were selected as discriminating between ”men in general” and a particular occupational group when a high proportion of one group responded with a particular pattern; and, on the same item, a small proportion of the other group responded with the same pattern. The items thus selected were used to discriminate between men in two cross-validation groups. For each subject, an item was counted only when he reSponded with the predetermined pattern. Although Campbell found that the configural scoring technique was as reliable as the usual scoring method, it was less successful in discriminating between the men in the two cross~ validation groups. Discouragement about configural scoring was doubt- less expressed by Campbell in the title he gave to his paper, "Another Attempt at Configural Scoring". McQuitty's Item Selection Technique McQuitty returned to one of the basic problems raised by the l Meehl paradox, that of item selection. Using Meehl's illustration as a 2 , . background, McQu1tty observed that there were actually two categories, or types, of schizophrenics and two types of normals. One type of schizophrenic responded to the two items with a "true—false" pattern; ' pattern. Similarly, one the second type reSponded with a ”false-true' type of normal responded with a "true-~true'l pattern, and the other with a "false—false” pattern. He designated these four types as A, B, C, and D, respectively. The responses of each type to each item are shown 1 Louis L. McQuitty, "Item Selection for Configural Scoring,” Educational and Psychological Measurements, 1961, Vol. XXI, pp. 925—8. 2 Meehl, loc. cit. 13 in Table 1.1 (from.McQuitty). Table 1.1 Responses to Two Test Items by Four Categories of Subjects (Hypothetical Data) Schizophrenic Type Normal Type Item Answer A B C D 1 True 50 0 SO 0 False O 50 O 50 2 True 0 50 50 0 False 50 O O 50 McQuitty next observed that there were four tests of one item each, called tests AC, AD, BC, and BD to indicate that they differenti- ated between the types represented by the letters in their names. Item 1 differentiates between types A and D, and between types B and C; item 2 differentiates between types A and C, and between types B and D. If scoring weights are assigned to these tests so that the schizophrenic types receive a score of one, and the normal types receive a score of zero, the following tests are defined: 1. Test AC, Item 2, True = 0; False = 1 2. Test AD, Item 1, True = 1; False = O 3. Test BC, Item 1, True = 0; False = O 4. Test BD, Item 2, True = 1; False = 0 When these tests are used t3 score the item responses shown in Table 1.1, the scores shown in Table 1.2 are obtained for each type of individual. The sum of the scores on Tests AC and AD differentiates between schizophrenics and normals, with Types A and B receiving scores of two and zero, respectively, and Types C and D each a score of one. Similarly, the sum of the scores on Tests BC and BD differentiate 14 between the two groups, with Types A and B receiving scores of zero and two, respectively, and Types C and D each a score of one. McQuitty con- cluded that normals could be differentiated from schizophrenics on the basis of the pattern of their scores on the four tests, or the sum of scores on the combination of tests. Table 1.2 Configural Test Scores by Four Categories of Subjects (Hypothetical Data of Table 1.1) Test Subjects AC AD AC+AD BC BD BC+BD Schizophrenic Group Type A l l 2 O O 0 Type B O O O l 1 2 Normal Group Type C O l l O l 1 Type D l l l O l McQuitty suggested that the procedure could be used as a general approach to the configural problem by summarizing the steps as follows: 1. Classify both groups of subjects separately into categories by some pattern—analytic method 2. Pair each category of Group 1 with every category of Group 2. Suppose Group 1 yielded Categories A and B, and Grvup 2 yielded Categories C and D. Then A would be paired with each C and D and likewise for B, both t3 yield Pairs AC, AD, BC, BD. 3 Select all items which are related to each category pair, such as those with a significant chi square for Category A paired with Category C. These items constitute a test which could be called Test AC. There would be one test for each pair: Tests AC, AD, BC, and BD. 1 . McQuitty, 2p. cit , pp. 927-8. 15 4. Decide arbitrarily whether to key to have Group 1 or 2 subject score higher; suppose the decision is in favor of Group 1. 5. Key the significant items for each test so that the answer alternative preferred by a Group 1 category is given a weight of one and the other alternative is given a weight of zero. 6. Score a cross—validational sample with all keys, assigning each subject to the category and thereby to the Group indicated by the several tests; use these results to assess the validity of the tests. Although McQuitty did not develop the scoring methodology as completely as he might have, he did show how a common method of item analysis could be applied to select configurally significant items. He also suggested a technique by which the pattern of scores can be used to differentiate between groups. Of most significance, however, is the fact that his procedure represented a promising approach to the config- ural scoring problem which did not require the use of the general polynomial equation Advantages of Meguitty's Technique One of the most promising features of McQuitty's technique was that it focused attention on the internal structure of the groups to be differentiated rather than on the many possible response patterns. Because it can be expected that the number of types within each group would be considerably fewer than the number of possible response patterns, the total number of configural variables would be drastically reduced. If each test, as AC, AD, BC, and ED, is considered as a con- figural variable, then the total number of variables would be the product of the number of types in each group. If there were, say, three types in Group 1 and four in Group 2, there would be twelve variables, regardless of the number of items in the original test. This technique l6 raises the possibility that the number of configural variables could be kept to a minimum without placing any limitation on the number of test items. It also raises the possibility that the number of variables could easily-be controlled. Just as a group might be divided into a small number of sub-groups or types, so each type might be further divided into a small number of sub-types. By controlling the level to which the typing is carried out, the number of configural variables could be controlled. The lack of limitation on the number of test items promises a solution to a problem that had scarcely been considered in earlier con- figural scoring studies, that of test reliability. In every psycho- logical test there are factors which contribute chance variance to the scores. Any technique which places importance on the pattern of responses should be relatively free of these chance variations, that is, it should have high reliability. If the reliability is low, the small chance variations might produce important chance variations in the pattern of scores and render the analysis useless. It is generally recognized that a test consisting of a small number of items will have a lower reliability than a test consisting of a larger number of equally discriminative items. The general polynomial approach to configural scoring places severe limitations on the number of test items, and, consequently, the investigator is forced to use tests of doubtful reli~ ability. By removing this limitation, McQuitty's approach raises the possibility of using longer tests which have a higher degree of reli- ability. Because of the hypothetical advantages, McQuitty's procedure might provide the basis for the much needed ”alternate” approach to the l7 configural scoring problem. This present study was based on McQuitty's procedure, and it was specifically intended to explore the potential— ities of that procedure for the solution of the configural scoring problem. A New Scoring Rationale The procedure followed in this study was almost exactly that outlined by McQuitty.l However, some additional development of the scoring methodology was necessary. McQuitty limited the application of his technique to the assignment of an individual to one of two criterion groups, e.g., normal or schizophrenic. Although the two groups may be composed, respectively, of individuals possessing a little, and those possessing a lot, of a particular attribute, in practice few individuals fall into one of the ”none” or "all” groups. Most individuals possess ”some” of the particular characteristic, and it is desirable to be able to give them a score that indicates the extent to which they possess the attribute. Borrowing McQuitty's terminology, both Test AC and AD serve to differentiate Type A from each of the Group 2 types, C and D. The sum of these two tests would produce a Type A score, that is, the subject's score if he were a Type A person. Tests BC and BD both differentiate Type B from the Group 2 types, and the sum of these tests produces a Type B score, his score if he were a Type B person. Similarly, the sum of AC and BC produces a Type C score, and the sum of AD and BD a Type D score. There is, then, one type score corresponding to each l McQuitty, loc. cit. 18 ideal type in the analysis. On the hypothetical test under discussion each subject would have four scores, a Type A, B, C, and D score. The question is, of course, which of the four should be assigned as his score on the test? If a technique could be developed to determine to which ideal type a subject was most similar, then the corresponding type score would be assigned to him as his score. One possible technique, the "pattern" technique, makes use of the fact that each ideal type has a unique pattern or profile of type scores. The pattern of type scores produced for an ideal Type A individual dif- fers from the pattern produced for any of the other types. These unique patterns are shown in Table 1.3 (using McQuitty‘s hypothetical data from Table l 1), and presented graphically in Figure 1.1. To apply this technique, it would be necessary to compare the type score pattern of a particular individual with the four ideal patterns, determine to which ideal pattern it is more similar, and assign the corresponding type score to the individual Table 1.3 Type Score Patterns for Four Categories of Subjects (Hypothetical Data) Test Type Score Ideal Type AC AD BC BD A B C D A 1 l O O 2 O 1 1 B O O l 1 O 2 1 l C 0 1 O l l 1 O 2 Score Score Score Score 19 Ideal Type A F Ideal Type B Ideal Type C Ideal Type D l l l A B C D Scoring Type Figure 1.1 Patterns of Type Scores for tht Ideal Types of Table 1.3 (Hypothetical Data) 20 This technique may be illustrated by supposing that the type scores for a subject were 1.8, 0.2, 0.9, and 1.1 for A, B, C, and D, respectively. This pattern of scores is most similar to the ideal Type A pattern; consequently, the individual would be assigned his Type A score of 1.8. A second technique, the ”expected score" technique, makes use of the fact for each ideal type there is a predetermined perfect or expected type score. Referring to Table 1.3, if a subject belong to Type A, he will have a Type A score of two, if he belongs to Type B, his Type B score will be two. That is, Type A or B subjects will have the maximum possible Type A or B score of two. Conversely, Type C or D subjects will have the minimum possible Type C or D score of zero. There is, then, defined a set of "expected” scores. Type A and B expected scores are two; Type C and D expected scores are zero. To apply this technique it would be necessary to compare the type scores obtained for a subject with the four expected scores, determine which type score was closest to its respective expected type score, and assign that type score to the subject. Referring to the illustration used above, the Type A score of 1.8 is closer to the expected Type A score of 2.0 than the Type B score of 0.2 is to its expected score of 2.0, the Type C score of 0.9 to its expected score of 0.0, or the Type D score of 1.1 to its expected score of 0.0. Therefore, the individual would be identified as a Type A individual and assigned his Type A score of l 8. 0n the surface, each of the techniques just described sounds reasonable, and should, theoretically, produce the same results. How- ever, there are certain practical difficulties which may limit or destroy the usefulness of either technique. For example, the pattern technique 21 requires that the profile of type scores obtained for a subject be clearly and obviously similar to one of the four ”ideal" profiles. But there are other possible profiles than the four ideals. There could be a ”saw—tooth", a ”U” shape, an inverted "U”, or a straight line. If one of these profiles occurs, the assignment of type score would have to be based on some set of arbitrary rules or a guessing procedure. In such a situation, little confidence would exist that the proper type score had been assigned. Similarly, the expected score technique requires that one of the type scores obtained for a subject be clearly closer to its respective expected score than any of the other three are to their expected score But what if, for example, the Type A score (in the illus- tration used above) was 1.85, and the Type C score was 0.16? The Type A score is closer to 2.0 than the Type C score is to 0.0, but the margin of difference is only 0.01. That would seem to be slender evidence on which to assign a score of 1.85 rather than 0.16. The basic question is, of course, how frequently will these practical difficulties cast reason- able doubt on the assignment of type scores; or, more precisely, which of the two techniques will permit the assignment of type scores to a subject as his score on the test with the highest degree of confidence? When the study was begun it was not known which of the two scor- ing techniques would yield the best results; consequently, it was planned to use both methods for a preliminary sample of subjects, and choose the method which was most satisfactory. Statement of the Problem The problem was to determine if McQuitty's procedure for item selection, with a modification of scoring procedure, could be used to 22 develop a configural scoring technique that would produce a test that was superior to one that could be produced using the conventional pro- cedures of item selection and scoring. The instrument used to develop and test the proposed configural technique was the Generalized Situational Choice Inventory developed by Farquhar and associates2 as part of a research project sponsored by the United States Office of Education This inventory purports to measure one aspect of the academic motivation of senior high school students A configurally scored form of the GSCI was developed and compared with the original linearily scored form currently in use as part of the Michigan State M—Scales The Research Hypotheses The hypotheses tested in this study related to two important characteristics of any test: its reliability, and its validity One of the assumptions implied in both Meehl's original paradox and McQuitty's item selection technique is that there may be some test items that fail to differentiate two criterion groups, but which would differentiate a sub-group of the first group from a sub-group of the second group. If this is true, then it is reasonable to expect that lHereafter referred to as the GSCI. 2William W. Farquhar, Motivation Factors Related to Academic Achievement, U S. Office of Health, Education and Welfare, Cooperative Research Project #846, ER 9, Office of Research and Publications, College of Education, Michigan State University, East Lansing, Michigan, 1963, 506 pp. 3A Description of the GSCI and the Michigan State M-Scales may be found in Chapter II. 23 from a given pool of items more would be selected for a configurally scored test than for a linearly scored test. Because, in general, a test containing a large number of items has a higher reliability than a test containing a smaller number of items, the possibility arose that the configural form of the GSCI would have a higher reliability than the linear form. However, it was also recognized that the overall relia- bility of the configural form might actually be lower than the linear form Even though the individual configural tests might have higher reliabilities, the resulting pattern of type scores might be less reli- able. Small chance variations in the scores that constitute a profile could cause relatively larger variations in the appearance of the pro- files which, in turn, could result in very large variations in the scores assigned to individuals. It was hypothesized that the reliability of the configural form of the GSCI would be higher than the linear form. It was also hypothesized that the validity of the configural form would be higher than the linear form. This expectation arose out of the assumption that the configural form makes use of the "configural" infor- mation supposedly contained in the pattern of responses. If additional relevant information is available about an individual, it is reasonable to expect that a more accurate prediction of his behavior can be made. Statement of the Research Hypotheses Hypothesis 1: The reliability of the configural form of the GSCI will be higher than the reliability of the linear form. Hypothesis II: The validity of the configural form of the GSCI will be higher than the validity of the linear form 24 Organization of the Study The over—all plan of the dissertation is as follows: A descrip— tion of the GSCI, the method of identifying types, and the procedure that was followed are presented in Chapter II. In Chapter III a detailed description of how the configural scoring form of the GSCI was developed may be found. The results of the configural and linear scoring of the GSCI, the test of the hypotheses, and a discussion of the findings are presented in Chapter IV, while the summary, conclusions and implications appear in Chapter V. CHAPTER II INSTRUMENTATION AND PROCEDURE In this chapter the test instruments and the analysis procedure are described. The Generalized Situational Choice Inventory The instrument used to develop and test the proposed configural scoring technique was the Generalized Situational Choice Inventory 1 developed by Farquhar This inventory purports to measure one aSpect of the academic motivation of senior high school students. The items in the GSCI are described as follows: Items on this scale were constructed to describe the motivational situation. Students are required to make a forced choice between twc types of situations, one which depicts a high and one which depicts a low academic motivation situation. A high score on this scale indicates an individual who has a high need for academic achievement and would generally like the kind of task and activities that schools would value as part of the academic program. A low soore indicates individuals who choose activities disassociated from the school's program which did not necessarily pay off in their academic study. The GSCI is one of four scales which make up the Michigan State M-Scales, This study was concerned only with the form of the GSCI used for male students. From a pool of two hundred items, forty-five were selected as l Farquhar, 223 cit. 2 , Manual for Interpretation of the Michigan State M-Scales, duplicated manuscript, p. l. 25 26 significantly discriminating two criterion groups of male academic over- and underachievers.1 The procedure for identifying the over- and under~ achieving groups and for selecting the discriminating items is described in detail by Thorpe and is briefly summarized as follows:2 1. The original sample consisted of approximately 4200 male and female eleventh grade students in nine Michigan high schools. 2 Overachievers were defined as those whose actual Grade Point Average (GPA) was at least one standard error of estimate above a regression line of GPA predicted from the Differen- tial Aptitude Test-Verbal Reasoning (DAT-VR) score. Similarly, underachievers were defined as those whose actual GPA was at least one standard error of estimate below the regression line. The procedure resulted in identifying 171 male over- achievers and 137 male underachievers 3. These groups were randomly divided into validation and cross- validation groups consisting, reSpectively, of 88 over— and 66 underachievers, and 83 over- and 71 underachievers. 4. Items were selected which discriminated the over— and under— achievers in the validation groups at the .20 significance level, and which also discriminated the over- and under- achievers in the cross-validation groups at the .10 level. Forty-five items were selected from the pool of two hundred. A complete list of items may be found in Farquhar, pp. cit., pp 283-303. 2Marion D. Thorpe, ”The Factored Dimensions of an Objective Test of Academic Motivation Based on Eleventh Grade Male Over- and Under- achievers," (unpublished Doctoral Dissertation, Michigan State Univer- sity, 1961), pp. 8-13. 27 5. Responses to each item were weighted so that overachievers would receive a score of one, and underachievers a score of zero. A student's score on the GSCI was simply the number of items to which he responded in the predetermined overachiver's direction This linear scoring technique is exactly repre- sented by equation (1.1), page 3. The plan of this study was to use McQuitty's item selection tech— nique to select items from the original pool of two hundred, to construct a configurally scored form of the GSCI, and to compare the reliability and predictive validity of the test thus obtained with the reliability and predictive validity of the forty—five item, linearly scored form described above. Rank Order Typal Analysis The pattern-analytic technique used to identify the several types of over— and underachievers was the Rank Order Typal Analysis (ROTA). This technique had not yet been developed at the time McQuitty proposed his method of item selection; consequently, it was not one of the tech- niques he suggested for the purpose.2 More recently, however, McQuitty recommended ROTA as the most suitable technique to use for identifying types for purposes of item selection. A type is defined as a sub-category of n_individuals drawn from Louis L. McQuitty, "Rank Order Typal Analysis," Educational and Psychological Measurements, 1963, Vol. XXIII, pp. 55-61. Louis L. McQuitty, ”Item Selection for Configural Scoring," Educational and Psychological Measurements, 1961, Vol. XXI, p. 927. Louis L. McQuitty, personal communication. 28 a larger category in such a manner that each person in the sub-category is more like the other 271 persons in the sub-category than he is like any of the persons not in the sub—category. The procedure for identi- fying such sub-categories is as follows: 1. Calculate the agreement score of each possible pair of persons in the larger category. This score is the number of items to which response is made in the same direction by both persons. 2. Arrange the agreement scores in a matrix having identical rows and columns, one row and column for each person. The intersection of a particular row and column would contain the agreement score for the pair of individuals represented by the respective row and column. The matrix would be symmet- rical about the diagonal axis. 3. Separately convert each column of agreement scores into rank scores from most to least agreement, so that the pair with the highest agreement score would be assigned the rank of one, the next highest agreement score the rank of two, and so on. Obviously, the rank of one will always be assigned to the agreement of an individual with himself. The rank two will indicate the ”row" individual who is most similar to the ”column" individual. Similarly, the rank three will indicate the ”row" individual who is second in similarity to the ”column” individual. It should be noted that the matrix of rank orders will probably not be symmetrical about the diagonal axis. 4. 'For each "column” individual, select a series of sub—categories of individuals beginning with the ”row” individual most like 29 the respective "column" individual, and successively adding to the sub—category the next most similar individual until a sub-category containing one less than the total number of persons in the criterion category is reached. (It is useless to consider a category containing all individuals, inasmuch as a type consisting of all of the individuals in the criter- ion group would be of no value.) There would then be 2’1 sub-categories to be considered for each of the fl_persons in the group. For each " column” individual, display the set of 3-1 subcate- gories selected in Sept 4 by rearranging the rank order matrix of Step 2 in such a way that the second row corresponds to the person selected for the first sub-category, the third row cor- responds to the second person selected, and so on. (The first row will, of course, represent the ”column” individual him- self.) The first column would then contain the rank orders from one to 3-1 in ascending order. The second column would represent the same individual as the second row, the third column as the third row, and so on. The entries in these successive columns would be the rank orders corresponding to the intersection of the particular row and column in the original rank order matrix. There would then be n matrices of size 3-1 times Efl Examine the part of each matrix that corresponds, successively, to each of the sub-categories selected in Step 4. Begin by considering the four ranks defined by the intersection of the first two columns with the first two rows, then the nine ranks 30 defined by the intersection of the three columns with the three rows, then the sixteen ranks defined by the four columns and rows, and so on. If the matrix contains no rank higher than the number of persons in the sub-category, that sub- category qualifies as a type If the matrix contains a rank higher than the number in the category, the sub-category fails to qualify. The higher rank indicates that the partic- ular ”column" person is more like someone outside of the sub— category than he is like one or more of the individuals in the sub-category. This last step is illustrated in Table 2.1 which contains four columns and rows supposedly taken from a larger matrix. The matrix of Table 2.1 Rank Order Matrix of Several Sub-categories in which A is a Member (Hypothetical Data) v—v— v—v v AvMember g A B C D A l 2|4I4I t 0 I SB 2 1‘2‘3' ,0 I c c 21 C 3 3 1 ' 2 . D 4 4 3 ' the sub—category consisting of individuals A and §_does not contain any rank higher than two. These individuals are more like each other than they are like anyone not in the sub-category; consequently, the sub~ category qualifies as a type. The sub-category containing A, E, and Q 31 does not qualify because a rank of four is found in the matrix. This indicates that individual 9 is more like someone not in the sub—category than he is like individual é_who is in the sub-category. The A, E, g, and 2 sub-category again qualifies as a type, inasmuch as no rank greater than four is found in the matrix. Experience has shown that the first application of ROTA to any real psychological data might produce few, if any, types. Two factors may account for this situation: (1) the definition of what constitutes a type may be too strict for practical use, and (2) the data may contain a large number of items that bear no relationship to the presumed types and, thus, obscure the identification of the types that may be present. McQuitty suggested a solution to each problem First, the definition of a type may be relaxed, Under the origi- nal definition, a type is identified when no rank in the sub~category matrix is larger than the number of individuals in the sub—category. The definition may be relaxed to permit any particular number of excep- tions to the rule. If, say, three exceptions were allowed, then all ranks that were up to three greater than the number in the sub-category would be permitted. In effect, this relaxation of the restrictions would permit each person in the sub-category to have a higher degree of likeness with three persons outside the sub-category than he has with those within the sub—category. If no types were identified under the strict definition, a successively increasing number of exceptions could be allowed until usable types were found. The groups thus identified 1 Louis L. McQuitty, "Rank Order Typal Analysis," Educational and Psychological Measurements, 1963, Vol. XXIII, p. 59. 32 would, technically, be ”type-like” categories, rather than pure types. For convenience, these type-like categories will hereafter be referred to simply as types. Second, an item analysis, such as the Chi-square test of signi— ficance, can be applied to the items, and those which least differentiate the types can be eliminated. The typal analysis could then be repeated. The plan of this study was to use ROTA to identify types in the criterion groups of over- and underachievers. A computer program to accomplish this was written for the IBM 1620 II computer at the Sundstrand Aviation Corporation, Denver, Colorado. A computer program was also written to compute the item Chi—square statistic for all possible pairs of types that would be identified during the various stages of the analysis. Procedure The study proceeded through three distinct steps. For each step there was a separate sample of subjects drawn from Farquhar‘s original group of 4200. Step One: The Identification of Types The Rank Order Typal Analysis was applied to the criterion groups of over— and underachievers in order to identify the types in each group. The groups were the 171 male overachievers and the 137 male under— achievers previously described. Individuals were randomly eliminated to bring the number in each group to 135 in order to meet the restrictions of the ROTA computer program. The typing was done separately for each group. 33 It was not known in advance that types did, in fact, exist. No attempt was made to predict the nature of the types. In this reSpect, the study proceeded as an exploratory study assuming that ”types prob- ably exist; therefore, identify them.” Theoretically, if only one item existed that was answered in the ”true" direction by approximately half of the criterion group, that item could be the basis for differentiating types within a group. From a practical standpoint, of course, one item would seem to be an inadequate basis for type selection If, however, there were several such items, all related to the same characteristic, then those items or, more correctly, that characteristic would be the basis for the identification of types. It seemed reasonable to assume that such a group of items would be found in the original pool of items. A detailed description of the typing procedure is presented in Chapter III. Step Two: The Construction of the Configural Tests The types identified in Step One were used to construct the set of configural tests described in Chapter 1, pages 12-15. A configural test was constructed for each overachieving type paired with each under- achieving type. Items which significantly discriminated the types in a pair were selected for that particular configural test. The Chi-square test of significance was used. The level of significance was not set in advance, inasmuch as it was not known how the reliability of the overall test would be affected by the selection of items at various significance levels It was also not known in advance which of the two techniques of assigning overall scores to individuals—-the "pattern” or the ”expected score” technique--would yield the best results. Experimentation during 34 Step Two was intended to answer both questions. Part of the experimentation made use of the same sample used in Step One, that is, the 135 over- and 135 underachievers. After Step One, however, two subjects were eliminated from the overachieving group because they were not part of either of the overachieving types that were identified in Step One. A second sample of 120 males was also used. It consisted of 20 overachievers, 20 underachievers, and 80 "normals”. The over- and underachievers were drawn at random from the criterion groups of over- and underachievers used in Step One. The 80 "normals” were drawn at random from the original group of 4200 (after the over- and underachievers had been withdrawn). A detailed description of the experimentation which led to the selection of the significance level for item selection and the technique of assigning overall scores is presented in Chapter III. Step Three: The Test of the Hypotheses A general group of 256 males was used to test the hypotheses. It was composed of 32 overachievers, 32 underachievers, and 192 ”normals" The over- and underachievers were chosen at random from the criterion groups of over- and underachievers used in Step One. (None of the over- and underachievers in the second sample were included in this sample.) The "Normals” were selected at random from the original group of 4200. Both the configural form of the GSCI developed in Step Two and the linear form developed by Farquhar were used to assign scores to the individuals in the general sample. These scores were used to test the hypotheses F" 35 hypothesis Reggrding_Reliability;, Each of the two forms of the GSCI were divided into "split-halves" by assigning odd numbered items to one form and even numbered items to the other. This produced two linear and two configural split-half forms. Each subject's responses were scored using the four forms. The product—moment correlation of the two linear scores was computed, as was the correlation of the two configural scores The significance of the difference in the two correlations was tested using a technique presented by Walker and Lev. The procedure just outlined does not, of course, directly approach the problem of test reliability. It is limited to a consideration of split—half correlations. In the usual case, the reliability of a test can be estimated from the split—half correlation by using the Spearman— Brown Prophecy Formula. This formula, however, cannot appropriately be applied to the split—half correlation of configural scores. Configural scores contain a source of variance of unknown magnitude which is not taken into account in the formula. There is not, at this time, any way of estimating the reliability of a configural test from its Split-half correlation. Test reliability could be computed if, for example test— retest or equivalent form scores were available. Neither procedure, however, was used by Farquhar in developing the GSCI; consequently, a true reliability coefficient could not be computed. The implications of this limitation of procedure are discussed more fully in Chapter IV. Hypothesis Regarding Validity, Two predictions of Grade Point 1 Helen M. Walker and Joseph Lev, Stagistiealjlnferenee, (New York: Holt, Rinehart and Winston, 1953), pp. 255-6. 36 Average were made, one from a multiple regression equation using the full-scale linear GSCI score and the DAT-VR, and the other using the full-scale configural score and the DAT-VR. The coefficient of multiple correlation was computed for each of the predictions. The significance of the difference between the correlations was tested using Hotelling's technique as described by Walker and Lev.1 It was hypothesized that the correlation with the configural prediction would be higher; conse- quently, a one-tailed test was used at the .05 significance level The test of the hypotheses are described more fully in Chapter IV. Summary The GSCI was the instrument used to develop and test the proposed configural scoring technique. Farquhar and his associates selected forty-five items which significantly discriminated two criterion groups of over— and underachievers This was the linear form of the GSCI that was to be compared with the configural form developed in the study. The proposed procedure for selecting items for a configural form of the GSCI requires that each of the criterion groups be divided into two or more sub—groups or types based on their pattern of responses to the two hundred original items of the GSCI. The Rank Order Typal Analy~ sis was the method used to identify the types. The definition of what constituted a type was presented, and the general procedure by which types are identified was described It was anticipated that two modifi- cations to the procedure would be required: (1) the strict definition of what constitutes a type may have to be relaxed, and (2) the items which 1 Walker and Lev, op: £353, pp. 256-7. 37 least discriminate the various types that are identified may have to be eliminated and the typing procedure repeated. The procedure by which the configural form was developed and tested was described as consisting of three steps. In Step One, ROTA was used to identify the several types of over- and underachievers The configural form of the GSCI was developed in Step Two. Steps One and Two are described more fully in Chapter III. In Step Three, the hypo- theses regarding reliability and validity were tested. Step Three is described more fully in Chapter IV. CHAPTER III THE CONSTRUCTION OF THE CONFIGURAL FORM In this chapter the two steps in the construction of the con- figural form of the GSCI are described. The steps are: (l) the identi- fication of the several types of over- and underachievers, and (2) the canstruction of the configural tests. The Identification of Types The Rank Order Typal Analysis was used to identify types of over- and underachievers. The criterion groups were those over- and under- achievers identified by Farquhar and his associates as described in Chapter II. Individuals were randomly eliminated from the original groups to reduce the number in each to 135 in order to meet the restric— tions of the computer program. It was expected that two modifications 3f the general ROTA pro- cedure Would be made. First, if nc usable types were identified when the strict definition of a type was being followed, an increasing number of exceptions to the definition would be allowed until usable types were identified. Second, those items which least discriminated the types thus identified would be eliminated, and the procedure would be repeated. Certain difficulties enCcuntered during the typing procedure necessitated some additional modifications. The procedure finally used can best be presented by describing how it was developed. 38 39 The Typing Procedure The first attempt to use ROTA to identify types of overachievers failed to produce any types. The entire original pool of two hundred items were used, and no exceptions to the definition of a type were allowed. (The idea of "exceptions" is discussed on pages 31 and 32.) As an increasing number of exceptions were allowed, the number of types increased rapidly. When eight exceptions were allowed, twenty-four types were identified. At twenty exceptions, fifty—three types were identified. The types varied in size from two to twenty—seven An analysis of these types revealed that many of them were overlapping, that is, many types had one or more individuals in common. One type, for example, might be composed of individuals a, b, e, and g) while another type would be composed of g, d, e, and f:_ Individuals £_and 9 would be common to both types. Such a confusing array of overlapping types proved difficult to analyze. It was, therefore, decided to further modify the definition of what constitutes a type. A type was redefined as the total number of individuals in any two or more overlapping type- like categories Under this modification, the two overlapping types illustrated above would be reported as one type composed of individuals ‘a, b, e, d, e, and f. The computer program was rewritten to incorporate the modification. It was also decided that a type composed of only tWU individuals would not be reported, inasmuch as such a small type would be of little practical use. The first attempt to use the modified program to identify types of overachievers produced results that were less confusing, but on the surface, not much more usable. The most important finding was that there was only one main type. This type consisted of only four individuals 40 when one exception was allowed, and it gradually increased in size with each additional exception until, at forty exceptions, it contained almost all of the individuals in the criterion group. Several small types were identified at the earlier stages, but they were soon incorporated into the one main type. Two types, composed of three and five individuals, maintained their separate identity through the fortieth exception. The way the main type emerged is shown in Figure 3.1. The number of excep— tions allowed at each stage is shown on the horizontal axis The number of individuals in the type at each exception is shown on the vertical axis. It was hypothesized that the presence of a large number of non— discriminating items obscured the identification of types. If the number cf factors that overachievers have in common is large in comparison to the factors that may discriminate sub-groups, the common factors may dominate the typing procedure and produce only one main type. If, for example, there were ten items that could discriminate two types of over— achievers,their effect on the procedure would probably be diluted by the presence of the 190 non-discriminating items. The problem was how to use the information from the first application of ROTA to identify and eliminate the non-discriminating items For this purpose a new category of groups was considered. These groups were thought of as ”possible” types. A ”possible" type was defined as any group of three or more individuals who are added to an existing type when the number of exceptions is increased by one. Referring to Figure 3.1, at two excep- tions the main type is composed of five individuals; at three exceptions it is composed of nine. The four individuals that were added to the type were designated possible type B. The groups designated A, C, and D are other possible types. Twenty—five possible types were identified Number of Individuals in the Type 120- 110 100 90 80 7O 6O 40 3o » 20- rillgvull v VIITTIIIIIIIIIIIIII I 5 41 10 15 20 25 30 Number of Exceptions Figure 3.1 Development of the Main Overachieving Type (First Cycle). 35 .TIAlllllllllAnlllnnillinlllllllllllitlll 40 42 from the information provided by the first application of the modified ROTA program to the overachieving criterion group. The responses of each pair of possible types to each item were analyzed using the Chi-square statistic. There were 319 Chi-squares computed for each of the two hundred items. The number of times the Chi—squares exceeded an arbitrary value (6.0 in this first case) were counted for each item. The Chi— square value was chosen so as to produce a distribution with the widest possible range. Items were eliminated from the pool beginning with the item having the lowest count and proceeding upward in the distribution until, in this first case, eighteen items had been eliminated The typing procedure was then repeated using the remaining 182 items. The typing procedure consisted, then, of the following steps: 1. ROTA was applied to the items which remained in the pool at the end of the previous cycle. A successively increasing number of exceptions were allowed until almost all of the 135 individuals in the criterion group were included in the types that were identified. 2. Each ”possible” type was identified. 3. An item analysis of the reSponses of each pair of ”possible" types was made using the Chi-square statistic. 4. The number of times each item discriminated between the pairs of ”possible” types at some arbitrary level were counted. 5. The items which discriminated the least number of times were eliminated, and the entire procedure was repeated. The number of items eliminated at any one time was arbitrarily limited to from ten to twenty per cent of the items that remained in the pool. The procedure was carried out separately for each 43 criterion group The importance of adhering strictly to this procedure was illus— trated at one point during the typing of overachievers. During the ninth cycle, when thirty—one items remained, it seemed that four types were beginning to emerge. In order to speed up the typing process, these four types were used in the item analysis, rather than the more numerous "possible” types. The number of items was reduced to fourteen, and the procedure was continued as before However, during the eleventh cycle, when only eight items remained, all semblance of the four types disap— peared and only one large type remained The only recourse was to return to the ninth cycle and adhere strictly to the procedure as outlined. The Types that were Identified On the sixteenth cycle, when only seven items remained, two rather distinct types of overachievers were identified. Similarly, on the fifteenth cycle, when five items remained, two types of underachievers were identified The overachieving types, designated A and B, were com- posed of sixty-two and seventy-une individuals, respectively. The under- achieving types, designated C and D, were composed of sixty—eight and sixty—seven individuals, respectively. Two of the overachievers did not appear in either of the overachieving types All of the underachievers appeared in one of the underachieving types. The typing procedure developed for this study was used success- fully to identify types of over— and underachievers. In effect, the procedure succeeded in isolating a small number of items which appeared to be answered randomly by the individuals in each criterion group; but because they were answered more or less consistently by sub-categories 44 of individuals, those items provided the basis for the identification of types. During the typing process other types seemed tc be emerging. This fact suggests that there may be other groups of items which could also serve as the basis for identifying types. The procedure, however, appears to have isolated the largest group of such items. There is reason to believe that each of the four types could be further divided into smaller types on the basis of these smaller groups of items. If this more refined typing were attempted, each of the four types would be considered as a separate criterion group. Factors which Discriminate the Types It was not the purpose of this study to explore the general field of academic motivation. A preliminary analysis of the results of the typing procedure, however, produced some findings that could be of interest in that field. Of the two hundred items in the original pool, fifteen discriminated the overachieving types A and B at the .001 sig— nificance level Eight items similarly discriminated the underachieving types C and D. Seven items were common to both groups; that is, there were seven items that discriminated both the overachieving and under— achieving types. It appears that the same characteristic(s) which dif- ferentiates Type A and B also differentiates Types C and D. Only one of the seven items was included in the linear form of the GSCI. The items, therefore, would seem to represent a psychological characteristic of over— and underachievers that went largely unnoticed in the linear form. The seven items are shown in Table 3.1 Type A and C individuals tended to respond to the items in the same direction, while Type B and D individuals both tended to respond in 45 Table 3.1 Items which Discriminate Both Overachieving and Underachieving Types (.001 Level) Chi—square Item Number and Content Overs Unders I would prefer to: 18. a) Be thought of as being intelligent or 25.7 66.8 +b) Be thought of as being practical 27 +a) Receive one of several "A's" in a class, or 18.8 16.2 b) Receive the highest test grade and get the only ”A". 36. 3) Be able to do difficult things better than 32.7 33.5 other people, or +b) Be able to do difficult things just as well as other people. 38. 3) Be known as a person who can solve problems 55 2 15.2 better than anyone else, or +b) Be known as a person who can solve problems well. 57. a) Be known to my parents as an intelligent 18.1 78.8 person, or +b) Be known to my parents as a practical person. 7‘70. a) Be known as a person with much ability, or 22 3 10.8 +b) Be known as a person with adequate ability. 117. a) Be thought of as being smart, or 65.0 58.7 +b) Be thought of as being practical. + Indicates the usual response of Types A and C. * Item 70 is included in the linear form. 46 the opposite direction from A and C. Although a rigorous analysis of the items was not attempted, they seem to be generally related to the image of themselves that students wish to project to their peers and parents Type B and D individuals evidently want to be known as "smart" and ”being intelligent". On the other hand, Type A and C individuals would prefer to be known as ”practical" and having "adequate" ability. It should be noted that only Types B and C conform to the logical expectations for over- and underachievers. It is reasonable to expect overachievers to want to be known as intelligent and underachievers to be content to be known as having merely adequate ability. It is sur- prising, therefore, to find that about half of each criterion group does not conform to these expectations. Type A overachievers do not want to be known as intelligent, but would prefer to be known as having merely adequate ability. Type D underachievers have the same desires (in this reSpect) as is expected of overachievers: they want to be known as intelligent and not as merely having adequate ability. A number of useful hypotheses could be developed to account for the unexpected findings suggested above; however, it was outside the scope of this study to have attempted to do so. For possible future study, therefore, a table showing the Chi-square statistic for each item for all combinations of types is presented in the Appendix. The Construction of the Configural Tests The plan of the study was to construct a set of configural tests which would discriminate each overachieving type from each underachieving type. Based on the types that were identified, four tests would be con- structed, designated AC, AD, BC, and BD, the letters in their names 47 indicating the types that each would discriminate. The scores by an individual on these tests would be combined to produce a set of type scores designated A, B, C, and D. The Type A score would be the sum of the AC and AD scores, and the Type B score the sum of the BC and BD scores. Similarly, the Type C score is the sum of AD and BD, and the Type D score the sum of AD and BD. One of these four type scores would be assigned to an individual as his score on the configural test. Two questions remained to be answered: (1) which of the two tech- niques described in Chapter 1, pages 18-21, should be used to select one of the four type scores to be the individual's score on the test, and (2) at what significance level should items be selected for the config- ural tests? The plan was to construct a preliminary set of configural tests by selecting items at some arbitrary significance level Then, using the preliminary test, select the best technique for assigning test scores. Finally, experimentation would be undertaken to determine the best significance level for item selection. The Technique for Assigning_Test Scores The preliminary set of configural tests was constructed as follows: 1. The Chi-square statistic was computed for the responses to each item by each overachieving type paired with each under- achieving type. 2. An arbitrary significance level of 01 was set for item selection. Items which discriminated Types A and C at this level were selected for configural test AC. In a similar manner items were selected for tests AD, BC, and BD. The 48 number of items each test contained is as follows: Test AC, 16 items, Test AD, 25 items, Test BC, 67 items, Test BD, 30 items. 3. Items were weighted in each test so that the overachieving type received a score of one, and the underachieving type a score of zero In the discussion of the scoring rationale in Chapter 1, pages 17-18, it was assumed that each configural test would contain an equal number of items. Each test would have equal weight in determining the various type scores. As is indicated above, however, each test contained a different number of items. Because there was no‘a priori reason to believe that the significance of a test was proportional to the number of items it contained, it was decided to report all configural test scores as the per cent of items, rather than the number of items, answered in the predetermined direction. The responses of the 268 individuals in the four types, A, B, C, and D, were used to identify the unique pattern of type scores that would be characteristic of each over— and underachieving type. For example, the responses of each of the 62 Type A individuals were scored separately for each configural test, and the mean score on each config- ural test by the Type A individuals was computed. These mean configural test scores were combined, as described above, to produce a set of mean type scores for Type A individuals. In a similar manner, the mean con- figural and type scores were produced for each of the other types. These mean scores as shown in Table 3.2, and the mean type scores are shown graphically in Figure 3.2. It was observed that the patterns of type 49 scores bear some similarity tc the type score patterns for the hypothet- ical data shown in Figure 1.1, page 19. Table 3 2 Mean Configural Test and Type Scores for all Types (Preliminary Form) Ideal Configural Test Type Type n AC AD BC BD A B C D Overachieving Types Type A 62 79 2 75 O 63.1 75 0 154.2 138.1 142.3 150.0 Type B 71 83.3 54.9 79.8 86.7 138.2 166.5 163.1 141.6 Underachieving Types Type 0 68 53.7 67.1 50.2 62.0 120.8 112.2 103.9 129.1 Type D 67 70.7 46.4 66.4 53.4 117.1 130.9 137.1 110.9 The Expected Score Technique. The expected score technique for determining which of the four type scores should be assigned to an individual as his overall test score was described in Chapter I, pages 20—21 The technique makes use of the fact that the type score that corresponds to the type designation of any particular individual has a predetermined perfect or expected value. Referring to the illustration used in Chapter I, the Type A score of a perfect Type A individual would be 2 O No other type individual is likely to have a Type A score that is even close to 2 0 Or to state it in another way, only a Type A individual could have a Type A score that was close to 2.0. The pro— cedure for using this technique would be to determine which type score was closest to its respective ideal or expected score, and assign that score to the individual Score 170 160 150 140 130 120 110 100 50 Type A — ’1‘ I” \ ‘ ’ \ I’ \ I I \ I \ I, ‘\ Type D ‘\\.’ \ — \ \ \ \ \ \ \ W 0 Expected Scores A B C D Scoring Type Figure 3.2 Pattern of Mean Type Scores for All Types Data from Table 3 2. 51 For purposes of assigning scores from the actual test data, the expected type scores were defined as the mean type score produced by the individuals of the corresponding type designation. For example, the expected Type A score was the mean Type A score produced by the 62 Type A individuals in the original criterion group. The expected type scores for the preliminary form of the configural test were as follows: Type A, 154.2 (154), Type B, 166.5 (167), Type C, 103.9 (104), Type D, 110.9 (111). These scores are shown on Figure 3.2 by large black circles. Theoretically, that type score would be selected which had the least absolute difference with its respective expected score. However, because the expected scores were mean scores, rather than some theo— retical maximum or minimum scores, provision had to be made for those cases when, for example, a Type A or B score was actually greater than its expected score, or a Type C or D score was less than its expected score. In order to accommodate these situations, the difference between type and expected scores were produced as follows: Types A and B, type score minus expected score, Types C and D, expected score minus type score. The type score having the largest algebraic difference was assigned as the test score. For example, if the set of type scores for an individual were 156, 175, 118, and 108 for A, B, C, and D, respectively, the dif— ferences would be 2, 8, —l4, and 3. Because the Type B difference is the largest (algebraically), the Type B score of 175 would be the individual's configural test score. When the above procedure was applied to the responses of the 268 individuals in the original criterion groups, there were five indeter— 52 minate cases. In two cases, the Type A and B scores had the same dif— ference; in three cases, the Type C and D scores had the same difference. It was arbitrarily decided to select the highest score in the first two cases, and the lowest score in the last three cases. Frequency distri- butions of the scores assigned to the individuals in each of the four types were produced. The distributions for Types A and B are shown graphically in Figure 3.3, and those for Types C and D in Figure 3.4. The most striking characteristic of the distributions is that each of them is bimodal. The question was whether the data was, in some psychological sense, truly bimodal, or whether the unusual distribution had been artificially produced by the scoring technique. There did not seem to be a psychological explanation to recommend the first alterna- tive; on the other hand, there was a reasonable explanation to support the second. Consider, for example, the Type B pattern of type scores shown in Figure 3.2 as being a fixed geometric shape that could slide up and down on the scales. When the shape is high on the scales, the Type B score would be nearest to its respective expected score and would, therefore, be assigned. As the shape moves down the scales, a point would be reached where the Type B and Type D scores would be equidistant from their reSpective expected scores. Below that point, the Type D score would be selected This situation is illustrated in Figure 3.5. The various expected scores are indicated by the small circles. The solid line represents the Type B pattern and is positioned at the Type B and D equidistant point. An individual with a Type B pattern would be assigned either a high score (Type B) or a low score (Type D) depending on where the pattern was positioned on the scales There would be little chance, however, for some intermediate score to be assigned An arti- 53 .wsuficcooH mu0em vmuoonxm xn pwcwwem< mascom .meazH wcfi>mwnuwuo>o now mmuoom vmcwwmm¢ mo GOwudnwuumwm zoomskum m.m wuswwm ova oucom pmdwwmm< MManc ONH HNHC NQHC _ mug/353335 :< ”"0 .m mamH O O as 69AM. OIIIIIO oo OH om om oq om SIBHPIAIPUI JO leqwnN J .oDVwcnomH muoom pmuomoxm ma vmcwwmm< mmuoom .mmaxH wcfi>wwcomu6p53 pom mouoom pmcwwmm< we cofiusnfluumwn xucwsvmum ¢.m ouswwm ouoom pocwwmmd oom owH 00H qu ONH 00H cc 4 _ _ m/ a _ l _ 0 ll. ’/ o / lidr /. I, 1 I ./ 1 41/ o /o\ I I1 1 \\ 1 I l. - _ _ 1 m3": .mgoifisomumpcs :< OHO menu .9 $93. I wen: u we? Giulio In A. _ _ _ _ 1 OH om om oq om stenpiAtpul go JaqmnN Score 170 160 150 140 130 120 110 100 55 ._.._--.---1 . .. 1 1- — :— 891 i E 1 I : i ! —- —. ——— — --;--~-+~ "Difference - -- -- 4'- ; of B sdore . i from Expected J A0 i f l J I l ———-—-— . _§+_...______._.___._ T I I _ e__m.__ ”h __. é . . . -i -~ 2. Difference ' ' ; . of D score 3 E from Expected J I t: 1 I _ ; CD C . ___. f L. i ..n .p I : . C) E*pected Scorés : 1 I l A B C D Scoring Type Figure 3.5 Type B Score Pattern Positioned to Show Type B and D Scores Equidistant from Expected Scores S6 ficial discontinuity was, therefore, introduced into the scoring system. This finding raised a serious doubt that the expected score technique should be used. An important implication of the above finding was that the expected score technique failed to accurately identify the type category of some of the individuals in the criterion groups. A comparison was made between the type scores assigned to individuals and the type category to which they were known to belong. It was found that ninety-seven incorrect assignments had been made. The Pattern Technique. The pattern technique for assigning test scores is described in Chapter I, pages 18-19. It makes use of the fact that there is a unique pattern of type scores for each ideal type individual The patterns for the four types found in this study are shown in Figure 3.2, page 50. The procedure was to examine the set of type scores for each individual, determine to which ideal pattern it is most similar, and assign the corresponding type score as the configural test score In most cases the patterns could easily be identified; however, in about one-quarter of the cases some doubt existed. It was necessary, therefore, to establish a rule that would be used in the doubtful cases. One of the characteristics of the ideal patterns is that for the over- achieving types the absolute difference between the Type A and B scores is larger than the absolute difference between the Type C and D scores. Similarly, for the underachieving types the difference between the Type C and D scores is larger than the difference between the Type A and B scores. In the doubtful cases it was necessary to calculate the abSUlute 57 difference between the two overachieving and the two underachieving type scores. If the overachieving type score difference was the larger, the individual was assumed to be an overachiever and was assigned the higher of the two scores. If the underachieving type score difference was the larger, the individual was assigned the lower of the underachieving type scores. The type score patterns of the 268 individuals in the criterion groups were analyzed according to the procedure described above. In six cases the absolute differences between the two overachieving and the two underachieving types were equal. In these cases the expected score technique was used to assign scores. The frequency distribution of assigned scores for the overachieving types is shown in Figure 3.6, and for the underachieving types in Figure 3.7. It can be seen that the distributions are not bimodal as they were when the expected score tech- nique was used to assign scores. In order to answer the question as to how successful the pattern technique was in correctly identifying the type category of the indi- viduals, a cross—tabulation was made of the type assigned by the scoring technique versus the type category to which it was known the individual belonged. The tabulation is shown in Table 3.3. The correct assign- ments are those shown on the diagonal axis. There were ninety—two incorrect assignments. More than half of the errors (48) were caused by difficulty in correctly distinguishing the Type A's from the C's, and the Type B's from the D's. This finding was not surprising. The ideal patterns for Types A and C are quite similar, as are the patterns for Types B and D . Small chance variations in the configural test scores could easily result 58 .msvficcume cuouumm zn vocwwmm< umuoom .mwaxe msw>mwsomuo>o you mwuoom cmcwamm¢ mo dowusnwuumwm xocmnvoum o m wudwwm muoom pmcwwwe< OON oma 00H OQH ONH 00H cm 00 I . _ // mmauc .mpw>wwcomum>o :< Dunno :2 .m 25 oIIIo None .4 33. 0.3.16 -1 l .. .. __ _ Ce OH ON om Cd sIenpiAipul go JoqmnN 59 .oswwcnomfi cumuumm >3 pocwwmw< mvuoom .mmaxH wcH>owcowprcD pom mwuoom pmcwflmm< mo cowusnwuumwa mucosvmum m.m muswwm muoum Uwcwwme< CON owH 00H 06H ONH OOH ow 00 Cd 4 . a _ _ _ _ a o I 1 OH - - ON I. o O . Om” - . _ i 3 —o mman .mpo>mw£omumwcb :< QIILHHO NONE mwuc .Q whim. Ollllb .u whim Ollllno SIBnpiAipuI 5c laqmnN 60 in a Type C pattern being produced by a Type A individual, and vice versa. The need for highly reliable configural tests are demonstrated by these errors . Table 3.3 Cross Tabulation of Assigned Type Versus Known Type (Preliminary Form) Known Assigned Type Type A B C D Total A 45 6 7 4 62 B 8 54 3 6 71 C 18 5 36 9 68 D 5 l7 4 41 67 Total 76 82 49 61 268 The pattern technique was somewhat more successful than was the expected score technique in correctly identifying the type category to which it was known individuals belonged. In addition, the pattern tech- nique did not produce the scoring discontinuity that was evident when the expected score technique was used. It was, consequently, decided to adopt the pattern technique as the technique for assigning overall configural test scores The Significan e Level for Item Selection Items were selected for the preliminary form of the configural tests at the .01 significance level. The question remained as to whether or not this level was the most suitable. It might be that tests composed of items selected at some other level would more accurately identify the 61 type category of individuals or be more reliable. In order to answer the question, two additional forms were constructed with items selected at the .025 and .05 levels The forms were designated the first, second, and third forms, respectively. The number of items each configural test in each form contained is shown in Table 3.4. Table 3.4 Number of Items in Each Form of the Configural Tests Configural Test Form Level AC AD BC BD l .Ol 16 25 67 30 2 .025 36 42 80 48 3 .05 43 56 95 67 The decision as to which of the three forms was the best was to be made on the following criteria: (1) the form that most accurately identified the type category of the individuals in the original criterion groups, and (2) that also had the highest correlation between overall scores produced by split-half sub-forms of each form. The second cri— teria, of course, is related to the reliabilities of the various forms. It was recognized that a decision as to which of the three forms was the best would not necessarily answer the question as to what was the best significance level for item selection. If, for example, the first form (.01 level) was best, it would still be possible that a form with items selected at the .005 or .001 level would be better 0n the other hand, if the third form (.05 level) was best, it would not be known whether or 62 not the .10 or .20 levels would be even more suitable. If either the first or third form was found to be the best of the three, the plan was to construct and test additional forms with items selected at either a higher or lower significance level. It would be concluded that any form was actually the best only when it was bracketed between less suitable forms having items selected at a higher, and a lower, level. The responses of the 268 individuals in the four types were scored using each of the three forms of the configural tests. Overall test scores were assigned using the pattern technique. The numbers of times each form correctly identified the type category of the individuals were counted. The results are shown in Table 3.5. The first and third forms were about sixty—five per cent accurate; the second form was most acccu- rate having correctly identified the type categories of seventy-five per cent of the individuals. Table 3.5 Number of Correct and Incorrect Identification of Types Made Using Each Form of the Configural Tests Test Form Identification l 2 3 Total Correct 176 201 170 547 Incorrect 92 67 98 257 Total 268 268 268 A sample of 120 subjects was used to examine the split-half Cur- relations of the three forms. This sample was described in Chapter II, page 34, and consisted of twenty overachievers, twenty underachievers, 63 and eighty ”normals". Each of the three forms was divided into odd and even halves. The responses of each subject were scored using each pair of split-half sub-forms of each form. Overall test scores were assigned using the pattern technique. Product-moment correlations were computed for both the separate configural tests, AC, AD, BC, and BD, and the assigned overall test score for each pair of split—half sub-forms of each form. These correlations are shown in Table 3.6. Although the corre- lations for the configural tests of the second form were not uniformly higher than they were for the other forms, the split-half correlation for the overall test scores was higher. It was reasonable to conclude, therefore, that the reliability of the second form was better than the reliability of the other forms. Table 3.6 Split-half Correlations for Configural Test and Overall Test Scores for Each Form of the Configural Tests Configural Test Overall Form AC AD BC BD Test 1 .52 .56 .84 .71 .44 2 .60 .66 .80 .79 .50 3 .58 .61 .85 .81 .41 The Second form most accurately identified the type category of individuals in the original criterion groups. The second form also had the highest split-half correlation of overall test scores. In addition, this form was bracketed by less suitable forms with items selected at a higher, and a lower, significance level. It was concluded, consequently, 64 that 025 was the most suitable significance level for item selection for the configural form of the GSCI, and that the second form should be used in the testing of the hypotheses. The mean scores produced by the indi- viduals in the original criterion groups on this form are shown in Table 3.7. _v _7 Table 3 7 Mean Configural Test and Type Scores for All Types Final Form Ideal gfiConfigural Test __ ff Type Type n AC AD BC BD A B C D Overachieving Type 0 Type A 62 76.6 76.5 62.3 72.7 153.1 135. 138.9 149.2 Type B 71 78.1 60.2 78.5 83.2 138.3 161.7 156.6 143.4 Underachieving Type Type C 68 55.1 68.6 51.0 60.7. 123.7 111.7 106.1 129.3 Type D 67 67.4 52.4 65.6 63 2 119.8 128.8 133.0 115.6 No of Items 36 42 80 48 65 92 88 69 The pattern of types scores for ideal types is about the same as the patterns shown in Figure 3.2, page 50. In the discussion of the hypotheses in Chapter I, page 22, the expectation was expressed that from a given pool of items more would be selected for a configurally scored test than for a linearly scored test. This expectation was realized. From the original pool of 200 items, 111 were selected for one or more of the configural tests. This number is in contrast tot:he 45 items that were selected for the linear form. 65 Summary The Rank Order Typal Analysis was used to identify types of over— and underachievers The two groups used in the typing procedure con- sisted of 135 overachievers, and 135 underachievers, chosen at random from the slightly larger criterion groups identified by Farquhar and his associates The procedure was used to identify a small number of items which appeared to be answered randomly by the individuals in each cri- terion group but which were actually answered more or less consistently by sub-categories of individuals within the groups. Those items, there- fore, provided the basis for the identification of types. Two types of overachievers and two types of underachievers were identified. The types were designated A, B, C, and D, respectively. In general, the items which discriminated Types A and B also discriminated Types C and D. A preliminary configural form of the GSCI was constructed. It consisted of four configural tests designated AC, AD, BC, and BD, the letters in their names indicating the types each would discriminate. Items were selected for these tests which discriminated the various types at the .01 level of significance. This preliminary form was used in answering the following questions: (1) which of the two proposed tech- niques of assigning overall test scores should be used, and (2) at what significance level should items be selected for the configural tests? In order to answer the first question, the responses of the individuals in the criterion groups were scored for the preliminary form. The con- figural test scores were combined in such a way as to produce a set of type scores designated A, B, C, and D. One of these type scores was to be selected as the individual's overall score for the configural form 66 Overall scores were selected using both the expected score and pattern techniques. The frequency distributions of the scores that were assigned by the two techniques were compared. There was a separate distribution for each of the four types. It was found that the distributions produced by the expected score technique were clearly bimodal, indicating an arti— ficial discontinuity in the scoring system. The distributions produced by the pattern technique were not bimodal. The two techniques were further compared by determining the accuracy with which each was able to identify the type category to which the individuals were known to belong. It was found that the pattern technique was slightly more successful in making correct identifications. Because of these two findings, the pattern technique was selected as the best technique for assigning over- all test scores. In order to answer the second question, two additional configural forms were constructed. Items for these forms were selected at the .025 and .05 levels The responses of the individuals in the criterion groups were scored for the additional forms The accuracies with which the three forms correctly identified the type categories of the individuals were compared. The .025 level form produced the largest number of correct identifications. The responses of a second sample of individuals were scored using split-half sub-forms of the three forms. The .025 level form had the highest split-half correlation of overall test scores It was concluded, therefore, that .025 was the best significance level for item selection, and that the second form would be used in testing the hypotheses. CHAPTER IV THE TEST OF THE HYPOTHESES In this chapter the testing of the two hypotheses is described, and the results of the testing are discussed, Two forms of the Generalized Situational Choice Inventory were to be compared. The first was the linear form developed by Farquhar and his associates; the second was the configural form developed in this study, the construction of which was described in Chapter III. It was hypothesized that the comparison would demonstrate that the configural form was superior to the linear form in two respects: (1) the configural form would have a higher reliability, and (2) it would have a higher pre- dictive validity than the linear form. The sample used in the comparison was described in Chapter II, page 34, and consisted of 32 overachievers, 32 underachievers, and 192 ”normals” The Hypothesis Regarding Reliability The expectation that the reliability of the configural form would be higher than the reliability of the linear form was based primarily on another expectation, namely, that from the original pool of 200 items more would be selected for a configurally scored test than for a linearly scored test. As was noted in Chapter III, page 64, this expectation was realized. That is, the number of items used to produce the Type A, B, C, and D scores was 65, 92, 88, and 69, reSpectively. The number of items used to produce any one of the type scores is greater than the number of 67 68 items in the linear form (45). Unfortunately, the question of reliability could not be directly approached with the data available for this study. The data used in the study was that which was collected by Farquhar and his associates for their study of academic motivation.l Their procedure did not include a re—test with the same form, or the administration of equivalent forms of the GSCI; consequently, reliability coefficients must be estimated from the data obtained from a single administration of the inventory. Two methods are commonly used to produce such estimates: (1) Hoyt's Analysis of Variance, and (2) the Spearman-Brown Prophecy Formula. Neither of these methods could appropriately be used to estimate the reliability of the configural form. .The set of configural scores contains a source of variance that is not found in a set of linear scores, namely variance caused by the process of selecting one of four type chres as the overall test score. This variance is related to the chance variations in the patterns of type scores. Neither of the two methods of estimating test reliability take this source of variance into account; consequently, neither of them could be used in this study. The only evidence for test reliability that could be presented was simply the correlation of split-half scores; consequently, the null hypothesis related only to these correlations. 1William W. Farquhar, Motivation Factors Related to Academic Achievement, U.S. Office of Health, Education and Welfare, Cooperative Research Project #846, ER 9, Office of Research and Publications, College of Education, Michigan State University, East Lansing, Michigan, 1963, 506 pp. 69 The Statistical Hypotheses The null hypothesis was as follows: There is no difference between the product-moment correlations of split—half scores for the configural and linear forms of the GSCI. The alternate hypothesis was as follows: The product-moment correlations of the split-half scores will be significantly higher for the configural form than for the linear form of the GSCI The Test Procedure Each form of the GSCI was divided into odd and even halves, and the reSponses of the subjects in the sample were scored using each Split— half of each form. The product—moment correlations of the split-half scores of each form were computed. The Split-half correlation for the configural form was .5570, and for the linear form it was .7195. The significance of the difference between these correlations was tested using a technique presented by Walker and Lev.1 The statistic E was computed from the following equation: where zr = %log This statistic has an N(0,l) distribution. The rejection region, for a one-tailed test at the .05 significance level, was z) 1.645. A §_of Helen M. Walker and Joseph Lev, Statistical Inference, (New York: Holt, Rinehart and Winston, 1953), pp. 255-6. 70 —3.13 was computed. The null hypothesis was rejected. Inasmuch as the correlation for the configural form was lower than it was for the linear form, the alternate hypothesis was also rejected. It was concluded that the correlation for the configural form was less than the correlation for the linear form. Discussion of the Result As was noted above, the question of reliability could not be answered from the available data. The data does, however, offer some indirect evidence for reliability. The reliability of the full-length linear form was estimated from the Spearman—Brown Prophecy Formula to be .84. The reliability of the configural form was estimated in the same way to be .71. This latter figure, however, is not a valid estimate. The Spearman-Brown Prophecy Formula provides an estimate of what the cor- relation between two sets of scores would be if a test were used which contained a different number of items than was actually used. It cannot, however, estimate the change in the selection procedure variance that might result from using the full-length configural test rather than the split-half tests It provides only part of the information needed in order to make an accurate estimate of the configural test reliability. The variance caused by the selection of type scores was a signif- icant factor in lowering the split—half correlation of the configural form The split—half correlations of the four type scores were .64, .86, .76, and .67 for Type A, B, C, and D, respectively. These correlations are in the same order of magnitude as the linear test correlation (.7195). If the overall configural scores were thought of as being merely a col— lection of various type scores, without the intervening variable of the 71 :election process, it would be reasonable to expect the overall config- ural score correlation to be at about the mid-point of the four type score correlations. This mid-point is about .73; the split-half cor- relation was much lower (.557). The difference is due primarily to the selection procedure variance The reliability of the type score selection procedure was, obvi- ously, of crucial importance. One measure of this reliability was obtained by comparing the type identifications that were made with each split-half form for each individual. These identifications are the type categories which correspond to the type scores selected as the configural test scores. A cross tabulation was made of the first versus the second identification of each individual, for each split-half form. The cross- tabulation is presented in Table 4.1. The consistent identifications, Table 4;£ Cross-Tabulation of Type Identifications Made Using Split-half Forms of thc Configural Test First - Second Type Type A B C D Total A 23 3 33 2 61 B 19 29 9 38 95 C 15 4 15 6 40 D 17 ll 16 16 60 Total 74 47 73 62 256 that is, when the two identifications coincided, are shown on the diag- onal axis. There were only 83 consistent identifications out of the 256 72 cases. If the reliability coefficient for the type score selection procedure were defined as the prOportion of consistent identifications, it would be only .32. In order to further explore the effect of the selection procedure reliability on the split-half correlation, two additional correlations were computed. The 83 cases in which consistent identifications were made could be thought of as a sub—sample for which the selection proce— dure reliability would be 1.0. The remaining 173 cases would constitute a second sub—sample for which the selection procedure reliability would be 0.0. A comparison of the split-half correlations computed for these two samples would give some indication as to how great the effect of the selection procedure reliability could be. The correlation for the first sample was .88, and for the second sample it was .55. Obviously, the selection procedure reliability is an important factor in determining the configural test reliability. The reliability coefficient of the configural form of the GSCI could not be computed, or estimated, from the data available for the study. It is, however, possible to specify two values, a lower and an upper lfiMt, between which the reliability coefficient will probably lie. The upper limit is the reliability coefficient that was estimated from the configural score split-half correlation. This estimate was .71, and was made from the Spearman-Brown Prophecy Formula. This value would be the reliability coefficient if the selection procedure reliability remained the same for the full-length form as it was for the split-half forms (.32). Inasmuch as it is highly probable, however, that the selec- tion reliability would be improved because of the improved reliabilities of the full-length type scores, it is also highly probable that the 73 reliability of the configural form is greater than .71. The upper limit is a reliability estimated from a theoretical split-half correlation, namely, the probable correlation if the selection procedure was not a factor, i.e., a selection procedure reliability of 1.0. Because the configural scores are composed of scores taken from the set of type scores, the configural score split-half correlation would probably be at about the mid-point of the correlations for the various type scores (assuming the selection procedure contributes no variance). This theo- retical split-half correlation was previously estimated to be about .73. The reliability estimated from the Spearman-Brown Prophecy Formula is .845. This value is what the reliability coefficient would probably be if the selection reliability were 1.0. Inasmuch as it is unlikely that the selection reliability would be as high as 1.0, it is also unlikely that the configural reliability would be as high as .845. In view of the above discussion, it is reasonable to say that the configural form reliability coefficient is greater than .71 but less than 845. The upper limit is about the same value as the estimated linear form reliability coefficient (.84). Inasmuch as the configural relia— bility is probably less than the upper limit value, it follows that the configural reliability is probably less than the linear reliability. The Hypothesis Regarding Validity The expectation that the validity of the configural form of the ISCI would be higher than the validity of the linear form is based on the assumption that a configural form would be able to identify, and make use of, the ”configural” information supposedly contained in the pattern of type scores. If additional relevant information is available about an 74 individual, it is reasonable to expect that a more accurate prediction of his behavior can be made This assumption was, evidently, valid. The configural form of the GSCI developed in this study did identify potentially relevant information about academic over- and underachievers, namely, that there are two types of overachievers and two types of under- achievers. And, the configural form did make use of this information by means of the four configural tests constructed so as to discriminate each type of overachiever from each type of underachiever. The expectation that the configural validity would be higher than the linear validity was, therefore, reasonable. The instruments that were to be compared, that is, the two forms of the GSCI, were both measures of one aspect of academic motivation. One way of comparing the predictive validity of the two forms would be to determine which form, in conjunction with a measure of academic aptitude, would most accurately predict the academic achievement of the individuals in the sample. The plan was to compare the coefficients of multiple correlation produced when the individuals' grade point averages were predicted from, first, their linear GSCI and Differential Aptitude Test-Verbal Reasoning scores, and, second, their configural GSCI and DAT-VR scores. The Statisticalnypotheses The null hypothesis was as follows: There is no difference between the coefficients of multiple correlation produced when GPA's are separately predicted from the linear and configural GSCI scores, each in conjunc- tion with the DAT—VR scores. Ihe alternate hypothesis was as follows: The coefficient of multiple correlation will be greater when 75 GPA's are predicted from the configural GSCI scores than it will be when GPA's are predicted from the linear scores, each in conjunction with the DAT-VR scores. TEE,TeSt Procedure The responses of each of the individuals in the sample were scored using the full-length forms of both the linear and configural GSCI. The subjects‘ eleventh grade GPA's and DAT-VR scores were obtained from the data in Farquhar's study. Calculations were made to produce the beta weights, coefficients for the multiple regression equations, and the coefficients of multiple correlation. The results are summarized in Table 4 2. i ffi Table 4.2 Beta Weights, Multiple Regression Equation Coefficients, and Coefficients of Multiple Correlation for the Linear and Configural Forms of the GSCI 1 ‘—v Beta Weights Equation Multiple (Normalized) Coefficients Correlation Form DAT GSCI A DAT GSCI Coefficient Linear 5212 .2862 1.178 .0429 .0293 .6847 Configural .5115 .3093 .951 .0422 .0083 .6931 It can be seen from Table 4.2 that the coefficients of multiple correlation are very nearly the same. The significance of the difference was tested using Hotelling's technique as described by Walker and Lev.1 The statistic £_was computed from the following equation: vwfi—v 1 Walker and Lev, 22- cit., p. 257 76 XZ yz 2 2 2 2(1 - r - r - r + 2r r r xy xz yz xy xz yz where N is the sample size, 5 and y refer to the GPA predictions by the two multiple regression equations, and §_refers to the actual GPA. This statistic has a t(253) distribution. For a one~tailed test at the .05 significance level, the critical region for rejecting the null hypothesis was t) 1.645. A value of .7121 was computed for £3 The null hypothesis was accepted, and the alternate was rejected. It was concluded that there was no difference between the two correlations and, consequently, no difference in the predictive validities of the two forms of the GSCI. The Discussion of thefiResult The conclusion of no difference reported above raises a number of questions. For example, did the configural technique actually identify any significant "configural” information as was earlier supposed? Or, if configural information was identified, was it accounted for in the scoring system in such a way as to modify the GSCI scores? Or again, was there, indeed, any significant amount of "configural” information hidden in the test responses? These and similar questions may be sum- marized by asking if the conclusion of no difference was the result of some inadequacies in the configural technique developed in the study, or was it produced by factors outside the control of the investigator? In an attempt to answer the above question, the intercorrelations among the four configural tests, AC, AD, BC, and BD, and the linear test were computed. These intercorrelations are presented in Table 4.3. It 77 will be observed that of the correlations of the linear test with the four configural tests three of them are relatively high, and one (with AD) is remarkably low. It can be further observed that the intercor- relations among three of the configural tests (AC, BC, and BD) are, again, relatively high, but the intercorrelations of the three with the fourth (AD) are very low. 4Table 4.3 v—v v—r W—V Intercorrelations Among the Linear and Configural Tests L7 7 L11 L v v—v f vv—V v—yfi Vfi—v v'v—VVT w Configural - Correlation with Test Linear AD BC BD AC .864 .227 .823 .839 AD , .258 -.l72 .150 BC .828 .859 BD .902 7 fl Fr fi It is reasonable to assume that if the correlation between two reasonably reliable tests is high, the tests are measuring the same or closely related qualities or characteristics; conversely, if the cor- relation is low, the tests are measuring significantly different charac- teristics. The intercorrelations among the linear test and configural tests AC, BC, and BD are all higher than .80; consequently, it can be assumed that they are measuring essentially the same characteristics. On the other hand, the correlations between configural test AD and the other four tests are quite low Obviously, test AD measures some dif- ferent characteristics than the other tests. In the discussion in Chapter III, pages 44-46, of the factors 78 which discriminate the various types of over- and underachievers, it was noted that in some respects Types A and D are the exact opposite from what one would expect of over- and underachievers. Although Type A individuals are overachievers, they would prefer to be known as having merely "adequate" ability and being "practical". On the other hand, Type D individuals are underachievers, but would like to have the reputa- tion of being "smart" and ”intelligent" These two facts are, in essence, the "configural” information identified in the study. Obviously, the configural test which discriminates Types A and D must be measuring this "configural” characteristic, as opposed to the other three configural tests which are, in a sense, merely alternate forms of the linear test and which, in general, measure the overall characteristics which discrim— inate overachievers and underachievers One configural test, AD, is different from the linear test because it makes use of the configural information identified in the study; three configural tests, AC, BC, and BD, are not essentially different from the linear test, because they do not make use of the configural information. The observation that three of the four configural tests were not essentially different from each other, or from the linear test, is con— trary to the logical expectations for those tests. If there are, indeed, two distinct types of over— and underachievers, one would expect to find that the psychological dimensions along which any one pair of types are discriminated would be somewhat different from the discriminating dimen- sions for any other pair of types. For example, Types B and C are dis- criminated along certain dimensions The dimensions along which Types B and D would be discriminated should, in some respects, be significantly different. This should be true because Types C and D are significantly 79 different from each other and, consequently, each should reveal different kinds of differences from Type B. Unfortunately, this expectation was not realized. Tests BC and BD seem to have been constructed along the same dimensions; the unique differences between Types C and D are not represented in the two tests in any significant way. An explanation for the observations reported above may be found by reviewing some of the theory out of which items were written for the GSCI. Farquhar and his associates hypothesized that over- and under- achievers would be discriminated along three bi-polar dimensions.l These dimensions are summarized in Table 4.4. Two hundred items describing situations logically related to the extremes of the polar dimension theory were constructed. Because it was intended that the items should be logically related to the dimensions stated in Table 4 4, one cannot expect other important dimensions to magically appear in the pool of items _~ Table_4.4 _fi Summary of the Polar Theory of High and Low Academic Achievement Motivation Used in Constructing the GSCI High Academic Low Academic Achievement Motivation Achievement Motivation 1 Need for Long-Term Involvement 1. Need for Short—Term Involvement 2. Need for Unique Accomplishment 2 Need for Common Accomplishment 3. Need to Compete with a Maximal 3. Need to Compete with a Minimal Standard of Excellence Standard of Excellence l Farquhar, gp_ cit. pp. 3—100 80 There was one dimension that did, in a sense, magically appear. This was the dimension which discriminated overachieving Types A and B, and also underachieving Types C and D. The most significant items which relate to this dimension are reproduced in Table 3.1, page 45. The most important factor which these items seem to have in common is one that relates to the image of themselves which students wish to project to their parents and peers. Stated in bi-polar terms, this dimension is as follows: the desire to be known as being superior versus the desire to be known as being average. On the surface, the items listed in Table 3.1 would seem to be related to the second dimension of Table 4.4. Appar- ently the item-writers assumed that if one wanted the reputation of being intelligent one would be motivated to act in such a way as to earn that reputation. As reasonable as that assumption is, it evidently is not correct. But because of this mistaken assumption, items were included in the pool which did relate to the "desired reputation” dimension. Those items were subsequently identified through the configural analysis, and the dimension was incorporated into configural test AD. The unique dimensions that should have made test BC different from test BD, however, were not found through the configural analysis simply because items which would relate to those dimensions were not included in the original pool, at least in any significant way. The ”dimensional purity" of the GSCI items is, in fact, a tribute to those who wrote those items They delib- erately intended to write items which would relate to only the dimensions specified in their theory; except for the ”desired reputation” dimension, they seemed to have succeeded in doing just that. One of the assumptions upon which this study was based is that there was "configural” information hidden in the responses to the GSCI, 81 that the configural technique developed in the study would be able to identify such information, and that the use of that information would permit more accurate predictions of GPA to be made. Although some "configural" information was identified, it was evidently not of great enough significance, or in enough detail, to make more accurate predic- tions possible. The dimensions necessary in order to construct an effective configural test were simply not present in the data, at least to any significant extent The foregoing discussion was undertaken in an attempt to answer a question regarding the failure of the configural analysis to improve on the predictive validity of the GSCI. It may be that some other config- ural technique might have been more successful in identifying and using the configural information. There is, however, evidence to support the conclusion that the failure was produced by factors outside the control of the investigator, namely, the relative scarcity of GSCI items which relate to any truly I'configural" dimension The implications of this conclusion are discussed in the next chapter. Summary In order to provide a meaningful comparison between the linear and configural forms of the Generalized Situational Choice Inventory, two hypotheses were presented and tested: The first hypothesis related to the reliabilities of the two forms. Because the reliability of the full-length configural form could not be accurately estimated from the available data, the hypothesis was limited to investigating the statis- tical significance of the difference between the split-half correlations of the two forms The split-half correlation of the configural form was 82 5570, and for the linear form it was .7195. The difference between these correlations was significant at the .05 level. It was concluded that the split—half correlation was lower for the configural form than it was for the linear form. In the discussion of the result of the first test, data was pre— sented to support the explanation that the lower split-half correlation of the configural form was largely due to the relatively low reliability of the procedure for selecting one of the four type scores as the over- all configural test score. Estimates were made of the full-length reliability of the configural form under two hypothetical conditions: (1) the selection procedure reliability was the same for the full-length form as it was for the split—half form, and (2) the selection procedure _reliability of the full-length form was perfect, 132.: l 0. Under the first condition the reliability would be 71; under the second it would be .845. Inasmuch as the selection procedure reliability for the full- 1ength form is undoubtedly higher than it was for the Split-half form, but less than 1.0, it was concluded that the reliability of the config- ural form was greater than .71 but less than .845. Because the estimated reliability of the linear form was .84, it was reasonable to assume that the reliability of the configural form was less than the linear form reliability. It was not possible, however, to test this assumption. The second hypothesis related to the predictive validities of the two forms The Grade Point Average of the individuals in the sample were predicted from multiple regression equations with, first, the linear GSCI and DAT-VR scores, and second, the configural GSCI and DAT-VR scores. The coefficient of multiple correlation in the first case was .6847; in the second it was 6931. The difference between these 83 correlations was not significant at the .05 level. It was concluded that there was no difference in the predictive validities of the two forms. In the discussion of the result of the second test, evidence was presented to support the explanation that the configural analysis failed to improve the predictive validity of the GSCI because there was not enough "configural" information hidden in the test reSponses to signif- icantly modify the GSCI scores CHAPTER V SUMMARY, CONCLUSIONS AND IMPLICATIONS Summary The purpose of this study was to develop a technique of configural scoring, to apply the technique to the scoring of a particular psycho- logical inventory, and to compare the result thus obtained with those obtained by the application of a more common scoring technique. The con- figural scoring technique that was developed was based on a technique suggested by McQuitty.l The psychological inventory used was the Generalized Situational Choice Inventory developed by Farquhar and his associates to explore the motivational factors related to academic over- and underachieving The Rank Order Typal Analysis was used to identify two types of overachievers and two types of underachievers from criterion groups of over- and underachievers which were previously identified by Farquhar. Four configural tests were constructed, one test to discriminate each overachieving type paired with each underachieving type. Items were selected which discriminated the paired types at the .025 level. Scores lLouis L: McQuitty, "Item Selection for Configural Scoring," Educational and Psychological Measurements, 1961, Vol. XXI, pp. 925-8. 2William W. Farquhar, Motivation Factorszelated £2_Academic Achievement, U. S. Office of Health, Education and welfare, Cooperative Researcthroject #846, ER 9, Office of Research and Publications, College of Education, Michigan State University, East Lansing, Michigan, 1963, 506 pp. 84 85 on the configural tests were combined in such a way as to produce a set of type scores, one type score corresponding to each of the four types of over- and underachievers. The unique configurations or patterns of type scores that would be characteristic of individuals in each of the four ideal types were identified by calculating the means of the four type scores produced by the individuals in each of the types. The four patterns of type scores were the basis upon which one of the four type scores produced for each individual would be selected as the individual's configural score. A method was developed for determining to which of the four ideal patterns the particular pattern of type scores produced by an individual was most similar. The type score which corresponded to that ideal pattern was selected as the configural score. The configural form of the GSCI was compared with the linear form which had been developed by Farquhar. Two characteristics of the tests were under consideration: (1) their reliability, and (2) their predic- tive validity In order to explore the question of reliability, the responses of the 256 subjects were scored using the split-halves of both the configural and linear forms. The product—moment correlation of the split-half scores of the corresponding forms were computed. The split- half correlation was .5570 for the configural form, and 7195 for the linear form. The difference between the correlations was significant at the .05 level It was concluded that the split-half correlation was lower for the configural form than it was for the linear form The reliability of the full-length linear test was estimated from the Spearman-Brown Prophecy Formula to be .84. Although the full-length reliability of the configural form could not be similarly estimated, evidence was presented to support the assumption that the configural 86 reliability wa: less than the linear reliability. It was not possible, however, to test this assumption. The question of predictive validity was explored by making two predictions of the subjects' Grade Point Averages, one from a multiple regression equation which used the subjects' scores on the linear GSCI and the Differential Aptitude Test-Verbal Reasoning, and the other using their configural GSCI and DAT-VR scores The coefficients of multiple correlation for each prediction were computed. The coefficient was .6847 for the prediction with the linear GSCI, and .6931 for the prediction with the configural score. The difference between these correlations wa; not significant at the .05 level It was concluded that there was no difference in the predictive validities of the two forms. Conclusions Based on the results of the present study, it is concluded that the configural scoring technique developed in the study has no advantages over the linear technique. It may, in fact, have several disadvantages. Although the predictive validity of the configural form was about the same as that of the linear form, no evidence was found to support the notion that the reliability of the configural form was higher than the reliability of the linear form. Some evidence suggests that the configural reliability is actually lower than the linear reliability. From a technical point of view, therefore, there is nothing to recommend the configural form over the linear form. From a practical point of view there is also nothing to recommend the configural form Considerably more effort is required to produce the configural score, effort which cannot be justified unless the results 87 to be obtained are significantly better than those which can be obtained by more conventional techniques. One possible advantage of the configural form is that, in addition to producing a test score, the scoring technique also identifies the sub- ject as being most similar in his test responses to one of the types of over- and underachievers. It may be that knowledge of this similarity would be useful in the academic counseling of students. It was outside the scope of this study, however, to explore this possibility. Implications The assumption upon which this study was based is that the config- uration of a set of responses contains useful psychological information which cannot be identified by conventional psychometric techniques. The results of the present study, as well as those of other configural studies, strongly suggests that the assumption is not valid. In the discussion of the result of the testing of the validity hypothesis in Chapter IV, pages 76-81, data and arguments were presented to support the assertion that little "configural" information was contained in the responses to the GSCI. It was further argued that the relative "dimen- sional purity" of the GSCI was intentional. Farquhar and his associates intended that the GSCI items should relate only to the dimensions which were specified in their polar theory of academic achievement motivation. If some wonderfully new dimensions had been found, it would have meant that the item-writers had not exercised sufficient care in constructing the items It is not reasonable to expect to find hidden "configural" dimensions in a test carefully constructed along strictly linear dimen- sions 88 The larger dimensions of the configural scoring problem have not, heretofore, been explicated. The crux of the problem is not, as has been supposed, how to get configural information out of any given data; it is, rather, how to get information into the data Configural scoring techniques have, in general, been regarded as merely alternate methods of analyzing data. It was hoped that configural analysis would unearth more useful information than would be unearthed by more common linear techniques. No attention, however, has been given to the data itself. It seems that the question has not been asked as to how the additional configural information is to get into the data. McQuitty made an important contribution to the configural scoring consideration by focusing attention on the problem of item selection. Apparently, he assumed that the much sought after configural items were already present in the larger pool of items, and that the only problem was how to identify them. The more important problem, however, relates to the procedure by which the configural items are to be included in the pool If they are not included, no amount of configural analysis is likely to unearth them. The procedure by which items are included in a pool of items should be the same for configural items as it is for linear items. In a well constructed test, items are included because it is thought that they are clearly related to a concise theoretical framework. If config- ural items are to be included, it follows that there must be a configural theory to which the items are related. The present study, as well as many previous configural scoring studies, approached the problem as if fifiv—fiwv l McQuitty, log cit 89 it were one of technique only. Little or no attention was paid to the kind of psychological theory demanded by the configural scoring tech- niques. It is not surprising, therefore, that the results of configural scoring studies have been almost uniformly discouraging The configural scoring technique developed in this study clearly demands a theory of types In this particular case, the technique demands that a theory of academic achievement motivation include the idea that there are, or may be, different types of over- and underachievers, and that it specify the dimensions along which the various types may be differentiated. The present study failed primarily because such a theory had not been constructed and consequently, items related to it were not included in the pool. If the psychological theory upon which a test is constructed does not encompass a theory of types, it should not be sur- prising that a configural analysis of the test would fail to produce encouraging results. In view of the foregoing discussion, a new hypothesis relating to configural scoring is presented for possible future study: Provided that the psychological theory upon which a test is constructed encompasses a theory of types, and provided items are included in the pool which relate to the typal aspects of the theory, it is hypothesized that the procedures developed in the present study can be used to construct a configural test which will be superior to one constructed along purely linear dimensions Perhaps the greatest value of the present study is that it has led to a restatement of the configural scoring problem. The least that can be said is that the configural scoring problem may not be so much one of configural technique; the problem is probably one of psychological theory. BIBLIOGRAPHY Alf, E. F. Configgral Scoring and Prediction, (Doctoral Dissertation, University of Washington) Technical Report, Public Health Research Grant M-743(CZ), University of Washington Division of Counseling and Testing Services, November 1956. Campbell, David. "Another Attempt at Configural Scoring," Educational and Psychological'Measurementsv, 1963, Vol. XXIII, pp. 721- 7. Farquhar, William W. Motivation Factors Related to Academic Achievement, U. S. Office of Health, Education and Welfare *Cooperative Research Project #846, ER 9, Office of Research and Publications, College of Education, Michigan State University, East Lansing, Michigan, 1963. Horst, Paul. "Pattern Analysis and Configural Scoring,” Journal gf Clinical Psychology, 1954, Vol X, pp. 3-11. q, ER 21, "The Prediction of Personal Adjustment, " Social Sciegpe' _ Research_Council'Bulletin, 1941. - "The Uniqueness of Configural Test Item Scores," Journal of Clinical Psychology, 1957, Vol. XIII, pp 107-114. Lubin, Ardie, and H. G. Osburn. "A Theory of Pattern Analysis for the Prediction of a Quantitative Criteria," Psyghometrigg) 1957, Vol. XXII, pp. 63~74. Lunneborg, C. E. Dimensional Analysi3, Latent Structures, and the Problem of Patterns, Doctoral Dissertation, University of Washington, 1959. M??E§l.f°r Interpretation of the Michigan State Mtficales, (Mimeographed). McQuitty, Louis L. "Item Selection for Configural Scoring," EducationaL and PsychologiqglgMeagurements, 1961, Vol. XXI, pp. 925—8. .._..~4. "Pattern Analysis Illustrated in Classifying Patients and Normals ,” Educational and Psychological Measurements, 1954, Vol. XIV, pp. 598-604 ..*.*... "Rank Order Typal Analysis," Educational and Psychological Mgasuremengs, 1963, Vol XXIII, pp 55 61 Meehl, Paul. "Configural Scoring,” Journal 9f Consulting Psychology) 1950, Vol. XIV, pp 165-71 90 91 efhorp Marion D The Factored Dimensions of an Objective Test of w Academic Motivation Based on Eleventh Grade “Male Ove r- and Under- H achievers, Doctoral Di.ssertation, Michigan State University, 1961. Walker, Helen M., and Joseph Lev. Statistical Inference, New York: Holt, Rinehart and Winston, 1953. APPENDIX CHI-SQUARE STATISTIC FOR RESPONSES BY ALL POSSIBLE PAIRS OF TYPES (Starred items were selected for the configural test indicated by the column heading. Items starred in the last column were selected for the linear form of the GSCI ) 92 93 Chi-sqgare Statistic for Re3ponses by All Possible Pairs of Types All Overs-Unders C-D Pairs of Txges A—C Item No 0.6 1 0 8 0.0 1 0.8 l W.A.117. 2006 l 6045 7.0vq311 1.7.090, llanunu x “2,0.1.1 RJnvnvA. * * 3507 3050 l * * A.anuQ3 1068 * 11nv7.0, 7002 n2n3.4.5 24122 03622 12250 [40/400 38.143 02.101 IX 46912 05102 In“ 1.85.1.8 23.112 1 3410.1 10010 Rvnv%_6vq3 00520 67890 .1 0 0 0/43 050 0 0 14128 34911 anJnvA.A. 20000 00000 Jan 589/400 03027 46!. 96253 0513/4 w 020.15 00020 1 12345 1.1111 49 01 1 0.0 0 66 8 O 2 0 3 25 0.0 2 1 2 52 . 6* .7 0 4 O 37 3* 0 7 O 3 16 17 .8 1 18 19 20 2. 2 .2 7 0.2 9 5 17 2 3.4 0 6.0* 20 2* 2 8 0 0 4.3 3 0.3 3.8* 7 0 3 2 .8 5 2 3 4 O. 1 9.8* 2 O 5 O 4 0 8 6 O l 21 26 02 .3 9 l 6 0 O 7 .9 .5* 8 0 5 1 2 l .9 22 23 24 l 1 l 0 .2 3 5 0 2 1 2 O O 7 2 1 9 25 90 33 26 27 .6 O 14.6* 1 5 .2 0 2 .9 0 l6 2* 1»:- , 5 . 3 9 0 6 28 29 3O 1 8 0 0 2 1 7.5* 1 O 3 k .5 l 2 .1 O c .7 3376 0 6 l. 2 2 0 8 8.1 3.2 0 1 1 8 1.5 17 2* 1 3. 1 0 3 5 5’ 6.5* 5 0 2 2.7 5 6' 0 0 002 0 31 32 33 34 35 0.2 1 0 3 0.0 0 6 .3 3 31.6* 7 33.5 6 2 .7 32 .5 l 11 3* 1 4 21 7* 46.5* 19 4 8 36 3 6 15 4 2 32.7* 1 31.6* 55 5 71: 37 0.8 1 l . 3+ 7 .01: 35 3 5 27': 38 39 40 4.6 2 2 O 2 3 .l 1 9 .8 0. 2 .2 0 O. l 5 w—I- A11 Overs-Unders 11. .1 l 8 20 8* 16.1* .1 1.0 2.3* 2.5* 19.5* 0.5 5.5 0 5 C:D 6.7 0.3 1 5 .7 1.7 0.1 28. 0.4 .1 10.3 1.7 0.0 .1 0 2 .1 0 .5 .1 .5 .3 3.6 .1 .6 .1 .8 .1 10 O 12 0 1. 10 1. 4.4 2 7 10.3 11 0.0 5 7.7* 0.0 4 4 2 0.4 7.1* 1 0.0 2 .5. .5 0.0 9.87'5' l l 0.2 .8 3.3 O O 94 l .1 10 9.4 .2 .7 1 .1 .1 .6 8.57 11.2* 0.4 33.4* 5 5 8* 0.8 4. 7.4%: l 3 14.5* 11.8* 0.0 37.2* 7.0* l 2 Pairs of Types 25.8* 0.2 0.0 2.1 l .1 0.0 0.4 0.2 8.3* 0.4 0.9 7 77? 1 5.8* 5.1* .1 21.2* 1 3.4 4.4 5 4*- 6.3+ 2 No. 41 50 51 52 53 54 55 56 57 58 59 6O 66 67 Item I\ 1““ 223 147 7.4 O 8 10.8 1 .3 3 2 8 22. 6.1* 8 6’ 1.6 0.2 4 3 2.0 0,2 1 I4. 68 69 7O .3 57': . 57': 1 3.4 .5 10 9* O 2 5 1 1.3 0.0 O O 0.8 .1 l 0.0 1 0 3 .l O 6 4 3 .l 5 3 0 .7 2 1. 0.6 2 2.4 4.8 0.0 O . 022- .8 .3 1 7.6? 5 7'.- 6 0 2 1. 7 .6 l 0.4 0 .7 l O. 3. .9 0 .3 .1 l 3 O 0.6 5 71 72 73 74 75 76 77 78 -43 I44 1 0.2 0.0 3.3 9 0 7 4 4 7.2* 0.3 1 .1, l\ U 79 8O .1 4 2 4 .43- .5 9 0 7 87': All Overs-Unders 12.6* 0. 15.2* 0.0 5.3 0.2 .9 .0 1 0.2 .1 A-B 0.1 0.0 l 1 .7 3 2 7.6 1.0 20. .7 .2 1* .O .7 95 15.1* '1 1 5 0 8.8* 0 11.0* 2.6 .9 2.3 21..25? .0 7.5‘ 0.2 9 67': Pairs of Types B-C 5.6* 7 1 .3 5 67‘: 11.8* 1. 1 1 3 0. 0 A-D A-C 1.5 0.8 3 O 0.0 0.8 0.8 5.5* 4 6 Item No. 81 82 83 84 85 86 87 88 74 07 1 3.8 0 4 2 4 0 0 2.8 O 2 12.2* 7 .0 l 2 4.0 89 90 20 20 0.6 7 9. 0 O 3 5 1 0.2 l 7.6* 10.0* 2.7 6.8* 0 3 0 8 0 2 91 92 93 94 95 7.5 5.9 7.7 . 37’: 9.0 0 0 .1 2 3 9 1 0.0 .2 1. 0. O 0.6 2.7 0.2 0.1 .7 1 8_1 5 4 0.0 0 0 4.9 4.8 .9 3.4 4 2 6.1* 0.0 l. 2.2 3.0 O 8 3.9 9.8* 8 .5 4. 20 .4* 6.0* 0 0 9 6.6* 5.9* 1.0 0.9 0.2 5.6* 1. 2.2 0.2 0 1 .8 5.0* . 17': 0.8 ,1 0 0.8 .5 2. 1 3 0. 0 l 1 96 97 98 99 100 101 3.8 5. 16.1* r) .5 1. . 67': 7.0* 6.8* 2 102 103 0.4 0 6 0.0 4.0 .5 0.0 1. 11.3* 10 1* .r 6.0” .1 104 105 1. O 0 .1 1. R3904. 20 0.8 5 1. 0.0 l 1 .8 0.6 0.3 0.0 1. 5 0.0 106 107 .7 .6 7 .1 7.4‘ 3.4 3 0 3.4 0. 108 0.2 14 9* 6 0 .7 I... .0 13.9* 1r 1 0. 109 110 0 3.4 '13.2* 3. 16 40 0 2 0.1 3 7 1 2 4 1 2.5 l 7 9 .0 .1 l 0 0.8 9 1 111 112 113 1.7 3.2 7 .1 1 8 4.4 1312* 7.2 .7 .0 .0 .7 7 .1 0.4 2 1 58 0.8 1 .5 .0 3 6 . 0.4 0 0 2 4.5 0 5 2. 65 .8 8.6* 0 2 .0 1 9 7.1* .3 5.0* 0 3 4.6 3.0 2.5 2.2 76.5* 3.3 7.9* 2.6 .1 5 2* .7 4.5 48 3* 0.2 2.6 O ]_ 7': .5 0.3 0.8 0.5 0 10. 2 114 115 116 117 118 119 120 A11 Overs-Unders 0.2 96 6 . 0* 15.7* Pairs of Types 11.57“ 7 ’99: 13 7* Item No. 121 .w 3. 13362 6 1811 8413/... 78620 63815 6613 00473 70110 051/41 1 0 5 5 3 2 19.5* 2 10 0 0 4 8.8* 4.3 0.1* l 1 0.0 03401 20716 46341 66510 47003 27090 50382 00700 01000 00101 54620 01442 06000 10140 38655 23151 90624 69572 46091 59001 03057 00N00 28010 91310 14mm50 60000 11011 00200 1 .x 4.1 4 w .x .3. 11...“ J... 4.. 10613 25641 90980 11955 40460 50412 07044 10842 46071 53001 70532 21311 00100 03501 1 1 1 1 1 .w .w .w .H .x.x.w* 7 .x .w 41082 65681 79841 22721 71063 38653 46296 10733 24280 10424 46362 80000 17100 13172 2 2 1 2 2 .... .w 21856 34216 62901 86921 10175 55974 0.7601 01310 01020 06100 13321 13340 30010 01000 11 .4 1.. 33641 06590 03128 00308 22007 26113 07116 a o c I a o I . o a o 0 00011 04120 03001 30000 00000 50100 11040 1 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 97 All Overs-Unders Pairs of Types KJC Item No. * 76 8O 63 01 05 O3 ~/,6 92nu 97': 8 72 21 5.7* 0.0 161 162 163 4 6 1 0.2 3 1 4.6 0.9 .2 .3 11 2 1 .2 2. 2 8.2* 3.2 3.2 9.7% 0.0 164 165 1 ,1 .5 1 olmvnv1. 72RJnVA. I42 01 1.6 1 0. 1 2 4 5.6* 0 3 .6 .1 3.8 8.8* 4.8 1 1 .7 2. 0.8 0.7 166 167 9 1 2 3. 7 9 1 0. 168 5 2.8 0. 1 169 15.8* 6.1* 9 3* 6.7* 170 10.0* 1 .9 10.4* 0.0 .7 1 10 4* ' 7.0* 0.4 2.2 1 .0 .1 2 3 171 172 173 541 000 O 9 1 1 .3 1 1 1.6 0 2 O6 1 O 1 0.3 1 1 174 175 12.3* 1 5.6* 13.0* .5 6 0* 1 1 7 0 0.6 0.2 6.3 1 O 8 11.2* 3. 2 176 8.2 5.4* 18.0* 1.6 3 0.8 ,7 177 178 179 .5 8 _3'k 6.6* 6 6 1 7". 9 9 8.3* 0. 3 0.3 .3 .2 0 0 3 0. 0.3 .57 11.2* 8.0 . 1* 4.2 -5 1 180 10.5 14.2 1 13 5.17:- .2 4.7, 32.7* .2 .0 .0 2 0 3 5 127': 0.4 181 0.4 1 1 1 .9 10.0* 182 1 5 .2 . o 7 O O 183 5.4 3 4 0.3 2. 0 4.3 184 .3 l 0.4 1 0.0 1 0 185 c 3481 0336 1 5 0,0 4 7 0 6 O 0 2.5 3.6 1 8 4.0 1.0 0.4 1 6 l .3 0 0 11.7* 7.5* 31.2* 0 9 4.5 0.4 0 4 l 3 .2 9 186 187 188 189 .7 11 13.0 9.9 4,6 .1 1 6.3* 190 l . 17': 7.0* 0.4 1 .9* .1 6 1 .2 .3 2 4.2 9 .3 O. 3 191 7 1 0.8 0.0 0 3 192 193 7 1 7': 0 0 2 3.3 0 2 2 3.2 9 0. 0.3 0 39 00 I40 00 1 0.9 O .5 .8 3.8 .2 194 195 1 0 .1 0. 2. 11 3 6 .2 3. 3 3 3 0 0 0 4 0.4 0.6 .5 3 5 5 0 0 2 0 3 196 197 198 1 O 4 O 7 7 .1 .3* 8.2* 2 1 04 00 .3 2 8 97': O O 0 1 3.0 199 200 R M'11111111111131fiflflfifl‘fiiflfllfliflfiflfilTiES