. V .7. ‘7.—~ . _ ~ « . . » _ W m . . . . . I _ I . CRITERION PATTERN-u ANA EYING IONS x RA 1.» .._P.REDIC GU I WN‘F EM II TIVE ' MI DE F D WT»..— M GA DR , ,3 . nu . B. e h" . bl. f0 MICHIGAN STATE "UNIVERSITY 7 Thests RK RTHUR‘CUI JAMES A T6655 This is to certifg that the thesis entitled CRITERION PATTERN ANALYSIS: A METHOD FOR IDENTIFYING PREDIC‘I‘IVE I'I‘Ei‘n CONFIGURATIONS presented by JAMES ARTHUR CLARK has been accepted towards fulfillment of the requirements for LL— degree in WC gy Major professor saw {MI/z? Date [2 X QI (hmtpnxj) [76 B/ 0-169 ABSTRACT CRITERION PATTERN ANALYSIS: A METHOD FOR IDENTIFYING PREDICTIVE ITEM CONFIGURATIONS by James Arthur Clark Meehl, in 1950, demonstrated the potential usefulness of patterns of items for predicting. The various ap- proaches subsequently developed to capitalize on this predictive power of patterns are classified and evaluated. In the light of their marked lack of success, nine princi- ples for an effective pattern prediction method are pro- posed. The method should (1) find all major patterns which predict the criteria; (2) find patterns separately for each criterion category; (3) isolate non-configural as well as configural relationships; (A) be capable of pre- dicting directly from patterns; (5) be capable of predict- ing better than linear methods on the analysis sample; (6) predict better than linear methods upon cross vali- dation; (7) be applicable to small samples; (8) yield readily interpretable results; (9) provide readily obtain- able results. Criterion Pattern Analysis (CPA), a method conforming to the above principles, has been developed. CPA operates on discrete data, typically a matrix of the responses to a set of items made by people who have James Arthur Clark been previously classified into two or more criterion categories. For each of the criteria, patterns of re— sponses are sought which are highly related only to that category. These patterns may be in one item, two items, three items, etc. A pattern is accepted as relating to the criterion category if it is significantly more pre— dictive of the criterion than are any of its subpatterns; the hypergeometric distribution is utilized in making the significance test. In checking all possible patterns, all one—item patterns are scrutinized first, then two- item patterns, then three-item patterns, etc. To overcome the impossible task of checking all possible patterns one by one, a technique of rejecting many patterns at one stroke is employed. Thus a pattern of r items, whether or not acceptable itself, is also tested as to whether it can possibly be improved through the addition of more items. Only if it can be improved significantly will it be used in the formation of patterns of r+l items. This procedure for CPA was carefully programmed to make efficient use of the capabilities of an electronic computer. A method for predicting directly from patterns was developed. A person for whom prediction is desired is checked for the patterns previously extracted. That pattern which he has which is most highly related to its criterion determines the highest prediction. In this way a hierarchy of prediction can be obtained. In an alter- native prediction scheme, each person in the original James Arthur Clark response matrix is given scores of 1's and O's according to whether or not he has each pattern. This set of scores can then be employed in one of the linear prediction methods. In this guise, CPA functions as an extension of item analysis. Two sets of data were used to compare CPA with two linear methods, multiple regression and a multivariate normal maximum likelihood procedure. The first set in- volved prediction of field dependence and independence from items of the I-E scale; the second involved predict— ing voting behavior on a selected issue from votes on other issues in the UN General Assembly. On the analysis samples, of 50 subjects and 55 nations, the maximum likelihood pro- cedure predicted better than did CPA; multiple regression did better than CPA on the UN data, but not as well on the Crego data. On the cross validation samples of A9 sub— jects and 55 nations, CPA consistently predicted better. The combination prediction scheme yielded better results than did predicting directly from patterns. On both sets of data patterns from CPA offered greater opportunity for substantive interpretation than did the results of the linear methods. Various ways of applying CPA are indicated. Areas of improvement of the present method are pointed out, such as establishment of an over-all significance level for patterns. CPA is compared with several other methods James Arthur Clark purported to utilize configural prOperties found in data. It is suggested that types as determined by patterns from CPA might be capable of helping revise typal theory in general. CPA is measured against the nine principles initi— ally formulated, and is found to meet all of them with the exception of number 5; here the necessity of sacri- ficing maximum prediction to the analysis sample in order to obtain best cross validation prediction is asserted. Importantly, the seemingly impossible task of examining all possible patterns in search of the highly predictive ones has been achieved, and with the aid of a high speed computer the application of CPA is made a practical pro- cedure. CRITERION PATTERN ANALYSIS: A METHOD FOR IDENTIFYING PREDICTIVE ITEM CONFIGURATIONS By James Arthur Clark A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1968 6 LI? '2 I? 4:") _. 3,1 5-"{.\ ‘13 v ACKNOWLEDGMENTS No dissertation is developed and written in iso- lation. There are many people who have been influential in the various stages of conceptualization and presen- tation of this thesis. First, I wish to acknowledge the members of my committee: Dr. William Mueller and Dr. Bertram Karon provided encouragement and many helpful comments. Dr. Terrence Allen generously extended thorough word for word criticism in the final stages of preparation, as well as valuable suggestions in the development. Most of all, I owe a debt of gratitude to the chairman, Dean L. L. McQuitty. Not only were his patience, encouragement, and suggestions appreciated in the formative stages, but also his skillful instruction led to a better presen— tation. Certainly without Dean McQuitty's pioneer work in pattern analytic methods, this thesis would not have been possible. Sincere thanks are owed to John Hafterson who deveIOped many ideas similar to my own and was always' ready to listen and react to my attempts to solve the problems at hand. Many people in and around the Michi- gan State University Computer Laboratory were very ii helpful in the development and running of the computer programs connected with the thesis. Finally, I must express my gratitude to my wife, Jan, who contributed in so many ways I could not begin to recount. It is to her and our son, Paul, that this dissertation is affectionately dedicated. iii TABLE OF CONTENTS ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES . . . . . . . LIST OF APPENDICES Chapter I. THE PROBLEM OF PATTERN PREDICTION METHODS Introduction. . . . . Methods of Predicting from Patterns Requirements for a Pattern Prediction Method. . . . . . . . II. THE METHOD OF CRITERION PATTERN ANALYSIS Introduction. . Definition of an Acceptable Predictive Pattern . Finding Acceptable Patterns in Data Predicting from Patterns. . . III. DATA AND RESULTS . . . . . . . . Data . . Linear Methods . . . . Results from Crego Data . . . . Results from UN Data . . . . . Summary . . . . . . . . . . iv Page ii vi ix Chapter Page IV. DISCUSSION AND CONCLUSION . . . . . . 7“ Discussion . . . . . . . . . . 74 Conclusion . . . . . . . . . . 82 APPENDIX . . . . . . . . . . . . . . 86 BIBLIOGRAPHY. . . . . . . . . . . . . 98 Table 10. ll. 12. 13. LIST OF TABLES An Example of Configural Prediction Subsets of the Pattern l(l) 2(2) 3(2) 4(1). Example Data. Pattern Acceptance and Rejection . . . . An Example of Patterns to be Used in Pre- diction . . . . . . . Results of Criterion Pattern Analysis on the Analysis Sample of the Crego Data . Results of Predicting to the Crego Analysis Sample from Patterns from Criterion Pattern Analysis . . . . . . . . Results of Predicting to the Crego Cross Validation Sample from Patterns Results of Multiple Regression on the Analysis Sample of the Crego Data. . Results of Predicting to the Crego Analysis Sample from Regression Coef- ficients . . . . . . . . . . . Results of Predicting to the Crego Cross Validation Sample from Regression Co- efficients . . . . . . . . Results of Predicting to the Crego Analysis Sample Using Maximum Likelihood . . . Results of Predicting to the Crego Cross Validation Sample Using Maximum Likelihood . . . . . vi Page l7 19 2O 3O 39 U8 “9 50 52 53 53 56 56 Table 1A. 15. l6. l7. l8. 19. 20. 21. 22. 23. 2A. 25. Results of Multiple Regression Applied to the Patterns in the Analysis Sample of the Crego Data . . . . . . . . Results of Predicting to the Analysis Sample of the Crego Data Using Regres- sion Coefficients from Patterns . . Results of Predicting to the Crego Cross Validation Sample Using Regression Co- efficients from Patterns. . . . . Results of Criterion Pattern Analysis on the Analysis Sample of the UN Data . Results of Predicting to the Analysis Sample of the UN Data from Patterns Results of Predicting to the Cross Vali- dation Sample of the UN Data from Patterns . . . . . . . . . Results of Multiple Regression on the Analysis Sample of the UN Data. . . Results of Predicting to the Analysis Sample of the UN Data from Regression Coefficients. . . . . . . . . Results of Predicting to the Cross Vali- dation Sample of the UN Data from Regression Coefficients . . . Results of Predicting to the Analysis Sample of the UN Data Using Maximum Likelihood . . . . . . . Results of Predicting to the Cross Vali- dation Sample of the UN Data Using Maximum Likelihood. . . . . Results of Multiple Regression Applied to the Patterns in the Analysis Sample of the UN Data . . . . . . . . . vii Page 58 59 59 62 63 63 65 66 66 68 68 7O Table Page 26. Results of Predicting to the Analysis Sample of the UN Data Using Regression Coefficients from Patterns . . . . . 71 27. Results of Predicting to the Cross Vali- dation Sample of the UN Data Using Regression Coefficients from Patterns . 71 viii LIST OF FIGURES Figure Page 1. A Flow Chart for Performing Criterion Pattern Analysis . . . . . . . 32 ix LIST OF APPENDICES Appendix A. I-E scale 0 o o o o o o o B. United Nations General Assembly, Seventeenth Session: Issues Decided by Roll Call Vote. C. UN Data: Countries in Analysis Sample . . . . . . . . Page 87 91 95 CHAPTER I THE PROBLEM OF PATTERN PREDICTION METHODS Introduction In 1950, Meehl, in an influential paper, asserted the importance of patterns for predicting. While single items alone may not be predictive, a pattern of two items together may be perfectly predictive. Meehl argued that how a person responds to a pair of items can uniquely reflect psychological characteristics, particularly in the clinical area (see also Meehl, 195A). Unless patterns are utilized, important psychological information can be lost. Meehl's paper helped stimulate many workers in psychology to develop quantitative techniques incorpor— ating patterns (Gaier and Lee, 1953; Sells et_al., 1955). Some techniques were analytical in nature: patterns were used to classify people (e.g., McQuitty, 1957a, 1963, 1966). Other methods were predictive: patterns were used for predicting a priori categories of people. Many of these methods are reviewed and classified below. In this paper, an effective new technique for iso- lating predictive patterns is presented. The identifi- cation of these patterns is viewed here as an extension of item analysis, which only identified predictive single items. The patterns extracted by the new method may be used for predicting either directly or in combination with linear procedures. Certainly the ideal approach would be to examine all possible patterns and pick out only the most reliably predictive ones; however, the large number of patterns to be examined in most cases renders this approach too laborious even with the aid of a high speed computer. The method proposed here, to be called Criterion Pattern Analysis, identifies the same patterns as does the ideal approach, but without having to examine all possible patterns one by one. It is the thesis here that Criterion Pattern Analysis is a practical method, more informative than linear methods, and at least as predictive as linear methods-—more predictive when there are configural proper- ties in the data. Methods of Predicting from Patterns The various methods of predicting from patterns fall into six classes: (1) cumulative, (2) reductive, (3) classification, (A) total—pattern, (5) small-pattern, and (6) pattern—search. The first two classes were pro- posed by McOuitty. The third was first hinted at by McQuitty under the term dual—pattern, but is now only one of several classification methods (McQuitty, 1957b). The last three classes of methods are unique to the pre- sent paper. They include some of the more often proposed methods. Criterion Pattern Analysis is best classified under pattern—search methods. The Cumulative Method An illustration of this rare method is contained in a study by Lubin (195U). The single item which best predicts the criterion is selected. Next, the item is selected which, when paired with the first, most improves prediction. A third item is added which most improves the item pair, and so on. Lubin applied this technique to 1474 subjects responding to 20 items from the MMPI. He compared the results of this method with multiple re- gression and found that while the pattern method predicted significantly better on the analysis sample, multiple re- gression predicted significantly better upon cross vali- dation. McQuitty (1957b and 1959) has reviewed the properties of the cumulative method and found much to be desired. The cumulative method may grossly miss important predictive patterns. Perhaps the appropriate starting point is not .with the best single item, but with a poor single item. The item which pairs best with a poor item may prove to have a better predictive value. On the other hand, the best single item, being highly correlated with the criterion, is limited in its configural relations with the criterion through some other item or items. While a cumulative approach might be feasible, there is, as yet, no practical procedural scheme proposed which will get at the highly predictive item combinations. The Reductive Method Whereas the cumulative method starts with one item and builds it to two, three, four, etc., items; the re— ductive method starts with the total pattern over all items used and reduces it to a subset of fewer items. Both methods, by their respective processes, hOpe to improve the validity and the practicality of prediction. McQuitty (1958) provides an example of this pro- cedure. McQuitty first selects the person who has more item responses in common with more members of his criterion group than does any other member. The person most like him is then used to select items which they both answer in common. A third person can be added in a similar way. The resulting responses to a subset of items is treated as a scoring key on which each person in the analysis sample can be scored. A cutting score can then be deter- mined in an effort to separate people belonging to one category from those belonging to another. This technique was tried on a small sample with four criterion groups. One large group contained 64 people, the others 1A, 12, and 15 respectively. When compared with a linear method, this pattern method did better even on the cross validation sample, but on the large group only. On the small groups the linear method did better. This difference was explained by pointing out that only in the large group were there enough people for different patterns to emerge corresponding to different types of peOple within the group. This promising method could be exploited further. McQuitty's use of it was not strictly in accordance with what Meehl (1950) called configural scoring, which is treating each combination of items differently. When McQuitty used the patterns as a scoring key to obtain a total score for each person, some "configural" infor- mation was lost. This is because two people who have the same score can have two different patterns of re- sponses. The Classification Methods In this class of methods, the people within each criterion category are first classified (by some appro- priate method) into two or more groups. Predictions are then based on patterns associated with each of the groups. McQuitty used this method in several different forms. In one form, called the dual-pattern method, (McQuitty, 1957b) people were classified into groups on the basis of external criterion scores. A subset of item responses held in common by all members was found for each criterion group. This pattern was then used as a predictor of the criterion category on other samples of peOple. In another form, the classification of people within each category is based on the test items them- selves (McQuitty, 1959, 1961a). Selection of the level of classification for identifying predictor patterns can be at the highest levels (major patterns) or at lower levels (minor patterns). Finally, a modification of this procedure was offered (McQuitty, 1961b) in which each classification in one criterion group is paired with each classification in the other criterion groups. Items are then found for each pair which distinguish one category from another. In this fashion, a scoring key for each pair is developed. Such keys can reflect configural prOperties if types are present in the data. The results of applying these methods have been in- conclusive or disappointing as compared with linear methods (McQuitty, 1956, 1957b). It seems that many con- figural properties can be missed while concentrating on a few. Classifying within each criterion group may result in a pattern which has little to do with dis— criminating between criterion groups. And even if the pattern does discriminate, it was not selected on the basis of being one of the best discriminating patterns. This method does have its merits, however. When there is a large number of items, this method offers a way of getting at predictive patterns which otherwise might never be sought. Even with the advent of high speed computers, looking at all patterns seems a formidable task. These methods offer a compromise between what is possible and what is theoretically best. The Total-Pattern Methods When the number of items is very small, say less than ten, and the number of people is very large, say more than a thousand, then the total-pattern methods can be employed. In these procedures the pattern of responses of each person to all the items is used. Since there are relatively few items, all empirical patterns are likely to occur with at least moderate frequencies. Cochran and Hopkins (1961) developed one such predictive model which they used for predicting election outcomes. In such a situation there are very few items and many people who are divided into two criterion categories. The frequency of occurrence of each pattern in each criterion category can be readily counted and used as the basis for proba- bility statements. The prediction for a new person is made by first ascertaining his pattern of responses to the set of items and then predicting to that criterion group which has the largest probability for that pattern. A similar method for medical diagnosis was proposed by Ledley and Lusted (1959). Another method which is quite common (Fricke, 1957; Lykken, 1956) is for situations which have quanti- tative criteria. For each pattern, the mean criterion score is computed over all people who have the pattern. A prediction for a new person is made by first ascer— taining his pattern and then assigning to him the mean score associated with his pattern. A more elaborate version was proposed by Horst (195A) and refined by Lubin and Osburn (1957). It was shown that a pattern of responses could be translated into a polynomial function involving all possible inter- action terms. With this mathematical representation, the usual regression analysis could be performed to predict to a quantitative criterion variable. Lubin and Osburn assert that this polynomial technique will produce a minimum number of misclassifications when the criterion score is normally distributed for each pattern. Alf (1957) and Lee (1957) tried the polynomial technique and found that the usual linear methods were better upon cross validation, although not significantly so. Lee's explanation was that configural methods tend to capitalize on chance patterns which then throw off prediction upon cross validation. Osburn and Lubin (1957) agree that while all information is considered, all information is also conserved whether reliable or not. In other words, there are often too many degrees of freedom, and the parameters in the regression equation will not be accurately estimated. All researchers with the total—pattern methods reiterate the need for very large samples of people. The Small—Pattern Method In an attempt to get away from the need for large samples of people in order to make accurate estimates, patterns of only two or three items are utilized instead of patterns over all items. In other respects the small- pattern methods are the same as the total-pattern methods. Lee (1957), in addition to using total patterns of eight items, also used small patterns of five and three items. Hoffman (1960) proposed the polynomial technique using all item pairs. Saunders (1955), using such a technique, found it no better than using linear predictors. While enormous samples of people are not required, the small—pattern methods still tend to capitalize on random patterns, again leading to erroneous predicting on cross validation samples. The Pattern-Search Methods Instead of indiscriminately using all patterns or some subset of patterns, pattern—search methods are selective. The various methods employ different criteria 10 for selecting. Forehand and McQuitty (1959) selected patterns according to their departure from chance occurrence. They found that using significant patterns predicted better upon cross validation than did using all patterns. However, multiple regression still pre- dicted better. They comment that part of the trouble is again too many patterns with too few subjects in each. It might be added that their selection criteria of significance of occurrence did not pick patterns which are necessarily significantly related to the criteria. Horst (1957) suggested seeking patterns which are highly related to the criteria, after first checking to see whether they could be expressed as a linear function of smaller patterns. This was followed up by Wainwright (1966) who defined the configural phenomenon as a non— linear combination of items. Thus he selected patterns -which could not be expressed in terms of single items. However, Wainwright was not interested in predicting to a criterion. His conclusion was that a linear combination of items does not account for all information, which is what Meehl initially asserted. In general, there seems to be a paucity of pattern- search techniques, even though this might be a fruitful approach. The widespread availability of high speed computing facilities might change the balance in the years to come. ll Resume By the end of the Fifties, most of the proposed methods to predict with patterns had been tested. The majority of the results were inconclusive or disappoint- ing in comparison with linear methods. Loevinger (1959), in reviewing some of these studies, believed the case for configural predicting was closed. While able to predict better in the analysis sample, pattern methods failed to hold up on the cross validation sample. The linear methods did at least as well or better. Apparently the potential of McQuitty's reductive method and the suc- cess of Cochran and Hopkins' probability model had been forgotten and not fully explored. And, of course, the pattern-search methods had hardly gotten off the ground. While the problem of predicting from patterns had apparently been solved in theory, only unusually large samples of people responding to a few items could be handled reliably. Seldom does the worker in psychology have these kinds of data. Not fully solved was how to use patterns to predict reliably on smaller samples of the kind of qualitative data with which psychologists often work. Hence, the need for a satisfactory pattern prediction method still exists. 12 Requirements for a Pattern Prediction Method In the previous section various predictive ap- proaches which attempted to capitalize on configural properties in the data were discussed. In this section a critique for an effective pattern prediction method is outlined. 1. All major patterns which predict the criteria should be found. This requirement is fundamental, and is implicit in other requirements. What is wanted is a set of patterns which are very reliable and which predict the criteria. No reliable predictive pattern should be excluded. If any such patterns are missing after analysis, predicting in cross validation can be jeopardized. 2. The patterns should be found separately for each criterion category. The patterns which predict to one criterion group may be ineffective in predicting to another. Fricke (1956) was the first to point this out in a modification of Meehl's example of configural scoring. The requirement does not mean that classifying within each criterion group will be acceptable; as pointed out earlier, the procedure may miss predictive patterns. The analogous assertion for linear methods was made by Stormes (1958). l3 3. The method should isolate non—configural as well as configural relationships. For example if single items or linear combinations of them are highly predictive, they should be extracted as such. Some configural methods already allow for single— item "patterns." A. Patterns extracted by the method should be capable of being used directly for prediction. This implies that a method may be developed for predicting with patterns themselves rather than with derivatives or functions of the patterns. The prediction method should work separately from the method used to ex- tract patterns and yet be tied to it logically. By using patterns directly, interpretation should be simplified (see requirement 8). 5. The method should be capable of predicting better than linear methods on the analysis sample. In those cases where the analysis sample is the only one to which prediction is desired, the method cer- tainly should be capable of predicting better than linear methods. Although satisfaction of this requirement is de- sirable, a method which predicts better than linear methods on the analysis sample has no guarantee of predicting better on cross validation samples. Prediction to a 1A cross validation sample is based on information common to both samples which is gleaned from the analysis sample. A high prediction on an analysis sample is likely based on much idiosyncratic information which cannot predict to a cross validation sample. 6. The method should predict better than linear methods on a cross validation sample. Previously this requirement has been the toughest to meet; and yet, if reliable patterns are isolated, as in requirement 1, pattern methods will begin to do much better. 7. The method should be applicable to small as well as to large samples of people. The repeated assertion about needing larger samples to show that configural properties are present can only weaken the appeal of pattern methods. The aim should be to develOp methods effective on the small sample. 8. The results of a configural prediction method should be readily interpretable. This is a plea for simplicity. With many methods, both linear and configural, it is difficult to under- stand the relationships between the predictors and the criterion. Configural methods have an opportunity to present a clear picture. 9. The results of a configural prediction method should be readily obtainable. 15 The method should not exist in theory only, but should be translatable into a practical tool. This translation can be one of the more difficult tasks in the development of a method, and will almost certainly have to be implemented on an electronic computer. In- deed, a method which meets in full the previous eight requirements might be fully translated only with difficulty, even when a computer is available. Resume Suggested requirements for a configural prediction method have been listed. The method proposed in this thesis will be measured against these standards in the final section. CHAPTER II THE METHOD OF CRITERION PATTERN ANALYSIS Introduction The method of Criterion Pattern Analysis is directed toward solving the problem of predicting with nominal data: if a person responds to a set of items, what pre- diction can be made from these responses about the cri— terion category to which this person belongs, judging from a similar set of people who have responded to the same set of items and for whom the criterion categories are known? According to the critique in the last section, the "major" patterns associated with each criterion cate- gory should be ascertained first. After this is accom- plished, these major patterns are used for predicting. The details of the method are developed in three sections: (1) the definition of an acceptable predictive pattern, (2) the steps in finding patterns in the data which meet the definition, and (3) the use of patterns in predicting. l6 17 Definition of an Acceptable Predictive Pattern Zubin (1938) was one of the first to point out the usefulness of patterns in a set of items. Meehl, however, emphasized a unique role for patterns in predicting be- havior. Meehl used the term configural to indicate a combination of items which predict to a criterion when the single items treated separately do not predict (Meehl, 1950). A contrived example of configural pre— diction is given in Table 1. TABLE 1.--An example of configural prediction. Item Item Criterion Observation A B Category 1 1 l l 2 2 2 l 3 l 2 2 A 2 l 2 Item A answered "1" is equally associated with both categories of the criterion; item A answered "2" is equally associated to both categories of the criterion. Likewise both responses to item B are equally associated to both criterion caregories. These are the linear re- lationships; item A alone and item B alone are obviously not helpful in predicting the criterion. The configural relationships, however, are helpful in predicting the l8 criterion. Item A answered "1" together with item B answered "1" perfectly predict category 1 of the cri— terion. Also items A and B both answered "2" predict 2 of the criterion. Criterion category 2 is perfectly predicted by either item A "l" and item B "2", or A "2" and B "l". Configural, then refers to a greater predicta- bility by a pattern than by its unit parts treated separately. An extension of this concept which greatly enhances its usefulness in prediction is that a pattern can have greater predictive power than any of its parts, including not only its unit parts, but also smaller con- figurations within the larger one. Such a pattern might be termed hyper-configural. These considerations lead to the following definition of an acceptable pattern: A pattern of responses to items A, B, . R is acceptable for Criterion Pattern Analysis if and only if the pattern in the items A, B, . . . , is a better predictor of the criterion than is each of the subsets of the pattern in items A, B, . . . , R. A pattern is a better predictor than a subpattern if the level of discrimination of the pattern is greater than the level of discrimination of the subpattern. The level of discrimination of a pattern is the ratio of the number of times it occurs with a specified criterion category over the total number of times it occurs (irrespective of the criterion categories with which it occurs). 19 The following examples help clarify the application and significance of the definition. Consider four items, one, two, three, four, answered 1, 2, 2, 1, respectively. This pattern in the items can be represented 1(1) for item one answered one; 2(2) for item two answered two; 3(2) for three, two; and A(l) for four, one. The pattern 1(1) 2(2) 3(2) A(l) is acceptable if it predicts better than its fourteen subpatterns, given in Table 2, and TABLE 2.--Subsets of the pattern 1(1) 2(2) 3(2) A(l). Discrimination Level for Subpattern Criterion Category 1 1(1) 2(2) 3(2) 2/3 1(1) 2(2) 4(1) 2/3 1(1) 3(2) A(l) 2/3 2(2) 3(2) A(l) 2/3 1(1) 2(2) 2/A 1(1) 3(2) 2/A 1(1) A(l) 3/5 2(2) 3(2) 3/5 2(2) A(l) 2/A 3(2) 14(1) 2/A 1(1) 3/6 2(2) 3/6 3(2) 3/6 A(l) 3/6 Criterion Marginal 5/10 20 also if it predicts better than the marginal relative frequency of the criterion category. The marginal relative frequency of a criterion category is the fre- quency of that category over the frequency of all cate- gories. The marginal relative frequency of the cri- terion category can be thought of as the discrimination level of a "subpattern" of zero items which is a subset of all patterns and against which they must be compared. In the special case of a "pattern" of one item, it is the only "subpattern" which is tested against. To elaborate this illustration, let the pattern 1(1) 2(2) 3(2) A(l) come from observations A and 5 of the data shown in Table 3. Also assume we are interested TABLE 3.--Example data. Observation 1 2 3 u Cgigzgigg l 2 2 2 2 1 2 1 1 1 1 1 3 2 l l 2 1 l 2 2 l 1 5 l 2 2 l 1 6 1 2 2 2 2 7 2 2 2 1 2 8 1 2 1 1 2 9 l l 2 1 2 IO 2 l l 2 2 21 in predicting criterion category one. Pattern 1(1) 2(2) 3(2) A(l) occurs twice in association with criterion category one and not at all with category two, for a discrimination level of 2/2. The subpatterns with their discrimination levels are shown in Table 2. In each case the discrimination level is less than 2/2. Furthermore, the marginal rela- tive frequency of criterion category one is 5/10, which also is less than 2/2. Therefore, according to the definition, pattern 1(1) 2(2) 3(2) A(l) satisfies our definition of an acceptable pattern. On the other hand, none of the single-item patterns (i.e., 1(1), 2(2), 3(2), or A(1)) is an acceptable pattern, since the discrimination levels of 3/6 are not more than the criterion marginal frequency 5/10 (see Table 2). One consequence of the definition is that any pattern of responses that is unique is acceptable, unless a subset of the pattern is itself unique. This consequence leads to the objection that too many acceptable patterns emerge. Not only are all the unique patterns acceptable, but any pattern that has a higher discrimination level than do its subpatterns is thereby acceptable. This leads to the im— possible task of recording thousands and millions of patterns which are acceptable for predicting to the cri— terion. Furthermore, if almost everything predicts, then very little is added by applying the method. 22 The difficulty can be ameliorated by stipulating a strict requirement for a pattern to be judged a better predictor than its subpatterns. Let Nl be the frequency of occurrence of the pattern in association with the selected criterion category, and N be the frequency of 2 occurrence in association outside the criterion category. The terms n and n2 are the analogous frequencies for a 1 N1 subpattern. Then —————-is the discrimination level of Nl+N2 n the pattern, and l is the discrimination level of a nl+n2 subpattern. Previously a pattern was acceptable if N1 n1 > Under the revised requirements a pattern N1+N2 nl+n2 is acceptable if for each subpattern: nIIIn2I I n1 IIn2 I Inl II n2 I I n1 IInZI N N N +1 N -1 N +2 N -2 N +k o * 1 2+ f 1 2 + 1? 2 + ... + 1 < a ’ 7 \ nl+n2 nl+n2 n1+n2 nl+n2 .N1+N2 N1+N2 N1+N2 N1+N2 where a is a preassigned positive number less than 1. This expression is the tail of the hypergeometric distribution, * Actually the last term of this expression is either [ n1 )[n21 [“1 “2] N +k (1 n k' 1 or 1 n +n , whichever occurs first in the series. 1 2 N +N2 23 and a is the proportion of the tail covered. The de— nominator of the first term, ISI1§2I’ is the number of ways that the Nl+N2 occurrences of the pattern can be chosen from the nl+n2 occurrences of the subpattern. The expression [E1] in the numerator is the number of 1 ways that the N occurrences of the pattern which are l in the criterion can be chosen from the n occurrences l of the subpattern which are in the criterion. Similarly [S2] is the number of ways that the N2 occurrences of the 2 pattern which are outside the criterion can be chosen from the n occurrences of the subpattern outside the criterion. 2 I \ “iI r12 N1 N2 The complete term -. , is the probability of having 101+n2 N1+N2J N1 and N2 occurrences when choosing Nl+N2 occurrences which fall into two groups of nl and n2. The remaining terms are probabilities for less likely events, so that the whole expression is the probability of having Nl occurrences or more and N2 occurrences or less when choosing Nl+N2 occurrences out of nl+n2 occurrences which fall into two groups of nl and n2. In other words the first term of the expression is the probability of the observed occurrences of the pattern among the criterion categories, given the occurrences of its subpattern among the criterion cate— gories. The whole expression is the probability of 2A having a pattern occur in the category under scrutiny with a frequency as great or greater than that observed, given the occurrences of its subpattern. An approximation to the whole expression can be had by computing chi square for the following 2x2 table: Subpattern Total Minus Pattern Pattern . . Within the nl—Nl Nl nl Criterion Category Outside the Criterion Category n2-N2 N2 n2 nl+n2 Nl+N2 nl+n2 ‘N1'N2 The meaning of the above requirement in the analysis of data is that for every subpattern a test is made as to whether or not the pattern significantly improves the prediction of the criterion. The reasoning is that if it does not improve prediction at the level specified, then the subpattern itself might as well be used. The level of significance for each test is set at a. Thus the hypergeometric distribution serves as a decision function for limiting the otherwise overwhelming number of acceptable patterns. Clearly the smaller a is, 25 the fewer patterns will be accepted, and a can be set so that no patterns will emerge at all. On the other hand, for a close to 1.0, the same situation obtains as with the requirement that the pattern be just better than its Nl n subpatterns: ‘————— > Nl+N2 nl+312 The value a does not represent the significance of a pattern in relation to the criterion. The level of significance has not been determined, and is not required for successful use of this method. Presently, in the analysis of data, the setting of a is done by trial and error, small enough to preclude a flood of patterns and large enough to admit the cream of the acceptable patterns. Finding Acceptable Patterns in Data In the previous section the definition of an ac- ceptable pattern was developed. In this section the problem of finding all acceptable patterns in a set of data is discussed. It follows from the critique in the first chapter that all possible patterns must be considered. Even with a high speed electronic computer the job of gener- ating each possible pattern for acceptance or rejection is overwhelming. For example, consider a very small 26 problem of ten dichotomous items. There is a total* of 690A8 different patterns to be generated. This total increases very rapidly with the number of items. With fifteen dichotomous items where are over fourteen million patterns to be generated and checked. With twenty dichotomous items there are over five billion patterns. Clearly the need to reduce the number of patterns actually handled is imperative. Previously described methods have attempted to cope with the task by placing severe re- strictions on the patterns considered, thereby reducing effectiveness and leading to a demand for large samples. Criterion Pattern Analysis solves this problem by con— sidering all possible patterns without generating and examining them one by one. A computational scheme which allows this to be accomplished will be developed and carefully programmed to make most efficient use of the computer's capabilities. In the paragraphs that follow, the procedure for finding all acceptable patterns is described in detail and related to the problem of con- sidering all possible patterns. In discussing the solu- tion of this problem, two closely interrelated aspects * The formula for computing the total number of possible patterns for N dichotomous items is: 27 are considered: (1) the order of examining patterns; (2) the judging of each pattern as it is brought up. Order of Examining Patterns According to the critique at the end of the first chapter, acceptable predictive patterns are sought separately for each criterion category. Within each criterion category there are many alternative approaches. The one chosen here is to first find all acceptable one- item patterns, then all acceptable two-item patterns, then three, four, and so on. As will be shown below, this order of proceeding allows reduction in the number of patterns examined. Judginggthe Patterns Two judgments are made for every pattern. The first is whether the pattern is acceptable under the definition. The second judgment is whether prediction can possibly be improved by annexing another item to the pattern. According to the strict definition, a pattern is acceptable if it predicts better than any of its sub— patterns, where better is determined by the preassigned a and the hypergeometric distribution. A pattern of r items has r subpatterns of r-l items, Eiglll subpatterns r(r-l)(r-2) 6 of r—2 items, subpatterns of r—3 items, etc. Since the number of tests to be made becomes quite large 28 as r increases, testing all subpatterns would be too laborious. The problem is solved here by testing only the pattern's immediate subpatterns; a pattern of r items is accepted if it predicts better than its subpatterns of r—l items. For example, pattern 1(2) 3(1) 7(2) is accepted if it improves the prediction of all of the subpatterns of two items, i.e., 1(2) 3(1), 1(2) 7(2), and 3(1) 7(2). Occasionally patterns can be accepted which are not better predictors than some of their remote sub- patterns, and also not better than predicting from the marginals of the criterion. This is more than compen- sated for by the fact that it does include all of the patterns which enhance prediction, and does this in a reasonable fashion in terms of the amount of analysis required.* The second judgment made on each pattern, whether or not it has been previously judged acceptable, is whether prediction can possibly be improved by annexing *Testing only whether a pattern predicts better than the marginals of the criterion category was also tried at one point in the development of the Criterion Pattern Analysis method. The results were a multitude of "acceptable" patterns, most of which contained a subpattern which was a very good though not a perfect predictor. Almost any item affixed to this subpattern would have produced a pattern which was also a very good predictor. This kind of result not only contra- dicts the strict definition of an acceptable pattern, but also produced too many patterns, all very similar. 29 another item to the pattern. Only if a pattern can be improved will it be used in the formation of larger patterns. Improvement is measured by a and the hyper- geometric distribution in the following manner. As N before, ——l——-is the discrimination level of a pattern Nl+N2 w-r. for the criterion category. Annexing another item will make its greatest improvement when it results in a N discrimination level of NI. Using the hypergeometric [HIE L l is computed. If less than a then Nl+N2 ’ N1 the pattern can be improved; if greater than a, then no formula, possible improvement can be made, and the pattern is rejected from further consideration. When a pattern is so rejected, automatically a class of many patterns is rejected. The patterns in the class are those which contain the original pattern as a subpattern. This effects a vast reduction in the number of patterns actually handled. As an example, suppose we are searching for patterns associated with category one of the criterion, and suppose we find that pattern 3(1) 7(2) never occurs in criterion category one. Not only can we reject pattern 3(1) 7(2), but we can also reject at the same time all those patterns of three or more items in which both 3(1) and 7(2) occur (for example, 30 1(2) 3(1) 7(2), or 2(1) 3(1) A(2) 5(1) 7(2)). See Table A. TABLE A.——Pattern acceptance and rejection. Occurrence of a Pattern in the Criterion Category Not Few Very Exclu- at All Times Often Many Often sively Individual Pattern reject reject reject accept accept accept Large Class of Patterns Containing the Smaller Individual not not Pattern reject reject reject reject reject reject If pattern 3(1) 7(2) does occur in criterion category one, but only a very small number of times, then again it can be rejected, and along with it all those patterns in which both 3(1) and 7(2) appear. If 3(1) 7(2) occurs often in category one, it may still be rejected, but larger pat— terns containing it cannot be rejected now. Certainly if 3(1) 7(2) occurs many times and most often in category one, it is likely to meet the requirements for acceptance, and also the larger patterns not be rejected. However, if it occurs exclusively or almost exclusively in category one, then the larger patterns which include it must be rejected, for there is no way to improve prediction under the assigned 0.. 31 If a class of patterns is not rejected, it is not thereby accepted; it is just not rejected. This means that an individual pattern can still become a part of a pattern of more items which stands a chance of being accepted. In implementing this condition, the individual patterns are saved and actually used to form trial pat- terns with more items. In general, patterns with r items which have been saved are used in combination to form patterns of r+l items. For example, to form the pattern 3(1) 7(2) 8(1) the list of previously saved patterns must include 3(1) 7(2); 3(1) 8(1); and 7(2) 8(1). If any one of these patterns is missing, the new pattern should not be formed. And incidentally, even if it were formed it could not satisfy the test of improving prediction over its subpatterns, since the missing subpattern has been omitted from the saved list because it could in no way be improved by the addition of other items. The Computational Scheme Following the procedure above, the method is imple- mented as shown in Figure 1. Each criterion category is considered separately, starting with the first. Then in turn, one-item patterns, two-item patterns, three-item patterns, etc., are generated and tested as outlined above, to determine whether they exist in the data, whether to accept them as predictive of the criterion category, and whether they should be saved or not. 32 “Input ; Start with the first data ' criterion category .1; Set number of items in a pattern equal to one. (r=l) 4>.4 VI no 4/» Does r \7 yes u \ equal one? / I A . Generate a pat- Generate a one tern with r item pattern items out of the patterns \ from List 2 ; (each of r-l items) v no Does the pattern Increase r ‘ of items exist by one in the data? J yes Is the pattern acceptable veg with regard to the criterion category? / no Print pattern xI Should pattern be saved? (Items annexed es to it could make it acceptable) (y no Store pattern 49> in List 1 w Are any patterns stored in List 1? no Destroy patterns in List 2 and put patterns from List 1 into List 2. Destroy patterns in List 1. V Done with all ‘ criterion categories? y€b Go to the next criterion category FIGURE 1 A FLOW CHART FOR PERFORMING CRITERION PATTERN ANALYSIS 33 One-item patterns are generated without consulting the "saved" list: first 1(1) (item 1 scored 1), then 1(2), 1(3), 1(A), as far as the response categories ex- tend. Next 2(l), 2(2), 2(3), 2(A),...; then 3(1), 3(2), 3(3), 3(A),...; and so on for all items and response categories. Each of these one-item patterns is tested as to whether or not it exists in the data. If it does, it is tested for acceptance. If accepted, it is recorded as a predictive pattern. Next, the same pattern is checked as to whether it should be saved or not. If it is saved it is stacked in a list called LIST 1. When all one-item patterns have been generated and checked, LIST 1 contains saved one-item patterns. Two—item patterns are considered; but before doing so, the patterns in LIST 1 are transferred into another list, LIST 2, making sure that nothing is in LIST 2 be— forehand, and that nothing remains in LIST 1 after the transfer. Two—item patterns are generated out of the one-item patterns found in LIST 2. Again the various tests are made, acceptable patterns recorded, and two-item pat— terns to be saved are stacked into LIST 1. When all two-item patterns have been generated from LIST 2 and tested, the contents of LIST 1 are again transferred into LIST 2 and three-item patterns are considered. This process continues until LIST 1 contains nothing to 3A be transferred to LIST 2. At this point the next criterion category is considered, starting over again with one-item patterns. When all criterion categories have been examined for acceptable patterns, the process is completed. Help in reducing computational time is obtained by adding the stipulation that no pattern be stored in LIST 1 unless its frequency of occurrence in the criterion category is larger than some preassigned constant. The requirement prevents loading LIST 1 with patterns that account for only a small percentage of people in the criterion category. Patterns with frequencies in the criterion smaller than the constant can still show them- selves to be acceptable if their subpatterns have fre- quencies in the criteria larger than the constant. This somewhat artificial procedure is particularly useful when attacking data with a large number of observations. All the acceptable patterns will not emerge, but the ones that do will, in general, be those which occur most often. Of course, for small problems the constant can be set to one. Computer Implementation The foregoing procedure was intended and conceptual- ized for a high speed computer. In order to facilitate consideration of all possible patterns in a reasonable amount of time, the computational scheme was carefully 35 programmed to make most efficient use of the computer's capabilities. It might be informative to indicate some of the principles utilized in developing the algorithm as a computer problem. The task of finding predictive patterns is of course an enormous one. If one were to try each possible pattern in turn, the task would be virtually impossible: if a thousand patterns could possibly be generated and checked in one second, it would still take over two months of computer time to complete the job of analyzing the more than five billion patterns associated with twenty dichotomous items. While generating and checking all possible patterns has the disadvantage of taking too much time, it does have the feature of requiring very little of the computer's memory space. By appropriately changing the method so that more memory is used, the amount of time required can be decreased. This was ultimately accomplished by using LIST 1 and LIST 2 of saved patterns. Although the pro- cessing of lists takes more time per pattern than does simple generating and checking, fewer patterns are actually processed (see page 33), and the net result is a saving in computation time. Hence, the first principle applied to the problem of checking all patterns is the reciprocity of computer time and space. If a problem takes too much computer time, it may be possible to re— duce the time by using more of the computer's memory. 36 Conversely, when a problem overflows the available memory, it may still be handled by reprogramming using more compu- tation time. One way of extending the computer's memory capacity is to use "packed storage." This amounts to storing many numbers in a computer word (memory location), where only one number would be stored ordinarily. This is possible where many small integers are to be processed. It is not readily applicable with large integers or fractions. This procedure was applied to the present problem and helped reduce computation time. In addition it was possible to operate on all the numbers packed in a single computer word at once instead of separately. This was done when counting the occurrences of a given pattern throughout the observations. In counting the occurrence of a four-item pattern over two hundred people, only twenty computer words were used in the computation in- stead of A x 200 = 800 words. In this way computation time was grossly reduced. Hence, the second principle applied might be termed "aggregate Operation." When a computer operation is performed on one computer word, it becomes tantamount to several operations on several com- puter words. To expedite the principle of "aggregate Operation" it is sometimes necessary and always convenient to use machine or assembly language operation codes. This means 37 that the basic operations built into the computer are directly selected by the programmer for his program. Such basic operation codes differ from one kind of com- ‘puter to another and require a new program to be written for each computer. By using a compiler based language such as FORTRAN, ALGOL, or COBOL, it would be possible to write programs easily transferable to a different computer. However, these languages result in a program which is not as efficient with respect to computer time as one written in a basic language (although they are usually very efficient in terms of the time required to write a program). Much of the method in this thesis was programmed in a basic machine language with a saving of computation time and computer memory. Thus the third principle applied was keeping to basic operation codes, especially in those parts of the program which would be repeated many times. These three programming principles, the reciprocity of computer time and space, aggregate operation, and efficient basic operation codes, were employed in solving the problem of identifying all acceptable predictive patterns. Without these principles the method in this thesis could not have been developed into a practical tool. 38 Predicting from Patterns Once the major predictive patterns have been found they can be used for predicting the criterion category of a person for whom only the responses to the predictive instrument are known. Direct Prediction Method The method of predicting directly from patterns starts by checking the new person's responses for the patterns which were previously determined. If no patterns are found for this person, no prediction can be made. If one pattern is found, the prediction is made for the cri- terion group with which that pattern is associated. If many patterns are found and all are associated with the same criterion group, again that group is predicted. A problem arises when patterns are found, some associated with one criterion group and some with another. This problem is solved as follows: each pattern, as it is extracted by the method of Criterion Pattern Analysis, has associated with it a fraction, Nigfia, termed the level of discrimination. The denominator of the fraction is the total number of people from the original data who have that pattern; the numerator is the number who have that pattern and who are in the criterion cate- gory with which that pattern is associated. The level of discrimination multiplied by 100 gives the percentage 39 of people having that pattern who are in the criterion category. These considerations are now applied to the new person who is found to have patterns in more than one criterion category. For each criterion category, the one pattern is selected which has the highest level of discrimination. Then with one pattern per criterion category, the prediction is made to that criterion cate- gory with the highest discrimination level. For example, suppose the six patterns given in Table 5 were found for TABLE 5.--An example of patterns to be used in prediction. Observed Level of Criterion Pattern Discrimination Category 3(2) 14(2) .63 1 2(1) 3(2) 6(1) 1.00 1 5(2) 7(1) .95 2 1(2) .65 3 3(2) 8(2) ‘ .78 3 1(2) 8(2) 5(2) .80 3 a person: two patterns associated with criterion one, one pattern associated with criterion two, and three patterns associated with criterion three. On the basis of pattern 2(1) 3(2) 6(1), with a discrimination level of 1.00, the prediction is made to criterion one. A0 It is advisable in practical situations to consider the highest predictions made for each criterion category: in the former example, 1.00 for criterion one, .95 for criterion two, and .80 for criterion three (see Table 5). If action on the basis of the prediction to criterion category one is precluded, the next highest prediction, to criterion category two, might be chosen by the user of this method. In other words, this method is not an im— perative for choosing one criterion category over another in an applied situation. By providing a listing of alter- native predictions and their relative levels, Criterion Pattern Analysis uniquely provides additional information that can be relevant for practical use. The procedure of predicting the criterion category whose pattern has the highest discrimination level may again lead to no prediction. This would occur, for ex- ample, if all criterion categories have the same discrimi- nation levels associated with their best patterns. No prediction is, in one sense, a kind of information. And no prediction because patterns have the same level of discrimination is different information from no prediction because there were no patterns at all. Combination Prediction Methods While one method of predicting is using the patterns directly, as was done above, other methods can be developed. Al Viewing Criterion Pattern Analysis as an extension of item analysis suggests treating every pattern as an item, and scoring people as having or not having each pattern. Every person may then be redefined by a new set of items, each item corresponding to one of the patterns. In this way any configural information present in a pattern is accounted for through a single score. Consequently linear methods, such as multiple regression, can be applied to this new set of data. Configural acumen will thereby be combined with the mathematical strength of linear pro- cedures. CHAPTER III DATA AND RESULTS Data . Pal I In this chapter two sets of data are used. The 2 first set is from Crego* (1966), A sample of 99 college— i age women responded to 23 items of the I-E scale de- velOped by Rotter (1966) (see Appendix A), and to the A)? Hidden Figures Test (Test CF from the Educational Test- ing Service Battery). The I-E scale measures whether the subject believes the locus of control of reinforcement is in an external or an internal site. In each item the subject selects one of two alternative statements. The usual score is the total number of external responses. However, for our purpose here, the items will be considered as a set of predictor variables. The Hidden Figures Test measures field dependence- independence (Witkin et al., 1962). Thirty-two complex * The writer wishes to express appreciation to Dr. Clyde Crego for permission to use his data and for his encouragement in applying configural methods to it. A2 A3 patterns are presented and the subject is to determine which one of several simple designs is present in each complex pattern. The total number of identified em- bedded figures is the score given, and indicates the degree of field independence. On the present sample the scores ranged from 2 to 25 with a median of 11.2. The subjects were given a score of l or 2 according to whether they scored above or below the median. These scores yielded the criterion categories to be predicted by the items of the I-E scale. All subjects were divided into two samples by a table of random numbers. The first sample of 50 subjects is the analysis sample, on which all methods are initially applied. The second sample of A9 subjects is the cross validation sample. The second set of data is from the roll—call voting record of the seventeenth session of the United Nations General Assembly (United Nations, 196Aa, 196Ab).* A total of 110 nations voted on AA issues (see Appendix B). Where some nations were not yet admitted on the first few votes, or where the data were incomplete in minor ways, the procedure recommended by Wrigley (Olin, 196A) for estimating missing votes was followed. Then responses * The writer wishes to thank Dr. Charles Wrigley for his generosity in welcoming use of data he has assembled. AA to each issue were dichotomized into "yes" votes versus all others, including "no," "abstain," etc. One of the issues was selected to be the criterion item. Selection was based on two considerations: one, a vote near the end of the session, and two, a vote hav- ing a low correlation with other votes, taken one at a re. time. The vote on issue 38 fulfilled both of these re— 3 quirements, being the sixth vote from the last and having the lowest average correlation with other votes. The 110 nations were then divided into two samples a using a table of random numbers, as before. The first group of 55 nations is the analysis sample; the second group of 55 nations is the cross validation sample. Linear Methods In addition to Criterion Pattern Analysis, two linear methods were applied for comparison. They are multiple regression and a multivariate-normal maximum likelihood procedure. Multiple regression is a well-known statistical procedure (Walker and Lev, 1950, Chapter 13), which works as follows: each of N people has a score on each of r items; let X1, X2, ..., Xr be a set of scores for one person. Also, all N people have a score on another item; let Y be the score for any one person. The pur- X pose is to predict Y from X 2, ..., Xr for every 1, person. Suppose for each person we find a weighted sum A5 1 . = W of the X s. W bO+lel+b2X2+ ... +brxr' Now for each person there are two scores: Y and W. The corre- lation between Y and W can be found across all people. The method of multiple regression selects the weights b b b br so that the correlation between Y O, l, 2’ '°°3 and W is maximized. This correlation is called the P1. coefficient of multiple correlation; it estimates how I well Y can be predicted from the X's on the sample of N people. The weights b b b ..., br are called 0’ 1’ 2’ regression weights, or if the X's are first expressed in terms of standard scores, the weights are called beta weights. A related method called discriminant function analy- sis (Tatsuoka and Tiedeman, 1957) also finds a set of 0’ bl’ b2, ..., br for the X's with which to form a sum W for each person. With discriminant function weights b analysis there is no criterion variable Y; instead each person has previously been classified into one of several groups. The weights b0, bl’ b2, ..., that the mean W score of each group is most different br are selected so from one group to another. In general, there is more than one set of weights found by discriminant function analysis. In the usual case if there are K different groups of people, then there will be K—l sets of weights. When there are two groups, then there is just one set of weights and discriminant function analysis gives the same results as does multiple regression with a dichotomous criterion variable (Welch, 1939). Hence for purposes of this thesis, only the results of multiple regression will be presented. A second, entirely different, method applied to the data is a maximum likelihood procedure using the multivariate-normal density function (Cooley and Lohnes, 1962, Chapter 7). This procedure, while not yet widely applied, offers an alternative to discriminant function analysis. Again N people have responded to r items, X1, X2, ..., fied into one of several groups. For each group the Xr' Each person has previously been classi— parameters for the multivariate-normal density function are calculated. The density is then calculated for each person using his responses X1, X2, ..., Xr' The higher the density, the closer that person is to being at the center of the distribution for that group. This method is applied by computing for each person his density for each group: pGl, pGZ, pGg’ ..., ka. The person is predicted to be in that group for which the den- sity is highest. Results from Crego Data Criterion Pattern Analysis The analysis sample of the Crego data was subjected to Criterion Pattern Analysis: thirteen patterns were A7 found using a = .05. Table 6 lists these patterns: eight for criterion group one, field dependence, and five for criterion group two, field independence. The first pattern in Table 6, 8(2), means item 8 answered with the second response alternative; 2(1) 17(2) means item 2 answered with the first alternative along with item 17 answered with the second alternative. Also listed in Table 6 are the total number of people having each pat— tern, the number of peOple in the criterion group having the pattern, and the discrimination level for the pattern. These patterns were used to predict for the analysis sample. The results are shown in Table 7. The phi of .663 shows the correlation between the actual and the predicted groups. The same patterns were then used to predict in the cross validation sample. Table 8 shows these results. The phi dropped to .230 and the chi square of 2.58 has a probability of .12. Interpretation of Patterns In criterion group one, field dependence, all 26 people have at least one of the eight patterns extracted. In criterion group two, field inde- pendence, only 17 of the 2A people have at least one of the five patterns. This may mean more heterogeneity among field independent people. Looking at the patterns for group one, we find patterns 20(2) 23(2) and 17(2) 20(1) 21(2). These define two exclusive kinds of field depen- dent people (since no one person responds both ways to item 20). In criterion group two, pat— tern 20(1) 21(1) defines one set of people, and pattern 6(1) 8(1) defines another set of people. A8 AI oaooom do Loossz hooESZ Hmpoa mm. A m AHVHN Aavom oo.H m m Aavm AHVS em. a CH Aave Amvm 0A. ea om Aavm me. AH SH Amvm monopsoooocH UHmHm Spas UopmHOOmm< memoppmm oo.a A A Aavmfi Amvwa Amvefi Aavma oo.a OH OH “Noam ANVAH Amvma Amvaa em. a OH Amvam flavom ANVAH Aw. ma ma Amvm Amvm AHVH oo.a m m Amvmm Amvom oo.H m m AHVSH AHVOH HA. om mm AmVAH Aavm Am. om om Amvm mocoocoaom pamfim and: Umpdaoomm¢ mapoppmm shopped shoppmm Cowpmc on» w2H>mm one wcfl>mm IHEHLomHQ coapopfipo on» CH oHQoom mo Chopped .mpmo omopo one mo oHQEmm mfimzamcm on» so mammams< shopped coflhopflho mo mpadmomll.m mqmsH Ho. was he penOHAHemHm** .He>eH me. he» he seeeHAHemHm* *mm. mm. AH. AHVHN AHvom **me. NH. mm. HHVm AHVS **m:. mm. mH. AHv: Amvm **mm. mm. oH.I Aavw *mm. mm. NH.I Amvm mm.u OH. Hm.| AHVmH AHVmH AmvzH AHVmH **mz.| om. mm.| Amva AmVAH Amvmfi ANVHH **mm.. mm. AH.: AmVHm HHvom ANVAH **m:.u HH. Am.n Amvm Amvm AHVH *mm.a me. mH.s Amvmm Amvom *mm.. mm. mo.u AHVSH AHVOH **::.I HH. om.| ANVAH AHVN COHpoquo pcoHOHonoo on» zpfiz COHmnopwom pcofioflmmooo Choppmm och 03p mo COHmmonom Chopped mo COHumHoppoo mocmoamHQMHm .mpmo owoso on» mo mHQEMm mHmmHmsm on» CH monoppma osp on ooflaoom COHmmopmop oHQHpHSE do mpHSmomll.:H mqm<9 59 TABLE 15.—-Results of predicting to the analysis sample of the Crego data using regression coefficients from patterns. Actual Group Predicted Group Field Field Dependent Independent Field Dependent 21 2 23 Field Independent 5 22 27 26 2A 50 ¢ = .726 TABLE l6.-—Resu1ts of predicting to the Crego cross vali- dation sample using regression coefficients from patterns. Actual Group Predicted Group Field Field Dependent Independent Field Dependent 19 ll 30 Field Independent 6 l3 19 25 2A A9 ¢ = .309 x2 = A.69 p < .05 60 Interpretation Pattern 13(1) lA(2) 18(1) 19(1) has the most significant regression weight and hence makes the highest independent contribution to the prediction. Pattern 11(2) 12(2) 17(2) 21(2) has the highest correlation with the criterion. Although these two patterns are the largest, there seems to be no real relationship between number of items in the pattern and its predictive power. There are smaller patterns with similar predictive values. It may be noted that the significances of the regression coefficients here appear smaller (i.e., more significant) than the corresponding coefficients from multiple regression on the original items. Similarly, the correlation of the patterns with the criterion are higher than are those for the original items. Of course, this is not surprising since the patterns were chosen for their high association with the criterion. The interpretation of the regression coef— ficients here poses the same problem as with the regression coefficients from the original items. Fortunately since the patterns themselves are all highly correlated with the criterion, they can be used as the basis of interpretation, as was done when the patterns were used directly in the pre- diction. Resume Using this combination of pattern and linear methods led to a successful prediction of the cross validation sample (p < .05). Therefore this method seems better than the previously applied procedures. However, in predicting to the cross validation sample the difference between multiple regression on patterns and multiple regression on the original items is significant at only p < .25. 61 Results from UN Data Criterion Pattern Analysis The analysis sample of the UN data was subjected to Criterion Pattern Analysis, with the results shown in Table 17. There are eight patterns relating to "no" on vote 38 and twelve patterns to "yes" on vote 38 using an assigned a of .05. Using these patterns to predict back to the analysis sample produced the results shown in Table 18. The phi coefficient between actual and predicted vote is .631. When the patterns were used to predict to the cross vali— dation sample, the results shown in Table 19 were pro— duced. The phi dropped to .37A, and the associated chi square of 7.72 shows that this correlation is very signifi— cant (p < .006). Interpretation of Patterns The criterion, vote 38, involved offering technical assistance for national projects of population study. Inspection of the nations whose patterns characterize each of the two criterion groups (see Appendix C) reveals that this issue does not divide the nations into a communist—non—communist dichotomy. Patterns pre— dicting to vote "no" include two types of nations: in the first is the USSR and its close satelites and some Asian and African nations (q.v. nations defined by pattern 25(1)); in the second group is the USA and some Latin American, Asian, and African nations (q.v. nations defined by pattern A(l)). Those patterns predictive of a "yes" vote on the criterion issue do not divide the data into such well-marked groups. Perhaps nations in this criterion group follow policies independent of the USA or the USSR. Item 19(2), which is a vote TABLE l7.--Results of criterion pattern analysis on the analysis sample A of‘ «'18 CI: data. Total _ Number of \ K p? Jumper of nations in Discrimi- Iattern wat'on. gatigns the Criterion Ration Having the Having the Pattern Pattern Pattern: Associated with NO on Vote 38 u(1) 1 2 A 5 6 7 32 26 .8125 8 9 10 ll 12 13 1A 16 19 21 23 27 28 30 32 35 39 A AA A; Ah 47 A8 50 C2 19(1) 3 f) 22 2A 25 26 11 11 1.0000 :8 31 A5 NC A3 2r’1) 3 .L C n“ “2 3A :2 12 1.0000 as 26 33 3 Al N; 12 18 17 .9AAA t (u l EFJ 1;mH Ho. men as peneHAHeme** .Hm>eH mo. see he neAOHAHemHm* **Ae. mH. em. AHVO: Amvem ANVAH **me. so. we.. Avam AHme Amva **mm. Hoe. Hm.H Avam AmeH AWL: *xmz. um. :m.l Avam Amvm ANV: **m:. mm. ma. AHVAm Amvom *xzm. mm. :00. Amvmm ANV: **::. mo. Am. Amvom ANV: **Hm. om. mo.- Amva Amv: *am.u me. HH.- ANVH: AHVom *amm.n mA. 20.: Amvmm AHVom *mm.| mm. AH.| Ava Amvm **om.| ma. mm.| Avam *zm.l Am. Ho.l Aavmm *mm.n am. no.1 AHVmH *Am.| Ho. 2H. Adv: COHHopHHo pcmHOmeooo map csz COHmmonmm pcoHonmooo esoppmm mnp onp mo COHmmopwom shopped mo coHpmHoHpoo mocmOHAficwflm .mume 29 who mo oHoEmm mHmmHmcm on» Ca wepoupmm on» on UoHHQom coammmpwop oHQHpHSE no mpHSmomll.mm mqm