AN EMPIRICAL STUDY OF SAMPLING ERROR IN FACTOR ANALYSIS Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY ALLAN LINDSAY LANGE 1969 This is to certify that the thesis entitled An Empirical Study of Sampling Error in Factor Analysis presented by Allan L. Lange has been accepted towards fulfillment of the requirements for Ph.D. Education degree in K” F 12/: A, Date jflé C art/é! 2 0-169 ABSTRACT AN EMPIRICAL STUDY OF SAMPLING ERROR IN FACTOR.ANALYSIS By Allan Lindsay Lange The major purpose of this study was to empirically determine the statistical information necessary to make meaningful decisions about sample size and the number of factors. The study also examined how the stability of sample factor patterns might be affected by certain changes in the population factor pattern. The data was drawn from nearly an entire freshman class at- Nfichigan State university; the students' responses to 41 items, which inquired into their social, political, and economic views, were recorded on a five-point scale (strongly agree - agree - uncertain - disagree - strongly disagree). From a population of 5948 responses to a fixed set of 12 variables with a known factor pattern, 100 random samples were drawn for each of the sample sizes 25, 100, 400, 800, 1200, and 1600. Factor analyses were performed, and the means and standard errors were computed for all the eigenvalues, for the highest rotated loadings of each variable, and for the unrotated loadings in the first column of the principal axis solution. Allan Lindsay Lange The average of all the standard errors for middle- and high- level rotated loadings was found to be slightly larger than l/(N)%, while the average fOr all unrotated loadings was slightly less than for rotated loadings. Higher loadings consistently had smaller standard errors than lower loadings, and in this respect both unrotated and ro- tated factor loadings behave like correlations. .A sample size of 400 appears necessary to consistently produce sample factor patterns that resemble the population factor pattern. Although using a sample size substantially smaller than 400 is likely to yield an interpretive text which is significantly different than the one that would be written to the population factor pattern, the slightly more accurate loadings ob- tained by increasing the sample size beyond 400 are not likely to result in interpretations that would produce a different text. Number of Factors Experiments were conducted using two unifactorial factor patterns, one of three underlying dimensions and one of four. For each pattern and with N=400, several groups of SO random.samples were drawn and factor analyses performed. For each group, a different number of factors was rotated, and means and standard errors of the highest loadings were computed. .All results indicate that the average standard error of the highest loadings is at a minimum.when the correct number of factors has been rotated, and thus a way is suggested for determining the number of significant underlying dimensions in a set of variables. Changes in_the Factor Pattern Factor patterns were manipulated in two ways: (1) the number of variables was increased from.9 to 15 by adding variables to just one of the factors, thus leaving the number of underlying dimensions unchanged, and (2) the number of variables was increased from.9 to 15 by adding two additional underlying dimensions, each containing three variables. Allan Lindsay Lange Increasing the number of dimensions from 3 to 5 approximately doubled the average magnitude of standard errors for rotated loadings; no such increase was detected for the unrotated loadings. Building up the number of variables without increasing the number underlying dimensions did not produce a significant change in the size of the standard errors for either rotated or unrotated loadings, and thus it appears that factorial stability is more dependent upon the number of underlying dimensions than on the number of variables. AN EMPIRICAL STUDY OF SAMPLING ERROR IN FACTOR.ANALYSIS By Allan Lindsay Lange .A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education 1969 This THESIS for the DOCTOR OF PHILOSOPHY degree By Adlan Lindsay Lange has been approved Charles F. wrigley Thesis Chairman, Guidance Committee William.A. Mehrens Chairman, Guidance Committee Irvin J. Lehmann Reader, Guidance Committee Clessen Martin Reader, Guidance Committee Lee S. Shulman Reader, Guidance Committee ACKNOWLEDGEMENTS The author is indebted to the members of his guidance committee, Professor Charles F. Wrigley (Thesis Chairman), Professor William A. Mbhrens (Chairman), Professor Irvin J. Lehmann, Professor Clessen.Martin, and Professor Lee S. Shulman for their advice and counsel during the course of this study. .The author is especially indebted to his major professor, Charles F. Wrigley,.for his personal interest; perceptive criticisms, and generous grants of time. 'To Mr. RiChard Rogers, Mr. James Deatherage, Mr. Layton Price, and Mrs. Hella Lange go sincere thanks for the many hours spent in preparation of computer programs; Appreciation and thanks are extended to the members of the Office of Evaluation Services and in particular to Professor Willard G. warrington fer providing office space and data used in thiS‘study. The author is also indebted to the Computer Institute for Social Science Research for its generous grant of computer time. ii TABLE OF CONTENTS List of Tables . . . . ........ . . .-. . . . . . . . . . . . v List of Figures . . ......................... vii Chapter I: PROBLEM . . . .*. .I..l. . . . . . . . . . ..... . . 1 Statement of the Problem . .L. .I. . . . . . . . . . . . . . . 4 Purpose of the Study . . . .-. . . . . . . . . . . . . . . . . 5 Studies on Sampling Error . . . . ...... . . . . . .‘. .p 6 Studies Relative to the NUmber of Factors ...... . . . . 8 ChapterII: PROBLEMI: SAMPLING‘ERROR. . . . . . . . . . . . . . .12 Method . . . . . . . . . ..... . .v ..... '. . . . . . . 12 General Procedure . . . . . . . . . . .-. . . . . . . . . 12 Procedure fer Problem I . . .‘. . . . . . ........ 12, Data . . . . . . . ... . . . . .-. . . ..... . . . . 13 Determination of underlying Dimensions . . . . . . . . . 13 Sampling Procedures . . . . . . . .,. .-. . . . . . . . . 14 Data Analyses ..... v ............ . . . . . 14 ReSUItS o e o o o o o o o e I o o e o ooooooo o e o a o o 18 Sample Size and Standard Error . . . . . ..... . . . 18 Loadings as Correlations . . . . . . . . . . . . . . . . 23 Uniformity of Change Among Standard Errors . . .‘. . . . 26 Stability versus Type of Loading . . . .,. . . . . . . . 26 Chapter III: PROBLEM II: STANDARD ERROR AND THE NUMBER.OF ROTATED FACTORS . . . ... .~. .,. . . . . . . . . . . . 27 Method . . . . . . ...... . . . . ...... . . . . . . 27 Procedure for the First Approach . . . . . . . . . . . . 28 Procedure for the Second Approach . . . . .~. . . . . . . 29 Results . . . . . . . . . .,. . . . . ........ . . . . 30 iii Chapter IV: PROBLEM III: CHANGES IN THE FACTOR PATTERN . . . . . . MethOd l O O I O O O I O O O O O O I Number of underlying Dimensions . . . . . Number of variables Loading on a Factor ..... . . . Results . . . .-. . . . . . . . . Number of underlying Dimensions ............. Number of variables Loading on a Factor . . . . . . . . Chapter V: DISCUSSION . . . . . ........... . . ..... Problem I: Sampling Error . . . . . . . . . . . . . . . . . Sample Size and the Stability of Factor Patterns . . . . Prediction of the Average Standard Error . . . . . . . Factor Loadings as Correlations . Problem II: Standard Error and the Number of Rotated Factors. Problem III: Changes in the Factor Pattern .*. . . . . . . . NUmber of underlying Dimensions . . Number of Variables Loading on a Factor . . . . . . . . . Chapter VI: SUNMARY . . . . . ..... . . Sampling Error . . . . . . . . . . Nlmlber Of FaCtOI'S o e o o a he 0 o 0 Changes in the Factor Pattern Questions Fostered.by the Study Bibliography Appendices iv 32 . 32 . 32 . 33 33 33 . 36 40 . 40 40 . 43 44 44 46 48 50 . 50 51 52 S3 Table 10 11 12 13 LIST OF TABLES Page Population values and Means of Obtained Eigenvalues for 100 Factor Analyses . . . ......... 19 Mean Standard Errors of Eigenvalues for 100 Factor Analyses . . . ............... 19 Population Values and Obtained Means of Unrotated Loadings for 100 Factor Analyses . . ....... 20 Mean Standard Errors of unrotated Loadings fer 100 Factor Analyses . . .‘. . . . ...... L ..... 21 Population.values and Group Averages of Obtained Means for High and Low Rotated Loadings . . . . . . . . ... 21 Average Standard Errors for High and Low Rotated Loadings I ' O ‘ O O O I. O O . O O ........... 21 Ratios of Obtained Standard Errors to Expected Standard Errors for Unrotated Loadings . . . . . . 22 Ratios of Obtained Standard Errors to Expected Standard Errors for High and Middle Rotated Loadings ‘. . . . . . p. . 25 Means of the Standard Errors of the Highest Rotated Loadings and Various Expected values . . .......... 25 Average Coefficients of Congruence for Rotated and unrotated Loadings . . . . . . ............ .. 26 Means of Obtained Loadings fOr the Block and Individual Methods of Selection .‘: . . . . .'. .‘ ..... 30 .Mean Standard Errors of Obtained Loadings for the Block and Individual Methods of Factor selection .‘. . . . . 30 P0pu1ation values of the Highest Loadings for the Correct Number of Rotated Factors and.Means of Obtained Loadings for a varied Number of Rotated Factors . . 31 14 15 16 17 18 19 20 21 22 23 24 25 26 Average Standard.Error for All Highest Loadings ,According to the Number of Factors—Rotated .-. ..... .Mean Eigenvalues for Three, Four, and Five underlying Dimensions .-. . . . .‘. . ......... .Mean Standard Errors of Eigenvalues for Three, Four, and Five Factor Situations . . . . . . . . . . . . Means of Unrotated Loadings for Three, Four, and Five underlying DimensionS'.‘.'. .‘. ........ Standard Errors of Unrotated Loadings for Three, Four, and Five underlying Dimensions . . . . . . . . .Means of Rotated Loadings for Three, Four, and Five underlying Dimensions . . . . . . . . . . . . . Standard Errors.of Rotated Loadings for Three, Four, and Five Underlying Dimensions .'. . . . . Means of Eigenvalues for Nine through Fifteen variables Loading on.Three underlying Dimensions .‘: . . Standard Errors of the First Five Eigenvalues for Nine through Fifteen variables Loading on Three underlying Dimensions “. . . . . . .‘ ....... Means of the Highest unrotated Factor Loadings for Four variables of the First Factor . . . . Standard Errors of the Highest Unrotated Factor Loadings for Four variables of the First Factor' . . . . Mean values of Rotated Factor Loadings for Nine Core Variables . . . . . . . . . . .'. . . . . . . . . Standard Errors of Rotated Factor Loadings for Nine Core variables . . . . . . .......... vi 31 35 35 36 36 37 37 38 38 LIST OF FIGURES Figure Page 1 Standard Errors of unrotated Loadings for Six Sample Sizes . .,. . . . . ........ . . . . . 23 2 Standard Errors of Rotated Loadings fer Six Sample Sizes ..... . . ..... . ....... 24 vii CHAPTER I PROBLEM Factor analysis is a statistical technique used to identify the underlying dimensions in a domain of variables. Since sufficient statis- tical information necessary to make meaningful decisions about sample size and the number of factors is not available, this study has been designed to provide such information in a ferm.which.meets the needs of the average researcher. The need for this information.has become greater in recent years, for with the advent of modern, high-speed computers, factor analysis has become widely used in psychological research. The existence of "packaged programs" which will factor analyze any correlation.matrix fed in has made it relatively simple for the researcher to obtain.mathematica11y‘ complex analyses of data without becoming an expert programmer. Because calculations consumed so much time in the precomputer era, only the most important material warranted factor analysis, but today the computer can quickly provide a factor analysis of any correlation.matrix, even if it be of only slight potential significance. Hence, though groups of tests are still frequently analyzed, attention has shifted towards the actual items of tests and other variables that might previously have been omit- ' ted. Since the scope of factor analysis expanded with the availability of computers, it is now particularly important that the researcher be provided with guidelines regarding (1) the number of cases needed to assure stable, replicable results, and (2) a means for determining the number of underlying dimensions in a body of variables. 1 2 If research is to be of value, certainly its results must be replicable, but the philosophical rule expressed by Ockhamls Razor must also be kept in.mind since it is usually impractical for the researcher to attempt to assure replicability by gathering all possible data. Col- lecting data from.every member of a population is an arduous task and frequently impossible; even if the entire population could be surveyed, the expense can seldom be justified. Massive collection of data to assure replicability is usually unnecessary. In.most cases random sampling is an efficient substitute. For example, a questionnaire should not be given to 6,000 peOple if a factor analysis run on a small, randomly selected portion of that number would yield nearly identical results. Hence, the researcher should gather only enough data to assure a level of reliability that meets the investigation's requirements. .Although it has not been traditional to investigate the stability of factor analysis by considering the reliability of eigenvalues and loadings, such an approach is an easy matter today with the aid of the extremely fast computers. This approach requires a large number of factor analyses to be run, thus permitting the calculation of the desired standard errors. The data for each factor analysis is obtained by either a.Mbnte Carlo technique or, if data on an entire papulation is available, by the multiple random sampling of persons. Specific infermation regarding those controllaBle aspects dfvtbenexperimental'deSign.which have the most effect on the stability 'of factor patterns can be obtained by varying the sample size or by changing the factor pattern for each group of factor analyses run. For example, one could double the size of the sample and‘ note what effect this might have on the size of standard errors. It would also be possible to change the number of underlying dimensions or to keep the same number of underlying dimensions but to increase the number of 3 variables loading on those dimensions and note what effects these changes might have on the standard errors. Knowing which of the controllable aspects of the experimental design most critically affect the standard error of eigenvalues and load- ings allows the experimenter to efficiently plan his research to yield reliable factor analyses. If the reliability of a factor pattern obtained by sampling is too low, it is highly probable that this factor pattern will be different from.the total-population factor pattern and thus not give the desired information. Since it is assumed that a variable's great- est influence is exerted on the factor on which its highest loading occurs, most investigators interpret the results in terms of the highest loadings. If, for example, in an investigation the sample size is 100 and a given variable's highest loading is 0.40 with a normally distributed standard error of 0.20, and if, for the purposes of interpretation, it has been decided to disregard loadings smaller than 0.25, then the variable's high- est loading will be ignored about 23% of the time. If the investigator knows how the standard error of loadings is affected by increasing the sample size, he is in a position to decide beforehand how much data should be gathered. Knowledge about the standard error of loadings should also help solve a basic problem of factor analysis -- that of identifying the RENEE! of underlying dimensions in a body of variables. If for clarity of interpretation the factor analyst rotates the initially obtained solution, it logically fellows that he should rotate as many factors as there are significant underlying dimensions in his material. Since a loading is actually a correlation between the variable and a factor, factor theory suggests that the specific underlying dimensions act in a 4 way that draws the highest loadings of appropriate variables to a given factor according to the correlational pattern. If one attempts to rotate too many factors, it is possible that some of the highest loadings will be forced to occur on superfluous dimensions. For example, it is known that rotating as many factors as there are variables will usually produce a pattern in which each factor contains only one highest loading; uni- queness is forced if too many factors are assumed. If too few factors are rotated, the highest loadings will necessarily be forced to load in a.pattern that does not accurately represent the underlying dimensions. The factors on which the variables load when the wrong number of factors is being rotated is at least partly due to chance since the influence exerted by each underlying dimension cannot be properly exercised. Hence, it can be theorized that the standard error of individual loadings will be at a minimum when the correct number of factors is rotated. Statement 9f_the Problem The present study addresses itself to three important problems. The first and third problems are concerned with the stability of factor patterns as a function of imput quantities, quantities over which some control may usually be exercised by the experimenter. The second problem examines rotation, an aspect of the output, and tests a.method of deter- mining the number of factors that should be rotated. Problem 13- The first problem will be concerned with the effect that varying the sample size has upon the standard error of loadings and eigenvalues. This problem has been divided into several subproblems which ask (1) if there is a predictable relation between sample size and standard. error, (2) if loadings are behaving as correlations, (3) if changes among the standard errors of loadings are uniform as sample size varies, and (4) if rotated loadings are more stable than unrotated ones. 5 Problem 11, The second problem will consider the relation between the correct number of factors and the standard error of rotated loadings. More specifically, it will test the hypothesis that the standard error of rotated loadings is indeed at a minimum when the correct number of factors has been rotated. Problem III. The final problem examines the effect of the factor pattern on the magnitude of the standard error and/or size of loadings. The number of factors and the number of variables loading on a factor will be varied to see what changes occur in the size and standard error of eigenvalues, unrotated loadings, and rotated loadings. Purpose oflthg_8tudy The problems considered in this study are designed to provide much-needed information about the size of standard errors of those quan- tities normally used fer interpretation of the number and nature of under- lying dimensions in a body of variables. ‘Without such information it is impossible to ascertain replicability and.meaningfulness of results. Certainly it is invalid to assume that a.factor analysis is obtained solely to determine the loadings of variables on factors which character- ize only the unique group of individuals that participated in the study. In most situations, the investigator must be able to generalize from the sample of data to a larger universe of more or less equivalent data that could have been gathered. But how does an investigator know when his sample size is large enough to insure valid generalization? Harman (1967) suggests that for a fixed set of variables, the measure of consistency of factors from sample to sample is a classical problem in the-theory of statistical sampling. He further points out, however, that little progress has been 6 made toward the sOlution of the sampling problem in factor analysis and therefore suggests that an empirical approach would seem appropriate. Such an approach will be used in this study. It will not attempt to be definitive nor will it suggest that the specific results are generalizable to other sets of data. But it is hoped that this study will be recognized as presenting an approach of considerable heuristic value for the future solution to-shmilar quests.v Studies gn_Sampligg_Error The literature prior to 1963 contains many conjectures about the standard error of factor loadings, but researchers in the precomputer era were hindered from.backing up their contentions with empirical evidence. Since 1963, several studies--all based on.Mbnte Carlo techniques-~have focused on the standard error of factor loadings. All of these studies, which have considered rotated or unrotated loadings, raise questions which warrant further examination. Unrotated Case. Joreskog (1963), using a.Monte Carlo technique, examined the sampling errors of individual loadings on unrotated common factors. For N5: 100, 200, and 300, he compared unrotated sample common. factor'matrices to the p0pu1ation factor matrices and found the mean square deviation from.the population values somewhat less than 1/(N)%,~the approx— imate standard error of a zero correlation. This value decreased only ‘ slightly with sample size and there were no consistent differences in the relative size of the standard errors fer normal and skewed populations. When using a lO-variable three-factor matrix exhibiting unifactorial structure, differences between the unrotated sample and population factors could not be confidently matched with any population factor. For given loadings the standard errors ranged from .42 for one of the zero loadings to .15 for one of the non-zero loadings; sampling errors did not decrease 7 sharply with N. Since Joreskog attributed the large sampling errors to near equality in the size of factors--which resulted in an instability in the positions of the principal axes—-it appears that more information relating the standard error of eigenvalues and sample size is needed. Rotated Case, Joreskog (1963) rotated the above described factor loadings and feund the standard errors to be somewhat smaller than l/(N)%. The sampling errors of the non-zero loadings tended to be smaller than those for the zero loadings. This difference appeared to be proportional to the sampling error of the Pearson correlation coefficient, which in turn -is proportional to l - r2, but the proportionality was not uniform for all variables.‘ Hamburger (1965), also using a.Mbnte Carlo method, generated sample matrices while investigating the sampling fluctuations of rotated loadings. He used four different population factor matrices with varied degrees of simple structure, and each matrix contained 12 variables and had four common factors. After generating ferty sample correlation matrices for each factor pattern, 20 of N=100 and 20 of N=400, these sample correlational matrices were factored by the principal axis method using squared.multiple correlations (R2) as communalities. The standard errors of factor loadingS' were somewhat larger than the standard errors of the correlations in samples of these sizes, but for N=100 the standard errors were lg§§_than twice as large as for N=400. It was also noted that the standard errors tended to be less for patterns exhibiting good simple structure than for poor, but an interesting question does arise because reversals did occur. Browne (1965) performed a study in which he generated sample values of several population matrices, extracted factors, and rotated the results. This was done for each of several methods of extracting factors and for each method a comparison between the results and the population loadings 8 was made. Although Browne found that Lawley's Maximum Likelihood Method gave the smallest sampling errors, .081, three other principal-components procedures yielded average errors only slightly larger. For N=100, Thomson's iterative method showed .087, principal factors with R2 commun- ality estimates was .089, and the weighted principal factor method was .088. The centroid method was somewhat larger, .107. For all of these methods, sampling errors tended to decrease with the size of the loadings. All three of the above studies suggest that the sampling errors of rotated factor loadings are close to those of correlations, about 1/(N)%, and that sampling errors tend to decrease for larger loadings. It is obvious that in the case of rotated loadings some control fer the number of factors rotated is also needed. These data reflect the situations where the number of factors rotated reflect the number of underlying dimensions in the correlation matrices. Further investigation should also be made into the proportionality of the size of loadings to the quantity (1 - r2). Since all loadings did not follow this relationship, it should be determdned.whether some of the loadings are consistent for all sample sizes and, if so, the defining characteristics of such loadings. Studies Relative 52 the Number 9f Factors No problem is perhaps as puzzling and bothersome as the one of deciding the number of factors that are present in a body of variables. The problem would not be too serious if rotating too few factors meant merely overlooking a psychological dimension, or rotating too many meant only some of the factors would break down and be rendered uninterpretable. But since it appears that the estimation of loadings on one factor cannot be accomplished independently of the estimation of loadings on others, the importance of estimating the correct number of factors cannot be over emphasized. 9 Some methods--such as Rao's (1955), Lawley's (1953), and Joreskog's (1963)--do not estimate the number of factors at all but rather estimate the uniqueness of the variables on the basis of a certain number of factors. Changing the estimate on the number of factors results in a communality estimate change as well as different factor loadings. But such methods are of less importance to the average researcher, for the uniqueness of individual variables is not what is usually sought. The number of factors assumed also has influence on the rotational process. Merrifield and Cliff (1963) feund.that when using the varimax, it is important that the number of factors to be rotated be specified correctly. If the varimax method requires the correct specification of the number of factors, it is reasonable.to assume that other rotational procedures may also be affected by a failure to do so. Older literature suggests that solutions possessing simple structure will be invariant with respect to the number of factors rotated--present information contradicts this. There have been many Iroposals for deciding on the number of factors and such decision-rules, according to Levonian and Comrey (1966), generally have been based on the concept of either statistical significance or minimal rank. It would seem.pertinent to look at the criticisms as well as the possible usefulness of some of the more important ones. Cattell (1958) criticized the criterion of statistical significance by stating that.the determination of the number of real common factors should not be dependent on the number of variables or subjects the inves- tigator happened to use. This seems to be an appropriate criticism, but a statistical procedure may help to determine the number of factors. Suppose, fer instance, that the average standard error of all variables is found to be at a minimum when the correct number of factors is rotated; 10 such infermation can then be used to make a meaningful decision about the number of factors. A criticism.of the minimal rank criterion by Tryon (1961) pointed. out that the minimal rank of the population correlation matrix, and.hence the number of real common factors, can never be determined, while the nunimal rank of the sample correlation matrix is always equal to the order of the matrix. This is, of course, a true statement, but it would appear possible to ascertain that within limits a sample correlation matrix has the same rank as the population matrix.- One could, for instance, continue to double the sample size until the rank became stable. Because of the difficulties with minimal rank and statistical significance, some investigators have departed from the tradition of pinpointing the number of factors and preferred to specify only the max- imum and.minimum bounds on the number of factors. Guttman (l954).feels that the lower limit is the number of non-negative latent roots of the correlation matrix whose main diagonal contains the squared.multiple correlation of each variable with the remaining (n - 1) variables. Kaiser (1960) has also argued for use of all characteristic roots greater than unity. Horn (1965) pointed out, though, that these criteria have been shown to apply only when it is assumed that we are dealing with a p0pu- lation of persons and a sample of tests.' Tucker (1964), applying Mbnte Carlo techniques, investigated this type of psychometric question and found that the various rules concerning the number of factors present in‘ a battery were not reliable estimates of the number of'major factors present in his artificial factor matrices. Browne (1965) also inves- tigated the number of factors rules of thumb and feund.that accepting the number of characteristic roots greater than unity as the number of factors gave good results in some cases but not in others. Since this rule has 11 not proved entirely satisfactory, it is necessary to look also at the mathematical approaches. The number-of-factors question has been approached mathematically by Lawley (1953), Rao (1955), and Joreskog (1963). Each have offered a statistic to test hypotheses regarding the number of common factors in a given correlation matrix. Lawley's and Joreskog's methods have been tested using a Monte Carlo procedure and found to give the correct number of factors in a majority of cases. But since these are tests for only a specific matrix, their value for generalization to an entire population is questionable. An appropriate procedure should enable generalization to a population--using knowledge about the size of sampling error is one possible method. CHAPTER II PROBLEM I: SAMPLING ERROR The data, means of determing the number of underlying dimensions, sampling procedure, and data analysis are described more fully in this chapter than in subsequent chapters since these quantities either remain constant or experience only minor changes for problems two and three. Method General Procedure .A core of nine variables loading on three underlying dimensions in a 3-3-3 pattern was included in all factor analyses run for the three problems. The magnitude of changes in the size of loadings and of standard errors among these variables will be used to decide whether the different sources of variation are responsible for instability among factor patterns. ' Procedure for Problem I L This problem, using a fixed sample of variables, considers the change in standard error which will result by varying the sample size.' For each of the sample sizes 25, 100, 400, 800, 1200, and 1600, one hundred random samples were drawn, correlation.matrices computed, and factor analyses performed. 2A domain of twelve variables loading on three" factors in a 5h344‘pattern‘was used.‘ varimaX“rotations were obtained fer all of the principal axes solutions, and the means and standard errors of eigenvalues, unrotated loadings, and rotated loadings were determined. Fifty pairs of factor solutions were randomly selected for each sample size and an average "coefficient of congruence" computed. 17 ‘ 13 Data Since the fecus of this problem is on the amount of sampling error as a function of sample size, it is necessary to obtain, as fully as possible, the responses of an entire population, for knowing popula- tion values reveals the accuracy of a generalization made from an indi- vidual sample. To satisfy the design of the problems under consideration, it is necessary that the set of variables to which the subjects respond contain both the desired number of underlying dimensions and the desired degree of simple structure. As a requirement fer the data to be factor analyzed, it is necessary that each of the variables be responded to on the basis of some continuum such as best to worst, most to least, strongly agree to strongly disagree, etc. Since principal component analysis will be used, standard deviations of the variables should be approximately equal. Data considered to meet these requirements suffi- ciently were found in the files of the Office of Evaluation Services at Michigan State University. Nearly the entire Michigan State university freshman class of 1967 responded to 41 items inquiring into the students' social, political, and economic views; these response5"were recorded on‘a five-point scale (strongly agree-agree—uncertain-disagree-strongly disagree).' For N=5948, means, standard deviations, and correlations between items were computed. The standard deviations were mostly in the range 0.95 to 1.15 and the correlationS‘ranged'from'n.30‘to +.55. Determination 2f Underlying Dimensions Those variables belonging to the same underlying dimensions were' determined by factor analyzing a group of 41 variables and rotating the principal axis solution using the varimax criterion and the Kiel4Wrigley criterion (1960), the latter being set at 3. This means that 2, 3, . . . 14 n factors were rotated until some factor failed to have at least 3 vari- ables whose highest loadings occurred on that factor. Those groups of variables whose highest loadings were always feund on the same factor, no matter how many factors were rotated, were considered to form underlying dimensions. Since many other researchers have chosen factor patterns containing 12 variables and 3 underlying dimensions to investigate the problem of sampling error, such a factor pattern is also used in the present experiment. Comparisons to other results should then be more meaningful. Finally, the factor analysis of the population correlation matrix revealed an almost unifactorial structure: only three of the variables showed more than a minimal amount of their influence divided on two or more factors. Sampling Procedure I Placing the responses from the entire population for the 12 selected variables in the core storage of a Control Data Corporation 3600 computer allowed the computer to quickly draw random samples for the desired sample size. No subject's responses could appear more than once in a given sample. SAMPLER--a fortran routine (AppendixA) which permits specification of p0pulation size, number of samples, desired sample size, and the number of variables to be sampled fer each subject--was used to draw the random samples. Data Analyses Eagtgr_Analysis Program. The factor analysis program.(A. Williams, 1967) of the Computer Institute fer Social Science ResearCh (CISSR) at lMichigan State university performed all factor analyses. This versatile routine, which computes eigenvalues, principal axis factor loadings, and either or both of the quartimax and varimax rotations, also includes pro- visions for specifying the type of communality desired if a correlation 15 matrix is calculated from raw data and for specifying the number of factors to be rotated. It is also programmed to use the Kie14Wrigley criterion (1960) and thus specify the minimum number of variables that should have their highest loadings on any of the factors. Once the min- imum.number of variables has been specified, rotation--all rotated solutions are printed out--will then continue until fewer than the specified number of highest loadings occur on a factor. In this study, unities were inserted as communalities for all factor analyses since these are commonly used by many investigators and thus should provide a less controversial entry for the communalities. Unities also were con- sidered to be most appropriate, because they represent the simplest situation and this should be examined first. Methods 2f Rotation. Since the purpose of the problem.under con- sideration is not to make comparisons among the various rotational pro- cedures, and since little difference was noted among such procedures by other investigators, only one rotational method was used, but a check" was still made to see if the quartimax solution would be similar to the varimax. Differences between corresponding loadings for the two methods were not detected until the thousandth's place, but it is possible that the results obtained might not be so nearly equal if a less unifactorial structure‘wereTused.. Rotation is usually carried out to reduce the complexity of the factorial description of the variables;' Since the‘quartimaX‘provideS" a rotation that tends to increase the larger factor loadings and decrease the smaller oneS‘for‘each'lariable of the original factor matrix, it is concentrating on the rows of the factor matrix. 'According‘to Harman (1960), the object of the'quartimax method is tO‘determine an orthogonal" transformation, _'1_‘_, which will carry the original factor matrix, 1:, into 16 a new factor matrix, B, for which the variance of the squared factor loadings is a maximum. The formula which will yield this maximum is 4 Q=§l £1). . j=1 p=1 JP where b_represents the rotated factor loading, 2 represents the number of factors 1, 2, . . . ,>m, and.j_represents the number of variables 1, 2, . . .-, n. In contrast to the quartimax, the varimax, which attempts to approximate simple structure more closely, concentrates on simplifying the columns or factors of the factor matrix. To achieve a "normal" var- imax criterion, the loadings in each row of the factor matrix are divided by the square root of the communality fer each row; respectively- oThe computing procedure fer a varimax solution is quite similar to that employed for a quartimax, except the varimax requires that 1 n1 n I Hi n v=nZ Z (b._/h.)4-Zch? HI?)2 p=l j=l J J p=1 j=13p J be maximized instead of Q. Here 6, p, and.j_are the same as was men- tioned in the proceeding paragraph and.h_represents the communality. "fggtggfSelection'Prgggam."The‘programrused‘in‘this‘study was COUMLDGS'(Appendix‘B).‘ The assumption behind this program is that variables which lave‘their highest loadings in‘the'same column of the rotated-factor matrix belong to the same underlying dimension.‘ If a group of variables-is known to form an underlying dimension; the column in which‘thi5°dimension is located can be determined by computing the linear sums of the loadings representing those‘variables in each of the " various'columns*and'selecting'the‘largest;"OOLMLDGS'aISO'providessa punched output of the selected loadings, the eigenvalues, and the first l7 row of the principal axis solution. This method was deemed sufficient fer the identification of factors since, for the most part, the sample sizes used were so large that the population correlation matrix was closely approximated and also because of the high degree of simple structure feund in the rotated solution. The sample sizes 25 and 100 often failed to produce a factor pattern similar to the population factor pattern, and thus the value of the results for those two sample sizes is questionable. Factor Comparison Program. A.factor comparison program, COMPARE, was written to individually compare either rotated or unrotated factors fer any two separate factor solutions. This method, called the "Coef- ficient of Congruence” by Tucker (1951) and the ”Coefficient of Similar- ity" by Barlow and Burt (1954), outwardly resembles the Pearson product- moment correlation coefficient, but it does not produce a true correla- tion since the factor loadings used in the formula are not deviates from their respective means and the summations are over the number of vari- ables rather than the number of individuals (Harman, 1960, p. 285). Recommended by Wrigley and Neuhaus (1955) and Pinneau and Newhouse (1964), the fermula for this method, which shall be referred to as the Coeffi- cient of Congruence (CC), is m a . b . k=1 1‘1 k3 ' 2 2 o b.% (If: and n) where _a_ and 2 refer to the factor loadings, _i_ and j_ refer to the two CCij = factors to be compared, and k refers to the variables (1, 2, . . . , HQ in each factor. 18 Standard Error Formulas. The standard errors of obtained loadings, both unrotated and rotated, and eigenvalues were computed by the formula _a'= It: x2)/ (N-mI where §Dis in deviation form and N is the number of factor analyses. The fermula for computing the standard error of the correlation coeffi- cient is 2 (l-r) CTr = (N-1)% where r_is the correlation coefficient. This fermula is considered to be an approximation to the corresponding correlation coefficient sigma in the population from which the sample of N_has been randomly drawn. I Results 1 The results will be reported relative to the subproblems which ask (1) if there is a predictable relation between sample size and standard error, (2) if loadings are behaving as correlations, or if changes among the standard errors of loadings are uniform as sample size varies, and (3) if rotated loadings are more stable than unrotated ones. Sample Size and Standard Error Since the basic question being considered here is whether a pre- dictable relation exists between sample size and standard error, this question will be treated separately for eigenvalues, rotated loadings, and unrotated loadings-" Eigenvalues. .A comparison between the magnitude of the obtained eigenvalues for each sample size and the population values is given in Table I. Obviously the two smallest sample sizes, 25 and 100, do not yield values close to the population values: large eigenvalues tended to be much larger and small eigenvalues were considerably smaller. As sam- ple size increases, the values quickly approach those of the population; 19 it may also be noted that for the sample sizes 25 and 100, more than the first three_eigenvalues were greater than unity, although only three underlying dimensions are contained in the body of variables. Table 1. Population values and Means of Obtained Eigenvalues for 100 Factor Analyses Sample Size 25 1_0_0_ 400 - 800 1200 1600 5948 (1) 3.24 2.78 2.66 2.66 2.66 2.67 2.65 (2) '2.09 1.75 1.63 1.61 1.60 1.60 1.61 (3) 1.61 1.40 1.34 1.33 1.32 1.32 1.32, Rank (4) 1.26 1.11 0.99 0.95 0.95 0.93 0.91 of: (5) 1.01 0.95 0.90 0.88 0.87~ '0.87 0.86 Eigenvalues (6) 0.79 0.83 0.83 0.82 0.82 0.82 0.82 (7) 0.62 0.74 0.76 0.77 0.77 0.76 0.76 (8) 0.47 0.65 0.71 0.72 0.73 0.73 0.75 (9) 0.36 0.57 0.65 0.67 0.68 0.68 0.68 (10) 0.26 0.49 0.69, 0.62 0.63, 0.64 0.66 (11) 0.18 0.41 0.51 0.54 0.54 0.54 0.55 (12) 0.10 0.32 0.41 0.43 0.43 0.43, 0.44 Table 2 shows the standard errors for the entries in Table 1. Table 2. .Mean Standard Errors of Eigenvalues - - for 100 Factor Analyses Sample Size 25 - 100 492 fig ’ 1200 1600 (1) .470 .303 .164 .111 .086 .062 (2) .251 .162 .095 .073 .049 .041 (3) .170 .114 .071 .059 .044 .036 (4) .125 .094 .048 .033 .031 .028 (5) .126 .063 .032 .030 .025 .025 Rank (6) .115 -.060 .037 .029 .027 .026 of _ g . , (7) .088 .054 .034 .024 .017 .018 Eigenvalues (8) .089 .046 .024 .024 .019 .018 (9) .055 .038 .034 .024 .019 .016 (10) .052 .045 .031 .023 .021 .015 (11) .047 .041 .035 .029 .021 .015 (12) .037 .044 .041‘ .026 .017 .015 -20-' Quadrupling the sample size did not fully halve standard errors in most cases. For all sample sizes except 1600, reversals did occur: in some cases smaller eigenvalues had larger standard errors than some of the larger eigehvalues. unrotated Loadings. Table 3 gives the group averages of the three highest, three middle, and three lowest unrotated loadings found in the first column of the principal axis solution. Each figure used to compute a group average is in itself an average of a particular loading on 100 factor analysis. Only the sample size 25 failed to give loadings that closely approximated the population values.' The average standard errors of these loadings are given in Table 4. Increasing the sample size results in a rapid decrease in standard error; it should be noted that as the average loading size decreases, the standard error increases. For the group of low loadings, which approximate zero loadings, quadrupling the sample size appears to halve the standard error, but this was not the case fer the medium.and.high.loadings, although such a rule could still be used to make a rough approximation of the standard errors for these groups. Table 3. Population Values and Obtained Means of Unrotated Loadings for 100 Factor Analyses. Sample Size _2_§_ E0 4_0_0 Q0 1200 1600 5948 Size High .530 .611 .613 .619 .621 .617 .619 of Muddle .337 .419 .418 .420 .423 .427 .427 Loadings Low .069 .081 .072 .072 .075 .076 .072 -21r~ Table 4. Mean Standard Errors of Unrotated Loadings for 100 Factor Analyses. Sample Size ._2_5_ 100 400. “800 1200 1600 Size . High .261 .091 .047 .031 .022 .019 of Middle .347 .128 .073 .048 .033 .026 Loadings ._. Low .363 .196 .104 .077 .059 .049 Rotated Loadings. Since there was not a large difference in the magnitude of the highest and lowest rotated loadings for which means and standard errors were calculated, two comparison groups were formed by grouping the four highest loadings and then the feur lowest loadings together. The average of these groups of loadings are shown in Table 5. Table 5. Population values and Group Averages of Obtained IMeans for High and Low Rotated Loadings Sample Size. _2_§_ l_0_(_)_ 4_09_ ggg 1200 1600 5948 Size High .651 .714 .756 .758 .761 .761 .764 of Middle .496 .521 . 527 . 537 .538 .536 .537 Loadings ‘ I Table 6. Average Standard Errors for High . and Low Rotated Loadings Sample Size 25 200 _4_0_0_ 890 1200 1600 Size High .235 .107 .035 .024 ’.019 .016 of Low ’.207 .152 .078 .049 .045 .037 Loadings The average standard errors for the entries in Table 5 are given ' in Table 6. Increasing the sample size results in a rapid decrease in standard error: for the higher loadings, quadrupling the sample size more than halved the standard error, but fer the lower loadings the standard ~22- error was not fully halved by quadrupling the sample size except when going from 400 to 1600. Loadings'a§_Correlations If factor loadings are behaving as simple correlations, the standard errors of loadings should conform to sigma = (1 - r2)/(N - 1)%, which is the expected standard error of a correlation coefficient. But it would not be unreasonable to consider loadings as "behaving” as correlation coefficients if (1) fer the same sample size, loadings of substantially different magnitudes have substantially different standard errors with larger loadings having smaller standard errors, and (2) the ratios of the obtained standard errors to the expected standard errors, for a given magnitude of loading, are approximately equal. Unrotated Loadingg, USing the entries of Table 6 and the formula in the preceding paragraph, Table 7 presents the computed ratios of the obtained standard errors to the expected standard errors. The average of the ratios obtained for the lowest group is 2.01; ratios fer the individual sample sizes are all quite close to this figure. For the middle and high groups, the ratios decline except for sample size 100. Examining the columns for sample sizes 100 through 1600 reveals that a perfect rank-ordering exists between the ratios and the three sizes of loadings. Table 7. Ratios of Obtained Standard Errors to Expected Standard Errors for Unrotated Loadings Sample Size 2_5_ _1_0_0_ 40_0_ 200 1200 1600 Size High 2.12 1.47 1.52 1.41 1.22 1.24 of Middle 2.12 1.56 1.78 1.66 1.37 1.30' Loadings Low 1.82 1.98' 2.08 2.20 2.03 1.96 .23- Figure 1 portrays, for each of the three loading sizes, the relationship between obtained standard errors and sample size. A perfect rank ordering exists between the size of loadings and the obtained standard error for each of the six sample sizes. It thus appears that the same general type of influence which a correlation's magnitude exerts upon the standard error of a correlation coefficient is also exerted by unrotated factor loadings upon their standard errors. 0.36 \ \ 0.30 0.24 0.18 0.12 0.06 25 100 400 800 1600 Samp1e>Size Figure 1. Standard Errors of Unrotated Loadings for Six Sample Sizes. -24; Rotated Loadiggs. Figure 2 portrays the relationship between sample size and the obtained standard errors for the two loading sizes. The rotated loadings exhibit a perfect rank-ordering of group sizes for the five largest sample sizes. 0.20 0.15 “e 3e 0.10 ' 0.05 I ‘\«\\\ 0.00 25 100 400 800 1600 Sample Size Figure 2. Standard Errors of Rotated Loadings - for Six Sample Sizes. -25- Table 8 gives the ratios ef'those standard errors that were actually obtained to.the standard errors*that*would‘be expected if the loadings were behaving exactly as correlations;"*With‘the'exception‘of the first two sample sizes, the higher.loadings yielded ratios that were quite close. The middle group of loadings also centered near"one value; 2:08,'with the exception of the sample size 25 which was considerably lower. Table 8. Ratios of Obtained Standard‘Errors to Expected Standard Errors for High and Middle Rotated Loadings “Sample‘Size _2__5_ 1193 go_0 _8_0_g 1200 1600 Size High 1.76 2.14 1.63 1.60 1.53 1.52 of Loadings Middle 1.54 2.08 2.16 1.94 2.19 2.07 Table 9 gives the means of the standard errors obtained for all the highest rotated loadings, the value of the standard error*which would be expected in the case of.1/(N)%, and the value of the standard error if the rotated loadings were actually’behaving as correlations: ‘ThiS‘table has been included because some investigators suggest‘using'l/(N)I to predict the average standard error of rotated factor loadings. "Comparing the obtained values with the expected .values, those in the bottom line of Table 9, it becomes clear that the formula sigma- = Ll—l—334—-grossly underestimates the obtained values. (N - Table 9. .Means of the Standard Errors of the Highest Rotated Loadings and Various Expected Values. "Sample'Size' .25 “lgg .399 . lggg_ 1200 1600 Obtained values .225 .135 .059 .038 .033 .027 1/(N)* .200 .100 .050 .035 .029 .025 (1 - rz)/(N - l)* .116 .058 .029 .020 .017 .014 -26- Uniformity gfnghggggflémgpg_8tandard 25:91§_ As mentioned in the previon5'section; increasing the‘sample‘size ‘ decreases, without exception, the'standard‘errors‘for‘all'leveIS‘of' loadings. .Although.the ratios-oftobtained'standard*errors'tO‘expected standard errors, even at.a given loading size, were'not“identical'for' all sample sizes, the standard error5“of the sample size5‘400, 800; 1200, and 1600 do decrease.at;a.rateidirectly proportional‘t0‘1/(N)£‘for'each of the three levels of loadings and for both rotated"and'unrotated” loadings. These data have shown that the change”in‘standard error 15‘ uniform for the larger sample sizes; but erratic for smaller sample sizes. Stability versus Type 2£_Loading For equal magnitudes oleoadings, the standard errorS'of rotated and unrotated loadings do netfiappear‘tO‘be different. Another7way'to investigate the stability of loading types is t0'look at"the resultS‘of running a congruence test:for.randomfpair5'of factor'analyses. The' Coefficient of‘Congruence (CC):was7dbtained for SO‘pairs for each sample size and for both rotated and unrotated loadings. 'Though’for'the"Smallest‘ two sample sizes the.average.€C of the rotated loadingS'waS'higher;' virtually no differences existed for‘the other sample sizes.' I Table 10. Average Coefficients of Congruence for Rotated and Unrotated'LOadings. maze 2i 19.9. 5.99. 999 1200. 160 Rotated .884 .947 .990 .996 .997 .996 unrotated .538 .916 .993 .998 .998 ‘.999 (HAPTER III PROBLEM II: STANDARD ERROR AND THE NUMBER OF ROTATED FACTORS In examining the relation between the standard error and the number of factors rotated, this problem uses SAMPLER, several COLMLDGS routines modified slightly to meet the present problem's requirements, and the same factor analysis program used in the previous problem. Method Since the hypothesis under examination is that the standard errors of the highest rotated loadings of each variable will be at a minimum when the correct number of factors is rotated, it was necessary to first determine a body of variables for which the number of underlying dimen- sions is known: the same variables used in Problem I were considered appropriate. Two distinct approaches seem available for testing this hypothesis. The first makes use of knowledge of the true factor pattern, and the second provides a means of determining the correct number of factors to rotate when nothing is known about the body of variables. Knowing the true factor pattern makes it possible to vary the number of factors rotated fer groups of factor analyses, to use the COLMLDGS routine of the previous chapter to identify those sample factors most closely resembling the pOpulation factors, and to then compute the standard errors of the individual variables' highest loadings. ‘If the average' standard error is at a minimum when 'the correct number of factors have been rotated, the hypothesis must be considered t0‘have7been*supported. Unlike the above approach, the second method considers each vari- able individually and seeks its highest rotated loading wherever it may ' <27- -28- occur. Thus, if the average of the standard errors of all the variables' highest loadings is at a minimum when the correct number of factors have been rotated, this approach can be used to identify the number of dimen- sions contained in a body of variables. Since the results from Chapter I indicate that standard errors decrease as the magnitude of loadings in- crease, any contrasts in standard errors brought about by rotating a different number of factors will be more significant because everything possible is being done to assure that only the largest loadings will be selected. 'If Standard errors are indeed at a minimum.when the correct number of factors have been rotated, it appears that an objective, statistical procedure is available for identifying the number of under- lying dimensions. But this identification procedure should be used with caution, fer such a method may be dependent upon the degree of simple structure of the population factor pattern, and thus any results obtained here may only be applicable to those situations where a high degree of simple structure is present in-the rotated factor solution. Procedure for the First Approach For the set of twelve variables containing the three underlying dimensions mentioned in‘Chapter II, the following stepswere completedr‘ (1)250 random samples of 400 subjects each were drawn, (2) correlation matrices were computed and factor analyses performed, (3) 2 number of factors were rotated, (4) factors most like the population factors were" chosen using COUMLDGS, and.(5) means and standard errors were computed‘ fer each variable.‘ These five steps were repeated three times, each“ time with 2_at a different value ranging from two to*f0ur. “This method' can be called the "block" method since it requires that the variableS' which approximate a population factor be located in the same column. _29- Procedure fer the Second Approach. I II Two Sets oftwelvevariables were chosen, one containing three underlying dimensions (the same one used for the first approach), and one containing four. For the set of variables containing three underlying dimensions, the six steps of the first approach were completed with k ranging from 2 to 4 but with one important change: the COLMLDGS routine used to select the rotated factor loadings was modified to choose the higheSt loading for each variable regardless of the column in which it might appear. When compared to the first approach, this modification 44 A. rru- m. a *‘h—A - . of OOLMDGS should result in higher mean loadings for each of the variableS" ; and thus, if anything, tend to make the standard errors more nearly I equal. *—~*' For the set of variables containing four underlying dimensions, the six steps of the first approach were repeated five times with k_ranging from Z'to 6. Again, the OOLMLDGS routine used was the type that selected the highest loadings for each variable in each of the rotated solutions, and the means and standard errors were computed using these values. -30-.. Results First Approach. Table 11. Means of Obtained Loadings for the Block and Individual Methods of Selection. Number p£_Factors Rotated 3251 'Three Four Block Indivi- Block Indivi- Block Indivi- dual dual dual (1) .54 .56 .54 .53 .42 .58 (2) .55 .58 .59 159 -49 .64 g (3) .65 .66 .77 .76 .65 .81 I vari- (4) .70 .70 .79 .79 .67 .80 I able (5) .50 .50 .44 .44 .35 .59 3 (6) .56 .57 .63 .62 .63 .62 j (7) .48- .46 .72 .71 .72 5.73 I (8) .49- .58 .75 .76 .74 .75~ I g (9) .57~ .59 .62 .62 .53 .59 "*’ (10) .53 .49 .73 .72 .72 .73 (11) .42 .49 .62 .61 .47 .65 (12) .47 .46 .58 .57 .62 .67 % When the block and individual methods are used on the correct number of factors--three-+there is little difference among the means of the selected loadings (Table 11). When these two factor selection tech- niques are used on the'rotated factor solutions which do not match the number of underlying dimensions, the individual method generally has the higher means. The mean standard errors of all loadings for a given sample size and.methodyare‘presented'in'Table 12. As predicted, the standard“error is lowest when the7correct‘number'of‘factorS'have"been rotated: in'the case of two rotated factors the standard error was nearly twice as large and in the case of‘four rotated factors approximately“three'times'as large. Table 12. Mean Standard Errors of Obtained Loadings.fbr the Block and Individual Methods of Factor Selection I Number p£_Factors Rotated 6 233 Three Pep Block Method .092 .059' .156 Individual Method .091 .055 .073 53l- Sgpppg Approach. The mean values of obtained rotated loadings have been discussed in the previous section and are given in Table 11. It should also be noted that the values obtained when the correct number of factors have been rotated are not different from the population values, and it is only in this event that the obtained values do approximate the pOpulation values. Table 14 contains the population values and the obtained means of the twelve variables containing four, rather than three, underlying dimen- sions. The population values have been inserted next to the column containing feur rotated factor solutions since it is this solution that most closely approximates the population values. Table 13. Population values of the Highest Loadings for the Correct Number of Rotated Factors and Means of Obtained Loadings fer a Varied Number of Rotated Factors. *‘ Number pf_Factors Rotated 239' Three Four ngulation Five Six (1) .55 .55 .56 .53 .54 .59 (2) .69 .73 .81 .84 .84 .84 variable (3) .72 .73 .81 .83 .83 .83 (4) .37 .44 .56 .56 .70 .81 (5) .31 .39 .59 .59 .75 .83 (6) .37~ .44 .64 .71 .70 .69 (7) .60 .61 .62 .63 .64 .67 (8) .52 .70 .71 .73 .70 .73 (9) .53 .75 .76 .77 .75 .73 (10) .51 .69 .72 .73 .73 .73 (11) .43 .58- .64 .65~ .66 .75 (17-)~ :19. aéé. iéé. 199. 199 .Lfli (hkxul .51 .61 .68 .69 .71 .75 Table 14 gives the average standard error for all of the highest rotated loadings at each of the specified number of rotations. The correct number ofunderlying dimensions was four, and it was at this number of rotated factors that the standard error was minimum ‘ Table 14. Average Standard Error for All Highest Loadings . According to the Number of Factors Rotated. NUmber pf_Factors Rotated Standard Two Three Four Five Six CHAPTER IV PROBLEM III: CHANGES IN THE FACTOR-PATTERN Although there are many possible ways to alter the factor pattern, the two variations investigated in this chapter are (1) those brought about by increasing the number of underlying dimensions, and (2) those brought about by increasing the number of variables loading on a factor while keeping the same number of underlying dimensions. *Certainly another change could be extremely influential--varying the degree to which the structure is unifactorial--but an indepth discussion of this variation is beyond the scope of the present study. 142924 Number g Underlying Dimensions. Although an increased number of underlying dimensions might not give reason to expect much change in the standard error of loadings in the principal axis solution, the presence of these dimensions might cause considerable wobble in the placement of rotated axes and thus increase the standard error of rotated loadings. For this reason nine variables containing three underlying dimensions were designated as a core of variables to be used to determine the effects of adding more dimensions. Each added dimension contained three variables and one hundred factor analyses were run fer each of the patterns: 3e3~3, 3-3-3-3, and 3-3-3-3-3. The samples were randomly drawn by SAMPLER.with _ N_set at 500; factors were selected in each case by a COLMLDGS‘routine* which Chose those factors most like the population factors. Means and ‘ standard errors were calculated for the highest rotated loadings, for the ”-32- -33- unrotated loadings in the first column of the principal axes solution, and for the eigenvalues. Number p£_variables Loading pp g_Factor. The factor pattern of the previously mentioned core variables, which loaded in a 3-3-3 pattern,“"" was altered by adding more variables. These additional variables loaded on just one of the underlying dimensions and yielded patterns of 4-3-3, 5-3-3, continuing to a final pattern of 9-3-3. These patterns permit one to observe how the altered factor and.the untouched underlying dimen- I sions are affected by doubling and tripling the number of variables load- I ing on that factor. With N = 500, SAMPLER drew one hundred random samples for each of the above factor_patterns; factor analyses and rotations were obtained for each of the random samples. The highest rotated‘factor n .~ loadings were obtained by appropriate COLMLDGS routines, and theirfmeans and standard errors computed. Means and standard errors were also computed for all of the eigenvalues and the unrotated loadings in the first column of the principal axis solution. Results " NUmber pg Uhderlyipg_Dimensions. Table 15 gives the averages of each of the first six eigenvalues fer three, four, and five underlying dimension situations. It is noticed that the values obtained do not contradict the general rule of thumb which suggests that there are as many significant underlying dimensions as there are eigenvalues greater than unity. But it should be noted that the sixth eigenvalue for the five ‘factor situation is quite close to unity and the drop between the fourth and fifth eigenvalues is much more than between the fifth and sixth.‘ Table 16 gives the standard errors for the entries in Table 15. There does not appear to be any appreciable increase in standard error as a result of the presence of'more underlying dimensions. -34- Table 15. Nban Eigenvalues-for~Three,.Four and Five underlying Dimensions NUmber‘9i_Factors Three Four Five (1) 1.984 2.07 2.18 (2) 1.57 1.67 1.73 (3) 1.27 1.36 1.43 Magnitude (4) 0.90 1.10 1.20 of (5) 0.81 0.97 1.07 . Eigenvalues (6) 0.75 0.89 0.99 Table 16. Mean Standard Errors of Eigenvalues for Three, Four, and Five Factor Situations. NUmber pf Factors Three Four Five (1) .082 .087 .097 (2) .072 .086 .086 (3) .067 .066 .070 (4) .040 .059 .059 (5) .032 .034 .031 (6) .031 .034 .031 Unrotated Loadipg§, The means of the first three loadings in the first column of the principal axis solution are presented in Table 17. No appreciable change has been feund in the magnitude of the loadings as a result of having added dimensions. Table 17. Means of Unrotated Loadings fer Three, . Four, and Five underlying Dimensions. Number pbeactors Three Four Five~ Position (1) .552 .561 .528 of (2) .645 .674 .630 Loadings (3) .712 .730 .693 Table 18 gives the standard errors which correspond to the‘loadings. in Table 17. No meaningful pattern appears to emerge from these data. -35- Table 18. Standard Errors of Unrotatedeoadings for Three, Four, and Five underlying Dimensions NumberggfLUnderlying,Dimensions Three Four Five Positions (1) .048 .050 .063 of (2) .066 .068 .054 Loadings (3) .049 .050 .043 Rotated Loadings. Table 19 contains the means of the highest rotated loadings for each of the core variables. Although a rather small decrease in the magnitude of loadings seems to be the rule as the result of increasing the number of underlying dimensions, in only one case dOBS' a really sharp dr0p occur: variable p_of Factor III. An inspection'of' the actual factor analyses revealed that this variable occasionally pulled away from its factor to load with a higher loading on one of the other four underlying dimensions. Table 19. Means of Rotated Loadings for Three, Four, and Five underlying Dimensions. Number o_f_‘ Underlyillg Dimensions Three Four Five Core Factor I (a) .579 .535 .530 (b) .830 .815 .801 (c) .832 .816 .797 Core Factor 11 (a) .628 .624 .611 (b) .732 .719 .715 (c) .719 .723 .715 Core Factor III (a) .774 .760 .723 (b) .631 .633 .544 (c) .664 .642 .629 Table 20 gives the standard errors corresponding to the values of Table 19. rLoading p_of Core Factor III has a rather large standard error, something that would be expected considering the rather sharp drop which occurred in the mean value of this loading after the fifth underlying -364 dimension was added., The presence of added.underlying dimensions ShOWS" an increase in the corresponding standard errors except fer the tw0' highest loadings when five underlying dimensions were present. Table 20. Standard Errors of Rotated Loadings for Three, Four, and Five Underlying Dimensions.' Number pf Underlying Dimensions Three Four Five Core Factor I (a) .060 .096 .107 ‘ (b) .020 .039 .035 (c) .020 .040 .036 Core Factor 11 (a) .053 .056 .066 (b) .030 .037 .071 (c) .031 .035 .039 Core Factor III (a) .029 .032 .069 (b) .063 .076 .157 (c) .042 .069 .093 W pf Variables Loading p_r_l_ _a Em Eigenvalues. The mean value of the eigenvalues form Table 21. As the seventh variable was added to the first core factor to'make‘a' total of 13 variables loading on 3 dimensions, the fourth eigenvalue? - exceeded unity and remained above that level. But a sharp drOp is noted between the third and fourth-~sharper than between the fourth and fifth. This difference continues for the 8-3-3 and 9-3-3 patterns. Table 21. Means of Eigenvalues for Nine through Fifteen Variables Loading on Three Underlying Dimensions. ' Number pf. Variables . (9) (10) (11) (12) (13) (14) (15) Rank (1) 1.98 2.20 2.39 2.49 2.671 2.75 2.90 of (2) 1.57 1.59 1.59 1.68 1.68 1.67 1.72 Eigen- (3) 1.27 1.28 1.30 1.31 1.32 1.34 1.34 value (4) .90 .94 .96 .98 1.00 1.04 1.05 (5) 4.81 .85 .88 .90 .92 .96 .97 -37- The standard errors for the entries in Table 21 are given in Table 22. The size of the first eigenvalue increases regularly as vari- ables are added, and a corresponding increase in the standard error is noted. The magnitude of the standard errors has not increased for the fourth and fifth eigenvalues. Table 22. Standard Errors of the First Five Eigenvalues fer Nine Through Fifteen variables Loading on Three Underlying Dimensions. Number'QEJVariables (9) (10) (11) (12) (13) (14) (15) (1) .082 .113 .112 .113 .150 .164 .166 (2) .072 .066 .081 .072 .081 .081 .090 (3) .067 .069 .066 .068 .066 .077 .064 (4) .040 .043 .044 .040 .044 .046 .043 (5) .032 .033 .035 .032 .030 .034 .033 unrotated Loadingg: The means of the firsthour loadings from the first column of the principal axis solution are given in Table 23. No appreciable change is noted among these unrotated loadings aS'a" result of adding variables which belong to the same underlying dimension. Table 23. Means of the Highest Unrotated Factor Loadings for Four variables of the First Factor. Number 9f variables 9 .19 ll. 12. "l§.l ii. .12 (1) -- .59 .59 .59 .57 .56- .55 (2) .55 .58 .57 .57 .58 .57 .56 (3) .65 .64 .65 .67 .65 .64 .65 (4) .71 .71 .72 .72 .70 .69 .68 The standard errors corresponding to the entries in Table 23 are given in Table 24. The standard errors of the first two unrotated loadings are not affected.by the increased number of variables, but the standard errors of.the third and feurth variables definitely decrease as more variables are added. It should be noted that although the unrotated -38- loadings for these variables are not affected by the increased number of variables, the rotated loadings do show a decrease in magnitude (Tables 23 and 25). Table 24. Standard Errors of the Highest Unrotated Factor Loadings for Four variables fer the First Factor. Number 92 variables 9 10. 11. 12. .13. 14. 1.5. (1) --- .043 .038 .043 .040 .043 .045 (2) .048 .053 .046 .048 .045 .045 .048 (3) .066 .049 .039 .039 .031 .031 .032 (4) .049 .039 .034 .030 .030 .029 .033 Rotated Loadings. Table 25 contains the mean values of the rotated loadings for each of the core variables. For the most part, increasing the number of variables seems to have only a minor effect upon the magnitude of loadings, but there are two notable exceptions: variables 2 and.p of Factor I. For these variables a rather uniform drop is noted as each additional-variable is placed in the factor pattern. This is in contrast to what occurred in the'other"method of altering the factor pattern, the method in which the number of variables was built up to 15 by adding more underlying dimensions. Table 25. Mean values of Rotated Factor Loadings for Nine Core variables. Number 9f Factors Rotated '9 .11 .11 12. 14. 1.4. 1.5. Factor I (a) .579 .585 .577 .559 .569 .545 .558 (b) .830 .790 .756 .750 .737 .721 .719 (C) .832 .811 .783 .764 .752 .732 .728 Factor 11 (a) .628 .631 .628 .629 .620 .624 .604 (b) .732 .733 .725 .720 .711 .707 .704 (c) .719 .726 .724 .701 .710 .707 .689 Factor III (a) .774 .757 .736 .760 .748 .747 .750 (b)I .631 .620 .634 .627 .618 .605 .609 (C) .664 .649 .644 '.641 .628 .631 .626 -39- Table 26 contains the standard errors for the entries in Table 25. Standard errors do not seem to be much affected by the increased number of variables although their magnitudes do tend to increase slightly. As the number of variables loading on Core Factor I increases, variables p and E_on Factor I increase regularly and considerably more in percentage than the other variables. Table 26. Standard Errors of Rotated Factor Loadings fer Nine Core variables. Number pngactors Rotated .9. 10. a 12. 14. .11 15. Factor I (a) .060 .061 .060 .061 .064 .068 .068 (b) .020. .023 .029 .027 .032 .035 .040 (C) .020 .025 .026 .027 .026 .028 .034 Factor II (a) .053 .051 .054 .054 .060 .050 .058 (b) .030 .035 .035 .036 .036 .037 .044 (c) .031 .030 .032 .040 .036 .036 .037 Factor III (a) .029 .033 .027 .027 .032 .036 .031 (b) .063 .062 .050 .058 .065 .062 .066 (c) .042 .053 .056 .056 .066 .065 .060 CHAPTER'V DISCUSSION The discussion will center on the most important findings, which (I) predict the sample size required to assure stable factor patterns, (2) provide evidence that loadings do behave as correlations, and (3) indicate that the mean of the standard errors of the most signifi- cant loadings is at a minimum when the number of factors rotated equals the number of underlying dimensions. Problem I: Sampling Error _S_a_JIp_l_<_e_ Size and the Stabilitx 21: m Patterns Eigenvalues. When the eigenvalues obtained by factor analyzing a sample correlation matrix are close to the pOpulation eigenvalues, the resultant factors are most likely to be similar to the'population factors. "A sample‘size of 400 was necessary before'the‘means of the eigenvalues for 100 factor analyses were reasonably close to the’popula- tion values. Sample sizes 25 and 100 produced four or more eigenvalues greater than unity; at sample size 400 the fourth eigenvalue was 0.99‘or just below unity, a value conforming with the rule of thumb which suggests that the number of underlying dimensions is equal to the number of "roots greater than unity. (It will be remembered that this experiment did have just three underlying dimensions.) The standard errors of eigenvalues appear to be small enough at sample size-400 to assure that the second and third eigenvalues will not cross; but perhaps this is not the most important consideration since indi- Vidual variables could cross without the actual eigenvalues. More -40- -41- importantly, there should be a high probability that the obtained eigen- values will be similar in size to the population eigenvalues. The data suggest that a sample size of 400 is probably necessary before one can be reasonably confident that the resultant eigenvalues will be sufficiently close to the pOpulation values. Unrotated Loading . Sample size 25 does not produce unrotated loadings whose means center on population values, but sample sizes of 100 and larger do (see Table 3). As sample size increases, two important changes take place: (1) standard errors decrease and (2) the means of the resultant loadings are closer to population values. Hence one way of assuring that sample values will be close to population values might be to pick a sample size which will result in a sufficiently small standard error. For example, if it should be decided that the maximum.standard error tolerable for any of the various levels of loadings of 0.10, then a sample size of at least 400 appears necessary: the results of Table 4 show that the standard error of the low group, which approximates zero correlations, is 0.104, while the standard errors of loadings in the other two groups were much smaller; loadings averaging 0.40 had a mean standard error of .073 and those averaging 0.50 had an average standard error of .047. Since the probability that low loadings may become significantly large is greater than the probability that larger loadings will become insignificantly small, it follows that differences in interpretation are most likely to result from low loadings becoming large. Rotated Loadipgg, As in the case of unrotated loadings, a sample size of 100 was sufficient to bring the means of the rotated loadings quite close to the population values; this result would be expected since the rotated solution is merely a transformation of the principal axis analysis. The real determinant of how close a given variable is likely -42- to be to the pOpulation value is, once again, the standard error. If one desires, for instance, to be certain that at the 0.05 level the resul- tant loadings are not more than 1I0.16 away from.the population values, a sample size of 400 is necessary (Table 6). If more precision is desired, it may be obtained by an appropriate increase in the sample size; But‘it appears that a sample size of 400 is necessary to consistently produce' sample factor patterns that resemble the population factor pattern. .Al+ though using a sample size substantially smaller than 400 is likely t0‘ yield an interpretive text which is significantly different than the one that would be written to the population factor pattern, the slightly more accurate loadings obtained by increasing the sample size beyond 400 are not likely to result in interpretations that would produce a different text. Stability pf Rotated versus Unrotated'Loading . Table 9 indicates that random pairs of unrotated loadings are"less congruent thaniare re; tated pairs at the smaller sample sizes. This difference is probably due to (l) the nature of the Coefficient of Congruence;:and (2) the manner in which the rotated factors were selected:‘ The CC is'sensitive'tO‘sign changes, and these were found to occur quite frequently, particularly for sample size 25. .Also, the block method of factor selection allowed the factor most like the population factor to be chosen from any of the three columns representing the rotated factors; this was not done for the'un- rotated loadings--they were taken from.the positions in which they occur in the pepulation factor pattern. For the larger sample sizes,-the”pro= bability that the unrotated loadings will not conferm.to the population pattern has been virtually eliminated and rotation becomes nothing more than a mechanical process. About the same degree of congruence is noted for the rotated and unrotated loadings at the larger sample sizes. -43_- Prediction 9; the Average Standard Error An important question is whether the actual standard errors of factor loadings can be predicted from the sample size. Hamburger (1966) finds the 1/(N)i is a good prediction of the average standard error for the sample sizes and correlation matrices he investigated. The results of the present study indicate that for rotated loadings of substantial magnitude, the average standard error is consistently, though only slightly, larger than l/(N)% (Table 9). Hamburger's suggested rule of thumb fer prediction of the standard error appears appropriate for the largest unrotated loadings (Table 4) and the largest rotated loadings (Table 6) but yields values which are much too small to accurately pre- dict the standard errors of loadings of less magnitude. Upon correction for the average loading size using the formula sigma = (l - r2)/ (N - l)%, the average of resultant standard errors for rotated loadings of significant magnitude is approximately twice the size of the standard errors that would be expected if the loadings were' behaving exactly as correlations (Table 9). .A similar comparison for unrotated loadings shows that the average of the resultant standard errors is approximately 50 per cent greater than the corrected expected values. ‘The low-level unrotated loadings, whiCh approximate zero loadings, are also about twice the corrected expected.values and thus similar to the largest rotated loadings. Figures 1 and 2 indicate that once the sample size is large enough' to assure the same factor pattern's appearance, the standard error does decrease at‘a rate proportional to the square root of the sample size. Hence it appears possible to predict the average standard error fer load— ings of a given magnitude, a finding considerably more valuable than mere~ ly predicting the average standard error for all loadings considered -44- together. For all sample sizes and levels of loadings, Tables 7 and 8 ShOW’the ratios of the resultant factor loadings to the values expected when the loadings behave as correlations. These ratios may be used to roughly determdne the magnitude of standard error fer a given loading level and for a specified sample size. SuCh information may also be obtained from Figures 1 and 2. Factor Loading§_§§ Correlations Since the standard error of a correlation coefficient p_with a sample size N_is given by sigma = (l - r2)/(N - l)%, loadings, if they are behaving as correlations, must be expected to fellow this relation- ship. Higher loadings must be expected to have lower standard errors, and zero loadings should be approximately l/(N)%. The results indicate that higher loadings do have lower standard errors, but the standard errors were somewhat larger than those expected for correlations. As discussed in the previous section, Figures 1 and 2 show that the standard errors of loadings are proportional to the square root of the sample size once the sample size has reached a level which assures repetition of the" population factor pattern. Tables 7 and 8 indicate that the ratios of the obtained standard errors to the expected standard errors are approxi- mately equal for a given magnitude of loading. PROBLEM 2: STANDARD ERROR AND THE NUMBER OF ROTATED FACTORS Underlying dimension theory suggests that it is important to rotate exactly as many factors as there are underlying dimensions. It has been suggested that rotating too few factors will find some variables' highest loadings wandering unpredictably among the rotated factors. Sim: ilarly, if too many factors are rotated, groups of variables whose highest loadings normally would be found on the same factor will unnecessarily be divided to provide loadings for the extra factor(s); furthermore, it can-' not be predicted which factor(s) will contribute variables to the -45- superfluous factor(s). But when the correct number of factors has been rotated, unpredictibility disappears and the highest loadings are always able to group together in the appropriate pattern. It is logical to expect that as the number of rotated factors becomes more distant from the true number of underlying dimensions, the resultant factor pattern will become less appropriate.‘ As the factor patterns become less apprOpriate, the standard errors also increase. Hence, if a graphical portrayal is made with the standard error'repre- sented by the vertical axis and the number of errors by the horizontal, a U-curve should result with the point at the very bottom of the "U" representing the standard error fer the correct number of rotated factors. All of the experiments described in Chapter III did yield U-curves (Tables 12 and 14). The "U" was considerably flatter fer the case of* feur underlying dimensions than for either case of three, but perhaps this is to be expected since certain conditions may accentuate the"difé ferences in standard error between the correct and incorrect number of factors. 'It may be that either the number of underlying dimensions or the extent to which the factor pattern is unifactorial is the prime con- trolling factor. Both might logically be expected to play a significant role in determining the Shape of the "U". Since the presence of more underlying dimensions means that more factors will have been rotated just before and after the correct number, the severity of the situation in terms of the percentage of variables which must load in a false pattern is diminished because the percentage of factors that are not able to properly develop is smaller. Hence the number of variables exhibiting an unusuallyihigh standard error should be fewer and their effect on the average standard error is likely to be less. The U-curve may also be flattened if the extent to which the factor pattern is unifactorial is ~46- low, for then it will be easier for certain variables, those loading high on two or more factors, to pull away, thus shifting their highest loadings from the correct grouping. Thus rotating the wrong number of factorS‘when a unifactorial structure is missing should result in higher standard errors fer a greater number of variables than in those situations poss- essing unifactorial structure. In these experiments the number of underlying dimensions probably exerted more influence on the shape of the U-curve than a low degree of simple structure because all factor patterns did approach unifactorial structure. I V The means of the averages of the highest loadings (bottom line of Table 13) show a progressive increase as the number of rotated factors becomes larger. This is not unexpected, because as more axes are avail- able to be positioned through likely groups of points, less error will occur: the points will lie closer to the axes, and the distance between each point's projection onto the axis and the origin, which is the value of the loading, will be'greater.> Unfortunately there is no maximizing process which would find the largest loadings occuring when the correct number of factors has been rotated; instead, the more factors rotated, the larger loadings become. But the fact that the standard error is at a.minimnm when the correct number of factors has been rotated is an ex=‘ tremely important finding for it can be used to determine the number of underlying dimensions in apprOpriate situations. PROBLEM III: CHANGES IN THE FACTOR PATTERN N_L_§n_b_e_r_ pf Underlying Dimensions Eiggnvalues. A general rule of thumb suggests that the number of significant factors in a body of variables is equal to the number of" eigenvalues greater than unity. In.this experiment the means of eigen- values fbr 100 factor analyses did conform to this rule, but it appears -47- that this rule is actually of little practical value. For the five-factor solution, the standard error of the sixth eigenvalue is 0.031 (Table 16), and since the mean value of this eigenvalue is 0.99 (Table 15), over 40 per cent of the factor analyses must have had at least six eigenvalues greater than unity. Thus such a rule cannot reliably predict how many factors should be rotated for the entire population. Unrotated Loadingg, The highest loadings of the unrotated solu- tion for Core Factor I (Table 17) indicate that adding dimensions to a' body of variables does not produce much change in the loadings of the existing dimensions. The standard errors (Table 18) also appear to be unaffected. These results could be expected for two reasons: (1) the component analysis being used extracts a maximum.amount of variance on successive factors, and (2) the structure of the factor patterns is basically unifactorial. Since the factors are basically orthogonal to each other, it is not likely that adding a factor will contribute much" to the previous factor structure, and the variance accounted for by the additional dimensions will be extracted as new factors.‘ Rotated Loadings. .As might be expected from the above discussion, an added number of dimensions did not, fer the most part, affect the mag- nitude of rotated loadings (Table 19). 'The one loading that did show a significant drop in magnitude had, it was discovered, a significantly high loading on one of the'other dimensions added, andthusyit did conform to the unifactorial structure exhibited by the other variables: ‘In """ general, these variables having relatively high loadings on.more than one' dimenSion werefound to .have higher-standard errors - (Table 20), and thus it appears that unifactorial structure also plays a role in determining the stability of-factor patterns. -43- m 9: Variables Loading pp 3 M Eigenvalues. Since the variables chosen for this experiment con- tained only three underlying dimensions, it would be expected that the number of eigenvalues greater than unity might be only three. The means of the fourth eigenvalue were slightly greater than unity for the' l3-l4-lS-variable situations. Is this a contradiction of the general‘ rule of thumb or is there a logical explanation? It will be noted that as variables are added, all of the eigenvalues increase someWhat."As expected, the first eigenvalue increases most rapidly since it is to this dimension that the added'variables belong (Table 21). The second, and" particularly the third eigenvalues remain much less affected: Since there is a slight increase in all eigenvalues, it is logical that those'initially‘ near unity will eventually surpass this value. It mnght be wiser to suggest using a rule which requires looking at sharp breaks between groups of eigenvalues to determine the number of factors." One might also seek to determine on which eigenvalue the most influence of an added variable is manifested. unrotated LoadingE, The magnitudes of the unrotated loadings on the altered factor do not appear to be affected by an increased number of variables loading on it (Table 23), but the standard errors of the third and feurth variables-(Table-24) Show a regular decrease as the number of variables increases. These are the most highly correlated variables and. thus are the prime determinants oftheir underlying dimension.I As more variablesare added to theirdimension, the probability that the nature of the dimension will change is diminished. Rotated Loadingg, .Although the probability that the nature of the dimension represented by the first eigenvalue will change is diminished -49- by the presence of additional variables loading on that dimension, the exact position‘of the rotated.axiS'placed through the group of points representing these variables imay.become less stable. For an individual factor analysis, each.added-variable‘provides an opportunity for addi4 tional wobble in the placenent‘of'the‘ axis, 'and'th'is should result in'a gradual increase in the standard errors of the individual rotated vari- ables as the number of variables present on that dimension increases (Table 26). It is also noted that those loadings which were initially highest, p_and‘g_of Factor I, steadily decrease in size as variables are added (Table 25).. If_the added variables were to load randomly on either side of these prime determinants, it would be logical‘tO“expect that the'posi; tion of the axis would not Change much from the original‘situation: But if most of the added variables fall on only one side of these two highly correlated variables, .the placement of the axis will be shifted in that direction and the projections of the two points representing the highly correlated variables onto the axis