AN EMPIRICAL STUDY OF SAMPLING
ERROR IN FACTOR ANALYSIS

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
ALLAN LINDSAY LANGE
1969

This is to certify that the
thesis entitled

An Empirical Study of Sampling Error

in Factor Analysis

presented by

 

 

Allan L. Lange

has been accepted towards fulfillment
of the requirements for

Ph.D. Education

degree in

K” F 12/: A,

Date jﬂé C art/é! 2

 

0-169

 

 

 

 

ABSTRACT

AN EMPIRICAL STUDY OF SAMPLING ERROR
IN FACTOR.ANALYSIS

By
Allan Lindsay Lange

The major purpose of this study was to empirically determine the
statistical information necessary to make meaningful decisions about
sample size and the number of factors. The study also examined how the
stability of sample factor patterns might be affected by certain changes
in the population factor pattern.

The data was drawn from nearly an entire freshman class at-
Nﬁchigan State university; the students' responses to 41 items, which
inquired into their social, political, and economic views, were recorded
on a five-point scale (strongly agree - agree - uncertain - disagree -
strongly disagree).

From a population of 5948 responses to a fixed set of 12 variables
with a known factor pattern, 100 random samples were drawn for each of
the sample sizes 25, 100, 400, 800, 1200, and 1600. Factor analyses were
performed, and the means and standard errors were computed for all the
eigenvalues, for the highest rotated loadings of each variable, and for

the unrotated loadings in the first column of the principal axis solution.

Allan Lindsay Lange

The average of all the standard errors for middle- and high-
level rotated loadings was found to be slightly larger than l/(N)%,
while the average fOr all unrotated loadings was slightly less than for
rotated loadings. Higher loadings consistently had smaller standard
errors than lower loadings, and in this respect both unrotated and ro-
tated factor loadings behave like correlations. .A sample size of 400
appears necessary to consistently produce sample factor patterns that
resemble the population factor pattern. Although using a sample size
substantially smaller than 400 is likely to yield an interpretive text
which is significantly different than the one that would be written to
the population factor pattern, the slightly more accurate loadings ob-
tained by increasing the sample size beyond 400 are not likely to result
in interpretations that would produce a different text.
Number of Factors

Experiments were conducted using two unifactorial factor patterns,
one of three underlying dimensions and one of four. For each pattern and
with N=400, several groups of SO random.samples were drawn and factor
analyses performed. For each group, a different number of factors was
rotated, and means and standard errors of the highest loadings were computed.

.All results indicate that the average standard error of the highest
loadings is at a minimum.when the correct number of factors has been rotated,
and thus a way is suggested for determining the number of significant
underlying dimensions in a set of variables.

Changes in_the Factor Pattern

 

Factor patterns were manipulated in two ways: (1) the number of
variables was increased from.9 to 15 by adding variables to just one of
the factors, thus leaving the number of underlying dimensions unchanged,
and (2) the number of variables was increased from.9 to 15 by adding two

additional underlying dimensions, each containing three variables.

Allan Lindsay Lange

Increasing the number of dimensions from 3 to 5 approximately
doubled the average magnitude of standard errors for rotated loadings;
no such increase was detected for the unrotated loadings. Building up the
number of variables without increasing the number underlying dimensions
did not produce a significant change in the size of the standard errors
for either rotated or unrotated loadings, and thus it appears that
factorial stability is more dependent upon the number of underlying

dimensions than on the number of variables.

AN EMPIRICAL STUDY OF SAMPLING ERROR IN FACTOR.ANALYSIS

By
Allan Lindsay Lange

.A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
College of Education
1969

This THESIS for the DOCTOR OF PHILOSOPHY degree

By

Adlan Lindsay Lange

has been approved

Charles F. wrigley Thesis Chairman, Guidance Committee
William.A. Mehrens Chairman, Guidance Committee

Irvin J. Lehmann Reader, Guidance Committee

Clessen Martin Reader, Guidance Committee

Lee S. Shulman Reader, Guidance Committee

ACKNOWLEDGEMENTS

The author is indebted to the members of his guidance committee,
Professor Charles F. Wrigley (Thesis Chairman), Professor William A.
Mbhrens (Chairman), Professor Irvin J. Lehmann, Professor Clessen.Martin,
and Professor Lee S. Shulman for their advice and counsel during the
course of this study. .The author is especially indebted to his major
professor, Charles F. Wrigley,.for his personal interest; perceptive
criticisms, and generous grants of time. 'To Mr. RiChard Rogers, Mr.
James Deatherage, Mr. Layton Price, and Mrs. Hella Lange go sincere thanks
for the many hours spent in preparation of computer programs;

Appreciation and thanks are extended to the members of the Office
of Evaluation Services and in particular to Professor Willard G.
warrington fer providing office space and data used in thiS‘study. The
author is also indebted to the Computer Institute for Social Science

Research for its generous grant of computer time.

ii

TABLE OF CONTENTS

List of Tables . . . . ........ . . .-. . . . . . . . . . . . v
List of Figures . . ......................... vii
Chapter I: PROBLEM . . . .*. .I..l. . . . . . . . . . ..... . . 1
Statement of the Problem . .L. .I. . . . . . . . . . . . . . . 4
Purpose of the Study . . . .-. . . . . . . . . . . . . . . . . 5
Studies on Sampling Error . . . . ...... . . . . . .‘. .p 6
Studies Relative to the NUmber of Factors ...... . . . . 8

ChapterII: PROBLEMI: SAMPLING‘ERROR. . . . . . . . . . . . . . .12
Method . . . . . . . . . ..... . .v ..... '. . . . . . . 12
General Procedure . . . . . . . . . . .-. . . . . . . . . 12

Procedure fer Problem I . . .‘. . . . . . ........ 12,

Data . . . . . . . ... . . . . .-. . . ..... . . . . 13
Determination of underlying Dimensions . . . . . . . . . 13

Sampling Procedures . . . . . . . .,. .-. . . . . . . . . 14

Data Analyses ..... v ............ . . . . . 14
ReSUItS o e o o o o o o o e I o o e o ooooooo o e o a o o 18
Sample Size and Standard Error . . . . . ..... . . . 18

Loadings as Correlations . . . . . . . . . . . . . . . . 23

Uniformity of Change Among Standard Errors . . .‘. . . . 26

Stability versus Type of Loading . . . .,. . . . . . . . 26

Chapter III: PROBLEM II: STANDARD ERROR AND THE NUMBER.OF

ROTATED FACTORS . . . ... .~. .,. . . . . . . . . . . . 27

Method . . . . . . ...... . . . . ...... . . . . . . 27
Procedure for the First Approach . . . . . . . . . . . . 28

Procedure for the Second Approach . . . . .~. . . . . . . 29

Results . . . . . . . . . .,. . . . . ........ . . . . 30

iii

Chapter IV: PROBLEM III: CHANGES IN THE FACTOR PATTERN . . . . . .

MethOd l O O I O O O I O O O O O O I
Number of underlying Dimensions . . . . .
Number of variables Loading on a Factor ..... . . .

Results . . . .-. . . . . . . . .

Number of underlying Dimensions .............

Number of variables Loading on a Factor . . . . . . . .

Chapter V: DISCUSSION . . . . . ........... . . .....

Problem I: Sampling Error . . . . . . . . . . . . . . . . .

Sample Size and the Stability of Factor Patterns . . . .

Prediction of the Average Standard Error . . . . . . .
Factor Loadings as Correlations .

Problem II: Standard Error and the Number of Rotated Factors.

Problem III: Changes in the Factor Pattern .*. . . . . . . .

NUmber of underlying Dimensions . .

Number of Variables Loading on a Factor . . . . . . . . .

Chapter VI: SUNMARY . . . . . ..... . .

Sampling Error . . . . . . . . . .

Nlmlber Of FaCtOI'S o e o o a he 0 o 0

Changes in the Factor Pattern
Questions Fostered.by the Study

Bibliography
Appendices

iv

32

. 32
. 32
. 33

33
33

. 36

40

. 40

40

. 43

44
44
46

48
50

. 50

51
52
S3

Table

10

11

12

13

LIST OF TABLES

Page
Population values and Means of Obtained
Eigenvalues for 100 Factor Analyses . . . ......... 19
Mean Standard Errors of Eigenvalues
for 100 Factor Analyses . . . ............... 19
Population Values and Obtained Means of
Unrotated Loadings for 100 Factor Analyses . . ....... 20
Mean Standard Errors of unrotated Loadings
fer 100 Factor Analyses . . .‘. . . . ...... L ..... 21
Population.values and Group Averages of Obtained
Means for High and Low Rotated Loadings . . . . . . . . ... 21
Average Standard Errors for High and Low
Rotated Loadings I ' O ‘ O O O I. O O . O O ........... 21

Ratios of Obtained Standard Errors to
Expected Standard Errors for Unrotated Loadings . . . . . . 22

Ratios of Obtained Standard Errors to Expected Standard
Errors for High and Middle Rotated Loadings ‘. . . . . . p. . 25

Means of the Standard Errors of the Highest Rotated
Loadings and Various Expected values . . .......... 25

Average Coefficients of Congruence for Rotated
and unrotated Loadings . . . . . . ............ .. 26

Means of Obtained Loadings fOr the Block and
Individual Methods of Selection .‘: . . . . .'. .‘ ..... 30

.Mean Standard Errors of Obtained Loadings for the
Block and Individual Methods of Factor selection .‘. . . . . 30

P0pu1ation values of the Highest Loadings for the
Correct Number of Rotated Factors and.Means of
Obtained Loadings for a varied Number of Rotated Factors . . 31

14

15

16

17

18

19

20

21

22

23

24

25

26

Average Standard.Error for All Highest Loadings

,According to the Number of Factors—Rotated .-. .....

.Mean Eigenvalues for Three, Four, and Five

underlying Dimensions .-. . . . .‘. . .........

.Mean Standard Errors of Eigenvalues for Three,

Four, and Five Factor Situations . . . . . . . . . . . .

Means of Unrotated Loadings for Three, Four,

and Five underlying DimensionS'.‘.'. .‘. ........

Standard Errors of Unrotated Loadings for

Three, Four, and Five underlying Dimensions . . . . . . . .

.Means of Rotated Loadings for Three, Four,

and Five underlying Dimensions . . . . . . . . . . . . .

Standard Errors.of Rotated Loadings for

Three, Four, and Five Underlying Dimensions .'. . . . .

Means of Eigenvalues for Nine through Fifteen

variables Loading on.Three underlying Dimensions .‘: . .

Standard Errors of the First Five Eigenvalues
for Nine through Fifteen variables Loading on

Three underlying Dimensions “. . . . . . .‘ .......

Means of the Highest unrotated Factor

Loadings for Four variables of the First Factor . . . .

Standard Errors of the Highest Unrotated Factor

Loadings for Four variables of the First Factor' . . . .

Mean values of Rotated Factor Loadings for

Nine Core Variables . . . . . . . . . . .'. . . . . . . . .

Standard Errors of Rotated Factor Loadings

for Nine Core variables . . . . . . ..........

vi

31

35

35

36

36

37

37

38

38

LIST OF FIGURES

Figure Page
1 Standard Errors of unrotated Loadings
for Six Sample Sizes . .,. . . . . ........ . . . . . 23
2 Standard Errors of Rotated Loadings
fer Six Sample Sizes ..... . . ..... . ....... 24

vii

CHAPTER I
PROBLEM

Factor analysis is a statistical technique used to identify the
underlying dimensions in a domain of variables. Since sufficient statis-
tical information necessary to make meaningful decisions about sample
size and the number of factors is not available, this study has been
designed to provide such information in a ferm.which.meets the needs of
the average researcher.

The need for this information.has become greater in recent years,
for with the advent of modern, high-speed computers, factor analysis has
become widely used in psychological research. The existence of "packaged
programs" which will factor analyze any correlation.matrix fed in has
made it relatively simple for the researcher to obtain.mathematica11y‘
complex analyses of data without becoming an expert programmer. Because
calculations consumed so much time in the precomputer era, only the most
important material warranted factor analysis, but today the computer can
quickly provide a factor analysis of any correlation.matrix, even if it
be of only slight potential significance. Hence, though groups of tests
are still frequently analyzed, attention has shifted towards the actual
items of tests and other variables that might previously have been omit- '
ted. Since the scope of factor analysis expanded with the availability
of computers, it is now particularly important that the researcher be
provided with guidelines regarding (1) the number of cases needed to
assure stable, replicable results, and (2) a means for determining the

number of underlying dimensions in a body of variables.
1

2

If research is to be of value, certainly its results must be
replicable, but the philosophical rule expressed by Ockhamls Razor must
also be kept in.mind since it is usually impractical for the researcher
to attempt to assure replicability by gathering all possible data. Col-
lecting data from.every member of a population is an arduous task and
frequently impossible; even if the entire population could be surveyed,
the expense can seldom be justified. Massive collection of data to
assure replicability is usually unnecessary. In.most cases random sampling
is an efficient substitute. For example, a questionnaire should not
be given to 6,000 peOple if a factor analysis run on a small, randomly
selected portion of that number would yield nearly identical results.
Hence, the researcher should gather only enough data to assure a level of
reliability that meets the investigation's requirements.

.Although it has not been traditional to investigate the stability
of factor analysis by considering the reliability of eigenvalues and
loadings, such an approach is an easy matter today with the aid of the
extremely fast computers. This approach requires a large number of
factor analyses to be run, thus permitting the calculation of the desired
standard errors. The data for each factor analysis is obtained by either
a.Mbnte Carlo technique or, if data on an entire papulation is available,
by the multiple random sampling of persons. Specific infermation regarding
those controllaBle aspects dfvtbenexperimental'deSign.which have the most
effect on the stability 'of factor patterns can be obtained by varying the
sample size or by changing the factor pattern for each group of factor
analyses run. For example, one could double the size of the sample and‘
note what effect this might have on the size of standard errors. It would

also be possible to change the number of underlying dimensions or to keep

the same number of underlying dimensions but to increase the number of

3
variables loading on those dimensions and note what effects these changes
might have on the standard errors.

Knowing which of the controllable aspects of the experimental
design most critically affect the standard error of eigenvalues and load-
ings allows the experimenter to efficiently plan his research to yield
reliable factor analyses. If the reliability of a factor pattern obtained
by sampling is too low, it is highly probable that this factor pattern
will be different from.the total-population factor pattern and thus not
give the desired information. Since it is assumed that a variable's great-
est influence is exerted on the factor on which its highest loading occurs,
most investigators interpret the results in terms of the highest loadings.
If, for example, in an investigation the sample size is 100 and a given
variable's highest loading is 0.40 with a normally distributed standard
error of 0.20, and if, for the purposes of interpretation, it has been
decided to disregard loadings smaller than 0.25, then the variable's high-
est loading will be ignored about 23% of the time. If the investigator
knows how the standard error of loadings is affected by increasing the
sample size, he is in a position to decide beforehand how much data
should be gathered.

Knowledge about the standard error of loadings should also help
solve a basic problem of factor analysis -- that of identifying the
RENEE! of underlying dimensions in a body of variables. If for clarity
of interpretation the factor analyst rotates the initially obtained
solution, it logically fellows that he should rotate as many factors as
there are significant underlying dimensions in his material. Since a
loading is actually a correlation between the variable and a factor,

factor theory suggests that the specific underlying dimensions act in a

4
way that draws the highest loadings of appropriate variables to a given
factor according to the correlational pattern. If one attempts to rotate
too many factors, it is possible that some of the highest loadings will
be forced to occur on superfluous dimensions. For example, it is known
that rotating as many factors as there are variables will usually produce
a pattern in which each factor contains only one highest loading; uni-
queness is forced if too many factors are assumed. If too few factors
are rotated, the highest loadings will necessarily be forced to load in
a.pattern that does not accurately represent the underlying dimensions.
The factors on which the variables load when the wrong number of factors
is being rotated is at least partly due to chance since the influence
exerted by each underlying dimension cannot be properly exercised. Hence,
it can be theorized that the standard error of individual loadings will

be at a minimum when the correct number of factors is rotated.

 

Statement 9f_the Problem

The present study addresses itself to three important problems.
The first and third problems are concerned with the stability of factor
patterns as a function of imput quantities, quantities over which some
control may usually be exercised by the experimenter. The second problem
examines rotation, an aspect of the output, and tests a.method of deter-
mining the number of factors that should be rotated.

Problem 13- The first problem will be concerned with the effect
that varying the sample size has upon the standard error of loadings and
eigenvalues. This problem has been divided into several subproblems which
ask (1) if there is a predictable relation between sample size and standard.
error, (2) if loadings are behaving as correlations, (3) if changes among
the standard errors of loadings are uniform as sample size varies, and (4)

if rotated loadings are more stable than unrotated ones.

5
Problem 11, The second problem will consider the relation between
the correct number of factors and the standard error of rotated loadings.
More specifically, it will test the hypothesis that the standard error of
rotated loadings is indeed at a minimum when the correct number of factors
has been rotated.

Problem III. The final problem examines the effect of the factor

 

pattern on the magnitude of the standard error and/or size of loadings.
The number of factors and the number of variables loading on a factor
will be varied to see what changes occur in the size and standard error

of eigenvalues, unrotated loadings, and rotated loadings.

Purpose oflthg_8tudy

The problems considered in this study are designed to provide
much-needed information about the size of standard errors of those quan-
tities normally used fer interpretation of the number and nature of under-
lying dimensions in a body of variables. ‘Without such information it is
impossible to ascertain replicability and.meaningfulness of results.
Certainly it is invalid to assume that a.factor analysis is obtained
solely to determine the loadings of variables on factors which character-
ize only the unique group of individuals that participated in the study.
In most situations, the investigator must be able to generalize from the
sample of data to a larger universe of more or less equivalent data that
could have been gathered.

But how does an investigator know when his sample size is large
enough to insure valid generalization? Harman (1967) suggests that for
a fixed set of variables, the measure of consistency of factors from
sample to sample is a classical problem in the-theory of statistical

sampling. He further points out, however, that little progress has been

6
made toward the sOlution of the sampling problem in factor analysis and
therefore suggests that an empirical approach would seem appropriate.
Such an approach will be used in this study. It will not attempt to be
definitive nor will it suggest that the specific results are generalizable
to other sets of data. But it is hoped that this study will be recognized
as presenting an approach of considerable heuristic value for the future

solution to-shmilar quests.v

Studies gn_Sampligg_Error

 

The literature prior to 1963 contains many conjectures about the
standard error of factor loadings, but researchers in the precomputer era
were hindered from.backing up their contentions with empirical evidence.
Since 1963, several studies--all based on.Mbnte Carlo techniques-~have
focused on the standard error of factor loadings. All of these studies,
which have considered rotated or unrotated loadings, raise questions
which warrant further examination.

Unrotated Case. Joreskog (1963), using a.Monte Carlo technique,

 

examined the sampling errors of individual loadings on unrotated common
factors. For N5: 100, 200, and 300, he compared unrotated sample common.
factor'matrices to the p0pu1ation factor matrices and found the mean square
deviation from.the population values somewhat less than 1/(N)%,~the approx—
imate standard error of a zero correlation. This value decreased only ‘
slightly with sample size and there were no consistent differences in the
relative size of the standard errors fer normal and skewed populations.
When using a lO-variable three-factor matrix exhibiting unifactorial
structure, differences between the unrotated sample and population factors
could not be confidently matched with any population factor. For given
loadings the standard errors ranged from .42 for one of the zero loadings

to .15 for one of the non-zero loadings; sampling errors did not decrease

7
sharply with N. Since Joreskog attributed the large sampling errors to
near equality in the size of factors--which resulted in an instability
in the positions of the principal axes—-it appears that more information
relating the standard error of eigenvalues and sample size is needed.

Rotated Case, Joreskog (1963) rotated the above described factor
loadings and feund the standard errors to be somewhat smaller than l/(N)%.
The sampling errors of the non-zero loadings tended to be smaller than
those for the zero loadings. This difference appeared to be proportional
to the sampling error of the Pearson correlation coefficient, which in
turn -is proportional to l - r2, but the proportionality was not uniform
for all variables.‘

Hamburger (1965), also using a.Mbnte Carlo method, generated sample
matrices while investigating the sampling fluctuations of rotated loadings.
He used four different population factor matrices with varied degrees of
simple structure, and each matrix contained 12 variables and had four
common factors. After generating ferty sample correlation matrices for
each factor pattern, 20 of N=100 and 20 of N=400, these sample correlational
matrices were factored by the principal axis method using squared.multiple
correlations (R2) as communalities. The standard errors of factor loadingS'
were somewhat larger than the standard errors of the correlations in samples
of these sizes, but for N=100 the standard errors were lg§§_than twice as
large as for N=400. It was also noted that the standard errors tended to
be less for patterns exhibiting good simple structure than for poor, but
an interesting question does arise because reversals did occur.

Browne (1965) performed a study in which he generated sample values
of several population matrices, extracted factors, and rotated the results.
This was done for each of several methods of extracting factors and for

each method a comparison between the results and the population loadings

8

was made. Although Browne found that Lawley's Maximum Likelihood Method
gave the smallest sampling errors, .081, three other principal-components
procedures yielded average errors only slightly larger. For N=100,
Thomson's iterative method showed .087, principal factors with R2 commun-
ality estimates was .089, and the weighted principal factor method was
.088. The centroid method was somewhat larger, .107. For all of these
methods, sampling errors tended to decrease with the size of the loadings.

All three of the above studies suggest that the sampling errors of
rotated factor loadings are close to those of correlations, about 1/(N)%,
and that sampling errors tend to decrease for larger loadings. It is
obvious that in the case of rotated loadings some control fer the number
of factors rotated is also needed. These data reflect the situations
where the number of factors rotated reflect the number of underlying
dimensions in the correlation matrices. Further investigation should also
be made into the proportionality of the size of loadings to the quantity
(1 - r2). Since all loadings did not follow this relationship, it should
be determdned.whether some of the loadings are consistent for all sample

sizes and, if so, the defining characteristics of such loadings.

Studies Relative 52 the Number 9f Factors

 

 

No problem is perhaps as puzzling and bothersome as the one of
deciding the number of factors that are present in a body of variables.
The problem would not be too serious if rotating too few factors meant
merely overlooking a psychological dimension, or rotating too many meant
only some of the factors would break down and be rendered uninterpretable.
But since it appears that the estimation of loadings on one factor cannot
be accomplished independently of the estimation of loadings on others, the
importance of estimating the correct number of factors cannot be over

emphasized.

9

Some methods--such as Rao's (1955), Lawley's (1953), and Joreskog's
(1963)--do not estimate the number of factors at all but rather estimate
the uniqueness of the variables on the basis of a certain number of factors.
Changing the estimate on the number of factors results in a communality
estimate change as well as different factor loadings. But such methods
are of less importance to the average researcher, for the uniqueness of
individual variables is not what is usually sought.

The number of factors assumed also has influence on the rotational
process. Merrifield and Cliff (1963) feund.that when using the varimax,
it is important that the number of factors to be rotated be specified
correctly. If the varimax method requires the correct specification of
the number of factors, it is reasonable.to assume that other rotational
procedures may also be affected by a failure to do so. Older literature
suggests that solutions possessing simple structure will be invariant with
respect to the number of factors rotated--present information contradicts
this.

There have been many Iroposals for deciding on the number of
factors and such decision-rules, according to Levonian and Comrey (1966),
generally have been based on the concept of either statistical significance
or minimal rank. It would seem.pertinent to look at the criticisms as
well as the possible usefulness of some of the more important ones.

Cattell (1958) criticized the criterion of statistical significance
by stating that.the determination of the number of real common factors
should not be dependent on the number of variables or subjects the inves-
tigator happened to use. This seems to be an appropriate criticism, but
a statistical procedure may help to determine the number of factors.
Suppose, fer instance, that the average standard error of all variables

is found to be at a minimum when the correct number of factors is rotated;

10
such infermation can then be used to make a meaningful decision about
the number of factors.

A criticism.of the minimal rank criterion by Tryon (1961) pointed.
out that the minimal rank of the population correlation matrix, and.hence
the number of real common factors, can never be determined, while the
nunimal rank of the sample correlation matrix is always equal to the order
of the matrix. This is, of course, a true statement, but it would appear
possible to ascertain that within limits a sample correlation matrix has
the same rank as the population matrix.- One could, for instance, continue
to double the sample size until the rank became stable.

Because of the difficulties with minimal rank and statistical
significance, some investigators have departed from the tradition of
pinpointing the number of factors and preferred to specify only the max-
imum and.minimum bounds on the number of factors. Guttman (l954).feels
that the lower limit is the number of non-negative latent roots of the
correlation matrix whose main diagonal contains the squared.multiple
correlation of each variable with the remaining (n - 1) variables. Kaiser
(1960) has also argued for use of all characteristic roots greater than
unity. Horn (1965) pointed out, though, that these criteria have been
shown to apply only when it is assumed that we are dealing with a p0pu-
lation of persons and a sample of tests.' Tucker (1964), applying Mbnte
Carlo techniques, investigated this type of psychometric question and
found that the various rules concerning the number of factors present in‘
a battery were not reliable estimates of the number of'major factors
present in his artificial factor matrices. Browne (1965) also inves-
tigated the number of factors rules of thumb and feund.that accepting the
number of characteristic roots greater than unity as the number of factors

gave good results in some cases but not in others. Since this rule has

11

not proved entirely satisfactory, it is necessary to look also at the
mathematical approaches.

The number-of-factors question has been approached mathematically
by Lawley (1953), Rao (1955), and Joreskog (1963). Each have offered a
statistic to test hypotheses regarding the number of common factors in
a given correlation matrix. Lawley's and Joreskog's methods have been
tested using a Monte Carlo procedure and found to give the correct number
of factors in a majority of cases. But since these are tests for only a
specific matrix, their value for generalization to an entire population
is questionable. An appropriate procedure should enable generalization
to a population--using knowledge about the size of sampling error is one

possible method.

CHAPTER II

PROBLEM I: SAMPLING ERROR

The data, means of determing the number of underlying dimensions,
sampling procedure, and data analysis are described more fully in this
chapter than in subsequent chapters since these quantities either remain

constant or experience only minor changes for problems two and three.

Method

General Procedure

 

.A core of nine variables loading on three underlying dimensions
in a 3-3-3 pattern was included in all factor analyses run for the three
problems. The magnitude of changes in the size of loadings and of
standard errors among these variables will be used to decide whether the
different sources of variation are responsible for instability among

factor patterns. '

 

Procedure for Problem I

L This problem, using a fixed sample of variables, considers the
change in standard error which will result by varying the sample size.'
For each of the sample sizes 25, 100, 400, 800, 1200, and 1600, one
hundred random samples were drawn, correlation.matrices computed, and
factor analyses performed. 2A domain of twelve variables loading on three"
factors in a 5h344‘pattern‘was used.‘ varimaX“rotations were obtained
fer all of the principal axes solutions, and the means and standard
errors of eigenvalues, unrotated loadings, and rotated loadings were
determined. Fifty pairs of factor solutions were randomly selected for

each sample size and an average "coefficient of congruence" computed.
17 ‘

13

Data

 

Since the fecus of this problem is on the amount of sampling
error as a function of sample size, it is necessary to obtain, as fully
as possible, the responses of an entire population, for knowing popula-
tion values reveals the accuracy of a generalization made from an indi-
vidual sample. To satisfy the design of the problems under consideration,
it is necessary that the set of variables to which the subjects respond
contain both the desired number of underlying dimensions and the desired
degree of simple structure. As a requirement fer the data to be factor
analyzed, it is necessary that each of the variables be responded to
on the basis of some continuum such as best to worst, most to least,
strongly agree to strongly disagree, etc. Since principal component
analysis will be used, standard deviations of the variables should be
approximately equal. Data considered to meet these requirements suffi-
ciently were found in the files of the Office of Evaluation Services at
Michigan State University.

Nearly the entire Michigan State university freshman class of 1967
responded to 41 items inquiring into the students' social, political, and
economic views; these response5"were recorded on‘a five-point scale
(strongly agree-agree—uncertain-disagree-strongly disagree).' For N=5948,
means, standard deviations, and correlations between items were computed.
The standard deviations were mostly in the range 0.95 to 1.15 and the

correlationS‘ranged'from'n.30‘to +.55.

 

 

Determination 2f Underlying Dimensions

Those variables belonging to the same underlying dimensions were'
determined by factor analyzing a group of 41 variables and rotating the
principal axis solution using the varimax criterion and the Kiel4Wrigley

criterion (1960), the latter being set at 3. This means that 2, 3, . . .

14

n factors were rotated until some factor failed to have at least 3 vari-
ables whose highest loadings occurred on that factor. Those groups of
variables whose highest loadings were always feund on the same factor, no
matter how many factors were rotated, were considered to form underlying
dimensions. Since many other researchers have chosen factor patterns
containing 12 variables and 3 underlying dimensions to investigate the
problem of sampling error, such a factor pattern is also used in the
present experiment. Comparisons to other results should then be more
meaningful. Finally, the factor analysis of the population correlation
matrix revealed an almost unifactorial structure: only three of the
variables showed more than a minimal amount of their influence divided
on two or more factors.
Sampling Procedure

I Placing the responses from the entire population for the 12
selected variables in the core storage of a Control Data Corporation 3600
computer allowed the computer to quickly draw random samples for the
desired sample size. No subject's responses could appear more than once
in a given sample. SAMPLER--a fortran routine (AppendixA) which permits
specification of p0pulation size, number of samples, desired sample size,
and the number of variables to be sampled fer each subject--was used to
draw the random samples.

Data Analyses

 

Eagtgr_Analysis Program. The factor analysis program.(A. Williams,
1967) of the Computer Institute fer Social Science ResearCh (CISSR) at
lMichigan State university performed all factor analyses. This versatile
routine, which computes eigenvalues, principal axis factor loadings, and
either or both of the quartimax and varimax rotations, also includes pro-

visions for specifying the type of communality desired if a correlation

15
matrix is calculated from raw data and for specifying the number of
factors to be rotated. It is also programmed to use the Kie14Wrigley
criterion (1960) and thus specify the minimum number of variables that
should have their highest loadings on any of the factors. Once the min-
imum.number of variables has been specified, rotation--all rotated
solutions are printed out--will then continue until fewer than the
specified number of highest loadings occur on a factor. In this study,
unities were inserted as communalities for all factor analyses since
these are commonly used by many investigators and thus should provide a
less controversial entry for the communalities. Unities also were con-
sidered to be most appropriate, because they represent the simplest
situation and this should be examined first.

Methods 2f Rotation. Since the purpose of the problem.under con-
sideration is not to make comparisons among the various rotational pro-
cedures, and since little difference was noted among such procedures by
other investigators, only one rotational method was used, but a check"
was still made to see if the quartimax solution would be similar to the
varimax. Differences between corresponding loadings for the two methods
were not detected until the thousandth's place, but it is possible that
the results obtained might not be so nearly equal if a less unifactorial
structure‘wereTused..

Rotation is usually carried out to reduce the complexity of the
factorial description of the variables;' Since the‘quartimaX‘provideS"
a rotation that tends to increase the larger factor loadings and decrease
the smaller oneS‘for‘each'lariable of the original factor matrix, it is
concentrating on the rows of the factor matrix. 'According‘to Harman
(1960), the object of the'quartimax method is tO‘determine an orthogonal"

transformation, _'1_‘_, which will carry the original factor matrix, 1:, into

16
a new factor matrix, B, for which the variance of the squared factor

loadings is a maximum. The formula which will yield this maximum is
4
Q=§l £1). .
j=1 p=1 JP

where b_represents the rotated factor loading, 2 represents the number
of factors 1, 2, . . . ,>m, and.j_represents the number of variables
1, 2, . . .-, n.

In contrast to the quartimax, the varimax, which attempts to
approximate simple structure more closely, concentrates on simplifying
the columns or factors of the factor matrix. To achieve a "normal" var-
imax criterion, the loadings in each row of the factor matrix are divided
by the square root of the communality fer each row; respectively- oThe
computing procedure fer a varimax solution is quite similar to that
employed for a quartimax, except the varimax requires that

1 n1 n I Hi n
v=nZ Z (b._/h.)4-Zch? HI?)2
p=l j=l J J p=1 j=13p J
be maximized instead of Q. Here 6, p, and.j_are the same as was men-
tioned in the proceeding paragraph and.h_represents the communality.

"fggtggfSelection'Prgggam."The‘programrused‘in‘this‘study was

 

COUMLDGS'(Appendix‘B).‘ The assumption behind this program is that
variables which lave‘their highest loadings in‘the'same column of the
rotated-factor matrix belong to the same underlying dimension.‘ If a
group of variables-is known to form an underlying dimension; the column
in which‘thi5°dimension is located can be determined by computing the
linear sums of the loadings representing those‘variables in each of the "
various'columns*and'selecting'the‘largest;"OOLMLDGS'aISO'providessa

punched output of the selected loadings, the eigenvalues, and the first

l7
row of the principal axis solution. This method was deemed sufficient
fer the identification of factors since, for the most part, the sample
sizes used were so large that the population correlation matrix was
closely approximated and also because of the high degree of simple
structure feund in the rotated solution. The sample sizes 25 and 100
often failed to produce a factor pattern similar to the population
factor pattern, and thus the value of the results for those two sample
sizes is questionable.

Factor Comparison Program. A.factor comparison program, COMPARE,

 

was written to individually compare either rotated or unrotated factors
fer any two separate factor solutions. This method, called the "Coef-
ficient of Congruence” by Tucker (1951) and the ”Coefficient of Similar-
ity" by Barlow and Burt (1954), outwardly resembles the Pearson product-
moment correlation coefficient, but it does not produce a true correla-
tion since the factor loadings used in the formula are not deviates from
their respective means and the summations are over the number of vari-
ables rather than the number of individuals (Harman, 1960, p. 285).
Recommended by Wrigley and Neuhaus (1955) and Pinneau and Newhouse (1964),
the fermula for this method, which shall be referred to as the Coeffi-
cient of Congruence (CC), is

m

a . b .
k=1 1‘1 k3

' 2 2
o b.%
(If: and n)

where _a_ and 2 refer to the factor loadings, _i_ and j_ refer to the two

 

CCij =

factors to be compared, and k refers to the variables (1, 2, . . . , HQ

in each factor.

18
Standard Error Formulas. The standard errors of obtained loadings,
both unrotated and rotated, and eigenvalues were computed by the formula
_a'= It: x2)/ (N-mI
where §Dis in deviation form and N is the number of factor analyses.
The fermula for computing the standard error of the correlation coeffi-

cient is 2
(l-r)

CTr =

 

(N-1)%
where r_is the correlation coefficient. This fermula is considered to

be an approximation to the corresponding correlation coefficient sigma
in the population from which the sample of N_has been randomly drawn.
I Results 1

The results will be reported relative to the subproblems which
ask (1) if there is a predictable relation between sample size and
standard error, (2) if loadings are behaving as correlations, or if
changes among the standard errors of loadings are uniform as sample size
varies, and (3) if rotated loadings are more stable than unrotated ones.

Sample Size and Standard Error

 

Since the basic question being considered here is whether a pre-
dictable relation exists between sample size and standard error, this
question will be treated separately for eigenvalues, rotated loadings,
and unrotated loadings-"

Eigenvalues. .A comparison between the magnitude of the obtained
eigenvalues for each sample size and the population values is given in
Table I. Obviously the two smallest sample sizes, 25 and 100, do not
yield values close to the population values: large eigenvalues tended to
be much larger and small eigenvalues were considerably smaller. As sam-

ple size increases, the values quickly approach those of the population;

 

19

it may also be noted that for the sample sizes 25 and 100, more than the

first three_eigenvalues were greater than unity, although only three

underlying dimensions are contained in the body of variables.

Table 1. Population values and Means of Obtained

Eigenvalues for 100 Factor Analyses

 

 

 

 

 

Sample Size
25 1_0_0_ 400 - 800 1200 1600 5948
(1) 3.24 2.78 2.66 2.66 2.66 2.67 2.65
(2) '2.09 1.75 1.63 1.61 1.60 1.60 1.61
(3) 1.61 1.40 1.34 1.33 1.32 1.32 1.32,
Rank (4) 1.26 1.11 0.99 0.95 0.95 0.93 0.91
of: (5) 1.01 0.95 0.90 0.88 0.87~ '0.87 0.86
Eigenvalues (6) 0.79 0.83 0.83 0.82 0.82 0.82 0.82
(7) 0.62 0.74 0.76 0.77 0.77 0.76 0.76
(8) 0.47 0.65 0.71 0.72 0.73 0.73 0.75
(9) 0.36 0.57 0.65 0.67 0.68 0.68 0.68
(10) 0.26 0.49 0.69, 0.62 0.63, 0.64 0.66
(11) 0.18 0.41 0.51 0.54 0.54 0.54 0.55
(12) 0.10 0.32 0.41 0.43 0.43 0.43, 0.44
Table 2 shows the standard errors for the entries in Table 1.
Table 2. .Mean Standard Errors of Eigenvalues
- - for 100 Factor Analyses
Sample Size
25 - 100 492 ﬁg ’ 1200 1600
(1) .470 .303 .164 .111 .086 .062
(2) .251 .162 .095 .073 .049 .041
(3) .170 .114 .071 .059 .044 .036
(4) .125 .094 .048 .033 .031 .028
(5) .126 .063 .032 .030 .025 .025
Rank (6) .115 -.060 .037 .029 .027 .026
of _ g . , (7) .088 .054 .034 .024 .017 .018
Eigenvalues (8) .089 .046 .024 .024 .019 .018
(9) .055 .038 .034 .024 .019 .016
(10) .052 .045 .031 .023 .021 .015
(11) .047 .041 .035 .029 .021 .015
(12) .037 .044 .041‘ .026 .017 .015

 

 

-20-'
Quadrupling the sample size did not fully halve standard errors in most
cases. For all sample sizes except 1600, reversals did occur: in some
cases smaller eigenvalues had larger standard errors than some of the
larger eigehvalues.

unrotated Loadings. Table 3 gives the group averages of the

 

three highest, three middle, and three lowest unrotated loadings found
in the first column of the principal axis solution. Each figure used to
compute a group average is in itself an average of a particular loading
on 100 factor analysis. Only the sample size 25 failed to give loadings
that closely approximated the population values.' The average standard
errors of these loadings are given in Table 4. Increasing the sample
size results in a rapid decrease in standard error; it should be noted
that as the average loading size decreases, the standard error increases.
For the group of low loadings, which approximate zero loadings, quadrupling
the sample size appears to halve the standard error, but this was not the
case fer the medium.and.high.loadings, although such a rule could still

be used to make a rough approximation of the standard errors for these

 

 

 

groups.
Table 3. Population Values and Obtained Means of
Unrotated Loadings for 100 Factor Analyses.
Sample Size

_2_§_ E0 4_0_0 Q0 1200 1600 5948
Size High .530 .611 .613 .619 .621 .617 .619
of Muddle .337 .419 .418 .420 .423 .427 .427
Loadings Low .069 .081 .072 .072 .075 .076 .072

 

 

-21r~

Table 4. Mean Standard Errors of Unrotated Loadings
for 100 Factor Analyses.

 

 

Sample Size

._2_5_ 100 400. “800 1200 1600

Size . High .261 .091 .047 .031 .022 .019
of Middle .347 .128 .073 .048 .033 .026
Loadings ._. Low .363 .196 .104 .077 .059 .049

 

Rotated Loadings. Since there was not a large difference in the
magnitude of the highest and lowest rotated loadings for which means
and standard errors were calculated, two comparison groups were formed
by grouping the four highest loadings and then the feur lowest loadings
together. The average of these groups of loadings are shown in Table 5.

Table 5. Population values and Group Averages of Obtained
IMeans for High and Low Rotated Loadings

 

 

Sample Size.
_2_§_ l_0_(_)_ 4_09_ ggg 1200 1600 5948
Size High .651 .714 .756 .758 .761 .761 .764
of Middle .496 .521 . 527 . 537 .538 .536 .537
Loadings ‘ I

 

Table 6. Average Standard Errors for High
. and Low Rotated Loadings

 

 

Sample Size
25 200 _4_0_0_ 890 1200 1600
Size High .235 .107 .035 .024 ’.019 .016
of Low ’.207 .152 .078 .049 .045 .037
Loadings

 

The average standard errors for the entries in Table 5 are given '
in Table 6. Increasing the sample size results in a rapid decrease in
standard error: for the higher loadings, quadrupling the sample size more

than halved the standard error, but fer the lower loadings the standard

 

 

 

~22-
error was not fully halved by quadrupling the sample size except when
going from 400 to 1600.

Loadings'a§_Correlations

 

If factor loadings are behaving as simple correlations, the
standard errors of loadings should conform to sigma = (1 - r2)/(N - 1)%,
which is the expected standard error of a correlation coefficient. But
it would not be unreasonable to consider loadings as "behaving” as
correlation coefficients if (1) fer the same sample size, loadings of
substantially different magnitudes have substantially different standard
errors with larger loadings having smaller standard errors, and (2) the
ratios of the obtained standard errors to the expected standard errors,
for a given magnitude of loading, are approximately equal.

Unrotated Loadingg, USing the entries of Table 6 and the formula

 

in the preceding paragraph, Table 7 presents the computed ratios of the
obtained standard errors to the expected standard errors. The average
of the ratios obtained for the lowest group is 2.01; ratios fer the
individual sample sizes are all quite close to this figure. For the
middle and high groups, the ratios decline except for sample size 100.
Examining the columns for sample sizes 100 through 1600 reveals that a

perfect rank-ordering exists between the ratios and the three sizes of

 

 

loadings.
Table 7. Ratios of Obtained Standard Errors to
Expected Standard Errors for Unrotated Loadings
Sample Size
2_5_ _1_0_0_ 40_0_ 200 1200 1600
Size High 2.12 1.47 1.52 1.41 1.22 1.24
of Middle 2.12 1.56 1.78 1.66 1.37 1.30'

Loadings Low 1.82 1.98' 2.08 2.20 2.03 1.96

 

 

.23-
Figure 1 portrays, for each of the three loading sizes, the

relationship between obtained standard errors and sample size. A perfect

rank ordering exists between the size of loadings and the obtained standard

error for each of the six sample sizes. It thus appears that the same

general type of influence which a correlation's magnitude exerts upon

the standard error of a correlation coefficient is also exerted by

unrotated factor loadings upon their standard errors.

 

 

0.36 \

\

0.30

0.24

0.18

0.12

0.06

 

25 100 400 800 1600

 

Samp1e>Size

 

 

Figure 1. Standard Errors of Unrotated Loadings
for Six Sample Sizes.

-24;

Rotated Loadiggs. Figure 2 portrays the relationship between

 

sample size and the obtained standard errors for the two loading sizes.
The rotated loadings exhibit a perfect rank-ordering of group sizes

for the five largest sample sizes.

 

0.20
0.15

“e

3e

0.10

'
0.05 I ‘\«\\\
0.00

25 100 400 800 1600
Sample Size

 

 

Figure 2. Standard Errors of Rotated Loadings
- for Six Sample Sizes.

 

-25-

Table 8 gives the ratios ef'those standard errors that were actually
obtained to.the standard errors*that*would‘be expected if the loadings were
behaving exactly as correlations;"*With‘the'exception‘of the first two
sample sizes, the higher.loadings yielded ratios that were quite close.

The middle group of loadings also centered near"one value; 2:08,'with the
exception of the sample size 25 which was considerably lower.
Table 8. Ratios of Obtained Standard‘Errors to

Expected Standard Errors for High and
Middle Rotated Loadings

 

 

 

“Sample‘Size
_2__5_ 1193 go_0 _8_0_g 1200 1600
Size High 1.76 2.14 1.63 1.60 1.53 1.52
of
Loadings Middle 1.54 2.08 2.16 1.94 2.19 2.07

 

 

Table 9 gives the means of the standard errors obtained for all
the highest rotated loadings, the value of the standard error*which would
be expected in the case of.1/(N)%, and the value of the standard error if
the rotated loadings were actually’behaving as correlations: ‘ThiS‘table
has been included because some investigators suggest‘using'l/(N)I to predict
the average standard error of rotated factor loadings. "Comparing the
obtained values with the expected .values, those in the bottom line of
Table 9, it becomes clear that the formula sigma- = Ll—l—334—-grossly
underestimates the obtained values. (N -

Table 9. .Means of the Standard Errors of the Highest
Rotated Loadings and Various Expected Values.

 

 

 

"Sample'Size'
.25 “lgg .399 . lggg_ 1200 1600
Obtained values .225 .135 .059 .038 .033 .027
1/(N)* .200 .100 .050 .035 .029 .025

 

(1 - rz)/(N - l)* .116 .058 .029 .020 .017 .014

 

-26-

 

Uniformity gfnghggggﬂémgpg_8tandard 25:91§_

As mentioned in the previon5'section; increasing the‘sample‘size ‘
decreases, without exception, the'standard‘errors‘for‘all'leveIS‘of'
loadings. .Although.the ratios-oftobtained'standard*errors'tO‘expected
standard errors, even at.a given loading size, were'not“identical'for'
all sample sizes, the standard error5“of the sample size5‘400, 800; 1200,
and 1600 do decrease.at;a.rateidirectly proportional‘t0‘1/(N)£‘for'each
of the three levels of loadings and for both rotated"and'unrotated”
loadings. These data have shown that the change”in‘standard error 15‘
uniform for the larger sample sizes; but erratic for smaller sample sizes.

Stability versus Type 2£_Loading

 

For equal magnitudes oleoadings, the standard errorS'of rotated
and unrotated loadings do netﬁappear‘tO‘be different. Another7way'to
investigate the stability of loading types is t0'look at"the resultS‘of
running a congruence test:for.randomfpair5'of factor'analyses. The'
Coefficient of‘Congruence (CC):was7dbtained for SO‘pairs for each sample
size and for both rotated and unrotated loadings. 'Though’for'the"Smallest‘
two sample sizes the.average.€C of the rotated loadingS'waS'higher;'
virtually no differences existed for‘the other sample sizes.' I

Table 10. Average Coefficients of Congruence for
Rotated and Unrotated'LOadings.

 

 

maze
2i 19.9. 5.99. 999 1200. 160

Rotated .884 .947 .990 .996 .997 .996
unrotated .538 .916 .993 .998 .998 ‘.999

 

 

(HAPTER III

PROBLEM II: STANDARD ERROR AND THE NUMBER OF ROTATED FACTORS

In examining the relation between the standard error and the number
of factors rotated, this problem uses SAMPLER, several COLMLDGS routines
modified slightly to meet the present problem's requirements, and
the same factor analysis program used in the previous problem.

Method

 

Since the hypothesis under examination is that the standard errors
of the highest rotated loadings of each variable will be at a minimum
when the correct number of factors is rotated, it was necessary to first
determine a body of variables for which the number of underlying dimen-
sions is known: the same variables used in Problem I were considered
appropriate. Two distinct approaches seem available for testing this
hypothesis. The first makes use of knowledge of the true factor pattern,
and the second provides a means of determining the correct number of
factors to rotate when nothing is known about the body of variables.

Knowing the true factor pattern makes it possible to vary the
number of factors rotated fer groups of factor analyses, to use the COLMLDGS
routine of the previous chapter to identify those sample factors most
closely resembling the pOpulation factors, and to then compute the standard
errors of the individual variables' highest loadings. ‘If the average'
standard error is at a minimum when 'the correct number of factors have
been rotated, the hypothesis must be considered t0‘have7been*supported.

Unlike the above approach, the second method considers each vari-

able individually and seeks its highest rotated loading wherever it may
' <27-

-28-
occur. Thus, if the average of the standard errors of all the variables'
highest loadings is at a minimum when the correct number of factors have
been rotated, this approach can be used to identify the number of dimen-
sions contained in a body of variables. Since the results from Chapter I
indicate that standard errors decrease as the magnitude of loadings in-
crease, any contrasts in standard errors brought about by rotating a
different number of factors will be more significant because everything
possible is being done to assure that only the largest loadings will be
selected. 'If Standard errors are indeed at a minimum.when the correct
number of factors have been rotated, it appears that an objective,
statistical procedure is available for identifying the number of under-
lying dimensions. But this identification procedure should be used with
caution, fer such a method may be dependent upon the degree of simple
structure of the population factor pattern, and thus any results obtained
here may only be applicable to those situations where a high degree of
simple structure is present in-the rotated factor solution.

Procedure for the First Approach

 

For the set of twelve variables containing the three underlying
dimensions mentioned in‘Chapter II, the following stepswere completedr‘
(1)250 random samples of 400 subjects each were drawn, (2) correlation
matrices were computed and factor analyses performed, (3) 2 number of
factors were rotated, (4) factors most like the population factors were"
chosen using COUMLDGS, and.(5) means and standard errors were computed‘
fer each variable.‘ These five steps were repeated three times, each“
time with 2_at a different value ranging from two to*f0ur. “This method'
can be called the "block" method since it requires that the variableS'

which approximate a population factor be located in the same column.

_29-
Procedure fer the Second Approach.

I II Two Sets oftwelvevariables were chosen, one containing three
underlying dimensions (the same one used for the first approach), and one
containing four. For the set of variables containing three underlying
dimensions, the six steps of the first approach were completed with k
ranging from 2 to 4 but with one important change: the COLMLDGS routine
used to select the rotated factor loadings was modified to choose the
higheSt loading for each variable regardless of the column in which it

might appear. When compared to the first approach, this modification

44 A. rru- m. a *‘h—A - .

of OOLMDGS should result in higher mean loadings for each of the variableS" ;

and thus, if anything, tend to make the standard errors more nearly I

equal. *—~*'
For the set of variables containing four underlying dimensions,

the six steps of the first approach were repeated five times with k_ranging

from Z'to 6. Again, the OOLMLDGS routine used was the type that selected

the highest loadings for each variable in each of the rotated solutions,

and the means and standard errors were computed using these values.

-30-..

Results

First Approach.

 

Table 11. Means of Obtained Loadings for the
Block and Individual Methods of Selection.

 

 

 

 

Number p£_Factors Rotated
3251 'Three Four
Block Indivi- Block Indivi- Block Indivi-
dual dual dual
(1) .54 .56 .54 .53 .42 .58
(2) .55 .58 .59 159 -49 .64 g
(3) .65 .66 .77 .76 .65 .81 I
vari- (4) .70 .70 .79 .79 .67 .80 I
able (5) .50 .50 .44 .44 .35 .59 3
(6) .56 .57 .63 .62 .63 .62 j
(7) .48- .46 .72 .71 .72 5.73 I
(8) .49- .58 .75 .76 .74 .75~ I g
(9) .57~ .59 .62 .62 .53 .59 "*’
(10) .53 .49 .73 .72 .72 .73
(11) .42 .49 .62 .61 .47 .65
(12) .47 .46 .58 .57 .62 .67 %

 

 

When the block and individual methods are used on the correct
number of factors--three-+there is little difference among the means of
the selected loadings (Table 11). When these two factor selection tech-
niques are used on the'rotated factor solutions which do not match the
number of underlying dimensions, the individual method generally has
the higher means.

The mean standard errors of all loadings for a given sample size
and.methodyare‘presented'in'Table 12. As predicted, the standard“error
is lowest when the7correct‘number'of‘factorS'have"been rotated: in'the
case of two rotated factors the standard error was nearly twice as large
and in the case of‘four rotated factors approximately“three'times'as large.

Table 12. Mean Standard Errors of Obtained Loadings.fbr the
Block and Individual Methods of Factor Selection

 

 

I Number p£_Factors Rotated

 

6

233 Three Pep
Block Method .092 .059' .156

 

Individual Method .091 .055 .073

 

53l-

Sgpppg Approach. The mean values of obtained rotated loadings
have been discussed in the previous section and are given in Table 11.

It should also be noted that the values obtained when the correct number
of factors have been rotated are not different from the population values,
and it is only in this event that the obtained values do approximate the
pOpulation values.

Table 14 contains the population values and the obtained means of
the twelve variables containing four, rather than three, underlying dimen-
sions. The population values have been inserted next to the column
containing feur rotated factor solutions since it is this solution that

most closely approximates the population values.

Table 13. Population values of the Highest Loadings for the
Correct Number of Rotated Factors and Means of Obtained
Loadings fer a Varied Number of Rotated Factors. *‘

 

 

Number pf_Factors Rotated
239' Three Four ngulation Five Six

 

(1) .55 .55 .56 .53 .54 .59
(2) .69 .73 .81 .84 .84 .84
variable (3) .72 .73 .81 .83 .83 .83
(4) .37 .44 .56 .56 .70 .81
(5) .31 .39 .59 .59 .75 .83
(6) .37~ .44 .64 .71 .70 .69
(7) .60 .61 .62 .63 .64 .67
(8) .52 .70 .71 .73 .70 .73
(9) .53 .75 .76 .77 .75 .73
(10) .51 .69 .72 .73 .73 .73
(11) .43 .58- .64 .65~ .66 .75
(17-)~ :19. aéé. iéé. 199. 199 .Lﬂi
(hkxul .51 .61 .68 .69 .71 .75

 

Table 14 gives the average standard error for all of the highest
rotated loadings at each of the specified number of rotations. The
correct number ofunderlying dimensions was four, and it was at this
number of rotated factors that the standard error was minimum ‘

Table 14. Average Standard Error for All Highest Loadings .
According to the Number of Factors Rotated.

NUmber pf_Factors Rotated
Standard Two Three Four Five Six

 

CHAPTER IV
PROBLEM III: CHANGES IN THE FACTOR-PATTERN

Although there are many possible ways to alter the factor pattern,
the two variations investigated in this chapter are (1) those brought
about by increasing the number of underlying dimensions, and (2) those
brought about by increasing the number of variables loading on a factor
while keeping the same number of underlying dimensions. *Certainly another
change could be extremely influential--varying the degree to which the
structure is unifactorial--but an indepth discussion of this variation is
beyond the scope of the present study.

142924

Number g Underlying Dimensions. Although an increased number of

 

underlying dimensions might not give reason to expect much change in the
standard error of loadings in the principal axis solution, the presence
of these dimensions might cause considerable wobble in the placement of
rotated axes and thus increase the standard error of rotated loadings.

For this reason nine variables containing three underlying dimensions
were designated as a core of variables to be used to determine the effects
of adding more dimensions. Each added dimension contained three variables
and one hundred factor analyses were run fer each of the patterns: 3e3~3,
3-3-3-3, and 3-3-3-3-3. The samples were randomly drawn by SAMPLER.with _
N_set at 500; factors were selected in each case by a COLMLDGS‘routine*
which Chose those factors most like the population factors. Means and ‘

standard errors were calculated for the highest rotated loadings, for the

”-32-

-33-
unrotated loadings in the first column of the principal axes solution,
and for the eigenvalues.

Number p£_variables Loading pp g_Factor. The factor pattern of

 

the previously mentioned core variables, which loaded in a 3-3-3 pattern,“""
was altered by adding more variables. These additional variables loaded

on just one of the underlying dimensions and yielded patterns of 4-3-3,
5-3-3, continuing to a final pattern of 9-3-3. These patterns permit

one to observe how the altered factor and.the untouched underlying dimen- I
sions are affected by doubling and tripling the number of variables load- I
ing on that factor. With N = 500, SAMPLER drew one hundred random samples

for each of the above factor_patterns; factor analyses and rotations were

obtained for each of the random samples. The highest rotated‘factor n .~

loadings were obtained by appropriate COLMLDGS routines, and theirfmeans
and standard errors computed. Means and standard errors were also computed
for all of the eigenvalues and the unrotated loadings in the first column
of the principal axis solution.

Results "

NUmber pg Uhderlyipg_Dimensions. Table 15 gives the averages of

 

each of the first six eigenvalues fer three, four, and five underlying
dimension situations. It is noticed that the values obtained do not
contradict the general rule of thumb which suggests that there are as many
significant underlying dimensions as there are eigenvalues greater than
unity. But it should be noted that the sixth eigenvalue for the five
‘factor situation is quite close to unity and the drop between the fourth
and fifth eigenvalues is much more than between the fifth and sixth.‘
Table 16 gives the standard errors for the entries in Table 15.
There does not appear to be any appreciable increase in standard error

as a result of the presence of'more underlying dimensions.

-34-

Table 15. Nban Eigenvalues-for~Three,.Four
and Five underlying Dimensions

 

 

NUmber‘9i_Factors

 

 

 

Three Four Five

(1) 1.984 2.07 2.18

(2) 1.57 1.67 1.73

(3) 1.27 1.36 1.43

Magnitude (4) 0.90 1.10 1.20
of (5) 0.81 0.97 1.07

. Eigenvalues (6) 0.75 0.89 0.99

 

 

 

 

Table 16. Mean Standard Errors of Eigenvalues for
Three, Four, and Five Factor Situations.

 

 

NUmber pf Factors

 

 

Three Four Five
(1) .082 .087 .097
(2) .072 .086 .086
(3) .067 .066 .070
(4) .040 .059 .059
(5) .032 .034 .031
(6) .031 .034 .031

 

Unrotated Loadipg§, The means of the first three loadings in the

 

first column of the principal axis solution are presented in Table 17.

No appreciable change has been feund in the magnitude of the loadings

as a result of having added dimensions.

Table 17. Means of Unrotated Loadings fer Three,
. Four, and Five underlying Dimensions.

 

 

Number pbeactors

 

 

 

Three Four Five~

Position (1) .552 .561 .528
of (2) .645 .674 .630
Loadings (3) .712 .730 .693

 

 

Table 18 gives the standard errors which correspond to the‘loadings.

in Table 17.

No meaningful pattern appears to emerge from these data.

 

-35-

Table 18. Standard Errors of Unrotatedeoadings for Three,
Four, and Five underlying Dimensions

 

 

NumberggfLUnderlying,Dimensions

 

 

 

 

Three Four Five

Positions (1) .048 .050 .063
of (2) .066 .068 .054
Loadings (3) .049 .050 .043

 

 

Rotated Loadings. Table 19 contains the means of the highest
rotated loadings for each of the core variables. Although a rather small
decrease in the magnitude of loadings seems to be the rule as the result
of increasing the number of underlying dimensions, in only one case dOBS'
a really sharp dr0p occur: variable p_of Factor III. An inspection'of'
the actual factor analyses revealed that this variable occasionally pulled
away from its factor to load with a higher loading on one of the other
four underlying dimensions.

Table 19. Means of Rotated Loadings for Three,
Four, and Five underlying Dimensions.

 

 

Number o_f_‘ Underlyillg Dimensions

 

 

 

 

 

Three Four Five

Core Factor I (a) .579 .535 .530
(b) .830 .815 .801

(c) .832 .816 .797

Core Factor 11 (a) .628 .624 .611
(b) .732 .719 .715

(c) .719 .723 .715

Core Factor III (a) .774 .760 .723
(b) .631 .633 .544

(c) .664 .642 .629

 

Table 20 gives the standard errors corresponding to the values of
Table 19. rLoading p_of Core Factor III has a rather large standard error,
something that would be expected considering the rather sharp drop which

occurred in the mean value of this loading after the fifth underlying

 

-364

dimension was added., The presence of added.underlying dimensions ShOWS"
an increase in the corresponding standard errors except fer the tw0'
highest loadings when five underlying dimensions were present.

Table 20. Standard Errors of Rotated Loadings for Three,
Four, and Five Underlying Dimensions.'

 

 

Number pf Underlying Dimensions

 

 

 

 

 

Three Four Five

Core Factor I (a) .060 .096 .107
‘ (b) .020 .039 .035

(c) .020 .040 .036

Core Factor 11 (a) .053 .056 .066
(b) .030 .037 .071

(c) .031 .035 .039

Core Factor III (a) .029 .032 .069
(b) .063 .076 .157

(c) .042 .069 .093

 

W pf Variables Loading p_r_l_ _a Em

Eigenvalues. The mean value of the eigenvalues form Table 21.
As the seventh variable was added to the first core factor to'make‘a'
total of 13 variables loading on 3 dimensions, the fourth eigenvalue? -
exceeded unity and remained above that level. But a sharp drOp is noted
between the third and fourth-~sharper than between the fourth and fifth.
This difference continues for the 8-3-3 and 9-3-3 patterns.

Table 21. Means of Eigenvalues for Nine through Fifteen
Variables Loading on Three Underlying Dimensions. '

 

 

Number pf. Variables .

(9) (10) (11) (12) (13) (14) (15)

Rank (1) 1.98 2.20 2.39 2.49 2.671 2.75 2.90
of (2) 1.57 1.59 1.59 1.68 1.68 1.67 1.72
Eigen- (3) 1.27 1.28 1.30 1.31 1.32 1.34 1.34
value (4) .90 .94 .96 .98 1.00 1.04 1.05

(5) 4.81 .85 .88 .90 .92 .96 .97

 

 

 

-37-
The standard errors for the entries in Table 21 are given in
Table 22. The size of the first eigenvalue increases regularly as vari-
ables are added, and a corresponding increase in the standard error is
noted. The magnitude of the standard errors has not increased for the
fourth and fifth eigenvalues.
Table 22. Standard Errors of the First Five Eigenvalues fer

Nine Through Fifteen variables Loading on Three
Underlying Dimensions.

 

 

Number'QEJVariables

(9) (10) (11) (12) (13) (14) (15)

 

(1) .082 .113 .112 .113 .150 .164 .166
(2) .072 .066 .081 .072 .081 .081 .090
(3) .067 .069 .066 .068 .066 .077 .064
(4) .040 .043 .044 .040 .044 .046 .043
(5) .032 .033 .035 .032 .030 .034 .033

 

unrotated Loadingg: The means of the firsthour loadings from

 

the first column of the principal axis solution are given in Table 23.
No appreciable change is noted among these unrotated loadings aS'a"
result of adding variables which belong to the same underlying dimension.

Table 23. Means of the Highest Unrotated Factor Loadings
for Four variables of the First Factor.

 

 

Number 9f variables
9 .19 ll. 12. "l§.l ii. .12

(1) -- .59 .59 .59 .57 .56- .55
(2) .55 .58 .57 .57 .58 .57 .56
(3) .65 .64 .65 .67 .65 .64 .65
(4) .71 .71 .72 .72 .70 .69 .68

 

 

The standard errors corresponding to the entries in Table 23 are
given in Table 24. The standard errors of the first two unrotated loadings
are not affected.by the increased number of variables, but the standard
errors of.the third and feurth variables definitely decrease as more

variables are added. It should be noted that although the unrotated

-38-
loadings for these variables are not affected by the increased number of
variables, the rotated loadings do show a decrease in magnitude (Tables
23 and 25).

Table 24. Standard Errors of the Highest Unrotated Factor
Loadings for Four variables fer the First Factor.

 

 

Number 92 variables
9 10. 11. 12. .13. 14. 1.5.

(1) --- .043 .038 .043 .040 .043 .045
(2) .048 .053 .046 .048 .045 .045 .048
(3) .066 .049 .039 .039 .031 .031 .032
(4) .049 .039 .034 .030 .030 .029 .033

 

Rotated Loadings. Table 25 contains the mean values of the

 

rotated loadings for each of the core variables. For the most part,
increasing the number of variables seems to have only a minor effect

upon the magnitude of loadings, but there are two notable exceptions:
variables 2 and.p of Factor I. For these variables a rather uniform

drop is noted as each additional-variable is placed in the factor pattern.
This is in contrast to what occurred in the'other"method of altering the
factor pattern, the method in which the number of variables was built up

to 15 by adding more underlying dimensions.

Table 25. Mean values of Rotated Factor Loadings
for Nine Core variables.

 

 

Number 9f Factors Rotated
'9 .11 .11 12. 14. 1.4. 1.5.

Factor I (a) .579 .585 .577 .559 .569 .545 .558
(b) .830 .790 .756 .750 .737 .721 .719
(C) .832 .811 .783 .764 .752 .732 .728

 

Factor 11 (a) .628 .631 .628 .629 .620 .624 .604
(b) .732 .733 .725 .720 .711 .707 .704
(c) .719 .726 .724 .701 .710 .707 .689

Factor III (a) .774 .757 .736 .760 .748 .747 .750
(b)I .631 .620 .634 .627 .618 .605 .609
(C) .664 .649 .644 '.641 .628 .631 .626

 

 

 

-39-

Table 26 contains the standard errors for the entries in Table 25.

Standard errors do not seem to be much affected by the increased number

of variables although their magnitudes do tend to increase slightly. As

the number of variables loading on Core Factor I increases, variables p

and E_on Factor I increase regularly and considerably more in percentage
than the other variables.

Table 26. Standard Errors of Rotated Factor Loadings
fer Nine Core variables.

 

 

Number pngactors Rotated
.9. 10. a 12. 14. .11 15.
Factor I (a) .060 .061 .060 .061 .064 .068 .068

(b) .020. .023 .029 .027 .032 .035 .040
(C) .020 .025 .026 .027 .026 .028 .034

 

Factor II (a) .053 .051 .054 .054 .060 .050 .058
(b) .030 .035 .035 .036 .036 .037 .044
(c) .031 .030 .032 .040 .036 .036 .037

Factor III (a) .029 .033 .027 .027 .032 .036 .031
(b) .063 .062 .050 .058 .065 .062 .066
(c) .042 .053 .056 .056 .066 .065 .060

 

 

CHAPTER'V
DISCUSSION

The discussion will center on the most important findings, which
(I) predict the sample size required to assure stable factor patterns,
(2) provide evidence that loadings do behave as correlations, and
(3) indicate that the mean of the standard errors of the most signifi-
cant loadings is at a minimum when the number of factors rotated equals
the number of underlying dimensions.

Problem I: Sampling Error
_S_a_JIp_l_<_e_ Size and the Stabilitx 21: m Patterns

Eigenvalues. When the eigenvalues obtained by factor analyzing
a sample correlation matrix are close to the pOpulation eigenvalues,
the resultant factors are most likely to be similar to the'population
factors. "A sample‘size of 400 was necessary before'the‘means of the
eigenvalues for 100 factor analyses were reasonably close to the’popula-
tion values. Sample sizes 25 and 100 produced four or more eigenvalues
greater than unity; at sample size 400 the fourth eigenvalue was 0.99‘or
just below unity, a value conforming with the rule of thumb which suggests
that the number of underlying dimensions is equal to the number of "roots
greater than unity. (It will be remembered that this experiment did have
just three underlying dimensions.)

The standard errors of eigenvalues appear to be small enough at
sample size-400 to assure that the second and third eigenvalues will not
cross; but perhaps this is not the most important consideration since indi-
Vidual variables could cross without the actual eigenvalues. More

-40-

-41-
importantly, there should be a high probability that the obtained eigen-
values will be similar in size to the population eigenvalues. The data
suggest that a sample size of 400 is probably necessary before one can be
reasonably confident that the resultant eigenvalues will be sufficiently
close to the pOpulation values.

Unrotated Loading . Sample size 25 does not produce unrotated

 

loadings whose means center on population values, but sample sizes of 100
and larger do (see Table 3). As sample size increases, two important
changes take place: (1) standard errors decrease and (2) the means of the
resultant loadings are closer to population values. Hence one way of
assuring that sample values will be close to population values might be
to pick a sample size which will result in a sufficiently small standard
error. For example, if it should be decided that the maximum.standard
error tolerable for any of the various levels of loadings of 0.10, then

a sample size of at least 400 appears necessary: the results of Table 4
show that the standard error of the low group, which approximates zero
correlations, is 0.104, while the standard errors of loadings in the other
two groups were much smaller; loadings averaging 0.40 had a mean standard
error of .073 and those averaging 0.50 had an average standard error of
.047. Since the probability that low loadings may become significantly
large is greater than the probability that larger loadings will become
insignificantly small, it follows that differences in interpretation are
most likely to result from low loadings becoming large.

Rotated Loadipgg, As in the case of unrotated loadings, a sample

 

size of 100 was sufficient to bring the means of the rotated loadings
quite close to the population values; this result would be expected since
the rotated solution is merely a transformation of the principal axis

analysis. The real determinant of how close a given variable is likely

-42-
to be to the pOpulation value is, once again, the standard error. If
one desires, for instance, to be certain that at the 0.05 level the resul-
tant loadings are not more than 1I0.16 away from.the population values,
a sample size of 400 is necessary (Table 6). If more precision is desired,
it may be obtained by an appropriate increase in the sample size; But‘it
appears that a sample size of 400 is necessary to consistently produce'
sample factor patterns that resemble the population factor pattern. .Al+
though using a sample size substantially smaller than 400 is likely t0‘
yield an interpretive text which is significantly different than the
one that would be written to the population factor pattern, the slightly
more accurate loadings obtained by increasing the sample size beyond
400 are not likely to result in interpretations that would produce a
different text.

Stability pf Rotated versus Unrotated'Loading . Table 9 indicates

 

that random pairs of unrotated loadings are"less congruent thaniare re;
tated pairs at the smaller sample sizes. This difference is probably due
to (l) the nature of the Coefficient of Congruence;:and (2) the manner in
which the rotated factors were selected:‘ The CC is'sensitive'tO‘sign
changes, and these were found to occur quite frequently, particularly for
sample size 25. .Also, the block method of factor selection allowed the
factor most like the population factor to be chosen from any of the three
columns representing the rotated factors; this was not done for the'un-
rotated loadings--they were taken from.the positions in which they occur
in the pepulation factor pattern. For the larger sample sizes,-the”pro=
bability that the unrotated loadings will not conferm.to the population
pattern has been virtually eliminated and rotation becomes nothing more
than a mechanical process. About the same degree of congruence is noted

for the rotated and unrotated loadings at the larger sample sizes.

-43_-

Prediction 9; the Average Standard Error

 

 

An important question is whether the actual standard errors of
factor loadings can be predicted from the sample size. Hamburger (1966)
finds the 1/(N)i is a good prediction of the average standard error for
the sample sizes and correlation matrices he investigated. The results
of the present study indicate that for rotated loadings of substantial
magnitude, the average standard error is consistently, though only
slightly, larger than l/(N)% (Table 9). Hamburger's suggested rule of
thumb fer prediction of the standard error appears appropriate for the
largest unrotated loadings (Table 4) and the largest rotated loadings
(Table 6) but yields values which are much too small to accurately pre-
dict the standard errors of loadings of less magnitude.

Upon correction for the average loading size using the formula
sigma = (l - r2)/ (N - l)%, the average of resultant standard errors for
rotated loadings of significant magnitude is approximately twice the
size of the standard errors that would be expected if the loadings were'
behaving exactly as correlations (Table 9). .A similar comparison for
unrotated loadings shows that the average of the resultant standard
errors is approximately 50 per cent greater than the corrected expected
values. ‘The low-level unrotated loadings, whiCh approximate zero loadings,
are also about twice the corrected expected.values and thus similar to the
largest rotated loadings.

Figures 1 and 2 indicate that once the sample size is large enough'
to assure the same factor pattern's appearance, the standard error does
decrease at‘a rate proportional to the square root of the sample size.
Hence it appears possible to predict the average standard error fer load—
ings of a given magnitude, a finding considerably more valuable than mere~

ly predicting the average standard error for all loadings considered

-44-
together. For all sample sizes and levels of loadings, Tables 7 and 8
ShOW’the ratios of the resultant factor loadings to the values expected
when the loadings behave as correlations. These ratios may be used to
roughly determdne the magnitude of standard error fer a given loading
level and for a specified sample size. SuCh information may also be

obtained from Figures 1 and 2.

 

 

Factor Loading§_§§ Correlations

Since the standard error of a correlation coefficient p_with a
sample size N_is given by sigma = (l - r2)/(N - l)%, loadings, if they
are behaving as correlations, must be expected to fellow this relation-
ship. Higher loadings must be expected to have lower standard errors,
and zero loadings should be approximately l/(N)%. The results indicate
that higher loadings do have lower standard errors, but the standard
errors were somewhat larger than those expected for correlations. As
discussed in the previous section, Figures 1 and 2 show that the standard
errors of loadings are proportional to the square root of the sample size
once the sample size has reached a level which assures repetition of the"
population factor pattern. Tables 7 and 8 indicate that the ratios of
the obtained standard errors to the expected standard errors are approxi-
mately equal for a given magnitude of loading.

PROBLEM 2: STANDARD ERROR AND THE NUMBER OF ROTATED FACTORS

Underlying dimension theory suggests that it is important to
rotate exactly as many factors as there are underlying dimensions. It
has been suggested that rotating too few factors will find some variables'
highest loadings wandering unpredictably among the rotated factors. Sim:
ilarly, if too many factors are rotated, groups of variables whose highest
loadings normally would be found on the same factor will unnecessarily be
divided to provide loadings for the extra factor(s); furthermore, it can-'

not be predicted which factor(s) will contribute variables to the

-45-
superfluous factor(s). But when the correct number of factors has been
rotated, unpredictibility disappears and the highest loadings are always
able to group together in the appropriate pattern.

It is logical to expect that as the number of rotated factors
becomes more distant from the true number of underlying dimensions, the
resultant factor pattern will become less appropriate.‘ As the factor
patterns become less apprOpriate, the standard errors also increase.
Hence, if a graphical portrayal is made with the standard error'repre-
sented by the vertical axis and the number of errors by the horizontal,
a U-curve should result with the point at the very bottom of the "U"
representing the standard error fer the correct number of rotated factors.

All of the experiments described in Chapter III did yield U-curves
(Tables 12 and 14). The "U" was considerably flatter fer the case of*
feur underlying dimensions than for either case of three, but perhaps
this is to be expected since certain conditions may accentuate the"difé
ferences in standard error between the correct and incorrect number of
factors. 'It may be that either the number of underlying dimensions or
the extent to which the factor pattern is unifactorial is the prime con-
trolling factor. Both might logically be expected to play a significant
role in determining the Shape of the "U". Since the presence of more
underlying dimensions means that more factors will have been rotated just
before and after the correct number, the severity of the situation in
terms of the percentage of variables which must load in a false pattern
is diminished because the percentage of factors that are not able to
properly develop is smaller. Hence the number of variables exhibiting an
unusuallyihigh standard error should be fewer and their effect on the
average standard error is likely to be less. The U-curve may also be

flattened if the extent to which the factor pattern is unifactorial is

~46-

low, for then it will be easier for certain variables, those loading high
on two or more factors, to pull away, thus shifting their highest loadings
from the correct grouping. Thus rotating the wrong number of factorS‘when
a unifactorial structure is missing should result in higher standard
errors fer a greater number of variables than in those situations poss-
essing unifactorial structure. In these experiments the number of
underlying dimensions probably exerted more influence on the shape of
the U-curve than a low degree of simple structure because all factor
patterns did approach unifactorial structure. I V

The means of the averages of the highest loadings (bottom line of
Table 13) show a progressive increase as the number of rotated factors
becomes larger. This is not unexpected, because as more axes are avail-
able to be positioned through likely groups of points, less error will
occur: the points will lie closer to the axes, and the distance between
each point's projection onto the axis and the origin, which is the value
of the loading, will be'greater.> Unfortunately there is no maximizing
process which would find the largest loadings occuring when the correct
number of factors has been rotated; instead, the more factors rotated,
the larger loadings become. But the fact that the standard error is at
a.minimnm when the correct number of factors has been rotated is an ex=‘
tremely important finding for it can be used to determine the number of
underlying dimensions in apprOpriate situations.

PROBLEM III: CHANGES IN THE FACTOR PATTERN

 

N_L_§n_b_e_r_ pf Underlying Dimensions

Eiggnvalues. A general rule of thumb suggests that the number of
significant factors in a body of variables is equal to the number of"
eigenvalues greater than unity. In.this experiment the means of eigen-

values fbr 100 factor analyses did conform to this rule, but it appears

-47-
that this rule is actually of little practical value. For the five-factor
solution, the standard error of the sixth eigenvalue is 0.031 (Table 16),
and since the mean value of this eigenvalue is 0.99 (Table 15), over 40
per cent of the factor analyses must have had at least six eigenvalues
greater than unity. Thus such a rule cannot reliably predict how many
factors should be rotated for the entire population.

Unrotated Loadingg, The highest loadings of the unrotated solu-

 

tion for Core Factor I (Table 17) indicate that adding dimensions to a'
body of variables does not produce much change in the loadings of the
existing dimensions. The standard errors (Table 18) also appear to be
unaffected. These results could be expected for two reasons: (1) the
component analysis being used extracts a maximum.amount of variance on
successive factors, and (2) the structure of the factor patterns is
basically unifactorial. Since the factors are basically orthogonal to
each other, it is not likely that adding a factor will contribute much"
to the previous factor structure, and the variance accounted for by the
additional dimensions will be extracted as new factors.‘

Rotated Loadings. .As might be expected from the above discussion,

 

an added number of dimensions did not, fer the most part, affect the mag-
nitude of rotated loadings (Table 19). 'The one loading that did show a
significant drop in magnitude had, it was discovered, a significantly

high loading on one of the'other dimensions added, andthusyit did conform
to the unifactorial structure exhibited by the other variables: ‘In """
general, these variables having relatively high loadings on.more than one'
dimenSion werefound to .have higher-standard errors - (Table 20), and thus
it appears that unifactorial structure also plays a role in determining

the stability of-factor patterns.

-43-

m 9: Variables Loading pp 3 M

Eigenvalues. Since the variables chosen for this experiment con-
tained only three underlying dimensions, it would be expected that the
number of eigenvalues greater than unity might be only three. The means
of the fourth eigenvalue were slightly greater than unity for the'
l3-l4-lS-variable situations. Is this a contradiction of the general‘
rule of thumb or is there a logical explanation? It will be noted that
as variables are added, all of the eigenvalues increase someWhat."As
expected, the first eigenvalue increases most rapidly since it is to this
dimension that the added'variables belong (Table 21). The second, and"
particularly the third eigenvalues remain much less affected: Since there
is a slight increase in all eigenvalues, it is logical that those'initially‘
near unity will eventually surpass this value. It mnght be wiser to
suggest using a rule which requires looking at sharp breaks between groups
of eigenvalues to determine the number of factors." One might also seek
to determine on which eigenvalue the most influence of an added variable
is manifested.

unrotated LoadingE, The magnitudes of the unrotated loadings on

 

the altered factor do not appear to be affected by an increased number of
variables loading on it (Table 23), but the standard errors of the third
and feurth variables-(Table-24) Show a regular decrease as the number of
variables increases. These are the most highly correlated variables and.
thus are the prime determinants oftheir underlying dimension.I As more
variablesare added to theirdimension, the probability that the nature
of the dimension will change is diminished.

Rotated Loadingg, .Although the probability that the nature of the

 

dimension represented by the first eigenvalue will change is diminished

-49-

by the presence of additional variables loading on that dimension, the
exact position‘of the rotated.axiS'placed through the group of points
representing these variables imay.become less stable. For an individual
factor analysis, each.added-variable‘provides an opportunity for addi4
tional wobble in the placenent‘of'the‘ axis, 'and'th'is should result in'a
gradual increase in the standard errors of the individual rotated vari-
ables as the number of variables present on that dimension increases
(Table 26).

It is also noted that those loadings which were initially highest,
p_and‘g_of Factor I, steadily decrease in size as variables are added
(Table 25).. If_the added variables were to load randomly on either side
of these prime determinants, it would be logical‘tO“expect that the'posi;
tion of the axis would not Change much from the original‘situation: But
if most of the added variables fall on only one side of these two highly
correlated variables, .the placement of the axis will be shifted in that
direction and the projections of the two points representing the highly
correlated variables onto the axis<will be closer'tOfthe‘origin, thus
decreasing the size of those.loadings. The latter is true for the sample

of variables used in this experiment.

CHAPTER‘VI
.SUMMARY'

The major purpose of the present study las been to empirically
determine the statistical information necessary‘to make meaningfu1 deci-
sions about sample size and.the number of factors." The study also inves—
tigated how changing the numberfof underlying dimensions or the number
of variables loading on.a factor affects the magnitude both of loadings
and standard errors.

Sampling.Error

 

From a population of 5948 responses to a fixed set of 12 variables
‘with a known factor_pattern, 100 random samples were drawn for each of
the sample sizes 25, 100, 400, 800, 1200, and 1600. ”Factor analyses were
performed, and the-means and standard errors were computed for all of the
eigenvalues, fer the highest rotated loadings of each variable; and for
the unrotated loadings in the first column of the principal axis solution.

The average of all the standard errors for middle- and highelevel,
rotated loadings .was found .to be slightly "larger than l/(N)%. Moreover,
when these loadings were divided into.groups of highest and lowest magni:
tudes, the standard errors.of each group weredfound to decrease at a rate
pr0portiona1 to l/(N)%, but the ratios of the obtained standard errors to
the expected standard errors if loadingsuwere'behaving as correlations,
(1 - r2)/(N - l)%, were different for different levels of loadings.
Loadings.whichgaveraged 0.75 were approximately 50 percent larger7than
the expected values while loadings averaging 0.50 were about twice as

-50-

-51_
large. For high.unrotated loadings, the obtained standard errors7were'
twice the expected values when thefloadings approximated zerO'correlations,
and less than 50 per centagreaterfthan'the expectedvalueS’when"averaging
0.60. .Means of loadings centered‘on‘the”pOpulation‘values”for'all‘but
sample size 25, and the two smallest'sample'sizes,‘25'and‘100,‘did‘not
often produce factor patterns identical to the pepulation factor pattern.

A sample size of 400 appears necessary to consistently produce
factor patterns that resemble the population factor'pattern: .Although'
using a sample size substantially smaller than 400 iS'likely'tO‘yield"an
interpretive textwhich issignificantlydifferent from ‘the'one‘that‘weuld
be written to the population faetor*pattern,‘the"slightly'more'accurate
loadings obtained by.increasing the*sampie‘size‘beyond 400'are notflikely
to result in interpretations that would produce a different text.

W 92 Factors

With N = 400, three groups of 50 random samples each were drawn
and factor analyses performed; .For each group; a different number of'
factors was rotated,.and means rand standard errors of the'highest'loadé‘
ings were‘cemputed...The.highestfloading5“were”sampled‘first“by'picking*
those groups of variables most similar to the population factor patterns
and then by picking the highest loadings for each variable without con-
sideration of the population factor pattern.

The three experiments indicated, first, that the average standard
error of the highest loadings-is-at‘a“minimum‘when‘the'correct‘number‘of'
factors has been rotated. Second, the means of the'highest‘loadings
become steadily greater.with an increase in the number of factors rotated.
Finally, the U-curve representing*the standard errOIS‘flattenS'out'as.the’

true number of factors increases. These results suggest a means for

-52-
determining the number of:significant‘underlying'dimensions'contained in
a body of variableS'and.this method appears'safe'tO‘use"when the structure
of the population factor pattern is unifactorial.

Changes‘ip_the'FactoriPattern

 

Factor patterns were-manipulated in two ways: (I) the number of
variables was increased t0715'by addingiadditional'underlying‘dimensions,
and (2) theinumber of variableS'waSiinereased'to'lS‘by'adding‘tO‘the
number of variables loading on just‘one“of the factorS"and thUS‘leaving
the number of underlying dimensionS'unchanged. Oneéhundred‘random'Samples
with N = 500 were drawn for each of the factor patterns; andfmeanS‘and
standard errors were computed for7the‘quantities previously mentioned.

As the number of underlying dimenSions increased,'the eigenvalue
representing the numberof the additional'factor, the fourth“or‘fifth,
increased most rapidly- Building'up the mmberotvariables-without
increasing the numberiof.underlying.dimensions“produced‘a‘marked‘increase;
in the first‘eigenvalue,.whileitheiothers increased considerably less.
A.general rule of thumb.abeut eigenvalues, which suggestS‘that’theinnmber.
of eigenvaluesgreater than unity'iequals'the‘number:ofisignificant“unders~
lying dimensions, was found to hold for the situation in‘which'the number'
of variables loading on a factor increased because'the‘fourth'eigenvalue‘
became greater than unity.when theiISth‘variable”waS‘added; It'appearS'
that as more variables are placedin the factor analysis, all lower rank
eigenvalues tend to drift.upwardequally‘rapidly'regardless of whether the
number of’underlying‘dimensions7has:changedior the number of variables
loading on existing.dimensions has been increased:

Increasing the number of dimensionS‘from'three to five7approxi;
mately doubled the average.magnitude of standard errors for‘rOtated“

loadings; no such increase was detected for the unrotated loadings.

-53-
Increasing the number ofzdimensionS'did not affect the means of either
rotated or unrotated loadings:.

Increasing the number.of variables from nine to fifteen without'
changing the number ofunderlying dimensions*did'not‘affect'the"means'of'
unrotated'loadingS'and-did.notvaffect'the“means'of”moSt*of“the‘rotated
loadings. .However, there was a decrease in the magnitudes of the two
rotated loadings which were initially highest, which'suggests7thattthe
position of the axis shifted-because‘of the loading pattern of the parti-
cular variableS'which were added. I
Quiestions Fostered -1_3y_ 513%

The study fostered the fellowing‘questions:

1. WOuld the empirical results of the present study'agree’with-
those that might be obtained by using a Mente Carlo technique?

2. .Are the present results generalizable'tO'setS‘of similar data?

3. HOW would a change awaysfrom.unifactorial structureaffect the
standard error of.a variable?

4. What other modifications of the'factor‘patternfare'possible,
and what will.be.their effects uponftheistandard errorS‘and means of
variables? For example, do permutations in the order'of'variables have
an effect upon the size or standard errors of the loadingS? I

5. Can the standard errors of resultant loadings be accurately
predicted from the average correlation of variables on a factor?

6. Since the standard.errors of loadings are higher than the
expected standard errors of true correlations, are the loadings acting as
partial correlations?.

I 7. Do the standard-errors of differentserthegonal rotational pro-
cedures vary when. the structure is not.unifact6rial?

8. would the.same standard errors fer loadings be obtained if

values other than unity are used as comnunalities?

BIBLIOGRAPHY

BIBLIOGRAPHY

Barlow, J.A. 6 Burt, C. The identification of factors from different
experiments. Brit. 2, Statist. Psychol., 1954, 7, 52-56.

 

Browne, M. .A comparison of factor analytic techniques. Unpublished
master's thesis, University of Witwatersrand, Union of South
Africa, 1965.

Cattell, R. B. Extracting.the correct number of factors in factor

analysis. Educational and Psychological.Measurement, 1958,
18, 791-838...

 

 

Guttman, L. Some necessary conditions fer common-factor analysis.
Psychometrika, 1954, 19, 149-161.

 

Hamburger, C. D.4 Factorial stability as a function of analytic rota-
tion method, type of factor pattern, and size of sample. Uhpub-
lished doctoral dissertation, University of Southern California,
1965.

Harman, H. H. .Modern.factor analysis. Chicago: Univer. Chicago Press,
1960. ' ‘

 

Harman. H. H. Modern factor analysis, Second edition. Chicago: Univer.
'Chicago Press, 1967.

Horn, J. L. A rationale and test fer the number of factors in factor
analysis. Paychometrika, 1965, 30, 179-185.

 

Joreskorg, K. G. Statistical estimation.2p_factor analysis. Stockholm:

Aquuist 6 Wleell, 1963.

 

 

Kaiser, H. F. The applicationsof electronic computers to factor analy-
sis. Educational and Psychological Measurement, 1960, 20, 141-151.

Kiel, D. F. G'Wrigley, C. F.. Effects upon the factorial solution of
rotating varying.numbers of factors. .American Psychologist,
1960, 15, 487.

 

Lawley, D. N. A.modified.method of estimation in factor analysis and
some large sample results. In, sala s osium on s cholo ical
factor analysis. Uppsala: Almqulst E WiREeII, 1953TI P%. 35-§2.

Levonian, E. 8 Comrey,.A. L.. Factorial stability as a function of the
number of orthogonally-rotated factors. Behavioral Science, 1966,
11, 400-405.

 

-54-

-55-

Merrifield, P.R., 6 Cliff, N.. Factor analytic methodology. Review
pnyducational Research, 1963,733, 510-522.

 

Pinneau, S.R. 8 Newhouse,A. Measures of invariance and compar-
~ability in factor analysis for fixed variables. Psychometrika,
1964, 29, 271-282. * '

Rao, C.R. Estimation and tests of significance in factor analysis.
P§ychometrika, 1955, 20,93-111.

 

Tryon, R.C. Salient dimensionality vs. the fallacy of "minimal rank"
in factor analysis. 'American Psycholpgist, 1961, 16, 167.

 

Tucker, L.R. A.method for synthesis of factor analysis studies.
Personnel Res. SeCtion 25p,, No. 984. washington, D.C.:
Dept. of the.Army, 1951. Cited by S.R. Pinneau 6 A. Newhouse,
Measures of invariance and comparability in factor analysis for
fixed variables. ‘Psychometrika; 1964, 29, p. 275.

 

 

Tucker, L.R. Recovery of factors from simulated data. Paper presented
at the meeting of the Psychometric and Psychonomic Societies,
Niagara Falls, Ontario, October 1964.

Williams, A. Factor analysis, factor'A: principal components and
orthogonal;rotations. -Technical report no. 34, Computer
Institute for.Social Science°ResearCh. 1967 (mimeo).

Wrigley, C.F. 6 Neuhaus, J.O. The matching of two sets of factors.
American Paycholggist, 1955, 10, 418-419.

 

APPENDIX A

The Computer Program SAMPLER

50
100

105

150
200

201

225

250

300

425
450
455

460
500

600

PROGRAMfSAMPLER

PROGRAM SAMPLER .1 '
DIMENSION DATA (50) , KFMl‘ (10)
DIMENSION N. (1000), MP1 (12), LPACK (6000)
REWIND 30

DO SO I=1,6000

LPACK (I) = 0

READ 100, IPOP, JVAR, KSAMP, 'NSAMP
FORMAT (415)

READ 105, (KFMT(I), I=1,10)
FORMAT (10A8)

DO 200 I = l,IPOP

READ (30,1(FM1‘) (MPI(J), J=1,JVAE)
DO 150 Jl' = 1, JVAR

LPACK(I) = LPACK(I)*8+MPI(J1)
CONTINUE

PRINT 201, (LPACK(I), I=I,IPOP)
FORMAT (lHO,8016)

DO 600 NS=1,NSAMP

LPOS=0

DO 300 KS=1,KSAMP
START=TIMEF(START)

CALL RANFSET(START)
M=(IPOP*RANF(-l))+l.0

DO 250 JCOMP+1,lpos

IF (M.EQ.N(JCGVIP))GO TO 225
CONTINUE

LPOS=LPOS+1

N(LPOS)=M

CONTINUE

DO 500 JCOMP = 1,KSAMP

IT = N(JCOMP)

II) 450 NT = 1,JVAR

NTl = JVAR-NT+1
N1 = LPACK(IT)
LPACK(IT) = LPACK(IT)/8
N2 = LPACK(IT)*8

PRINT 425, Nl,N2

FORMAT (2016)
DATA(NT1) = N1-N2
Continue

WRITE (32,455) (DATA(J), J-1,JVAR)
FORMAT (12121.0)

PRINT 460, (DATA(J) , J-1,JVAR)
FORMAT (1H0,12F3.0)

CONTINUE

REWIND 30

CONTINUE

ENDFILB 32

REWIND 32

END

-56-

APPENDIX B

The Subroutine CDLMLDGS

83
84

87

97
98

90

110
100

The Subroutine CDLMLDGS

SUBROU'TINE COLMLDGS .(NF,NV)
DIMENSION SUM(5,5), FACTOR. (15,5), EIGEN(15)
DIMENSION PRINAX(15)

REAL MAX(12)

FORMAT (*3*6F8.4)

FORMAT (*4*6F8.4)

FORMAT (*1*6F10.4)

FORMAT (*2*6F10.4 ‘

FORMAT (*5*6F10.4) ‘

FORMAT (*6*6F10.4)

IF (NF,NE,5.0R.NV.NE.12) RETURN
REWIND 45

DO 3.I= 1.12

READ TAPE 45, PRINAx (I)

READ TAPE 45,.(EICEN(I).I=1.12)
CALL SKIPR' (45.1)

READ TAPE 45. ((FACTOR(I.J).J=1,5) ,I=1.12)
DO 90 I=l.12_

MAX(I) =ABS (FACTOR( I , 1))

CONTINUE .

DO 100 I=l,12

DO 110 K=2.5 .
IF(MAX(I).GT.ABS(FACTOR(I,K))) GO TO 110
MAX (I)=ABS(F.ACI‘OR(I,K))
CONTINUE

OONTINUE

WRITE (62,86) (EIGEN(I), I=l.6)

WRITE (62,87) (EIGEN(I), I=7.12)
PUNCH83,(MAX(I) ,I=1,6)

PUNG-I84 , (MAX(I) ,I=7,12)

WRITE (62,97) (PRINAX(I), I=l,6)

WRITE (62,98) (PRINAX(I) ,I=7. 12)
REWIND 45 '

RETURN

END

-57-

IIIIIIIIIIIIIIIII IIIIIIIII IIIIIIIIIIIIIIIIIII IIIIIIIIIIIIS
3 1293 03085 6060