._ o .7. ‘ \I .. n o a o .u . m. .. . .. . . . . KN .v . \\ .1145. “.fih K... _ . .. {of . .Y. r on f v m. L .. n t o J «‘0 art a. . 4 .3... IA l P 3: . (3‘ ‘1 .Q‘v \‘i a. V1. \. i .3. L t 3 3 r. u . . 5/.“ .l. D‘ ¢ .+ “a .w ”3. .w 38 . d m w: in. y . .s f. . ‘ x .5 .J‘ -1 3...... '3 "S ‘1 ‘0 s 3 "E 3 .c- v! .Q. . . rd .1 L.“ . \z “.1. V n .1... m T. u lfluh 4...?» M... . _.\ u.-. firm 1.... gm: & cwl‘. G ..A .ku . u .w H...’ (11“ 003‘” .fnd .f . .1 ~ 0 (to .8. r3?» .\ g 4| M u..\I\ -— 0'1 . u . b . w .\o. 15h n!) «Ht 5. 'l b 3 ”MIN n .-.. ‘l. .. .u “a flu..- .»\u. a . ¢ I ‘0... o I {wk W5, «Q. U .. - J 5....“- I: \ rum \ L ' '\ ‘1 O u l r . , . l . '." “ y I . I . st 1 ‘0 — :l 9' . ‘l‘ 1- s r II ‘I I \ ‘ t .1 ~' \.I " I \ . < . s L ‘ \ l -, \ I I '. r ' < s _...__~ -—_—_' _ —_ This is to certify that the thesis entitled "The Reliability and Validity of the Courtis Test of Growth, Series S." presented by Robert Huyser has been accepted towards fulfillment of the requirements for M.A= degree in Education H. W. Sundwall Major professor Date May 29: 1952 0-169 THE RELIABILITY AND VALIDITY OF THE COURTIS GENERAL DEVELOPMENT TEST BY Robert Jay Huyser A THESIS Submitted to the School of Graduate Studies of Michigan State College of Agriculture and.Applied Science in partial fulfillment of the requirements for the degree of MASTER OF ARTS Department of Education 1952 THEME ACKNOWLEDGLENT S The writer has received assistance from many persons both in con- ducting and in reporting this study. Indebtedness is expressed to Dr. Harry Sundwall, under whose helpful guidance this study was conducted, and to Dr. Leonard Luker and Dr. Alfred Dietzs, who served as members of the examining committee. Further indebtedness is acknowledged to Dr..Arthur DeLong, who made available much of the data used in this study; to the many students who agreed to act as subjects, several on their own time; and to my wife, Sarah, who assisted in the preparation of both the original and final copies of the this thesis. 'Qflhf n‘ .‘~~ .4'b’b Q‘ ,0 ‘ 3 a A TABLE OF CONTENTS CHAPTER I. THE PROBLEM AND DEFINITIONS OF TERMS USED 'I'heproblem........... Statement of the problem . . . Importance of the study . . . . Definitions of terms used . . . . Differential testing . . . . . Obtained score . . . . . . . . Ratio testing . . . . . . . . . Reliability . . . . . . . . . . True score . . . . . . . . . . validity O O O O O O O O O O O O O O 0 Organization of the remainder of the thesis II. REVIEW OF THE LITERATURE . . . . . . . . . . Literature concerning the Courtis Test Literature on the theoretical validity of comtis Test . O O O O O O . C O C C O . 0 Studies on the reliability and validity of the the Courtis Test . . . . . . . . . . . . . . . Literature on theories of intelligence "Ratio“ testing and the theories of intelligence I I I . I'ZETHOWLOGY . O O O O O O O O O C O O O 0 On the assessment of reliability On the assessment of validity . . . . . . . . . . PAGE H mm-F'FF-F'KNWWUH 10 10 12 12 13 CHAPTER IV. THE MATERIALS USED AND GROUPS STUDIED . . . . . . . . Haterials used . . . . . . . . . . . . . . . . . The Courtis General Development Test . . . . . . The California Short Form Test of Mental Maturity The Otis Quickascoring Intelligence Scale . . . . Nature of the groups studied . . . . . . . . . . The experimental groups . . . . . . . . . . . . . V. PRESENTATION OF DATA . . . . . . . . . . . . . . . . Reliability . . . . . . . . . . . . . . . . . . . Reliability of timed tests . . . . . . . . . . . Intercorrelation among the subtests . . . . . . . Validity of the Courtis Test . . . . . . . . . . The effect of practice on the Courtis Test . . . VI. SUMXARY AND CONCLUSIONS . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . BIBLIOGRAHIY O O O O O O O O O O O O O O O O 0 0 O O O O 0 PAGE 15 15 15 19 21 21 22 26 26 27 29 32 32 3s 3s 39 '41 TABLE I. II. III. IV. V. VI. VII. VIII. IX. X. LIST OF TABLES RAGE Reliability of California Short-Form Test of Mental maturity . . . . . . . . . . . . . . . . . . 20 Frequency Distribution of California Mental Maturity and Otis IQ's for the Experimental Groups . . . . . 23 Total and Subtest Reliabilities for the Courtis General Development Test, and Estimated Population Reliabilities . . . . . . . . . . . . . . . . . . . 26 Comparison of the Reliabilities of the Timed Tests when High and Low Scores Were Discarded and When These Scores Were Included . . . . . . . . . . . . . . . . 28 Reliability of the Timed Tests . . . . . . . . . . . . 30 Subtest Intercorrelations for the Courtis Test . . . . 31 Intercorrelations Among Individual Timed Tests Corrected for Attenuation . . . . . . . . . . . . . . . . . . 33 DR and Subtest Validity Coefficients Obtained for the Courtis Test . . . . . . . . . . . . . . . . . . . . 3“ Correlations of Timed Tests from the Courtis with the Criteria . . . . . . . . . . . . . . . . . . . . . . 35 Significance of Differences in Mean Deveopmental 36 Ratios on Retest.. . . . . . . . . . . . . . . . . . LIST OF FIGURES FIGURE PAGE 1. Excerpts from Each Part of the Courtis Test . . . . . . . l7 2. Smoothed Distribution of California Mental Maturity and Otis IQ's for Experimental Groups . . . . . . . . . . . 2h CHAPTER I THE PROBLEM AND DEFINITIONS OF TERMS USED Since the beginning of the testing movement, attempts have been made to improve psychological measurement. Many of the tests now in use have been criticized, and rightly so, for their inability to measure a single aspect or factor of the human.personality satisfactorily. With few exceptions, the scores of traditional tests are influenced by a great many different factors. As a result, their meaningfulness has been questioned, and a few would claim.them to be indices of so much that they cease to have genuine relationship to anything. Dr. Stuart A. Courtis has proposed a new method of psychological measurement which is said to have promise as a remedy for some of these shortcomings. He has constructed a test using this method but has made no positive claims as to what the test measures. He has, however, pro- vided a few interesting hypotheses which are discussed in a later chapter. I. THE PROBLEM Statement g£_the problem. The purpose of this study was to make certain tests of the_reliability and validity of the Courtis General Development Test by means of statistical analysis of appropriate data. Importance 9£_§hg_stugy. The Courtis General Development Test was copyrighted in 1930 by Stuart Appleton Courtis, Ann Arbor, Michigan. Since that time it has seen little application, except for a small number 2 of research projects. It was the contention of the writer that one of the major reasons for this disuse might be the complete lack of accept- able reliability and validity determinations concerning this test. The Courtis Test, in the writer's opinion, merits investigation be- cause of its unusual construction and because of the promise it is said to offer for the improvement of educational measurement. This unusual method of test construction is called “ratio“ testing --a1so referred to by Dr. Courtis as differential testing. With this technique, Dr. Courtis has attempted to "cancel out“ the many factors which interfere with the precise measurement of individual differences. The theoretical aspects of this technique are reviewed in a later chapter. Another important aspect of this new method addresses itself to the very nature of measurement. Dr. Courtis seems to have made an.honest attempt to advance educational and psychological measurement from.the more primitive ordinal type to the ratio type characteristic of the "exzaict'I sciences. He has devised a new unit, called an “isochron", which is claimed to provide for an absolute zero point and for equality of units throughout the scale. These units, it is contended, are capable of being added, subtracted, multiplied, and divided as are the physical units of length, time, mass and so forth. It is of further interest to note that Dr. Courtis has presented data which seems to indicate that the test possesses considerable cross- cultural fairness. The test has been translated into several European languages and administered to a great many native school children. The average scores of these groups were found to be approximately the same. In fact, Courtis says, "The average ratio for an unselected thousand 3 gghggl.children is independent of sex, age, grade, or language within the limits of ages 9 to 20 or 3O."1 It is concluded, therefore, that the test may be useful for individual comparisons from one age or cul— tural group to another. Certainly, if these claims can be supported.and the usefulness of the test demonstrated, it seems safe to predict that the test will be- come accepted and widely used, and.that many more tests will be construc- ted by applying these techniques. Many of the difficulties met in educational and psychological re- search result from spurious correlations due to the fact that scores on tests are not determined by single factors. An example of such a dif. ficulty can be seen in the theoretical controversy on the nature of in- telligence.2 II. DEFINITIONS OF TERMS USED Differential testing. As used in this study, the term differential testing will be synonymous with the expression ratio testing. Obtained score. An obtained score is an actual score made by a subject on a given administration of a given test. Ratio testing. The Courtis measurement technique which involves administering two tests to each individual and expressing the score as 1 Stuart A. Courtis, "Differential Testing as a Method of Psycho- logical Analysis," Address of Retiring Vice-President, Section Q, Amer- ican Association for the Advancement of Science, Education, Boston, December 29,1933, p. 27. 2 For discussion refer to Chapter II. h a ratio of the scores on the two tests, thus attempting to "cancel out" factors which interfere with the measurement process. Reliabilit . For the purposes of this study, reliability will be defined as the stability of scores on repeated testing'under similar conditions. True score. .A true score is a hypothetical score which has been defined as the mean score obtained from.an infinite number of adminis- trations of a test to one individual. validity. Validity will be considered to mean the relationship be- tween the test scores and the various criteria chosen in this study. The portion of the study concerned with the establishment of the validity of the Courtis Test will be, essentially, a survey of relationships nec- essary for certain uses of the test. THE ORGANIZATION OF THE REMAINDER OF THE THESIS Chapter I has outlined and defined.the problem under consideration and presented background material essential to a full understanding of the problem. Chapter II will contain a review of pertinent literature and will include an attempt to fit this problem into its appropriate place with respect to related knowledge and research. A discussion of the major methodological problems of reliability and validity determinations, as they pertain to this study,is advanced in Chapter III. Chapter IV will be concerned with a description of the groups stud, ied and materials used. The conditions under which the study was made will be described in complete form. A report of the research conducted in this study will be presented in Chapter V. Hypotheses will be advanced and tested; conclusions will be presented where, in the writer's opinion, they are warranted. Chapter VI will include a summary of procedures and findings of the study. Additional problems in this area which were not considered in this study or which were raised by this study will be discussed. CHAPTER II REVIEW OF THE LITERATURE .A good deal of literature concerning theories of intelligence has been written; somewhat less is available with regard to reliability and validity of tests and very little concerning “ratio testing". To the writer's knowledge, the only literature concerned with "ratio testing" has been written.by Dr. Courtis. Literature ggncerning_the Courtis 2233, The Courtis Test attempts the use of a basic idea in scientific research, i.e., the law of the single variable. According to the theory presented by Courtis, two tests are administered to each individual. Both.tests, like traditional tests, are influenced by a great many variables. Now, by dividing the score on the second test by the score on the first test, all of these extran- eous variables are said to ”cancel out".1 The result is an index of the facet of the individual which is being measured. This assumes, of course, that the variables which influence the scores are multiplicative rather than additive and that they influence both tests equally. Literature gg_the theoretical.validity qf_the Courtis Test. The validity of a test is the degree of correspondence between scores made on the test and the "true" criterion, i.e., the trait or characteristic which the test was designed to measure. In the case of psychological 1 s. A. Courtis, ”Explanations Essential to Understanding,” (un- published folder), Detroit, Michigan, 1951, p. u. 7 tests, of course, it is rarely or never possible to measure the "true" criterion directly. A substitute criterion.must be chosen. The problem of finding an acceptable criterion for the Courtis is a difficult one, mainly because the trait being measured, ”quality”, has been only vaguely defined as to the behaviors expected of a person possessing a certain amount of the trait. Dr. Courtis has referred to quality as: ..... the cause withinDthe nature g£_the organism of differ- ences in growth when all other factors have been held constant .... The causes of differences in the achievements of differ- ent ingividuals OTBE(*) is called quality. It is a nature el- ement. Dr. Courtis, in classifying the various factors dealt with in mea— surement, mentions Nature factors, Nurture factors, and Maturity factors. He describes and gives examples of nature factors as follows: NATURE FACTORS: age, sex, differences in individual status, health, aptitude, imagination, memory, initiative, etc. The general name that will be used for all nature factors is QUALITY. That-is, two children who differ, let us say, in memory for numbers wilg be described as having memories of so many units of qualtiy. Dr. Courtis further describes the.test as measuring an "element“ which, by his definition, does not change.“ Besides these more or less general descriptions of "quality," Dr. Courtis makbs a few more specific suggestions as to what the test might measure: (T) Other things being equal. 2 S. A. Courtis, Maturation Units and How tg.g§g.Them, Detroit, Michigan, 1950, p. 63 3 S. A. Courtis, Toward §_Science g£.Education, (Explanations and Interpretations to Accompany Maturation.Units and How tg_g§g_Them), Detroit, Michigan, 1950, p. 17. u Ibid., p. 25. TheseEDR'ameasure the relative rate at which the indi- viduals learn or develop under uniform conditions. (They cor. respond somewhat to IQ's. If you use MM.7 as a divisor for the standard value for tests 3/2, 36.h for tests S/M, and 31.3 for tests 7/6, you will obtain for the most of the children IQ's comparable with those from intelligence tests.)5 In the same general vein is this description of the manner in which one would deal with persons possessing different DR's: ..... To a low DR speak slowly, wait between each idea un- til the individual has mastered it, use concrete illustrations,... ...With high DB's do Just the opposite. Talk in terms of ab- stract principles. Speak quickly and directly. Do not repeat, do not dominate. Let the individual state his needs and give him Just what he wants and no more. and: A developmental ratio corresponds roughly to an IQ, or a measure of brightness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.person who is of average bright- ness will have a developmental ratio of 100; those who are more gifted by nature will have ratios hi er than 100, and those less gifted, will have lower ratios. The resemblance of these and other descriptions to descriptions of differences in intelligence seems very pronounced.' Therefore, a measure of intelligence is proposed.as appropriate criterion for asses- sing the validity of the Courtis Test. However, lack.of demonstrable validity using this criterion does not preclude validity for another criterion. .A further test of validity will be made by comparing the scores on each subtest with the scores on each of the other two. This test is 5 S. A. Courtis, "Instructions for Giving the General Development Tests,“ (unpublished paper), p. h. 6 Ibid., p. 5. 7 S. A. Courtis, "The Interpretation of Scores in the General Development Tests.“ (unpublished paper), p. 1L. 9 suggested by Dr. Courtis' claim that the three subtests measure the same quality. Studies gggthe reliability and validity of.the Courtis Test. A recent study concerning reliability of the Courtis Test was conducted by Dr. Arthur R. DeLong.8 He reported.midpratio test-retest correlations of .53 (N = 75) and .56 (N a 56) using data collected on college students. The validity of the Courtis Test was investigated recently in a study by Rusch,9 using the rankbdifference technique on data obtained with.high school freshmen (N = 1N0), a correlation between Courtis Mid DR and KuhlmanpAnderson IQ of .37 was found. This would provide for predictions of one score from the other which were seven per cent better than chance. It may be asswmed that this sample was relatively unbiased. In a more recent unpublished study, Jacobs10 compared scores on the Courtis Test with Wechsler-Bellevue IQ. These data, obtained on MM res- idents of the Lansing Boys vocational school, yielded the following cor- relations. Cburtis score Wechsler score correlation median DB Full-scale IQ .h? median 1m Verbal IQ ‘ .148 median DR Performance IQ .50 Mid DR Full-scale IQ .37 5 Arthur R. DeLong, “How Does a Constant Disturbance Factor Affect the Developmental Ratios on the Courtis General Development Test,“ East Lansing, Michigan: Michigan State College Department of Elementary Education, 1952. p. 6. - 9 R. Busch, "Psychology Seminar," (unpublished paper), Naperville, Michigan, 1951, p. 3. 10 J. Jacobs, "Correlation Between the Courtis Test and the Wechsler- Bellevue Intelligence Scale,“ (verbal report of findings), East Lansing, Michigan, 1952. 10 Literature gg_theories gf'intelligence. Since the time intelli- gence was first measured objectively by Binet in 1904, three distinct theories of intelligence have emerged. Spearman postulated a two-fac- tor theory, stating the intelligence was made up of a general factor, ”G” and specific “s" factors. He recognized elusive group factors re- sulting from overlapping specific factors. Thorndike theorized that thought consisted of associations and that the number of these associa- tions or bonds that an individual had, or could.have, determined.his intelligence. Thurstone, on the other hand, pr0posed that factors could be isolated into ”primary abilities," each of which would be un:related to other "primary abilities." Experimentation has failed to prove or disprove any of these theor- ies. The data have been inconclusive. In administering intelligence tests made up of subtests, each designed to measure separate aspects of intelligence, the subtests are found to correlate from about .17 to about .50. Proponents of the “GT factor theory explain the "low" intercorrela— tions as due to differing experience backgrounds among testees. Propon— ents of the primary factor theory explain the "high" intercorrelations on the basis of "impurity" in the tests. "Ratio? testing and the theories g§_intelligence. If I'ratio" test- ing should prove to be an answer to "impurities" in testing, the way would be open for research to determine whether specific aptitudes are related and whether the theory of the general factor is tenable. This study will not be concerned with these questions directly; rather it will attempt only to determine ihether the specific "ratio” tests being investigated 11 are reliable and valid in some instances where validity would be assumed on the basis of theory underlying the test. If the tests were demon- strated to be reliable and valid, the way would be open for the construc- tion of tests of separate abilities which may be empirically shown to be pure. Tests of this type might prove fruitful for research concerning the organization of mental factors. CHAPTER III METHODOLOGY This chapter contains a discussion of the important methodological problems pertinent to this study. The approach selected will be out- lined and the reasons for its use presented. Qn_the Assessment g£_Reliability. The reliability of a test refers to the consistency or stability attained in.its measurements. As such, an index of reliability reveals the degree of confidence which may be placed in scores obtained with the test; i.e., it tells how closely a score may be expected to approximate some "true" score. Statistically, the numerical value of a reliability coefficient cor- responds exactly to the proportion of the score variance1 that is due to real differences in individuals in the trait measured by the test. The remainder of the variance is due to errors of measurement.2 The experimental and statistical procedures used to determine re- liability determine what is to be considered true variance and what is to be called error variance. There are four more or less distinct meth- ods of assessing reliability, each having variations. These are: (1) equivalent forms, (2) test-retest, (3) split-halves, and (h) analysis of variance among individual items. 1 Standard deviation squared. 2 E. F. Lindquist (Ed.), Educational_Measurement, American Council on Education, Washington, D. C., 1951, p. 561. 13 The test-retest procedure is, in a sense, the most conservative of the above methods. The reason for this is that any real change in the trait measured between the two administrations of the test, or in the manner in which the individual responds to the test, it considered by this method to be error variance. This is quite hmportant in some cases, while in others it is relatively unimportant. In the measurement of at- titudes and other traits which may be comparatively ephemeral and un» stable, reliability may be decidedly underestimated if much time elapses between the test and retest administrations. However, in a test which.measures a hereditarily determined trait, or any other trait that is highly stable and not subject to fluctuation, this procedure is considered to be quite acceptable and, in fact, avoids certain disadvantages in the other methods. Another consideration is the nature of the test itself. Certain tests lend themselves to certain kinds of reliability determinations.3 In the writer's opinion, the most appropriate method for the Courtis Test is the test-retest approach. Qg_the Assessment g£_Validity. The validity of a test is the de- gree to which a test measures whatever it was designed to measure. There are essentially two aspects of validity; namely, reliability and relevance. The reliability of a test can be thought of as placing a ceiling on the possible validity of a test. The other phase of valid_ ity, relevance, concerns the relationship between scores on the test and the actual trait which the test was designed to measure. It follows, 3 Lindquist, op, £12,, p. 577. in then, that to assess the validity of a test, it is necessary to have some independent measure of the trait in question. This measure is referred to as a criterion. The criteria chosen for this study will be various measures of intelligence. Independent estimates of intelligence are provided by the Otis Quick scoring Mental Ability Test and the California Short- Form Test of Mental Maturity. It is, of course, possible for a test to be valid for measuring one trait and not valid for another. Therefore, to demonstrate lack of validity for one purpose does not, §.priori, demonstrate lack of validity for some other purpose. CHAPTER IV THE MATERIALS USED AND GROUPS STUDIED The materials used in this study include, in addition to the Courtis Test, the California Short-Form.Test of Mental Maturity, Advanced '50 S- Form, and the Otis Quickbscoring Mental Ability Test, Advanced, Gamma, Form.Bm. The groups studied were composed of students in an undergrad, uate course in Child Growth and Development at Michigan State College. These groups, as well as the materials used to test them, will be des- cribed in greater length in this chapter. 1. MATERIALS USED The Courtis General Development Test. The Courtis Test, as pre- viously stated, is unique in its construction, the differential technique of measurement being its outstanding feature. Excerpts from.the Courtis Test are presented in Figure l to illustrate the manner in which this idea has been applied. This test consists of three subtests, each of which contains two separately timed tests. The first of these subtests is referred to as the "Cat and Dog" test. It is administered as follows: The subject's attention is directed to the first part of the test (part "a" in.Figure 1). He is instructed to identify the animal which is likg_the key animal by underlining the appropriate choice and placing its identifying number in the parentheses following the four choices. The individual is then given a signal to 16 start work on the test. After thirty seconds, and at thirty-second intervals thereafter during the test, he is instructed to circle the choice at which he is looking and to place an appropriate number beside the circled choice. The second portion of the test is administered in the same way except that the individual is instructed to identify the animal which is the opposite of the key animal. A procedure similar to that described above is used.in the two re- mahumg subtests. In the first of these, this idea is applied to words. (Figure 1, Parts "c" and ”d") The testee responds to the first portion of the word test by selecting the word which is the same as the key word. Again he locates his progress each thirty seconds when the examiner sig- nals, "Mark 1," and so on. The scoring is the same as in the "Cat and Dog“ test. In the second ”thinking" portion of the word test the procedp ure is the same except that the testes selects an antonym of the key'word. Again the same procedure is used in.the third test where the subject matter is numbers. (Parts "e" and “f“, Figure 1) In the first part of the number test, the subject-underlines the number which is identical to the key number. The second portion requires that he pick the number which is the reverse of the key number. The testee responds to the sig- nals, "Mark 1," etc., as he did in the first two tests. The test is scored by counting the number of responses in each thirty; second interval for each test. In order to increase reliability, the highest and lowest scores are crossed out.1 Then the remaining scores 1 S. A. Courtis, "Instructions for Giving the General Development Tests," (unpublished paper), p. 3. n“ ’M Q Rev-Eu“! . _‘ 4”}. A): - “fl .Lv {Vi 4" n .‘~ '- ‘ J" 3' -=::~. ' )N 'L Word Test Number Test” Figure l EXCERPTS FROM EACH PART OF THE COURTIS TEST :‘|eil"..\- ‘. . \ ' ‘ .. '~ . 17 -—-¢ , Lo." 18 are added and the score on the second test is expressed as a per cent of the score on the first test. This percentage is referred to as a percentage of development. These percentages are then transmuted by means of a table into linear units called "isochrons." The table of isochronic values was prepared by Dr. Courtis and contains loglog values obtained from the Gompertz growth curve.2 These units, according to Dr. Courtis, may be added, subtracted, multiplied, and divided as are the units used in the physical sciences. However, it seems appropriate to point out that the derivation and use of these units requires an assumption to the effect that the Gomperts curve adequately describes all growth and learning.3 After converting the percentage scores to isochronic units, each person's isochronic score is divided by the average isochronic score for the group of which.he is a member. The ratio thus obtained is multi. plied by 100 providing a number of the order of an IQ.” Thus, if a per- son's isochronic score is average, he will have a differential ratio of 100, while scores below average will provide differential ratios below 100 and scores above average will be transmuted to differential ratios above 100 e 2 For a discussion of the Gompertz curve see Croxton and Cowden, Applied General_$tatistics, p. “#7. 3 John C. Flanagan, ”Units, Scores, and Nbrms,“ Educational Meas- urement, (E. F. Lindquist, editor), Washington, D. 0.: American Council on Education, 1951, p. 722. 4 S. A. Courtis, "The Interpretation of Scores in the General De- velopment tests," (unpublished paper), p. M. 19 Three DR's (or QI's5 as Dr. Courtis has more recently called them) are obtained as a result of the scoring operations described above. These three values should be quite close together.6 If they are, the middle one is chosen; if one differs markedly, it is rejected and the other two averaged; and if the two extreme scores differ from.the mid- dle one by more than 10 points, the test is to be given a second time and the need for constant effort explained.7 The California Short-Form Test f M ntal Maturity. The California Short-Form Test of Mental Maturity is constructed on.the basis of the multiple factor theory of intelligence. It is composed of seven sub- tests, each designed to measure some aspect of intelligence. The fol— lowing scores are provided: 1. Total Mental Factors, 2. Verbal, 3. Non—verbal, M. Spatial Relations, 5. Logical Reasoning, 6. Numerical Reasoning, 7. verbal Concepts. Reliability coefficients for each of the above are presented in Table I. These reliabilities were obtained by the split-half method (corrected by use of the Spearman-Brown formula) using data obtained on 250 college freshmen. The standard deviation of the derived IQ's is given in the manual as 16 Ionints.8 5 Quality Index 6 This statement assumes that each of the three tests measures the same thing. 7 S. A. Courtis, "Instructions for Giving the General Development Tests," (unpublished paper), p. M. 3 Elizabeth r. Sullivan, Willis v. Clark, and Ernest w. Tiegs, "Manual, California Short-Form Test of Mental Maturity, Advanced, Grades 9 to Adult, 1950, S-Form." Los Angeles, California: California Test Bureau. 1950. p. u. 20 TABLE I RELIABILITY OF CALIFORNIA SHO T-FORM TEST OF MENTAL MATURITY Reliability Score Coefficient Total mental factors .94 Language .92 Non-language .88 Spatial relations .8? Logical reasoning .85 Numerical reasoning .88 Verbal concepts A .92 g loc. cit. 21 The Otis Quickascoring Mental Ability_Test. The Otis Quickascoru ing Mental Ability Test is constructed in accordance with the concept of general intelligence. As such, it combines a variety of items, all designed to measure general intelligence, to provide a single score. The split-half reliability of the 0tis,Quickhscoring Mental Ability Test, corrected by the Spearmaanrown formula, is presented separately for each grade. These values are as follows: grade 10, .90; grade 11, .91; grade 12, .85. No exact information concerning variability in the standardization group is provided. However, the manual states, "'Gamma IQ's' found by this method tend to be somewhat less variable than ordinary IQ's."10 Natu£g_qf the groups studied. Two groups of college students were studied. The subjects were students in a course in Child Growth and Development given in the Division of Education at Michigan State College during winter team of 1952. No attempt at random selection was made. Most of the students were in their junior year and were majoring in ele- mentary education. These groups appeared to be somewhat homogeneous and selected in IQ but not significantly so with regard to scores on the Courtis Test; at least this was found to be true where comparisons with less selected populations11 were possible. Also, the information available concern— ing the performance of less selected groups on the tests used in this study, makes possible some fairly reasonable predictions as to what might be expected had the study been done using other groups. lo Otis, manual for Administering and Scoring the Otis Quick-Scor- ing Intelligence Scale," 1937, p. A. ll Anonvmous, "Tabulations for Norms Based on Groups of Children Alike in SEXdAGE-GRADE," p. 3. 22 The Courtis Test was administered to this entire group twice dur- ing the winter term. The interval between the two administrations was about two months. On the last day of the term, the group was given the Otis teSte Thg'ggperimental_group§. Group I was made up of all the students on whom the above scores were available. This group contained seventy- four subjects. The scores made by these subjects were used in all re- liability determinations. The following term, spring, 1952, as many members of group I as could be contacted were requested to take the California Short-Form Test of Mental Maturity. Fifty-seven persons responded to the request. These subjects constituted group II. The means and standard deviations on the criterion tests for this group were as follows: California Test of Mental Maturity Mean 112.7 S.D. 8.3 (IQ points) Otis Intelligence Scale Mean 112.“ S.D. 8.0 (IQ points). The distributions of the Otis and the California Mental Maturity IQ's for the experimental groups are presented in tabular form in Table II and graphically in Figure 2. TABLE II FREQUENCY DISTRIBUTIONS OF CALIFORNIA MENTAL MATURITY AND OTIS IQ'S FOR THE EXPERIMENTAL GROUPS IQ California Mental Otis Otis Méggpoints Maturity, GroupII__:: GroupI Group_;; 187 l 0 0 134 0 0 0 161 1 0 O 128 2 1 0 125 3 4 5 122 2 11 8 119 2 7 6 116 5 7 6 115 18 9 6 110 5 12 10 107 6 10 9 104 7 5 S 101 2 0 0 98 5 4 4 95 o 2 l 92 0 0 0 89 0 l;f l N 57 75 57 ,.-a it _ I? . _ . . _ _ .xvlerr i _ . viola . a 9 e rILlanhlela Ll. r '4 Alil+'.n's twla.-#.'}.‘_ ' v :wI? q H .fiulce! .J‘°>JOI . Div Ihltl ‘ 3(l Z‘U—td!( CHAPTER V PRESENTATION OF DATA This chapter will be devoted to a presentation of a statistical treatment of the data which summarizes the findings of the study. A brief discussion will parallel the presentation of these data. Reliability. The assessment of reliability was approached with two major purposes in mind; first, to evaluate the reliability of the test for use with college groups such as the experimental group, and second, to estimate the reliability which might be Obtained if the test were used with an unselected population. The obtained reliability coefficients for group I are presented in Table III. These coefficients are estimates of the test-retest cor- relation for groups similar to the experimental group. An estimate of the reliability of the test for unselected groups was obtained by the 'use of data believed to have been provided by Courtis.1 These estimates were obtained by adjusting the correlation to correct for curtailment of variance in the experimental group.2 Unfortunately, this procedure could not be applied to the reliability coefficients obtained for the number test or the Mid DR's since no data were available from which to 1 Anonymous, "Tabulations for Norms Based on Groups of Children Alike in SEXPAGE-GRADE.' p. 3. 2 Lindquist, E. F., (Ed.), Education§1_Mga§urement, Washington, D. 0.: American Council on Education, p. 595. 26 TABLE III TOTAL AND SUBTEST RELIABILITIES FOR THE COURTIS GENERAL DEVELOPMENT TEST, AND ESTIMATED POPULATION RELIABILITIES Reliability Population Score‘ Coeffigient Estimate* Total Mid DR .460 . . Subtest Cat and Dog DR .485 .618 Word DR .454 .566 Number DR .286 *Estimates of the pepulation variance were unavailable for Mid and Number DR'e. 27 estimate the population variance on these scores. However, it would seem reasonable to expect that each value would be increased in similar proportion. Reliabilities of this order are very low by comparison with the reliabilities of other available tests and are definitely below the level claimed to be desirable for all but the most crude comparisons. Reliability 9§_timed tests. It was hypothesized that the procedure of discarding the "high" and "low" scores obtained during the five 30 second intervals of each timed test, rather than increasing the reliabil— ity of the test as claimed by Dr. Courtis, actually reduced it. It was believed that a further major cause of this unreliability might be the use of the procedure of dividing the score on one test by the score on another. That is, if the "true score“ on test 2 were 60 and the "true score" on test 3 were #0, an error of no more than 5 points in both might throw the ratio anywhere from .5” to .82, depending on where the errors occurred. For these reasons it was decided to determine the reliability of each timed test using both scoring procedures. These values are present- ed in Table IV. These findings seem to offer an explanation for the unp reliability of the ratio scores; i.e., the scores from.which they are derived lack adequate stability. The comparison of scoring methods favored, in each case, the proced. ure of retaining the "high" and "low" scores. The increase in reliabil- ity was not significant in every case, but the combined probability was significantly in favor of the method in which the extreme scores were retained. 3 Ibid., p. 609 TABLE IV COMPARISON OF THE RELIABILITIES OF THE TIMED TESTS WHEN HIGH AND LOW SCORES WERE DISCARDED AND WHEN THESE SCORES WERE INCLUDED High and Low Test Discarded .56 dmmmum O) (D High and Low Included .70‘ .71 .73 .78 .65 .75 29 Ratio scores were recomputed from the scores obtained by retain. ing extreme scores in the timed tests. However, a comparison of the reliability of these scores with those found by the original method of scoring failed to support the hypothesis of increased reliability. The data, presented in Table V, appear somewhat contradictory and do not significantly favor either method over the other. Intercorrelations among the subtests. It is of interest to note that the reliability of the Mid Dr (Table I) is no greater than the av. erage of the subtest reliabilities, as would be expected if the three subtests actually measured one "element".u The intercorrelations among the subtests were obtained for each administration of the test. These correlations were found to be extremely low. However, it was recognized that this could have resulted from.the unreliability of the subtests rather than from lack of similarity in the functions measured. Therefore, in order to maximize reliability, the "high" and “low" scores were in- cluded and the ratios for both administrations averaged. This procedure provided scores which were estimated to be somewhat more reliable (cat and dog test, .HS; word test, .73; number test, .6“). The intercorrela- tions among these sets of scores were then found and corrected for atten» nation to provide an estimate of the relationships existing among the actual'Wraits" measured by the subtests. These values, presented in Table VI, are low enough to cast considerable doubt on the claimed similarity of u s. A. Courtis, "Differential Testing as a Method of Psycholog- ical Analysis," (address of retiring Vice-President Section Q, American Association for the Advancement of Science, Education), Boston, 1933, p. 26s 30 TABLE V COMPARISON OF RELIABILITIES OBTAINED WHEN VARIOUS SCORING METHODS WERE USED ON THE SUBTESTS . Raio “ ’1'gh DR (Usin; and Low Scores and Low Scores Subtest Isochrons _ Discarded Retained Cat and Dog Test .49 .47 .32 Word Test .45 .57 .58 Number Test .29 .38 .47 naambubmw.9th.4.gzru he I... a. a u 9.8. Season to» an $3333 one: coupon owonbw amp no hpaadpoaaea can .coapmscoppe hon weapooaaoo cHs .Ho>ma pace hog H on» us oocooauaswam now on. .Hopoa pace mom a cap as omen Scam pcoaoumao hapcmoanacmam on ow mm. fiasco pose a camecm can» nH as. mm. mm. eo. unmeasz nods muses .mn. mm. do. b0. muonezz apas mom can poo me. on. so. am. mesa; spas won ens ado ccospmscoppa HH e H HmweOHpenp . mwwoapesp mmuoe you scopes» -macaaea nudcasea empomnsoo umaaasea m.ma . u.mn oapem owoaobm Emma mHBmDOU mma mom mZOHB¢AHmmoomHBZH BmMBmDm H> quds an 32 the functions measured. A relatively small portion of the "true" var. iance in any one of the tests seems to be accounted for, or accompanied by, variation in either of the other two sets of scores. A similar procedure was used to estimate the true relationship among the functions measured by the individual timed tests. These findp ings are presented in Table VII and.seem to indicate that there is some- what more homogeneity among the timed tests than among the subtests. Validity of the Courtis Test. Validity coefficients were obtained with group II using all scores on the California Short-Form Test of Men- tal Maturity as well as the Otis Quick-scoring Mental Ability Test as criteria. validity coefficients for Mid DR and subtests using all cri— teria are presented in Table VIII. None of the DR validity coefficients were significantly different from zero. Of the five subtest coefficients which.were significant, two were negative. The correlation of each of the timed tests with the criteria was found and is presented in Table IX. Of these #2 coefficients, seven were significantly different from zero. The scores from.both administra- tions were averaged and "high" and "low" scores were retained to obtain improved reliability in the timed tests. The effect g£_practice on the Courtis Test. It was hypothesized that if practice effects actually "canceled out" in the differential test, there would be no difference in mean scores if the Courtis Test were administered to the same group twice. This hypothesis was tested statistically by the application of a "t" test for significance of dif. ference in means. The results of this operation are presented in Table X. 33 TABLE VII INTERCORRELATIONS AMONG INDIVIDUAL TIMED TESTS CORRECTED FOR ATTENUATION QOIUIUFOI .58 .68 .42 .73 .54 3 4 5 6 .48 .48 .7O .59 .97 .70 .54 .76 .65 .90 .Hmpoa pcoo you n on» no messedmacmam .oheu scan pcoaemmau haucmoanacmame .Hopoa pcoo hem H one no ooceoauacmam you en. no“ deaasdoa ma mm. 90 soapmaoaaoo « ”H902 OH. on. om. ma. sum. ma. tom. ma. oases smoasz na. ea. mo. *mn.u :mm.u ma. Hm.u eon. oases use; HH.- mo. Ha. mo.u oa.a mo. oa.u ea. oases mom a use escapeapmacaaoe can no owoao>< ea. no. om. nH.u HH.- ma. mo. mo. ma ea: HH coapenpmacasea 00.- ma. mo. mo. mo. so. eo. so. ma an: H soapespmaaasea upmoocoo mcdcommom madnomeom mcoapmaom ewmsm owmsm, maouowh wwwo puma Honao> HecaAoesz Hooamoq Heapemm Iceacoz lama Hence: Hence #0 Emma mHamDoo mma mOh QHZHdsmo mazmHOHhhmoo HBHQHA<> Emmemom QZd mm HHH> mqmma .onou son» pcosmeeae sauceoaeacwamt . .Hopoa acme pom a on» we ooswoahacwam non en. H.>oa peso sea 8 on» we moceoaedcmau sou eoudsums ma om. no soapeaonnoo a ”@902 NH. am. 00.! mm. mm. tum. tom. 0H. m puma sees me. ma. 0H.I ma. 00. HH. na. ma. 0 puma aeoz ma. NN. H0. ¢H.I 0H.I Hm. b0. emm. m puma use: #0. mo. 00.! ha. mo. m0. HH. $0. w umos csoz tam. emm. 00. Ha. . ma. ma. 5H. H0. a name zoom 00. ma. m0.l eon. mm. ma. tan. N0.I N puma coo: till!!! I'l'lllllllllln'lllli' Ill mp.wocoo «commom ”cacomwom mcoapmaom ones”; 0.8:. mHOpowm capo , puma HoQAo> amOHAeasz Hmoawoq Hwapmqm IcoHcoz 3:84 Hence: Hence g «HmHBHMO mma maH3 mHBmDOU Ema zomh mamma szHa ho 20Ha4qmmm00 NH mqm¢a on 36 TABLE X SIGNIFICANCE OF DIFFERENCES IN MEAN DEVELOPMENTAL RATIOS ON RETEST Mean Mean 1* Developmental Administration Administration W Cat and Dog 73.5 78.8 5.3 4.4 Word 62.0 69.9 7.9 7.9 Number 75.0 81.1 6.1 8.2 't“ (d.f. - 73) must exceed 1.96 to be significant at the 5 per cent level and 2.58 for significance at the 1 per cent level. 37 In all subtests the mean scores were higher on retest, and all differ- ences were found to be highly significant (beyond the 1% level of sigh nificance). In the light of this evidence it seems reasonable to reject the hypothesis that practice effects "cancel out”. CHAPTER VI SUMMARY AND CONCLUSIONS I. SUMMARY This study has had as its major focus the problem of determining the reliability and validity of the Courtis General Development Test. It was believed that this test merited study because of its new method of construction and because of its claimed culture-fairness. Dr. Courtis' theory of differential measurement was reviewed in order to draw attention to its main features and to provide a background for the hypotheses which were to be presented and tested later in the study. The problems of assessing reliability and validity were reviewed and the methods chosen for this study were discussed in this context. The Courtis General Development Test was fully described, as was the recommended procedure for administering and scoring the test. The Otis QuickhScori g Mental Ability Test and the California Test of Menp tal Maturity, which were chosen as criteria for the validity study, were reviewed. All of the above tests were administered to two groups of college students, all of whom.were enrolled in a course in Child Growth and De- velopment given in the Division of Education of Michigan State College winter term, 1952. Group I contained seventy-four persons while group II was composed of fifty-seven. No attempt was made to select the subjects . 39 randomly. The performance of both groups on both intelligence tests was presented as a part of the description of the groups. Test-retest reliabilities were obtained for the Courtis Mid DR, each subtest, and for the individual timed tests. These are presented in Tables III and IV. Possible reasons for the surprisingly low re- liabilities found for the test were discussed. ‘ Validity coefficients found for DR's and subtests on the Courtis Test were presented in Table VIII. Out of forty coefficients, five were significantly different from zero. The hfypothesis that there would be no practice effect; i.e., no difference in means on retest, was tested. This null hypothesis was rejected on the basis of a “t” test showing all differences to be sig- nificant beyond the l per cent level of significance. II. CONCLUSIONS The major conclusions which seem to follow from this study are: 1. The reliability of the Courtis Test is too low for individual comparisons of any kind on the college level, and probably too low for comparisons of this type at any educational level. 2. Only in group comparisons could score differences be meaningful (with the probable exception of the number test which fails, with the present scoring procedure, to show enough stability for even the crudest of comparisons). 3. A major reason for the low reliability of the ratio scores seems to have been found in the low reliabilities of the timed tests. #0 h. The hypothesis that discarding "high“ and "low" scores would increase the reliability of the timed tests was not supported. Ev- idence was presented which favored the inclusion of extreme scores when computing the scores on the timed tests. 5. The hypothesis that all three subtests measure the same trait was not supported to the degree necessary for comparison with other measures. 6. The validity of the test was found to be in serious doubt. An important reason for this, undoubtedly, is the low reliability of the test. 7. The portion of this study dealing with the relevance aspect of validity has tended to show very little, if any, evidence of validp ity. However, in view of the unreliability of the test, the select nature of the group, and the possible questionability of the choice of criterion, these results are held to be inconclusive. An unp equivocal answer to this question should wait, in the writer's opin- ion, until the reliability of the test is improved. BIBLIOGRAPHY A. BOOKS Anastasi, Anne, Differential Psychology. New York: Macmillan Company, 1937- 615 Pp- Courtis, Stuart A., Maturation.Units and.How to Use Them. Detroit, Michigan: Stuart4A. Courtis, 9110 Dwight Ave., Detroit 14, Michigan, 1950. 148 pp. , Toward §.Science g§_Education, (Explanations and Interpretations to Accompany Maturation Units and.How to Use Them). Detroit, Mich- igan: Stuart A. Courtis, 9110 Dwight Ave., *Detroit 14, Michigan, 1950. Croxton, Frederick E., and Dudley J. Cowden, Applied General Statistics. New York: Prentice-Hall, Inc., 1939. 944 pp. Lindquist, E. F., editor, Educational Measurement. Washington, D. 0.: American Council on Education, 1951. 819 pp. , Statistical Analysis in Educational Research. Boston: Houghton Mifflin Company, 1940. 266 pp. Thorndike, Edward.L., E. 0. Bregman, M. V. Cobb, Ella Teodyard, and Staff, The Measurement g§_Intelligence. New York: Bureau of Publications, Teachers College, Columbia University, 1925. 616 pp. Thurstone, Louis L., The Reliability and validity 9;.Tests. Ann Arbor, Michigan: Edwards Brothers, Inc., 1939. 113 pp. , The Vectors of Mind. Chicago, Illinois: University of Chicago Press, 1935. 266 pp. Spearman, Charles E., The Abilities of Man. New York: Macmillan 00., 1927. 415 pp. B. TEST MANUALS Courtis, Stuart A., "Explanations Essential to Understanding." Detroit, Michigan: Stuart A. Courtis. 1951. 42 Courtis, Stuart A., ”Instructions for Giving the General Development Tests." Detroit, Michigan : Stuart A. Courtis. 1951. "The Interpretation of Scores in the General Development Tests." Detroit, Michigan: Stuart A. Courtis. 1951. Otis, Arthur 8., ”Manual, Otis Quickacoring Mental Ability Tests, Gamma Test: Forms Am.and Bm." Yonkers-on-Hudson, New York: World Book Company. 1937- 6 pp. Sullivan, Elizabeth T., Willis W. Clark, and Ernest W. Tiegs, "Manual, California Short-Form Test of Mental Maturity, Advanced, Grades 9 to Adult, 1950 S-Form." Los Angeles, California: California Test Bureau, 1950. 20 pp. C. PERIODICAL ARTICLES Courtis, Stuart A., "What is a Growth.Cyc1e," Growth, I (May, 1937). D. UNPUBLISHED MATERIALS (Anonymous) "Tabulations for norms based on groups of children ALIKE IN SEX-AGE-CEADE." 3 pp. Courtis, Stuart A., ”A New Point of View in Psychological Measurement." Unpublished paper presented to the Michigan Academy of Science, Arts and Letters, Psychology Section, East Lansing, Michigan, March 23, 1951. 5 pp. ' , ”Differential Testing as a Method of Psychological Analysis.” Address of retiring Vice-President, Section Q, American Association for the Advance of Science, Education, Boston, December 29, 1933. , "The Inside Story of the New Deal in Educational Measurements.” Ann Arbor, Michigan, 1934. DeLong, Arthur R., "How Does a Constant Disturbance Factor Affect the Developmental Ratios on the Courtis General Development Test." Un- published paper read before the meeting of the Michigan Academy of Science, Arts, and Letters, Psychology Section, Ann Arbor, Michigan: April 11, 1952. 7 pp. Rusch, R., "Esychology Seminar." Unpublished paper, Naperville, Illinois, 1950- pp. E. VERBAL COMMUNICATION Jacobs, James, Verbal report of findings in a study conducted at Boys vocational School, Lansing, Michigan, May, 1952. ROOM "USE ONLY a... N ”"Tfifl‘flflflufljfliufilLimiujflfiflniflflnjfgfiflm“