ABSTRACT THE TEACHING AND ASSESSMENT OF COGNITIVE STRUCTURE THROUGH THE DIAGRAMMATIC REPRESENTATION OF STRUCTURES OF KNOWLEDGE By Jean L. Dyer One of the basic learning problems in education is how the structure of a discipline can best be transmitted to the cognitive structure of each student. In the present study three aspects of this problem were investi- gated. First, structure of knowledge was Operationally defined. Second, a test which systematically examined the cognitive structures of students was developed. Third, the effects of different modes of presenting structure of knowledge were compared. Structure of knowledge was defined as the organization of a given area of knowledge, where organization referred to the relationships between elements within that area. Elements were defined as the basic unit within a structure with the type of elements in any structure depending upon the sub- iect matter area itself. Some typical elements are concepts, principles, events, and facts. Relationships were defined as the connections between elements. ' The following classification of relationships was developed: descriptive, causation, multiple causation, temporal, logical, quantitative, Jean L. Dyer functional, and composite (an interaction of any of the preceding). Diagrams (Venn diagrams, tables or matrices, lists, and graphs) were used to represent the structure of any area. All structures could be represented by graphs (graph theory), for there is a one- to-one correspondence between graphs and the definition of cognitive structure with points representing elements and lines representing relationships. Cognitive structure was defined as an individual's organization Of knowledge in a certain subiect matter area at a given time, xvhe're organiza- tion again referred to relationships between elements. However, the knowledge structure that the individual has acquired may not coincide with the structure of the subiect matter. Using diagrams as representations of structure of know- ledge, a procedure for constructing tests which systematically tested each of the specified structural relationships was presented. Transfer structures, cover- ing material that was logically implied by two or more given structures, were also explained. Items were scored to reflect the number of relationships cor- rectly understood. In many cases patterns of item responses were scored rather than using the traditional method of scoring each item independently. Two different modes of representing structure of knowledge were examined, verbal and diagrammatic. It was expected that diagrams would result in higher performance on acquisition and retention because diagrams would serve as "perceptual blueprints"; separating relevant from irrelevant structural relationships more clearly than verbal statements, organizing material Jean L. Dyer during acquisition and retention, representing material in a rather stable form for storage, and aiding retrieval of information. Three treatments were used to present the structure of knowledge in a 7,000 word passage on reliability: a diagram (D), a verbal (V), and a non-review (NR) treatment. The D treatment presented diagrams represent- ing the reliability structure in the following manner: 19 small diagrams, six substructure diagrams which integrated these smaller diagrams, and a diagram which connected all of these substructure diagrams, i.e. , the total structure of the passage. These diagrams were placed within the passage following material relevant to the comprehension of each diagram. The V treatment was identical to the D treatment except that the three levels of structural representations were in verbal rather than diagram form. The NR treatment consisted of the reliability passage without either the verbal or the diagram representations of structure. All 55 were given a diagram interpretation program before the administration of the treatment passage. After the passage Ss were given a test over the reliability content. 55 (n = I56) were randomly assigned by sex to the three treatments. A control group (n = 78), which did not receive any treatment but took the test on the reliability passage, was also used in making some comparisons. The three maior dependent variables were performances on a test covering all structural relationships, on a test covering transfer relationships based upon structure, and on a typical multiple- choice achievement test. One week later 55 were again given the tests and Jean L. Dyer also given a questionnaire pertaining to the experimental materials. The central hypothesis of the study was that the D treatment would facilitate learning of the structural and transfer relationships on both acquisition and retention more than either the V or NR treatments. However, no differences were expected among the treatment conditions for the typical achievement test. A retention drOp for all dependent variables across treat- ments was expected. Performance on certain sequential dependencies among the structural relationships was expected to be highest for the D treaMent. Certain correlations were also expected between some variables; in particular, time spent reading the reliability passage and the ma ior dependent variables, structure and transfer, and the dependent variables on acquisition and reten- tion. The primary exploratory question involved the type of structural relationships which were easy or difficult for the $5. The maior dependent variables showed no significant differences among the experimental treatments. Except for transfer, the control group's performance was lower than the experimental treatment's performance. Time spent reading the reliability passage was greatest for the D treatment and least for the NR treatment. Since the prerequisite item for all the sequential dependencies was not passed by any 5, an analysis of these patterns was not possible. Time was not correlated with the maior dependent variables and structure was more highly related to achievement than to transfer. However, achievement, structure, and transfer were correlated for acquisi- tion and retention . Jean L. Dyer A structural analysis of the Ss structural relationships indicated that certain substructures were more difficult than others, and that all Ss had dif- ficulty with certain types of relationships. In particular, precision in defini- tions was lacking and causal relationships were confused or incomplete. Knowledge of transfer relationships was consistent with the knowledge of structural relationships. The more difficult substructures often resulted in less consistency in Ss' cognitive structures from acquisition to retention. The unexpected similarity among experimental treatments was explained by inadequate comprehension of the reliability passage with one reading (as indicated by questionnaire data and absolute performance levels), and by presentation of diagrams too soon in the learning process. Because of inadequate comprehension the underlying theoretical position was not adequately tested. Performance on the items was related to two factors, the chance level of the items (format) and the number of relationships tested by the item (information load). These variables were used to explain correlation patterns and performance on items identified in the sequential dependency patterns. - .The differences among time spent reading the reliability passage suggested methodological implications for future research of a similar nature. Despite the general negative results of the experimental treatments, the ’ difficulty Ss had with certain structural relationships supported the usefulness of astructural analysis in testing and in diagnosing learning problems. The Jean L. Dyer similarity between Ss' comprehension and confusion of structural relation- ships, despite different versions of the reliability passage, suggested the need for investigating the ability of individuals to understand different types of relationships. THE TEACHING AND ASSESSMENT OF COGNITIVE STRUCTURE THROUGH THE DIAGRAMMATIC REPRESENTATION OF STRUCTURES OF KNOWLEDGE by .69 Jean LQUDyer A THESIS Submitted -to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education I 969 ACKNOWLEDGMENTS I wish to thank Dr. Lee Shulman, Dr. John Hunter, Dr. Clessen Martin, and Dr . John Wagner for their cooperation and suggestions in developing the study. I am particularly grateful to my chairman, Dr. Shulman, who gave me the freedom to explore some rather complex areas within educa- tional psychology . In addition, I wish to thank Dr. Hunter, who encouraged me in my initial explorations and whose constructive ideas clarified my own thoughts and provided me with the precision to operationalize the basic con- cepts in the study . I would also like to thank my friends and associates who helped me in many ways, especially Jay Powell, who willingly discussed many of the technical aspects of the study with me . Of course, the final words of appre- ciation go to my husband, Fred, who experienced every success and failure of the study with me over three 'Iong' years . His suggestions and encouragement were a vital part of the success and completion of every aspect of the proiect . Chapter IV. TABLE OF CONTENTS STRUCTURE OF KNOWLEDGE AND COGNITIVE STRUCTURE ................... Structure of Knowledge .............. Representation of Structure of Knowledge with diagrams ................ Cognitive Structure ................ TESTING COGNITIVE STRUCTURE ----------- Testing Cognitive Structure Relationships ------ Cornparison of This Approach with Other Approaches to Representing and Testing Cognitive Structure .............. RATIONALE FOR. DIAGRAMMATIC PRESENTATION OF STRUCTURE OF KNOWLEDGE ---------- The Role of Diagrams in Learning .......... Individual Differences in Understanding Diagram; .................. Organization of Diagrams within a Passage ------ Overall Sequence of a Passage ----------- PILOT STUDY, EXPERIMENTAL MATERIALS, AND HYPOTHESES .................. Pilot Study ................... Experimental Materials .............. Tests ...................... Additional Blot Study Results ........... Hypotheses - - ................. PROCEDURE AND RESULTS .............. Procedure ------------------- Malor Raul” .................. Minor Rugultg .................. Page 9 I7 22 22 39 47 43 54 55 58 60 6i 66 69 70 76 80 Chapter Page VI . ANALYSIS OF THE COMPREHENSION AND RETENTION OF STRUCTURAL RELATIONSHIPS . . . I05 Structure .................... 106 Transfer .................... I38 Summary and Interpretation ............ I52 VII . DISCUSSION OF MAJOR RESULTS AND CONCLUSIONS ................ I57 Time . . . . . ................ 157 Achievement, Structure, and Transfer ....... I58 Sequential Dependencies ............ I6I Relationships among Main Variables ........ I64 Implications of the Study ............. I66 iv Table I0. II. I2. I3. LIST OF TABLES Page Distribution of Subiects ................. 78 Means and Standard Deviations for the Main Dependent Variables for Each Treatment .......... 8i Correlation Coefficients among the Main Dependent Variables for all Subiects ............ 90 Subjects' Rankings of Substructure Difficulty by Treatment and Group ................. 96 Correlation Coefficients among Main Dependent, Substructure, Background, and Questionnaire Variables for all Subiects ................. 97 Percentage of Subiects with Perfect Substructures on ACquisition and Retention ............... I07 Substructure I: Acquisition-Retention and Consistency Percentages for Structures 20,2I , and 22a ........................ I08 Substructure 2: Acquisition-Retention and Consistency Percentages for Structure 2 .......... III Substructure 3: Acquisition-Retention Percentages for Structure 6 - Memory Item ............... II6 Substructure 3: Consistency Percentages for Structure 6 ....................... I22 Substructure 3: Acquisition-Retention Percentages for Structures 4 and 5 ..... . ............ I23 Substructure 3: Consistency Percentages for Structures 4 and 5 .................... I26 Substructure 4: Acquisition-Retention and Consistency Percentages for all Items ............ I28 Table I4 . I5. I6. I7. I9. 20. 21. 22. 23. 24. 25. 26. 27. Substructure 5: Acquisition-Retention Percentages for Structure I9 - Parallel Forms and Tests .......... Substructure 5: Consistency Percentages for Structure I9 - Parallel Forms and Tests .......... Substructure 6: Acquisition-Retention and Consistency Percentages for Structure 7-Ist ......... Transfer, Substructurei: Acquisition-Retention and Consistency Percentages for Structure 22b ....... Transfer, Substructure 2: Acquisition-Retention and Consistency Percentages for Structure 3 ........ Transfer, Substructure 5: Acquisition-Retention and Consistency Percentages for Structure I7 ........ Transfer, Substructures 3 and 6: Acquisition- Retention Percentages for Structure 7-2nd ......... Transfer, Substructures 3 and 6: Consistency Percentages for Structure 7-2nd ............. Transfer, Substructures 3 and 6: Acquisition- Retention Percentages for Structure 8 ........... Transfer, Substructures 3 and 6: Consistency Pa' centages for Structure 8 ............... Transfer, Substructures 3 and 6: Acquisition- Retention Percentages for Structures 9, I0, II, and I2 . . . ..................... Transfer,Substructures 3 and 6: Consistency Percentages for Structures 9, I0, II , and I2 ........ Guttman Reproducibility Coefficients on Structure and Transfer . . .......... . ......... Rank Correlations: Actual Difficulty with Format and Information on Structure and Transfer ......... vi I37 I39 I40 I4] I43 I47 I5I Table 29. 30. 3I . 32. 33. 35. 36. 37. 39. 4]. 42. 43. Page Analysis of Variance for Time and Errors .......... 295 Scheffe Multiple Conparisons on Time . . ......... 296 Analysis of Variance for Achievement ........... 297 Analysis of Covariance for Achievement .......... 298 Scheffe Multiple Comparisons on Achievement for Six Treatments and Control .............. 299 Analysis of Variance for Structure ............. 300 Analysis of Covariance for Structure ............ 30I Scheffe Multiple Comparisons on Structure for Six Treatments and Control ................ 302 Analysis of Variance for Transfer ............. 303 Analysis of Covariance for Transfer ............ 304 Analysis of Variance for Substructure an Acquisition - Six Treatments and Control ......... 305 Analysis of Variance for Substructures on Retention - Six Treatments ............... 306 I . Scheffe Multiple Comparisons on Substructure- Acquisition for Six Treatments and Control ......... 307 Means and Standard Deviations on Substructure- Retention for Six Treatments ............... 309 Responses to the Diagram and Verbal Questionnaire Items .................. i ....... 3I0 Correlations among Aptitude Scores and Main Dependent Variables ...... . ........... 3I2 vii Appendix :"F'Un .10 E- 9.2.3 LIST OF APPENDICES Pilot Questionnaire .................. Treatment Questionnaire ................ Diagram Interpretation Program ............. Outline of Reliability Passage .............. Reliability Passage ................... Diagrams and Corresponding Verbal Statements . ..... Test and Test Analysis ................. Order of Test Items .................. Guttman Dependencies ................ Analysis of Variance for Time and Errors ......... ~ Analysis of Variance and Covariance for Achieve- ment ........................ Analysis of Variance and Covariance for Structure ..... Analysis of Variance and Covariance for Transfer ...... Analysis of Variance for Substructures .......... Questionnaire Items Unique to Diagram and Verbal Treatments ...................... viii 2I5 2I6 245 260 CHAPTER I STRUCTURE OF KNOWLEDGE AND COGNITIVE STRUCTURE Many educational psychologists use rather broad and vague concepts which lack the precision necessary for fruitful application to class- room Iearning. Two rather common, yet vague, concepts in the literature today are structure of knowledge and cognitive structure . Cognitive structure refers to what a person knows, whereas structure of knowledge refers to what experts have decided he should know. Diagramming is proposed as a method of clarifying these two concepts, since it provides a method of representing structures of knowledge and a model for testing an individual 's cognitive structure. In addition, diagrams provide new approaches to curriculum development and to the assessment of learning . Novak (I966, p . 249-253) presented the following model of the educational process whereby the conceptual structure of a given discipline is transmitted to the student by various media such as books, teachers, and films with the student storing and integrating this information within his own cognitive structure . Discipline Student Conceptual —— "Programming" —— Cognitive Structure of —— Experimentation —-—-—— Structure of the The Discipline -———— Books _______. Student Films, Slides Other Sources Teacher The purpose of this study was to investigate all three aspects of this model . It attempted (I) to define conceptual structure of a discipline (structure of knowledge) and to develop a method of representing this structure, (2) to develop a test specifically of cognitive structure, and (3) to determine the effects upon cognitive structure of various modes of presenting structure of knowledge . Structure of Knowledge A Many of the new social studies and mathematics curricula empha- . size structure of knowledge . This trend is partly the result of Bruner's (I 963) stress upon designing curricula that reflected the basic structure of a field of knowledge . According to Bruner (I 963} “to learn structure is to learn how things are related (p.7)" . . . "it implies learning the underlying principles, attitudes, and/or regularities of a subject (Bruner, I966a, p.249) ." Structure was conceived as the most economical representation of a discipline; namely, the rules or propositions which generate it (Bruner, I966b.,p. 20I -203) . Morrissett (I 967) defined structure of knowledge similarly; the arrangement and interrelationships of parts within a whole. A structure can refer to the relationship of concepts to each other; for example the concepts, "economic system" and " political system" may be related to each other in astruc- ture called a "social system." Conversely, a concept may it- self have a structure . The concept "economic system" can also be thought of as a structure having component concepts such as "money" and "spending" which are structurally related to each other (p . 4) By citing "economic system" and ”social system" as structures, where economic system was part of the structure called a social system, Morrissett showed that structures could have different levels of abstraction and/or complexity . But he did not clarify the meanings of interrelationships (or arrangement) and parts. For exarrple, he distinguished theory from structure, a theory being a. general statement about relationships among facts, where these facts have been organized into concepts . A theory is a structure of concepts; it states a relationship - often a casual relationship among the concepts (p . 5) . In essence Morrissett defined a theory as a specific type of structure, with a specific type of part (facts organized into concepts) and a specific relation- ship (casual). However, later he treated structure and theory as two separate entities within any curriculum, theory implying more than iust a structure. It seemed that the meaning of structure of knowledge needed to be clarified by defining what is meant by "part" and what is meant by "arrangement" or "interrelationships" . The present definition of structure of knowledge represents an attempt to clarify and expand upon Bruner's and Morrissett's approaches . Structure of knowledge is defined as the organization of a given area of knowledge, where organization refers to the relationships between elements within that area . Elements In this definition, element might mean what is usually termed a concept, a principle, an event, a fact, an obiect, a theory, or a sub- structure, which could be larger or smaller than a theory . The type of element in a structure depends upon the particular subiect matter area . The criterion for determining the elements of a given structure is subiective, not yet based upon a mathematical or experimental procedure . I The lowest level of an element is an event, fact, or object, where the elements are not abstractions, e .g . , President Johnson talked with Premier Kosygin on June 25, I967. Concepts, being abstract, are at a higher level . Some examples of basic concepts from statistics would be mean, variance, and correlation; from science, energy, matter, neutron, and electron . Principles are at the next level of complexity . Some principles in physics would be the laws of magnetism, the gas laws, and buoyancy principles. Gagn'e's (I 966) distinction between concepts and principles is used . For Gagné concepts refer essentially to equivalence classes of objects or obiect-qualities. One type of concept, concept by observation, is learned through observation of positive and negative instances and another type, concept by definition, is learned through verbal communication . Principles, on the other hand, are "conposed of two or more concepts having an ordered relationship between them (Gagné, I966, p . 98) " . ' But this distinction is not always clear as Gagné himself pointed out in referring to concepts by definition, ' The other [type] is a concept by definition, which is in a formal sense the same as a principle . It is a combination of simpler concepts and is typically learned by human beings via verbal statements that provide the cues to recall of compon- ent concepts and to their correct ordering (p . 90) . This distinction between concepts by definition and principles is particularly unclear when the mode of presentation is only that of written material, which is the way many concepts are learned, as Gagné pointed out. Gagn'e (I 966) made two other major distinctions between concepts and principles. Gagn'e stated that the behavioral criteria for knowing con- cepts and principles were different; for a concept it is identification while for a principle it is demonstration . Concept testing involves a choice from a number of alternatives; principle testing involves a situation where per- formance reflects identification of component concepts and the operation relating them to each other. Considering the previously stated similarity between concepts by definition and principles, this seems to be an arbitrary criterion. The other maior distinction was that of mediation; a concept representing a single mediator and a principle representing a sequence of mediators . However, this criterion is also unsatisfactory for complex concepts could easily involve a sequence of mediators . Even though Gagne's distinction between concepts and principles was inadequate his classification of concepts into two types, concepts by observation and concepts by defini- tion was used . Returning to the original discussion of structure and the term "element", a certain structure might be an explanation of a concept or principle. Other structures might involve comparison of principles or con- cepts, implications of principles or concepts, combination or transformation of principles into more inclusive ones, etc . When elements are substructures such as theories, the structure or organization involves a hierarchical sequence of interrelationships. There could also be many generalizations which form substructures not complex enough to be called theories and even other substructures more complex than theories . Relationships Relationships are the connections between elements in a body of knowledge. The relationships given below are intended to be exhaustive of the entire class of relationships. However this may not be the case . Descriptive: is most apparent in definitions or character- izations of things, e .g . , characteristics of of types of rocks, criteria of good pop art, Causation: Multiple Causation: Tenporal : Logical: Quantitative: Functional: Composite: advantages and disadvantages of two theoretical positions . of the form "A causes 8" of the form "(A,B,C, . .) causes (W,X,Y, . .)" of the form "A precedes 8" refers to the logical connectives - and, or, not, if then, etc . One important subset here is the subset-set or inclusion-exclusion relationship, e .g . , classification of rocks, trees, or mammals; types of statistical tests, etc . mathematical relationships such as equality, inequality, proportionality, mathematical functions, order, addition, etc . excludes mathematical functions. Refers to relationships that express purpose, use, action, direction , transformation , etc . , e .g . , a corrputer processes data, a skillet is for cooking food, Edmonton, Alberta is northwest of East Lansing, Michigan, John is the father of Bill, etc . any complex interaction of the relationships listed above, e.g. , temporal-causal . A closer examination of different subiect matter areas shows that within and across areas different relationships may exist. Some concepts have a subset-set type of relationship, e .g . , in matrix theory the identity matrix is a subset of the set of all diagonal matrices and in history the set of explo- rations to America could be divided into subsets according to the country for which the explorer sailed, to the period of time, or perhaps by geographical areas of exploration on the mainland . In some areas terms are related in the sense that one describes the other; a psychological learning theory might be described as behaviorist, neo-behaviorist, or cognitive and in geometry triangles can be described as equiangular, equilateral, similar, congruent, etc . Another common relationship is that of logical inplication, e .g. , an equilateral triangle logically inplies an equiangular triangle and vice versa, and much unsystematic or error variance in test scores implies low reliability . Range of flaplication of the Present Definition This definition of structure of knowledge is quite broad, including structures which are sinple, only two elements and one relationship, to com- plex structures involving many elements and many relationships including entire disciplines. Structures can also vary in degree of abstraction of elements, i.e. , from events or facts to theories Bourne (I 966) reviewed concept learning studies which varied the type of rule (relationship) that combined the defining attributes of a concept, e.g. , coniunctive, disiunc- tive, relational, ioint denial, conditional, etc. Rules showed different degrees of difficulty . These. results would imply that cognitive structure relationships also vary in degree of difficulty, reflecting perhaps differences in degree of abstraction and/or corrplexity . The present definition is not limited to what might be called the "basic" structure of a discipline where’basic refers to the most important ideas, the underlying principles, and/or regularities of a discipline . Since the basic structure of any discipline is to a certain extent determined by experts in that field itself, different structures could be identified, distin- guished primarily by a choice of different elements and in some cases by different relationships among these elements . But given a passage of material encompassing a smaller content area than an entire discipline, it is assumed that the same structure would be identified by different indivi- duals. Representation of Structure of Knowledge with Diagrams ' The preceding definition and explanation of structure of knowledge inplies that these structures can be represented by diagams which illustrate the elements and their relationships. In other words, the structure of a subiect matter or of a passage can be "seen", iust as the structure of a crystal or of a building can be 'teen ." Diagrams or graphics may be classified into four types: Venn diagrams, tables or matrices (n x m array), lists, and graphs, that is the grmhs of graph theory (Harary, Norman, and Cartwright, I965) . This classification can represent all of the previously listed relationships . In fact, even by using only graphs all structures could be represented, simply by letting points represent elements and lines represent relationships. Berlyne (I965) in discussing situational thoughts and transformational thoughts, which I0 are similar to cognitive elements and relationships respectively, has also stated that graph theory could represent all structures. For Berlyne "a node [ a point] can stand for a situational thought and a branch [connecting line] for a transformational thought leading from one situational thought to another (p. 200)." A more extensive analysis of Berlyne's approach is given later. Generally types of diagrams other than graphs are used because some diagrams lend themselves well to representation of particular relation- ships. A useful correspondence between diagrams and relationships is the followin g: Venn diagrams: subset-set Tables or Matrices: descriptive, causal, logical, multiple causation Lists: subset-set , descriptive (outline is one type of list) Graphs: causal, multiple causation, logical, quantitative, functional, composite (time lines in history and flow charts in chemistry are specific examples of graphs) 'A series of exarrples will illustrate how a diagram could represent the structure of a given area . Below is a paragraph on sinking of land followed by a diagram representing the structure of that passage . Geological subsidence or sinking of lands results from tapping the earth for oil or gas. Near Long Beach, California the land above the Wilmington oil field sand until it had become a bowl up II to 26 feet deep over ant area of 22 square miles . The slow subsidence of land ruined buildings, racked pavements, twisted railroad tracks, and wrecked bridges. The explanation for such phenomenon is as follows . Liquid or gas is generally drawn from a stratum of porous rock whose pores are filled with the fluid under pressure. If the rock is well consolidated (if its grains are well cemented together) it will usually continue to support the weight of the rock and earth on top after the fluid is withdrawn . However, if the fluid-holding rock is a poorly consolidated, easily- molded sandstone, once the supporting pressure of the fluid has been withdrawn from its pores, the pressure of the overburden compacts the rock, and the ground above subsides by the amount by which the rock is compressed. Other factors besides the mechanical strength of the fluid-containing rock may contribute to subsidence . For example, subsidence is more likely if soft, clayey material (which is easily conpacted) is present in or next to the fluid stratum . Conpactible material in or Pressure of land next to the fluid stratem above oil or gas field (oil or gas removed) Poorly Soft consolidated clayey Rocks Material (and) Sinking This structure is represented by a combination of a Venn diagram and a graph, representing subset-set and multiple causation relationships respectively . The subset-set relationship is the two types-of con'pactable material (poorly consolidated rocks and soft, clayey material) in or next to the fluid stratum. The subsidence or sinking of land is the ioint effect of oonpactable material and the pressure of land above the oil or gas field . I2 In this case the arrow (—-)) represents "causes" . An exarrple of a matrix or chart is given below . Types of Rocks Igneous Sedimentary Metamorphic X . Molten rock which has cooled and hardened Rock grains locked together by pressure and cementing material X Rock with changed mineral content Here the diagram represents the formation description of types of rocks. Note that the pattern of checks (X) indicates non-overlapping formation processes . Several structures could be diagrammed and then interrelated . For example. the following four structures were taken from an article for elementary students iustifying why America was named after Amerigo Vespucci rather than Columbus . Time Line Location of water route to India I492 q; Vespucci to Spain Straight w t of Europe Columbus I Vespucci: urther south than West I493 -lr- Columbus 2 ndies I499 d— Vespucci I Vespucci: uth of Amazon River I50] q—Vespucci 2 Vespucci: urther south than south- rn Argentina I 507 --r Name America I3 Name of New World No Name China-India America Causal Sequence: Why New World named after Vespucci Vesp : interest and knowledge in geogrcphy and cosmogrmhy ldoubt Columbus 's reports that he had found China 8: India lfirst sail to new world VI I kept accurate ms: trip I thought water route to India further south than where he had been (Amazon River) lwanted second trip to find route . ‘lsecond trip maps of second trip es< V: first to question if land was Asia or India Y first to assert land was a new continent Name new continent "America" after Vespucci These four substructures are not independent of one another and can be related quite meaningfully on a time dimension . A larger structure is there- by created where the substructures constitute the elements and these are I4 connected by a Mutual relationship (see page l5) . In summary, diagramming a given area requires systematic identification of the basic elements and the relationships among them, followed by selection of an appropriate diagram form to represent the structure . This form of representation can be classified as a symbolic, not an ikonic mode (Bruner, I964, I966b, p. 202, 2.52; Bruner, Oliver, and Greenfield, I966) . For Bruner the ikonic mode refers to summary images that ”stand for" the thing represented , where the image must possess parts similar to parts of the obiect thereby being isomorphic with distinctive features of the thing imaged . The symbolic mode refers to a set of symbolic or logical propositions, where no degree of correspondence with the thing signified is required . Diagrams, because they are not isomorphic with the thing represented, are part of the symbolic mode . Previous Use of Diagrams to Represent Structure of Knowledge This methodological procedure of graphically representing structure is not entirely new in the field of educational psychology, although it does not appear to have been applied in a systematic fashion to education- al problems . The neo-behaviorist literaturehas many diagrams illustrating verbal habit family hierarchies, conditioning processes, and mediation . Goss (I 96I) , a neo-behaviorist, referred to n x n tables and tree diagrams as ways of representing conceptual schemes . ‘— ‘— I5 wuozqmm> Lmuwo ..mu_.LmE<: ucmcwucou 3m: 9202 rl l l mqumE< mEmz Jimomp \ ucmcmucou 3m: m we: vcmfi pcmmmm op umcwe ”> > ecwacmag< :cmcusom mwm< we: ucm_ we cowpmmao op emcee u> cusp goaom gmcpcac H> A x, a ace“ ecm ”Pl 1 ounce ccww op awe» new ompcmz “>\\L1 > N wuoaqmm>1TpomP coon em: a: meme: cusp cuaom cmgpgzw muse; mwucH acozoca n> meme mpmcsuow pomx u> upcoz go: o» Pwmm umF "> .C are» new mo mace "> 7 .q co~mE< we cu=Om H> .rtll .II: .111 .II: turn P euuaamm>lueme_ mcwcurmchH cease ecu mc_;u mmcecH use: cane cane» uocu mucoomc .Fou oopnaou n>.|r 1|. rill iii: nuzom cmcugaw ”\,rli null. .11! till: III N magazpou rimmep P manE=Fou_ir Ildwmcm op .omm> Nae, T IT I] T. IF I l I ll xgamgmoEmou one Acomcmoma cm mmumpzocx can ummcmpcw “gmm> “mm: pgmwecum msmz oz Huuzamm> mmhu< omx13 ”muzmzcmm 4 C —-9E Concord to Danbury ll‘ Albany to Elmira D Boston to Albany Concord to Albany On a larger scale Senesh (I967) discussed the structure of disciplines of economics, political science, sociology, anthropology, and geography . In all cases he represented or illustrated the respective structure with a diagram, which could be classified as a form of graph . The basic concepts were represented by boxes and the relationships among these con- cepts represented by lines. In general, the lines did not represent a specific relationship among basic concepts but functional only to illustrate that a I7 connection or connections existed . Thus Senesh's approach is similar to but not as systematic as the diagram procedure presented here . Cognitive Structure Novak's (I 966) model included the cognitive structure of the stu- dent as well as the structure of the discipline . Cognitive structure is a currently popular, although not new, term used by cognitive theorists in attenptingto explain acquisition of knowledge . Ausubel (I963, p. 26) defined cognitive structure as "an individual '5 organization, stability and clarity of- knowledge in a particular subject matter field at any given time ." "Organization," "stability" and "clarity" as Ausubel used them appeared to be dimensions by which one could characterize various structures rather than being defining attributes. Ausubel always seemed to be referring to the degree of organization, the degree of stability and the degree of clarity, rather than saying that cognitive structures were organized, clear, and stable . Ausubel's definition is difficult to operationalize because he did not elaborate on the meanings of these terms . Reitman (I 965) attempted to clarify cognitive structure and thought processes through a corrputer simulation model using the list pro- cessing language, lPL-V. His approach to cognitive structure was that "we may regard the whole problem of cognitive structure as a matter of sets of cognitive elements interconnected together in complex networks. Relations I8 supply the connective tissue that ties the individual sets together into a network (p. 9I) ." Reitman tended to limit exarrples of cognitive elements to properties or names of obiects, although this limitation was not implied by the term "cognitive element ." He stated that a relation could be any connection or linkage among elements ranging from "functional relations (one thing depending upon another) to similarity and equivalence relations (one thing like or unlike another in some respect) and even to systems of interconnections of the sort we find in family relations (p . 96 ." Goss (I 96I) presented the notion of conceptual schemes which function as mediators organizing particular items or obiects . A conceptual scheme was defined as "one or more sets of categories or two or more vari- ables that stand in ordinal, classificatory, or functional [ mathematical] ‘r, l I e I e relationships to each other (p . 42 All of these approaches are similar in that they refer to some form of organization or relationship between things variously referred to as cognitive elements, categories, or variables. . Cognitive structure is defined similarly to structure of knowledge . More specifically, it is an individual '5 organization of knowledge in a certain subiect matter area at a given time, where organization refers to the relationships between cognitive elements . Cognitive elements and relation- ships hove been explained previously when discussing the terms "element" and "relationship'I as they applied to structure of knowledge . However, cognitive elements and cognitive relationships refer to what the individual I9 has acquired, which may not agree with the structure of the subiect matter . Colloquially speaking, cognitive structure refers to what the person knows, whereas structure of knowledge refers to what experts have decided he should know. An individual's cognitive structure is composed of many inter- related substructures, varying with the content which has been acquired and retained by that individual . This definition of cognitive structure is more inclusive and flexible than Reitman's and Goss's and reduces the ambiguity of Ausubel's definition. With this definition, "stability" and "clarity" refer to properties of cognitive structure . Goss's (I 96I) usage of the terms "category" and "variable" is similar in scope to that of "cognitive element," although he referred to. functional relationships only in the mathematical sense, whereas the present definition includes additional relationships. Reitman's (I 965) concept of relations which tie elements into a network implies the concept of organization as used here'. Although he did not specify the precise (meaning of elements and relationships, the terms are similar to those in the present definition . As mentioned later Reitman's method of representing cognitive structure differs from the present. Berlyne's (I 965, p. II4-I I5) distinction between situational thoughts and transformational thoughts is also similar to the distinction between cognitive elements and relationships, although Berlyne was primarily concerned with the role of these two types of thoughts (each 20 composed of an implicit response and its feedback stimulus) in directed think- ing rather than with the product of learning or thinking. A situational thought represents "an external stimulus situation (p . II4) " and a transforma- tional thought represents an operation which changes one situational thought into another . For Berlyne these transformational responses or thoughts are transformations in the mathematical sense as well . For each transformation there is a corresponding domain and range, such that a transformation applied to an element in the domain yields an element in the range . In other words, qeplying a transformation to a situational thought yields another (or the same, depending upon the transformation) situational thought. The present approach is similar to Berlyne's . Stating that an individual "knows that A and B are related by X", is another way of saying that he can apply transformation X to A to obtain 8. For example, if an individual "knows" that "systematic factors cause systematic variation " then when he applies the transformational thought of "cause of" to the situational thought of "systematic variation" he obtains "systematic factors" and not vice versa. It is also assumed that the individual can operate in a reverse fashion, i .e . , he can get from "systematic factors" to "systematic variation" by cpplying the transformational thought "result of ." Additional comparisons with Berlyne's approach are given later . In this chapter definitions of structure of knowledge and of cognitive structure were presented with an explanation of how diagrams 2I could be used to represent both types of structures . A comparison with other definitions of structure of knowledge and cognitive structure was also given . In the next chapter a procedure for systematically testing for cognitive structure relationships will be presented and will then be compared with traditional testing procedures . CHAPTER II TESTING COGNITIVE STRUCTURE Having defined structure of knowledge and cognitive structure, a procedure for testing cognitive structure based on these definitions will be presented . It will then be compared with other methodological approaches. Testing Cognitive Structure Relationships Defining both structure of knowledge and cognitive structure in terms of relationships between elements and then diagramming the structure of subiect matter provides a guide for testing whether a student has learned these relationships and if not, the relationships he has learned. It is not assumed that an individual automatically stores a graph or matrix in his head, although through visual imagery he might use such a form in recall and thought processes. The diagrams which represent the structure of an area serve as blueprints for constructing test items which examine the relation- ships between elements within an individual '5 cognitive structure . Knowledge of elements is not directly tested with the present procedure . In general, multiple choice tests (Ausubel and Fitzgerald, I96I; Ausubel , Robbins and Blake, I957; Ausubel and Youssef, I963,~ I965; Fitzgerald and Ausubel , I963) have been used to measure cognitive structure . g 22 23 The results were then interpreted by using the total score from the test. But the total score as such did not reflect structure, nor the organization, stability and clarity of it. At least no attempts were made to illustrate such relation- ships. The method of testing for cognitive structure developed in the present study was based on the diagrams which represented the structure of the subiect matter. Test items were constructed which systematically examined all of the relationships between elements which were represented in the diagram. Data were scores which reflected the nature of structural relationships attained by the individual. The scores did not reflect simply the number of correct items. .g An individual's cognitive structure may not correspond with the structure of a given subiect matter area, i.e . , he may not have learned and retained the same relationships between elements that exist within the subiect matter itself. His cognitive structure may differ from the corresponding know- ledge structure in the following ways: a . all the structure could be absent b . part of the structure (elements and/or relationship) could be present and the rest absent c . a different structure could exist, i.e . , different relationships and/or different elements present d . a combination of b and 3 could exist Essay questions could be used to test for cognitive structure, but such items have the undesirable property that students are reluctant to put down ideas because they are afraid the idea is wrong . Yet often these ideas 24 are the ones in which a researcher is most interested . However, an objective test requires the student to answer each question . Therefore the recognition method was used where the elements were given in the item stem and/or alternative and a correct response indicated knowledge of the prOper relation- ship between elements . :Since only relationships are systematically examined this approach has some limitations, reflecting only part of the individual's cognitive structure; but it does insure that a!l of the structure of knowledge is examined . Such a test can be used to determine how well the structure of the material or passage has been transmitted to the student '5 cognitive structure . If items are constructed to yield enough information, it is possible to "diagram" on individual '5 cognitive structure from the test results . In such a test each item or group of items is analyzed for the number of relationships tested and is then scored accordingly. In the typical achievement test simply because two individuals have the same score on a test does not necessarily mean that they have the same cognitive structure or that they have answered items in the same way . For example, two people could have answered . 50 out of I00 items correctly yet have comprehended non-overlapping segments of the material . To a teacher or researcher, inferences for those two individ- uals would be quite different. Items on a cognitive structure test are often scored on the basis of patterns of responses, different patterns reflecting different cognitive structures rather than scoring each item independently . No attenpt has been made to quantify the degree of structure . 25 A series of examples will show how items testing cognitive structure. relationships can be constructed and how the items are analyzed for the number of relationships tested . The concept of reliability has been used in each of these examples. A brief explanation of some of the terms used in the items will be presented to clarify confusions that might be created by the item analysis itself. Reliability refers to the consistency with which a test measures whatever it purports to measure, or the degree to which a test may be depended upon to yield similar test results under similar circumstances . Since noperfectly reliable test exists, variability among scores earned by an individual-over repeated testing will occur . The two maior types of variation in test scores that can occur are systematic and unsystematic variation (which are caused by systematic and unsystematic factors, respeCtively) . Systematic variationlis characterized by a systematic change in scores while unsystematic variation is characterized by random fluctuations in scores. The greater the unsystematic variation then the lower the degree of reliability of a test. Thus in order to increase the reliability of a test, unsystematic factors need to be controlled . In addition, unsystematic factors can be classified into two types, varying and constant . The basic distinction between these two types is that they have different effects upon score variation when tests are given on different occasions. The quantitative measure of the degree of reliability of a test is called the reliability coefficient. It expresses the extent to which scores on 26 any type of test can predict scores on a similar type of test . When unsystem- atic variation is great over a series of similar tests the reliability coefficient is low... Thus the reliability coefficient can be classified as one type of correlation coefficient. There are three ways of estimating the reliability coefficient for a test: (a) test-retest; estimation from the correlation coefficient between scores on repetitions of the same test, (b) parallel forms; estimation-from the correlation coefficient between scores on parallel (comparable) forms of a test", and (c) internal consistency; estimation from the correlation coefficient among comparable parts of a test. The series of examples of items given below illustrate how an individual's knowledge of relationships among these basic concepts can be examined . A simple application type question could test for the relationships among the methods of estimating reliability coefficients and their character- istics (defining attributes) as represented by the following chart. The answers directly reflect if a student has the correct relationships among the methods of estimating reliability coefficients and the defining cliaracteristics of each , Clarity of subiects' cognitive structure is checked by requiring the same response to the first and last parts of the item. 27 Parallel Parallel Test Internal Forms Forms Retest Consistency Immed . Delayed Hype of Test Identical X X dmin . Similar X X Same Qccasion X X Different Occasion X X One; X imes Test More Than dmin . Once X X X are different from the text) Application Question (i .e . , application, because the examples The following methods are sometimes used for estimating reliability coefficients: method is described and place that letter (a,b,c,d) on the line preceding the (a) (b) (C) (d) test -retest internal consistency parallel forms - immediate parallel forms - delayed For each of the following situations determine which reliability corresponding statement . The group form of the Stanford-Binet intelligence test was given to the sixth grade class on the opening day of school . The Iowa Tests of Basic Skills were administered to all transfer students; form A on the first day of school and form 8 two weeks later . The teacher gave the same pre and post test on one chapter in the text. Both forms of a personality scale were administered to a group of nurses upon their graduation . 28 'A final examination was given to all students during the last class period of the day . 6 Below is an example of scoring a pattern of responses with the corresponding analysis of the patterns, which is in turn related to diagrams representing the structure of a certain area . The small circles (o) in the diagrams indicate the relationships which are tested in the item itself; the correct pattern "tfffft" (2 true and 1 true) reflecting each of these relation- ships as indicated by the analysis . The small checks (x) arelthe relationships reflected by the pattern "tfftff" (a true and d_ true) also indicated by the anal- ysis. High Degree of Unsystematic Variation Low x - aox Low T Correl . Coeff . on Parallel Tests , High .. Reliability C—ffoe . Low k Reliability L High F- . . . “Tm."f Reliability Reliability Correlation Coeff . Coeff. uant . Index xo X xo X xo egree of Unsyst. X X X or. in Scores X X X X Question over these two structures: If we found a low degree of relationship between tests 8 and C (are parallel tests) we would expect this to be reflected in quantitative indices (index) such as 29 low correlation coefficient high correlation coefficient low reliability high reliability coefficient high reliability low reliability coefficient :“0 0.0 0-6 Analysis If 2 true (2 false for consistency) - 3 points (I) quantitative index (2) two continuums; unsystematic variation and correlation coefficient If i true (d false for consistency) - 3 points (I) quantitative index (2) two continuums; unsystematic variation and reliability co- efficient If both ganditrue - I point - equality of correlation coefficient on parallel . tests and reliability coefficient. Consider g and e combinations when a and 1 true . If both Sand _e_false - 3 points (I) reliability not a quantitative index (2) infer continuums correct - reliability If Etrue and efalse - 2 points (2) continuum correct, but quantitative index wrong If E and 3 both true — inconsistent If 2 false and 2 true - no points If 2 true and 1 false . 5 could be false because of two reasons - continuum or quantitative index . Check by alternative d. 30 If 2 marked true is because of continuum, but incorrect, yet has quantitative. index - I point If i marked false - no points - can not infer about continuum and quantitative index is wrong . Consider g and e combinations . If 2 true and efalse - 2 points for continuum If c false and e false - I point for index If Efalse and Etrue - no points If a false and f true, 3 could be false because of either continuum or index . Check by alternative b. If b true - I point for index (continuum wrong) If E false - no points (cannot infer about continuum, index wrong) Consider E and e combinations If 3 true and _e_false - 2 points for continuum If Efalse and 3 false - I point for index If gfalse and 3 true - no points If both 2 and: marked false, reason is either continuum or quantitative index . Check by alternative 9 and b. If dtrue - I point for index If d false - no points If btrue - I point for index If E false - no points 9 and e combinations follow same pattern as outlined in the previous section . Using this analysis we then have the following consistent patterns . The re- maining patterns are inconsistent, scored -I . (A blank indicates "false".) 3I TTTTTTT T TT I97655 43655 0 If a student responded with the appropriate pattern of responses the diagram of his cognitive structure would be identical to the preceding diagram (relationships indicated with circles only). He would also be given I0 points for such a pattern, it reflecting knowledge of IO appropriate rela- tionships among elements. If he had responded with "tfftff" we would then examine the situation where g is true and _f_ is false. Under‘that analysis 1 could be false for two reasons, which could be checked by examining the response to alternative d. If g is marked true (as is the case) both parts of the alternative must be true from the student's point of view (both continuum and quantitative index). But, in fact, only the quantitative index relation- ships is correct, therefore allowing for only one point. Then S and e combinations need to be examined . Th is analysis shows that if such a pattern had been given only five relationships were correct in the student '5 cognitive structure yielding a score of five points for that pattern . Note that such an analysis gives the number of correct relationships and also tells which of these relationships are appropriate. There is a broader classification of response patterns that can be used. In general, response patterns can be classified as consistent or 32 inconsistent. If a pattern is inconsistent a student has contradicted himself. Consistent patterns can be of three types: (a) perfect, meaning that there is an isomorphic relationship between the student's cognitive structure and the structure of the subiect matter, (b) incorrect but without contradictions, i. e. , the student has answered consistently but wrong throughout the given set of items, and (c) partly correct, i.e. , a subset of items has been answered correctly, and the student has not contradicted himself on the other items (answer may reflect an omission of a relationship or confusion of relationships but not a contradiction). An example of a contradiction would be where a student stated that internal consistency and parallel forms-delayed tests yielded * the highest reliability estimates and then later stated that internal consistency and test-retest estimates yielded lowest reliability indices. A student would be "confused" if he stated that systematic factors caused unsystematic variation and unsystematic factors caused systematic variation. From both a teacher's and researcher's vieWpoint this type of information could be quite valuable, indicating different types of problems for different students. Testing for Transfer It is also possible to generate a transfer structure, a stru ctu re that is not presented in the material itself, yet is logically implied from what is given. This is done by critically examin- ing the relationship between two or more substructures. Below are two such substructures which yield a transfer structure because of their specific 33 relationships Thus transfer questions can be generated which go beyond the material presented and are not merely new applications of an idea or principle. System . Unsystem . Factors Factors onstant X Methods of Estimating Reliability Coefficients Test Int. Paral . Forms Paral . Form+ Retest Consist . Immed . Del . ' x x x x Time of Occas. x x Testing Diff. x x Occas Number Once x Times Test More than Admin . f Once x x x Transfer Structure Paral . Paral . Test Int . Forms Forms ‘ Retest Consist. Immed . Del . S stematic Factors x x Unsystematic Constant x x actors Varying x x x x 34 A transfer question which could be drawn from this structure could be tested. Transfer Item Suppose we want to determine the reliability of a newly constructed test called the "Student Happiness Test ." Two comparable forms, A and B, were prepared . The decision made to use all four basic methods of estimating reliability coefficients rather than iust one or two . Mark each of the following statements "true" or "false" on the basis of this information . Assume that similar students and group testing were used for all estimation methods . a b .c. d. e f 9 h IC and PFI would give the highest reliability indices. TR and PFD would give the lowest reliability indices. IC and PFI would give the lowest reliability indices. TR and PFC would give the highest reliability indices. PFD and PFI would give the lowest reliability indices. - IC and TR would give the highest reliability indices. PFD and PFI would give the highest reliability indices . IC and TR would give the lowest reliability indices. This item was constructed on the basis that constant unsystematic factors can not be distinguished when testing is done on the same occasion, thus giving higher reliability estimates for the internal consistency and parallel forms- immediate procedures. The four basic methods of estimating reliability co- efficients are discriminated by the time dimension, type of test, and number of testings. The above alternatives only confound time and type of test, checking to see if the student has connected time of testing in both preceding structures to generate the transfer structure relating estimation methods and type of factors . 35 Comparison of Present Method with Traditional Test Construction A cognitive structure test differs from the usual achievement test in both the construction process and the final product. Before constructing a typical achievement test the writer usually lists the topics which should be covered with their appropriate weightings . By contrast ,, an outline of a cognitive structure test is based on the structural analysis of the material, which is therefore dependent upon the elements and relationships identified . Topics and elements are different, although the two domains do overlap . Not all topics would be considered structural elements and not all elements would necessarily be tapics. Topics which are discussed but not included in a structural analysis would not be structural elements. It is also possible that the test writer might omit some topics which are structural elements because of personal preference (subiectivity is not necessarily avoided 'with a structural analysis either), some topics lending themselves very easily to questions, section headings in the text itself (which are not necessarily reflective of structure), etc . Weighting of topics with the usual achievement test is determined by some arbitrary criterion . Weighting of elements and relationships (structures) on a cognitive structure test is determined by the structural analysis of the material, the weighting depending primarily on the 36 interaction with other substructures. In other words, a structure which is crucial to understanding the material, is at the "heart" of the material, will probably interact with other structures more often than one which is not as central. It will thus be given more weight because the test is constructed to systematically examine all the interrelationships between elements . When we consider the actual construction of the tests, different processes are also involved . In the multiple choice achievement format, the items have been written to cover important topics, but the crucial? problem (for reliability and validity purposes) is to construct good alternative dis- tractors, such that item difficulty and item discrimination indices over cppropriate ranges. But these statistical criteria tend to avoid the issue of what type of content, not format, actually makes one alternative better than another. It might be that good distractors are those which contain "confusable" relationships of elements, ones that would be easily detected by a'structural analysis of material . But this is only a hypothesis and not one that has been supported by empirical evidence . As stated previously, items on a cognitive structure test are written to test for relationships . But these confusable rela- tionships are not necessarily included to mislead the student, but rather to adequately determine how he has connected ideas within his own cognitive structure . If a student contradicts himself, he obviously has learned and retained something different from a student who responds consistently . The scoring procedure for these two types of tests also differs . 37 On achievement tests the answer made to a given item is scored independently of the answer on any other item and these scores are then added . In a sense this procedure reflects the number of things a student knows. It also leads to the possibility that students may get an answer right by chance . Admittedly this chance is not "pure" but the probability of getting an appropriate answer is higher on most achievement tests than on a cognitive structure test. On such a test, the total score reflects the number of correct relationships existing for the student. These scores, as mentioned previously, are deter- mined in part by a pattern of answers, which means that a student is less likely to get an answer totally correct by chance alone, i .e . , the prob- ability is less that a student has five items correct than the probability that he has one item correct . Not all items involve patterns of responses even though they are scored for the number of correct relationships, e .g . , one item may reflect five relationships and another may reflect two. The total score may also be broken into parts reflecting the basic substructures within the passage . Traditional test analysis is not directly applicable to this type of test because of the differential weighting given to each item and the items which are scored by patterns. Thus, in general, the usual item difficulty and discrimination indices can not be used . Where sequential dependencies exist between substructures as shown by the structural analysis, answers should reveal a Guttman scale pattern, one set of relationships being a prerequisite 38 for another. This does not necessarily imply a 50-50 split on difficulty with such items. Rather one would look for patterns within students, expecting either correct-incorrect (:+-) , correct-correct (++), or incorrect - incorrect (--) sequences, not incorrect-correct (-+) ones. Thus the errphasis would be on patterns which are characteristic of a group of students or pahqs the ones which are not characteristic . The difficulty of such items might range from 0 to I00% and still be consistent with a Guttman scale pattern . Item discrimination indices would not apply to individual items which are scored in terms of patterns . The subscore for a pattern could be used instead, but the weightings of such items with the total .score would be greater than the usual achievement test items. An implicit assumption has been that most achievement tests do not test for structural relationships. Many writers may have such an analysis in mind when they write questions but they probably do not apply it in a systematic fashion . In other, perhaps extreme, cases the structure of the material may not be central at all . On such a test a student may respond correctly based on partial knowledge of the structure or on rate memorization . It is not claimed here that a test based on a structural analysis avoids all problems of test construction, but it does systematically and thoroughly test for knowledge of structure, which could be called mastery . Achievement tests constructed in the usual manner examine only a sample of all possible structural relationships. Thus, depending upon the sanple, an individual 39 could do well on the test without knowing large blocks of material . However, a test based upon a structural analysis examines all relationships . In order to perform well, a student must have grasped correctly all the interrelationships between ideas, and probably done some additional relating himself; partial comprehension will not suffice . Thus it could be assumed that unless a special "set" is given to emphasize structure, ordinary reading would not necessarily result in good performance on such a test although good performance might be expected on most achievement tests . Comparison of This Approach with Other Approaches to Representing and Testing Cognitive Structure Neo-behaviorist This approach to cognitive structure differs from the neo-behavior- ist's complex network of S-R associations primarily in that the neo-behaviorists are representing a process and the diagrammatic approach is representing a product. In the neo-behaviorist approach there is a stochastic chain repre- sentation of behavior where a given stimulus (covert or overt) leads to a certain response (covert or overt) with a certain degree of probability . With this model the neo-behaviorists give explanations of such behavioral sequences as problem solving, concept attainment, and creativity . The diagrammatic approach to modeling cognitive structure is not a model for such processes . It represents their outcome and does not assume that any one approach best 40 explains the' processes themselves . Assuming then that the basic difference between the two approaches is process versus product there still remains the question of whether the prod- uct is essentially one of S-R associations . A comparison of the basic elements of each approach will answer this question . With the neo-behaviorists the basic unit is "S-R"where "S" represents "stimulus" (covert or overt) and "R" represents "response" (covert or overt), an "-" represents the fact that they we connected with a certain degree of probability so that when S appears R tends to follow, unidirectionality implied . With the diagrammatic approach the basic unit is "0/l' " where "0" and "#" represent cognitiveelements and "/" represents the type of relationship that exists between them . "O" and "W do not represent respectively a stimulus and a response, and there is no unidirectionality implied . The symbolization could be "#*" with a different relationship specified . There is no beginning and end implied with this conception of cognitive elements . ", " represents a specific relationship that holds between "0" and "if", this relationship being a learned connection based on the subject matter area from which the elements were also learned . It does not represent a stochastic situation but rather that within a given sub- iect matter two ideas are connected in a certain manner; When applying the diagrammatic approach to testing for cognitive structure, the test checks if the student does have the correct relationship between these ideas . Thus the diagrammatic qsproach displays a huge, corrplicated network of 4I associations, but these associations are not between stimuli and responses, nor is the basic association that of sequential dependency. In Berlyne's (I 965) extension of the neo-behaviorist position, his distinction between structures of a system of symbolic responses as being either bare or stochastic is similar to the previous distinction between process and product. Bare structure is defined by specifying the responses that are associated as alternative next steps, with each represented stimulus situation at which a subiect mighf arrive. In the case of a transformational hierarchy, this . means specifying the alternative transformational responses that can branch out from each situational thought and speci- fying the new situational thought to which each would lead. Stochastic structure is defined by specifying the bare structure together with the probability of each response that can lead away from a given represented situation (p.303) Berlyne stated‘that structures could be represented by graphs; a bare structure represented by a nonevaluated graph consisting simply of 'a set of nodes and branches (points and lines). Bare structure is similar to cognitive structure in that cognitive structure implies specifying all of the possible relationships between cognitive elements that an individual possesses. After testing an individual 's cagnitive structurein a certain area, only part of the possible set of interconnections has been examined . Some relationships to cognitive elements might be stronger than the ones.examined, which would be indicated by Berlyne's bare structure, but the present emphasis is only with testing to see if certain 42 parts of an entire cognitive structure or bare structure actually exist for an individual. Berlyne also referred to certain types of structures being quite inportant in directed thinking, e.g. , transitive structures and habit family hierarchies. These structures imply representation by certain types 'of graphs, Icon'pletely connected and tree respectively, which is different from the conception of cognitive structure presented here. The present emphasis is not upon what types of relationships are important in directed thinking, but rather upon what types of relationships have been learned from a passage . Piaget and Reitman I Piaget (Flavell, I963, p . I7-I9, I64-236) has postdlated logico- mathematical structures as models of cognitive structures. Cognitive struc- tures refer, in this case, to the organizing properties of intelligence . They are created through assimilation of and accommodation to the environment and are inferable from the behavioral content or data which they determine . For Piaget these structures vary with age, and he has represented develop- mental stages by various logico-mathematical structures which express the essence of these organizational properties . Thus certain cognitive structures are implied from behavior. If Piaget says that the classificatory behavior of the eight-year old indicates that he possesses the 'grouping of logical classification,‘ he means that the child's thought Organization in the classificatory area has formal properties 43 (reversibility, associativity, composition, tautology, etc .) very much like those which define this logico-algebraic structure . The latter has certain specific and definable properties; we infer from his behavior that the child's cognitive structure has similar properties . (Flavell , I963, p. I69). Many of these structures can be represented by some type of diagram as Flavell has illustrated (p .I80-I95) . The present approach differs from Piaget although it is similar in that graphics are used to model or represent structure . Piaget is concerned with modeling the broad area of intelligence rather than certain subiect matter areas. It is not postulated here that one type of structure, such as the class of logico-mathematical ones, completely model a given area, but that several types of structural diagrams could be used . The present development does not pertain to developmental theory nor does it postulate that cognitive structure is characterized by structural properties such as reversibility . Such a classificatory scheme might result from the study but it would not be derived from a theory of logico-mathematical structures . Although Reitman's (I963) primary emphasis was that of simulating thought processes, he did use diagrams to represent cognitive structures . For example, he included various forms of graphs illustrating family relations . But because Reitman was simulating thought processes with the list processing language IPL-V, he usually represented cognitive structures by list form, indicating the individual's organization of a problem . Reitman's hierarchical lists are related to Ausubel 's (I 963) assumption that each individual '5 cognitive 44 structure is organized hierarchically according to the principle of progressive differentiation, going from broad, inclusive concepts to specific, less inclusive concepts . Educational approaches As stated previously this method of investigating cognitive structure has not been used to any extent in the educational field . Ausubel, as mentioned before, used multiple-choice achievement tests to test for cognitive structure. However, he did not, at least from the reported evidence, construct his tests on the basis of a structural model . Nor did he report in his conclusions whether the varying relationships that subjects had learned might have differed or coincided with the structure of the material . I In Gagné's vauisition of knowledge studies (Gagné, I963; Gagné and Bassler, I963; Gagné, et. al, I962; Gagné and Paradise, I96l) a defi- ' nite hierarchical pattern of learning prerequisites for a final task was shown, i. e. , students exhibited successful learning up to a point and then failed the remaining steps . Yet Gagné's analysis reflected a sequential dependency in learning of tapics, not how the topics were structurally related, i.e. , no basic relationship between topics was given other than the fact that one was a prerequisite for the other . This is not to say, however, that sequential dependencies do not exist between substructures . Hartmann (I 942, p . 204) graphically distinguished between a 45 person who knew twenty discrete facts and one who knew five facts in all their permutations and combinations . \ / VEFSUS ................ . . . . However, he did not use this approach in studying cognitive structure . T . Johnson (I 968) reviewed many of the methodological approaches to cognitive structure . His own methodological procedure was based on Ausubel's definition of cognitive structure although the exact correspondence between the two was not clear . Latent partition analysis, a form of factor analysis'of categorizations yielding a partitioning of stimulus items into a set of latent categories, was used on concepts from the subiect matter areas of teacher behavior and physics . Then a multi-dimensional scaling procedure was applied showing the distances among the latent categories . From these latent categories and inter-category distances inferences were made about how the Ss perceived teacher behavior and physics . Johnson's approach is similar to the present approach in that he treated. cognitive structure as a product not a processs. But the procedure did not specify why the stimulus items were grcuped together, i.e . , the relationships . It was also an attempt to measure how a subiect naturally groups concepts (by forcing him into one category sorting, where more than one might be possible) rather than to systematically attempt to test for knowledge relations based upon subiect matter . Having individuals sort stimuli into 46 categories is basic to other procedures attempting to measure cognitive complexity (Scott, 1962; Zaionc, T960“ , P. Johnson (1967) examined relationships between concepts in Newtonian mechanics by using verbal association measure . The formal constraints among these concepts within the subject matter itself was compared with high and low achievers' associations . Again, this procedure only yielded information regarding what concepts were related by a student. It did not reveal how the student perceived them as being connected . Both of the preceding approaches are valuable for instructional guides, but neither provides a way of testing subiect matter knowledge . In this chapter a procedure for testing cognitive structure relation- ships was explained and compared with traditional achievement test procedures . Transfer structures, covering material that was logically implied by two or more given structures, were also explained a A comparison with other qsproaches of testing cognitive structure was given . In the next chapter the role of diagrams as perceptual blueprints within learning will be examined . CHAPTER III RATIONALE FOR DIAGRAMMATIC PRESENTATION OF STRUCTURE OF KNOWLEDGE ' Novak's (1966) model dealt with the transmission of the structure of a subiect matter to an individual '5 cognitive structure and inplied that an individual 's cognitive structure in some degree approaches the structure of a body of knowledge. Ausubel (1963) and Goss (1961) both emphasized the inporta'nc'e of existing cognitive structure in the learning and retention of additional information. Ausubel (1963) specifically stated that "when we deliberately attempt to influence cognitive structure so as to maximize mean- ingful learning and retention we come to the heart of the educative process (a . 26) . " . The main purpose of this study was to examine the effects of different'modes of presenting the structure of knowledge upon individuals' cognitive structure immediater after presentation and one week later. The structure of any area can be represented with language, diagams or both . Usually language has been used, yet long-term retention and transfer might be enhanced by other modes of presentation . It was expect- ed that the structure of the material wouldbe more clear if diagrams of that structure were presented with the corresponding written exposition of the passage . 47 48 The Role of Diagrams in Learning Sheffield (1961) examined the role of perceptual responses in the learning of sequential tasks. Perception was referred to as a process of interpretation of "filling-in" of sense data, this interpretation depending in general upon past experience . For exanple, a block of ice which is presented only visually is perceived as cold because in the past it has been sensed cutaneously while being sensed visually . One perceives an airplane iust from the distinctive sound made by its motor. A wristwatch is "transparent" to a watch repairman because from the brand and model he can 'fill-in" all the internal parts. An important feature of these perceptual responses is that "they permit complete representation of a distinctive stimulus obiect, even though all of the various stimulus aspects of the object may never be sensed simultaneously (Sheffield, 1961 , p. 16) ." Other characteristics of perceptual responses are (a) a complete perceptual response can be elicited by a condition- ed stimulus in the absence of any of the stimulus aspects of the perceived obiect and (b) a perceptual pattern can serve as a stimulus as well as a response . Sheffield specified a type of perceptual mediation which did not require acquisition of a sequence of responses and stimuli but rather that of a pattern. This he called "perceptual blueprinting." Essentially an individual acquires and {stores a pattern or blueprint. Behavior is then matched to this memory image or perceptual blueprint . Just as an architect refers back to his 49 blueprint of a house many times as he attempts to match his perception of his overt product with his perception of this blueprint. an individual uses a memory image in a similar way . It functions as a "blueprint," i.e. , the learner manipulateshis behavior until his product matches this perceptual blueprint stored in memory . Thus a perceptual blueprint may cue off a sequence of responses. It is important to note that an actual blueprint has advantages over one in memory because the former is exact and unchanging . Sheffield also stated some other relevant hypotheses . (1) Most adults are able in some degree to match their overt behavior with their perceptual memories. As such a memory image or perceptual blueprint can serve as a static complex which may bereferred to constantly in guiding overt sequences of responses. (2) There is less interference among perceptual responses than among overt responses. The explanation for this was really not given except that it was assumed that perceptual units are static, rather stable patterns and can be reinstated as fairly complete units without interference from separate parts. (3) Symbolic behavior can reinstate a perceptual response or unit. (4) Perceptual units can organize sequences, especially if there is a natural or inherent rather than an imposed organization to a given task . As such it can simplify or aid memory . 50 The basic assunption in the study was that diagrams in written material function as actual blueprints . When a person stores this pattern in memory, he provides himself with what might be called a memory image or perceptual blueprint and uses this to guide future behavior. Assuming then that diagrams function as blueprints, they would provide a static blueprint which would guide organization of material during acquisition. Smith and Smith's (1966) review of the role of non-verbal text- book design also supported the position that non-verbal presentations "provide a stable, spatially organized visual framework or background for the more highly articulated and more temporally organized verbal presentation (p .331)" Although diagrams are not entirely non-verbal they do present information in a spatial form. Sheffield (1961) also stated that perceptual units were more effective as organizers of material when there was a natural rather than an inposed organization . Diagrams do represent the "natural" organization of the material and therefore should enhance learning. From Sheffield's position it can be assumed that when an indivi- dual perceives a diagram while reading a passage and the diagram follows explanatory material, the perceptual response will trigger a mediation sequence about the material directly related to it. If the diagram precedes the related material and if it introduces, new, unfamiliar terms, the perceptual response might not cue off any relevant mediation . Thus in the first case the diagram would serve as a review or integrator; in the second, only as an introduction to new terms and would. not evoke any relevant content. 51 The organizational advantages of diagrams during acquisition would also apply to memory . Sheffield also hypothesized that there was less interference among perceptual responses than overt responses in memory . This, of course, did not imply that no interference occurs, but it was expected that perceptual blueprints or memory images of the diagrams would be retained with little interference . Thus the diagrams would facilitate retention to a greater extent than only verbal presentation of the material . When an individual is tested for recall of knowledge through an achievement test, a test item (verbal stimulus) would reinstate the relevant memory image or perceptual blueprint. This in turn would act as a stimulus, in pattern form, to evoke the relevant sequence of mediators necessary to answer the question. Thus diagrams would act as a blueprint, guiding the student's responses and serving as a fairly stable referent. A test can be viewed as a problem of information retrieval . Since perceptual blueprints function as organizational aids in memory and are subiect to less interference than other types of memory traces, they would provide a means for retrieving more infor- mation . But if the material had not been correctly acquired and distortion had occurred, they would serve as aids for retrieving incorrect information . Many of these ideas were only tentative . Sheffield provided some support for his position in a study which involved learning two different mechanical assembly tasks (Sheffield, Margolius and Hoehn, 1961) . The material was presented by film which utilized a form of perceptual blueprints . 52 The tasks were divided into sub-assembly units. At the end of each sub- assembly presentation a series of "stills" displaying the parts of the sub-assembly were presented in rapid succession (ikonic level of representation) . Each part "iumped " into its proper place in the proper sequence . This technique was an atterrpt to provide a static blueprint of the material and to guide perfor- mance by perceptual memory on a criterion test of assemblying the motor . It yielded higher performance scores than a presentation omitting the blueprints . However, this interpretation was not completely clear because repetition effects were uncontrolled . Another aspect of diagrams, not specified by Sheffield, is that they separate relevant and irrelevant information . Morrison (in Bruner, 1966a, p. 263) viewed translating the sentence "the wind is blowing from the east " into a diagram (an arrow, ikonic mode) 'as useful at 30 miles an hour, because "it is a noiseless version of the original statement, containing all the information and only the information relevant to the problem (p . 264) ." The role of relevant and irrelevant information has been studied in concept attainment studies . In summarizing these factors Archer (1966) concluded that (a) increasing the amount of irrelevant information decreases the speed of conept identification, (b) inclusion of redundant relevant information offsets the effects of large amounts of irrelevant information, and (c) concept indentification will be facilitated when relevant information is obvious and irrelevant information is minimized. The ultimate condition is 53 where irrelevant information does not exist. Although reading material to attain concepts is different from the usual, more refined concept identification task, it was assumed that the effects of irrelevant information in written material would be similar. Diagrams, as representations of structure empha- sizing ‘.important relationships and concepts, focus on the relevant information and indicate what is necessary for comprehension of that particular structure . As such they would enhance the learning process . In summary, it was assumed that during vauisition diagrams would (a) evoke related content if presented after the material necessary for its corrprehension, (b) introduce new terms if presented before explanatory material, (c) organize the material in a static, perceptual form and (d) separate relevant from irrelevant information . In memory they would provide rather stable resilent memory traces and serve an organizational function as well . During recall they would act as a readily available stimulus guiding the retrieval of information . With these advantages it was hypothesized that students exposed to diagrams would acquire the structure of the material better than students not exposed to diagrams . Students exposed to diagrams would also perform better on transfer to new tasks based upon the structure of the material. It was expected that the superiority of the diagram presentation would occur on both immediate and later recall . 54 Individual Differences in Understanding Diagrams All the preceding maior assertions have assumed that the student is able to understand and relate diagrams to the written passage . Vernon (1952, 1953a, 1953b) showed that charts and graphs, which illustrated quantitative data, and pictures were not understood or related appropriately to written text unless subiects, including adults, had training . Graphs were best understood when they related directly to the text. Malter (1948) also concluded that training was necessary and that labels should be used to explain the meaning of graphic symbols such as arrows and dashes . Due to the unfamiliar notation and symbolic nature of the diagrams used in the present study a training progam an interpretation of diagrams was constructed . Of course, individual differences for thinking in spatial forms and retaining such information will cause variations on criterion tests even though individuals have been trained to the same criterion in the use of diagrams. A mental image has been defined as "a more or less complete representation of the attributes of an obiect or event once experienced but not now present to the senses, together with recognition of its pastness (English and English, 1958) ." That people differ greatly in their ability to form mental images is well accepted (Lovell, 1964, p. 97; McKellar, 1957, p . 19) , although for most people visual imagery seems to be clearer and to arise more frequently than other types (Lovell, 1964, p . 97) . Sheffield 55 (1961) assumed that most individuals could' guide their behavior by perceptual memories.Visual images would be similar to perceptual memories but images would involve creation as well as recall . Assuming that diagrams were understood, the students possessing a cqacity for visual imagery should retain the material longer and perhaps learn it easier than students not possessing visual imagery . Tests for visual imagery are not very reliable nor valid (Woodworth, 1938) . Lovell (1964, p. 98-99) stated that visual imagery is clearly connected with mathematical ability in some way but the form of the relationship is unknown . Perhaps students accustomed to symbolic notations, e .g. , mathematicians, would be at an advantage with diagrams. Thus although visual imagery might aid individuals in using diagrams, it was not possible to find appropriate measuring instruments to detect people in this category . Organization of Diagrams Within a Passage Because diagrams were to be used in the experimental materials, the organization of these diagrams within a passage was inportant . It was assumed that the most appropriate placement of diagrams within a passage is determined by both the size and the function of the diagram. Since diagrams are related to the content of written materials, the size of the diagrams is to some extent dependent upon the amount of information they encompass . Large diagrams could represent the structure of an entire passage, and small diagrams 56 could represent substructures. If a diagram is placed before a passage it functions as an overview; if placed after, it functions as a review. Sheffield mentioned the advantages of diagrams as reviews . Ausubel 's analysis of reviews and overviews also corresponds to Sheffield's position, although Ausubel considered only verbal modes of presentation of material. For Ausubel (Ausubel and Youssef, 1965) both reviews and over- reviews achieve their effects partly by repetition . Repetitions of the material serve as consolidators of information , as feedback mechanisms to test correct- ness of knowledge previously acquired , and as sensitizors to the full meaning of the material . Reviews consolidate information and enhance material which is highly available (Ausubel , 1966), while overviews prefamiliarize the learner with certain key terms (Ausubel , 1963, p . 214). This analysis corresponds to the perceptual response analysis of Sheffield; overviews being relatively ineffective when many new terms are introduced but not yet fully explained, making‘much of the introduction meaningless for the students . The preceding statement assumes that reviews and overviews are not of the name or list type, e.g. , "This chapter will cover (has covered) S ,U , and V, " but that they do cite basic concepts and their interrelationships. Diagrams are not necessarily classified as a type of advance organizer (Ausubel , 1963) . Advance organ- izers relate and compare concepts the individual already possesses to the new material thereby activating appropriate, rather stable pre-existing, subsuming concepts in an individual's cognitive structure for the new material. Diagrams 57 do not purposely integrate the new with the old , however they could be used in this way . Experimental studies have found advantages for reviews, have revealed little on the utility of overviews (because of poor experimental designs), and have not provided an adequate comparison of the relative effects of reviews and overviews . Christenson and Stordahl (1955) obtained no facilitating effects with reviews or overviews, but the experimental ma- terial was short and highly familiar to the subiects and the same test was used for pre and post testing. Reynolds and Glaser (1964) investigated massed and spaced review treatments by repeating technical terms on a specified ratio basis in the context of a condensed version of the original topic . The spaced review yielded higher performance than did massed review, and both types of review were better than the no review group . Merrill and Stolurow (1966) found that a hierarchical review did not take more time than when a review was absent, but did result in higher test scores . These studies used verbal reviews and overviews, not diagrammatic ones, but the repeti- tion functions should be similar for diagram forms. From another viewpoint individuals possess a limited capacity for processing information (Miller, 1956; Posner, 1965) . Placing the entire structure either at only the beginning or only the end of a passage would probably exceed the individual 's coding and processing capacity . Since placement of a large diagram before a passage would probably exceed the 58 individual's processing capacity, introduce unfamiliar terms, not evoke relevant mediation, not consolidate material and since no special advantages for overviews have been found in experimental studies, this procedure was eliminated from the experimental materials in the present study . Having only one large review diagram was avoided in the present study because the sudden introduction of several corrplex, inter- connected diagrams would prevent adequate coding even though the terms would be familiar. Placing small diagrams within the material itself would facilitate coding, but would not provide a picture of the total structure of the material . Therefore small diagrams representing substructures were placed within the material and a large review diagram comprised of these substructures was placed at the end . The small diagrams were placed after the corresponding verbal passage thereby functioning as a review . This analysis was supported by one pilot subject who had the small diagrams before the corresponding verbal passage, yet who did not examine the diagrams until the relevant material had been read anyway . Overall Sequence of a Passage According to Ausubel (1963) the overall sequence of topics in a passage should be arranged by the principles of progressive differentiation and integrative reconciliation . Newton and Hickey (1965) found that learning was facilitated when subconcepts, used in defining another concept, were 59 placed with the concept rather than separated from it . Gagné (1963) qaplied the question "what would the individual have to be able to do in order that he can attain successful performance on this task, provided he is given only instructions?" in order to arrive at his hierarchical sequence of tasks. All of the preceding methods imply a highly organized text, with no detached items, the place of each part determined by a broader scheme, and the hierarchical interrelationship of parts being rather clear. In general, a familiar (often means rather inclusive concepts} to unfamiliar (specific or technical) sequence is implied with prerequisites placed as such . These principles were used as guidelines in construction of the written material . Within this broad sequence, rules or generalizations were presented first followed by relevant illustrations and examples of them . In this chapter the role of diagrams as perceptual blueprints in learning, memory, and recall was presented . The appropriate placement of diagrams within passages of material was also discussed . In the next chapter the results of the pilot study will be described, and the hypotheses for the main experiment will be given . CHAPTER IV PILOT STUDY, EXPERIMENTAL MATERIALS, AND HYPOTHESES First the pilot study will be outlined, then the experimental materials will be described including pilot study revisions, and finally the hypotheses will be given . Pilot Study Fourteen Ss (6M, 8F) , volunteer students from a senior under- graduate learning course at Michigan State University, participated in the pilot study . Although the sample for the main experiment was drawn from a senior undergraduate learning course at the University of Alberta, the differ- ences between these two samples were assumed to be minimal . The Ss were assigned to three experimental treatments, differing in mode of presentation of the structure of a passage on reliability of measurement. . One presentation mode involved diagrams of the structure (D), one had verbal statements instead of these diagrams (V) , and the other was a passage without diagrams or verbal statements, i .e . , no-review (NR) . A complete description of these materials is given later . Four dependent variables were examined: the existence of cognitive structure relationships, 60 61 transfer based on these relationships, achievement measured by a typical achievement test, and material incidental to the structure . Most Ss (8; 4M, 4F) received the diagram treatment because it was expected that Ss would have difficulties with this new mode of presenta- tion . Both sexes were distributed as evenly as possible across both the experimental treatments and the progressive revisions of the materials . Those Ss who received the diagram version of the reliability passage also had the diagam interpretation training program . The pilot study was used to clarify instructions; to revise the diagram interpretation training program, the three versions of the reliability passage, and the four tests; to give time estimates for the materials; and to determ'ne placement of a formal break within the materials . Ss' answers to questions (Appendix A) and their spontaneous comments about each of the materials were used to produce several revisions. The experiment was explained to all Ss after participating. Experimental Materials Diagram Interpretation Program Goss (1961) mentioned that presentation modes could often be acquired somewhat independently of the content itself. For example, the lines partitioning a Venn diagram represent subdivisions of a larger class and this can be learned somewhat independently of specific content. Such 62 principles of representation, as well as the relationship between verbal material and graphic representations of it, were enphasized in the training progam (Appendix C , final form) . It included the three types of diagrams used in the reliability passage: Venn diagrams, tables or matrices, and graphs. Both elements and relationships were included although more errphasis was placed on relationships . In some cases Ss were asked to inter- pret diagrams without the corresponding verbal material being presented . A branching programmed learning format was used for the training with questions given after examples of the various diagrammatic forms. Questions always presented a two-choice situation, one choice correct and one choice incorrect. Students were instructed to mark their answers and then turn to a specific page, which in turn presented feedback and elaboration on their answer. If they answered incorrectly they were not instructed to answer again but were given corrective information and then instructed to procede to another page . The content was different from the reliability passage itself. Since Ss marked their answers to each question a record of the number of errors for each student was available . The interpretation training program required two maior revisions. Some Ss had difficulty interpreting the small diagrams within the reliability passage which represented systematic factors and constant and varying unsystematic factors . These diagrams involved overlapping of classifications and interpretation of a pattern of checks . To facilitate this interpretation a 63 section an overlap (intersection) of classes was included in the Venn diagram section of the training program . In addition a passage on the importance of a pattern of checks was inserted in the chart section of the training program . Reliability Essa: The passage was a rather non-statistical , yet technical treatment of reliability based on Ghiselli (1964) . The only statistical concept pre- sented was that of correlation, introduced through scatter diagrams . No formulas or calculations were given . An outline of the passage is given in Appendix D . The material was not as highly structured as areas such as chemistry or mathematics, yet Ghiselli's treatment of types of score varia- tion yielded nicely to a structural analysis. The topic of reliability was chosen because it was an area which the experimenter had studied in some depth and was at an appropriate difficulty level for the Ss . The passage dealt with the importance of reliability and of types of score variation, developed a definition of reliability, and examined practical ways of estimating reliability coefficients . The 7,000 word passage, without diagrams or verbal statements, is presented in Appendix E (final form) . Placement of diagrams or verbal statements is indicated in the right hand margin . In the diagram version 19 small diagrams were presented after the paragraphs explaining the subiect matter structure they represented . The maximum number of relationships in 64 any small diagram was seven and the minimum was two . At the end was a large review diagram which combined these small diagrams into six sub- structures and also inter-related these substructures . The diagrams are presented in Appendix F in the same order as they appeared within the passage. The small diagrams within the passage had no explanations because the training program and reliability text provided sufficient guidance . In the verbal statement condition, the diagrams were replaced with short, separate, single-spaced paragraphs stating very concisely the material of the diagram itself . These passages are also presented in Appendix F following the corresponding diagram . The same labels were used in the verbal statements and diagrams so the criterion test was not biased in favor of either . Both the verbal and diagram reviews were arranged in the same order which did not deviate greatly from the reliability passage sequence . The maior revisions in the reliability passage resulting from the pilot work involved rewriting the sections on systematic factors and on constant and varying unsystematic factors; providing additional directions on reading the final review diagram and some diagrams within the passage; and including six additional maior sub-diagrams and then placing these sub-diagrams within the reliability passage at appropriate review places . The sections on differ- ent types of factors gave more examples of each type, specified more clearly their similarities and differences and provided additional summary statements . The six sub-diagrams in the final review diagram were numbered and a 65 suggested order or proceding through the diagrams was given . In the first version of the diagram passage, 19 small diagrams were presented within the passage and then combined into six sub-diagrams for the final review. Since Ss had difficulty interpreting this final review diagam because many combinations of the small diagrams had been made simultaneously, the six sub-diagrams (marked "Sub-D" in Appendix F) were placed at appropriate review points within the passage . For example, after small diagrams relating to (a) types of variation in scores, (b) score arrange- ment and types of variation, (c) relation of factors and variation, and (d) types of unsystematic factors had been presented, the sub-diagram which combined these four smaller diagrams was presented and integrated into the text in the following way: "At this point we can now briefly re-examine the types of factors and their respective outcomes with the diagram given below ." Thus 55 saw the six sub-diagrams within the passage before they encountered them in the final review . Corresponding changes were made in the verbal passage to equate for repetition effects . With this change the verbal and diagram passages provided four presentations of the structure of the reliability passage: (1) the passage itself, (2) the 19 small verbal or diagram sections, (3) the six larger verbal or diagram review sections, and (4) the final verbal or diagram review . The straight version had no summary or repetition of the structure . 66 Tests Only one test (Appendix G, final form) was administered, it being originally divided into four parts measuring cognitive structure relationships, transfer based on those relationships, incidental material, and achievement . Cognitive Structure Relationships As stated before only cognitive structure relationships, not elements, were tested . Elements were usually given in the item stem, the correct alternative indicated the relationship involved . In order to test systematically for all relationships, the items were predominantly multiple true-false, i .e . , a multiple choice format where each alternative was marked true or false . In some instances when a student gave incorrect answers it was possible to determine exactly how he had incorrectly related two elements . In other cases the testing procedure was not complete enough to determine exactly what relationships did exist, although the range could be narrowed to several possibilities . The scoring was based on the number of correct relations; one item yielded as many as six correct relations while in another situation as many as five items yielded only one. The items were based upon the six main substructures that constituted the review . In some cases patterns of responses were scored . As mentioned 67 previously the patterns could be classified as consistent and inconsistent. Inconsistent patterns and consistent, but wrong, patterns imply different things about cognitive structure . In the first case the subiect has contradicted himself, while in the second case the student has learned something well with no contradictions but learned it wrong. In order to distinguish between these two situations, inconsistent patterns were scored minus one point; consistent but wrong patterns, zero points . This scoring procedure was used mainly to distinguish inconsistent from consistent but wrong patterns and not to indicate that one pattern was necessarily worse (from a learning viewpoint) than another . It was assumed that the one point difference between the two scores would not greatly affect the comparison among the total scores for individuals . The other consistent patterns were scores for the number of correct relationships they reflected . A complete analysis of each item is given in Appendix G . This analysis includes reference, by diagram, to the substructure being tested, the analysis of response patterns, and the score for each item . Transfer Transfer items tested new material which had as its prerequisite the availability of the correct relationships between elements presented in the reliability passage . The transfer topics were not mentioned in the reli- ability passage but required a thorough understanding of its structure to be 68 answered correctly . These items were scored similarly and are also presented in Appendix G with a conplete analysis of each. It was hoped that this type of transfer task would distinguish between those who had comprehended and retained the material and were able to use it as a basis for new information and those who had con'prehended and retained the material but could use it only in the restricted limits of the text itself. Achievement and Incidental Ordinary achievement items were included to provide a base- line. These items were constructed from the no-review version of the reliability passage by a doctoral student maioring in tests and measurements . These items are included at the end of Appendix G . A few items covering incidental material, topics not relevant to the structural analysis of the reliability passage, were originally included but eliminated after the pilot study because they served a function similiar to the achievement items. All items (achievement, structure, and transfer) were ordered on the test so that previous ones did not provide answers to later ones . However, it was difficult to entirely eliminate overlap of questions since most of the test was quite conprehensive . The order of the items and maximum score for each test are given in Appendix H . Some minor test item revisions we found to be necessary from the pilot work . These consisted mainly of instructions for answering the items, grammatical changes, and clarification of item stems and alternatives . 69 Additional Pilot Study Results The interpretation training program required an average of 28 .4 minutes (range 25-31 minutes, n= 8); the reliability passage, on average of 47.57 minutes (range 35-70 minutes, n=14); and the test an average of 50.21 minutes (range 35 -75 minutes, n=14) , yielding an estimate of two hours for the entire experiment. Because of the small sample size, time estimates for each of the three treatments were highly unreliable and not generalized to the final sample . The formal break was placed preceding the section on constant and varying unsystematic factors (see Appendix D). This section, in the middle of the reliability passage, appeared to be quite difficult for most subiects . With the combined effect of increased difficulty and fatigue 55 tended to skip the section or not attend to it as well as on preceding sections . It was expected that a break would increase attention and decrease fatigue . Any conclusions about differences between the experimental treatments from the pilot study was not warranted because of the small sample size and the progressive revisions that were made of the materials . However, on the final revisions of the materials the Ss in the Diagram treatment were performing at a higher level than the $5 in the Verbal treatment. The means and ranges for these two groups of Ss on the maior dependent variables were as follows (Diagram treatment presented first): Structure; 78 (range 70-86) , 68.5 (range 63-74), Transfer; 37 (no range), 23 (range 21 -25), and 70 Achievement; 23. 5 (range 22-25), 20. 5 (range 20-21). After the pilot study was completed a questionnaire (Appendix B) was designed for the main experiment asking hos Ss used the diagrams or verbal statements, whether they liked the passage, and what parts of the passage were difficult. These parts referred to the six basic substructures which had been identified by the structural analysis. Hypotheses The central hypothesis was that diagrams would facilitate acquisition of the structure of the material and transfer to new material following from that structure. It was assumed that diagrams would serve as perceptual blueprints, separating relevant from irrelevant aspects more clearly than verbal statements, organizing the material during acquisition and retention, representing material in a rather stable form for storage, and aiding retrieval of information. The verbal passage controlled for repetition effects of the diagrams and was expected to have some advantage over the straight presentation of the material. It was expected that diagram presentation would enhance both acquisition and retention of cognitive structure relationships compared to the verbal and no-review presentations. Since transfer required knowledge of structural relations, it was expected that the diagram group would also be superior on this variable of both testing periods. However, an interaction 71 effect between the experimental conditions and testing time was expected for these two variables with the diagram condition resulting in a smaller retention drop over time than the verbal and straight conditions . Performance under all three: treatments was expected to drop with time . Differences among the experimental conditions were not expected for the achievement variable, since analysis of the achievement test showed that complete knowledge of structural relationships was not crucial for good performance . No differential drop in performance over time among the three experimental conditions was expected on this variable although all would drop . Substructures Several Guttman scale patterns were identified among the sub- structures, and it was hypothesized that Ss' response patterns would also exhibit the same dependencies . In general these took the form of "A + B + . l" were prerequisites for correct performance on "Z" . These Guttman dependencies (Appendix I) were: A . Questions 4 and 5 iointly dependent upon question 6 . B. Question 7 (second part) dependent upon questions 6 and 7 (first part) . C . Questions 8,9,11 each dependent upon questions 6 and 7- first (7-second) and I5a and/or 19d . 72 All of these items centered on the various types of systematic and unsystem- atic variation. Questions 4 and 5 were application items; the rest were transfer items. It was expected that a higher proportion of S5 (of those who successfully completed the prerequisite items) in the diagram group would perform successfully on these criterion tasks than Ss in the other treatments . Several questions were exploratory in nature . As stated before six basic substructures constituted the "elementsu of the total structure of the material: Sb] Degree of reliability and its practical importance Sb2 Definition of systematic factors and variation and of unsystematic factors and variation Sb3 Effects of all types of factors Sb4 Definition of reliability, reliability coefficient, and correlation coefficient Sb5 Parallel forms versus parallel tests Sb6 Methods of estimating reliability coefficients There was no reason to believe that the diagram group would be superior on all substructures on acquisition or retention, but exactly which substructures would:be crucial was not hypothesized . Similarly the retention drop on each substructure was of interest. 73 Expected Correlations Additional information about the relationships among the various variables was necessary, although it was difficult to specify in much detail the patterns that miight appear . All variables were classified into three groups: background, experimental, and dependent. Background Sex Age General Aptitude Quantitative Ability Verbal Ability University Maior Tests and Measurements Background Experimental Errors on Training Program Time on Reliability Passage Questionnaire Dependent (Acquisition and Retention) Cognitive Structure Relationships Six Substructures Transfer Achievement The expected relationships among these variables are explained below . First consider the relationship among background variables and experimental variables . Little variation was anticipated on errors on train- ing progam so this variable would not be highly related to other variables . If there was a range of time spent on the reliability passage it would probably be correlated with verbal ability . The only relationship between the question- naire data and background variables was that having a tests and measurements 74 course would be correlated with familiarity with the material on reliability. The relationships among background variables and dependent variables were expected to be similar on both acquisition and retention . Verbal ability and general aptitude would probably be related to achieve- ment scores. If quantitative ability or being a math or science maior in any way reflects visual imagery they would be related to cognitive structure relationships and transfer for the diagram treatment . The experimental variables of training errors and passage reading time were expected to be related to performance on the dependent variables only if their ranges were large . For the questionnaire data several relationships were anticipated: correlation between subiective difficulty with substructures and actual difficulty as indicated by test scores, correlation between examination of the review passages and scores on the structure test, and correlation between familiarity with the topic of reliability and achievement, structure, and perhaps transfer . The intercorrelations among the dependent variables themselves were expected to be positive and to be similar for acquisition and retention periods . Cognitive structure relations and transfer scores would be related . The substructure on definition of types of factors and variation (Sb2) would be related to the substructure on effects of these factors (Sb3) . Because of the scoring procedure there was a built-in dependency between each sub- structure score and the total structure score, so some relationships would 75 exist. Since the substructure on the effects of types of factors (Sb3) was related to many of the transfer items a relationship was expected between Sb3 and transfer . As explained previously achievement was not expected to be highly correlated with structure and transfer. High stability coeffici- ent within each dependent variable were expected, e .g . , transfer on acquisition and transfer on retention . In the next chapter the experimental procedures will be described. This is followed by presentation of the maior results . CHAPTER V PROCEDURE AND RESULTS In this chapter the administrative procedures are described , followed by presentation of data relevant to both the major and minor hypotheses of the study . Chapter VI presents a structural analysis of the relationships which the subiects acquired on each substructure and on each transfer item. Discussion of all results is found in Chapter VII. B'ocedure Subiects A total of 234 $5 (127 F, 107M) , all undergraduatesoenrolled in a senior learning course at the University of Alberta, participated in the study . This sanple consisted of three groups: two groups (O and C) were students from the experimenter's own classes and the third group (R) was from another instructor's class. Groups 0 and R both (n=156) received the three experimental treatments, while Group C (n=78) served as an additional control group which did not receive any treatment but took the tests on achievement, structure, and transfer. Groups O and C were formed in the following manner . Students from three of the experimenter 's classes, those who could participate in a 76 77 three hour evening experimental session, were assigned randomly by sex to the three experimental treatments . One hundred and forty-two Ss com- prised this original group but there was an unexpected attrition of 27 Ss at the experimental session itself, leaving a total of 113 Ss who constituted Group O. All of the students who were in the experimenter's classes but did not participate in the experiment yet took the tests comprised Group C . Due to the unexpected attrition of $5 from the original experi- mental group and the necessity for large sample sizes for the item analysis, students from another class were asked to participate (Group R , n=43) . These Ss were also randomly assigned by sex to the experimental treatments . Subiects were assigned so that the number of Ss and the sex distribution of Ss per treatment were balanced for the total experimental sample (Groups O and R) . Table 1 gives this distribution . Group R was given the experi- mental treatment and retention test one week after the corresponding experimental administration for Group O. It was decided to combine the two groups for statistical analysis if they appeared similar on the main dependent variables on both acquisition and retention periods . Otherwise, separate analyses would be required . Treatments All 55 in Groups O and R received the diagram interpretation progam, followed by one of the three versions of the reliability passage 78 (Diagram, Verbal, or Non-Review) , with the acquisition test at the end . All Ss progressed at their own rate and finished within a three-hour period . The test which measured cognitive structure relationships, transfer, and achievement was used for acquisition and retention . Immediately after the retest one week later Ss answered the questionnaire about the experiment. Group C took the test when Group O was administered the retest. Table I Distribution of Subjects Acquisition Retention Diagram O 38(23F , 15M) 35(21F , 14M) R 16(6F, 10M) 16(6F,10M) Total 54(29F, 25M) 51 (27F, 24M) Verbal O 38(20F , 18M) 34(20F , 14M) R 13( 7F, 6M) 10( 5F, 5M) Total 51 (27F , 24M) 44(25F , 19M) No-Review O 37(17F, 20M) 34(15F, 19M) R 14( 8F, 6M) 13( 7F, 6M) Total 51 (25F , 26M) 47(22F , 25M) Control 78(46F , 32M) The administration of the experimental treatments differed slightly for Groups O and R . For Group O the three treatments were given in separate rooms by different administrators . Because the administrator only handed out the materials, differences among them were assumed to be 79 negligible . Different rooms were used to avoid contamination effects by the anticipated differences in reading time for the three treatments . One week later Ss were given a surprise retest during the regular class period . However, for Group R the experimental treatments were given in one room (for convenience) and the S5 were accidentally told of the retest. The written instructions for the experimental materials were the same for all treatments and are given below . Diagram Interpretation Program "In much written material diagrams or drawings are presented . In the following passage you will be introduced to various types of diagrams, each of which is directly related to the content in a written passage . It is the purpose of this instructional program to enable you to interpret these various types of diagrams and to relate them to the correspond- ing written passage . "This instructional program is presented in a programmed learning format. After an example of a diagram you will be asked at least one question over it. Answer the question, mark your answer on the sheet itself, and then follow the d—ire—ctions which will tell you the number of the next page to read . In this manner you will progress through the entire program . Please follow the instructions carefully ." Reliability fbssage "You will now be asked to read a passage on the concept of reliability . Read it carefully for comprehension and read it only once . You will be asked to time yourself on this passage . "On page x (exact page depended upon presentation mode) a "break" is indicated . When you reach this point in the passage, you may take about a 15 minute break in the hall outside this room . Please do not discuss the experiment with anyone during this period . Please indicate times of stopping 80 and beginning again on page x where the break is indicated . "After conpleting this passage you will be given a series of questions over it . " Test "You will now be asked a series of questions over the passage on reliability . There is not a constant format for all of the questions; they are not all true-false, nor all multiple-choice . In general the test is obiective . "Since each question differs in the way in which you should answer it, please follow the instructions veg carefully for each question . All answers are to be marked on the test itself. "Please do not discuss the experiment with anyone else who has participated . Thank you very much for participating ." Maior Results The following major findings are presented: comparison of the experimental treatments on the main dependent variables of time, errors, achievement, structure, and transfer on acquisition and transfer; the sequential dependencies among items; and the intercorrelations among time, errors, achievement, structure, and transfer. Data for each hypothesis is presented first for Group O and then for Group R . Differences and similarities between the two groups are mentioned. When appropriate, comparisons with the control group, Group C, are made . If a complete presentation of data is not in the text, it is located in an appendix. The following notation will be used for the 81 experimental treatments and dependent variables: Groups O and R, No-Review treatment - O-NR and R-NR Groups 0 and R, Verbal treatment — O-V and R-V Groups O and R, Diagram treatment — 0-D and R-D Time - Tm Errors - E Achievement on vauisition and retention - A1 and A2 Structure on acquisition and retention - SI and 52 Transfer on vauisition and retention - T1 and T2 Comparison of Treatments on Time, Achievement, Structure, Transfer, and Errors Table 2 presents and means and standard deviations for the major dependent variables for each treatment. Separate analyses for each variable is presented after this table . Table 2 Means and Standard Deviations for the Main Dependent Variables for Each Treatment O-NR 0-17 0-0 R-NR R-V R-D c Tm M 33.22 41.38 46.53 36.36 34.92 40.98 so 5.41 9.98 8.57 8.81 6.07 10.98 AI M 15.38 15.53 15.37 16.50 16.31 17.06 11.06 so 3.23 3.67 8.81 3.88 3.29 3.36 3.31 A2 M 14.77 14.74 15.23 15.00 15.50 16.86 so 3.89 3.61 3.27 3.82 4.67 3.08 s1 M 61.32 62.47 62.74 64.93 60.85 64.50 51.81 so 11.52 8.95 11.64 7.99 11.35 10.73 12.31 52 M 60.24 63.38 61.09 66.46 61.70 63.25 so 9.79 8.15 9.78 7.24 10.01 7.83 T‘ M 24.73 25.16 22.61 22.36 23.92 25.38 21.65 so 8.33 9.70 7.42 6.60 5.74 8.45 7.44 n M 24.03 25.41 23.29 24.07 28.20 26.44 so 7.26 6.68 6.39 8.53 6.78 6.86 E M .62 .66 .34 .29 .92 .38 so I .11 .97 .53 .82 1.04 .62 82 Time . -- Analysis of variance for the time measurements indicated that the experimental treatments in Group O differed in average time to read the reliability passage . The F ratio of 26.60 (df 2,110) was significant beyond the .001 level and the treatments accounted for 32% of the total time score variance (Appendix J). Scheffé's multiple compar- ison technique showed that the D and V treatments each required more time than the N-R treatment (at .001 level), but that the D treatment did not differ from the V treatment (Appendix J). Differences in time had been expected because the treatment passages differed in length . However, differences among treatments on time for Group R did not occur (F = I .29; df 2, 40; Appendix J). Since the order of the Tm means differed by treatment from that of Group 0, the two samples were comared . A one-way analysis of variance with the six treatments (two groups, each with three treatments) was calculated . The F ratio was significant, (F = 10.90, df 5,150, p <.001) and 27% of the time score variance was accounted for by the treatments (Appendix J). In corrparing Group O and R Scheffé's multiple comparison tebhnique showed that the following treatments differed in average time to read the reliability passage: O-D greater than R-V and R-NR, and R-D geater than O-NR (Appendix J). These differences would have been expected if the two samples were similar on time to read the passage . However, three other con'parisons should also have been significant if the 83 sanples had been similar (O-V greater than R-NR, R-D greater than O-V and R-V greater than O-NR) . The basic difference between the two groups was that for Group R the NR treatment required more time than expected and the D treatment required less time than expected . There was less variance among the three treatments times for Group R than for Group O . This discrepancy can apparently be explained by the differences in administration procedures for the two groups . A further discussion of this is given later . Since time spent reading the reliability passage was considered an inportant factor in learning and the two groups differed on this measure, the sanples were separated rather than combined in the remaining analyses . Thus the attenpt at increasing the sample size for statistical analyses failed, yielding instead a replication of the study . Achievement. -- There was no difference in mean A1 scores for the treatments within either Group O or R (F ratios being respectively: F(O) = .06; df 2, 110; F(R) = .17, df 2, 40). Analyses of covariance with Tm as the covariate also indicated no significant differences (F (O) = .166, df 2, 109; F(R) = .095, df 2, 39). Complete data is given in Appendix K . The correlations between Tm and A1 for all treatments were generally low, ranging from -.397 to .456. The Kuder-Richardson 20 reliability coeffici - ent for the A1 test was .64. However, the means from Group R were generally higher than 84 those from Group O so a comparison of these groups as well as of the control group was made . Appendix K shows this analysis of variance . The treatments were significantly different (F = 15.98, df 6,227, p <.001) and accounted for 29.7% of the total score variance . Scheffe's multiple comparison test showed that each treatment in the experimental groups (O and R) differed significantly from the control group C . However, none of the comparisons between treatments in O and R were significant. The comparison of these means is given in Appendix K . These results were as originally expected . However the difference between the treatments and the control group was not as large as expected even though it was significant . On the average the treatment means indicated that the S5 were performing between chance (7.5) and the optimum difficulty level (19) . There were also no significant differences for mean scores on A2 for Groups O and R (F(O) = .20, df 2,100; F(R) = .88, df 2,36; Appendix K) . The reliability of the A2 test was .56 (Kuder-Richardson 20). Analysis of covariance with A1 as the covariate also indicated no significant differences among the treatments for Groups O and R (F(O) = .36, df 2, 99; F(R) = .673, df 2, 35; Appendix K). The correlation be- tween Aland A2 ranged from .435 to .927. When Groups O,R, and C were compared the experimental 85 groups maintained significantly higher scores than Group C (F = 11 .80, df 6, 213; L02 = 24.6%) . Thus contrary to expectation, little retention drop occurred on the achievement variable . Appendix K gives the analysis of variance and indicates the significant differences between mean comparisons. All experimental treatment means were significantly different from the Group C mean as with acquisition, although the order of the means differed from acquisition . Structure . -- There were no differences in mean S1 scores for the three treatments within either Group O or R (F(O) = .15, df 2, 110; F (R) = .65, df 2, 40; Appendix L) . Analysis of covariance with Tm as the covariate also indicated no significant differences (F (O) =.095, df 2, 109; F(R) = .584, df 2, 39; Appendix L). The correlations between Tm and S1 ranged from - .343 to .445. However, the means for Groups O and R appeared to be higher than the Group C mean, so an additional comparison was made with these groups (Appendix L). The F test (F = 8.09, df 6, 227) was significant at the .001 level with the treatments accounting for 17.6% of the variance . Conparison of the means showed that all except one of the experimental treatments scored higher than the control group . Thus the results on $1 were not as expected ,' the D treatment was not superior to the V and NR treatments . In general the experimental treat- ments were superior to the C group; but these differences were not as large as expected . 86 There were no significant differences in mean scores on structure retention (S2) for either group O or R (F (O) = 1 .02, df 2, 100; F (R) = .95, df 2, 36; Appendix L). Analysis of covariance with $1 as the covariate also indicated no significant differences among the treat- ments (F (O) = .125, df 2, 99; F (R) = .102, df 2, 35; Appendix L). The correlations between $1 and S2 ranged from .137 to .679. When Groups O,R, and C were compared , the experimental treatments generally main- tained significantly higher scores than Group C (F = 9.00, df 6, 213, pl .001) over the retention period (Appendix L). The treatments accounted for 20.2% of the total score variance . Again, the results were not as expected . The experimental treatment means did not differ on retention and little forgetting occurred, as Indicated by the general superiority of the treatments in contrast to Group C at retention . M. -- There were no significant differences In mean scores on transfer acquisition (T I) for either Group 0 or R (F(O) I .86, df 2, 110; (F (R) I .35, df 2, 40; Appendix M). Analysis of covariance using 'I'm as the covariate, also Indicated no significant differences among treatments (F(O) I .528, df 2, 109; F (R) I .38, df 2, 39; Appendix M). The correlations between T111 and TI ranged from -.209 to .466. In contrast to the previous pattern of results, there were no significant differences between the experimental treatments and Group C 87 on T1 , even though the mean score for Group C was lowest. Appendix M gives the analysis of variance on T1 for Groups A,B, and C . There were no significant differences in mean scores for T2 for either Group O or R (F(O) = .60, df 2, 100; F(R) = .23, df 2, 36; Appendix M). Analysis of covariance with T1 as the covariate also indicated no significant differences (F(O) = .376, df 2, 99; F(R) = .574, df 2, 35; Appendix M). The correlations between T1 and T2 ranged from .047 to .669. In comparing the experimental treatments with Group C on T2, the F test was significant (F = 2.40, df 6, 213, p <.05, Appendix M) but the treatments accounted for only 6 .3% of the total score variance . Scheffe's multiple comparison technique did not indicate any pairs of means to be significantly different at the .05 level of significance . Thus the significant contrasts must have existed in a combination of pairs rather than in single comparisons. The expected differences in transfer among the experimental treatments did not occur . Because the experimental treatments did not differ from the control group, the transfer test appeared to be the most difficult of the three tests . This order of difficulty of tests was congruent with the theoretical formulation, but the experimental treatments were expected to be superior . The T2 means were generally higher than the T1 means, as indicated by the analysis of variance . However the slight rise 88 did not reveal any differences between pairs of means . Discussions of the unexpected results on the achievement, structure, transfer, and time variables is given later. E11211. -- As expected there were no differences (F = 1 .41 , df 5, 150) among the experimental treatments on the number of errors on the daigam training program . Appendix J gives the analysis of variance on errors (E) for the experimental treatments . Although some Ss made three or four errors on the training program, the average number of errors for each experimental treatment was less than one. Thus the hypothesis that Ss would make few errors on the program was supported . Guttman Dependencies Several dependencies were expected to exist among the Items on the structure and transfer tests; some items testing information which served as a prerequisite for successful performance on other Items . Within this collection of items, one item (*6) was a prerequisite for all the remaining Items (refer to Appendix I for this anlcysis). However, every 5 failed this Item, eliminating half of the possible patterns. In general the malority of the remaining patterns on the dependencies were --. Since half of the possible patterns did not exist, an analysis of the remaining patterns would not have been meaningful because an adequate test of the hypothesis was Impossible. Therefore such an analysis was not included in the study. 89 The complete failure on item 6 was not expected . It appeared that something besides knowledge itself was a factor determining perform- ance on the items. Two factors were investigated, the information load of the item and the item format . These two factors are discussed more thoroughly in Chapter VI . Correlations among Time, Errors, Achievement Structure and Transfer Correlations among the main dependent variables were originally computed separately for the experimental treatments. However, since very few differences among these correlations occurred across the six treatments, and the previous analyses had indicated that the treat- ments did not differ. on mean scores for each of these variables, the correlatianal data was pooled across all treatments. Thus, highly stable indices of the relationships among these variables were obtained . Table 3 gives the correlation matrix for these main variables. Since the sanple size was so large for these correlations, correlation coefficients which were not high in absolute value were significant (any correlation coeffici- ent greater than approximately .14 was significant at the .05 level). In order to distinguish among correlations which were significant and high and those which were significant and not as high, a rather arbitrary cutting point of .45 was used . Any correlation coefficient of this magnitude indicated that 20% of the variance on one variable could be accounted 90 for by the other related variable. Table 3 Correlation Coefficients among the Main Dependent Variables for all Subiects Tm E A1 A2 s1 52 n E -09 A1 07 -15 A2 10 -26** 68*** $1 10 -33*** 39*** 46*** 52 ii -23** 29*** 46*... 54*“ TI -04 -I4 18* 21* 06 11 T2 -04 -23** 17* 32*** 25** 30*** 36*** mu P <.001 ** p < . 01 * p < . 05 Time spent reading the reliability passage did not correlate with achievement, structure, or transfer on acquisition or retention (Table 3). Errors on the diagram interpretation program was not expected to correlate with the reliability tests because a small range on the error scores was anticipated . In general the small variability did occur but there more several significant negative correlations (Table 3). A negative 91 correlation meant that a high number of errors on the diagram interpretation program was related to a low score on the reliability tests. This relation- ship is congruent with what one might expect if there were a general ability factor underlying performance on different tasks. However none of these correlations were above the cutting point of .45. 'As hypothesized high test-retest correlations existed for achievement, structure, and transfer, with the structure and achievement test-retest correlations being the highest and greater than .45 . Contrary to expectation structure correlated more highly with achievement than with transfer . Each of the four correlations between structure and achieve- ment was significant but only two of the four structure-transfer correla- tions were significant. In fact, achievement and transfer were more consistently related than structure and transfer . Discussion of these relationships is presented in Chapter VII. Minor Results Experimental Treatment Comparisons on Substructures In general few differences among experimentat treatments were found on the six substructure scores (Appendix G). On acquisition a one-way analysis of variance with seven treatments was computed. Although the over-all F test was significant at the .05 level for substruc- tures (Sb 1, 2, 3, 4, and 5,Appendix N), Scheffé's multiple comparison 92 technique indicated only four significant differences between pairs of means. In each case the difference was between an experimental treatment and the control group (For Sbl, O-D was greater than C and O-NR was greater than C; for Sb5, O-D was greater than C and R-D was greater than C). On retention only the six experimental treatments were compared. There were no significant differences on any of the substructures. The F tests and treatment means and standard deviations for each substructure are given in Appendix N. Thus on acquisition and retention no substructure was learned better by any of the S5 in the three experimental treatments. Any differences that did occur were between the treatments and the control group. The simi- larity among the treatments was also supported by the structural analysis presented on the next chapter. No definite predictions were made about Ss performance on the substructures. However, in light of the generally poor performance by Ss on the total structure variable, and the similarity of the experimental treatments on total structure, differences between experimental treatments on substructures were not expected. Questionnaire ResEnses A questionnaire was given to all Ss during the retest period, one week after the initial administration of the reliability passage. Three questions were common to the three experimental treatments. The first asked if the 93 content of the reliability passage was familiar; the second, if the Ss enioyed reading the reliability passage; and the third, asked the Ss to check which of the six substructures were difficult to understand. Other questions were given to the Ss in the D and V treatInents. The data for these latter questions are discussed below, while the data on the common questions are presented later. Because responses to thequestions were similar for both groups 0 and R within each treatment, the data for both groups was combined. The Ss in the D treatment answered additional questions on how they had studied the reliability passage. Ss indicated little trouble with the small diagrams within the passage, but about half of them indicated problems with the large review diagram. On the review diagram Ss did not always examine the interconnections between the diagrams in a systematic fashion, in fact, two of them ignored the connections. No one method of using the diagrams while reading the passage predominated; all were used about equally. Only one-fourth of the Ss stated that they visualized diagrams while taking the test, and only one-fourth made any other definite associations between the test and the diagrams. Complete data are given in Appendix 0. The Ss in the V treatment also answered questions pertaining to their mode of studying the reliability passage. Again no method predominated, although repetition was the most frequent usage. Very few, less than one- fourth, of the Ss Instantly recognized that a review passage related to a test item. Complete data on the V treatment are also given in Appendix 0. 94 Despite training on diagram interpretation 55 indicated trouble in understanding the diagrams within the reliability passage. This confusion could have possibly been the result of inadequate diagram construction. For example, different sized cells within a matrix could have had different mean- ings for Ss, double-headed arrows might have been ambiguous, etc. These potential difficulties were pointed out by a S who was an engineer and had had previous experience with problems involved in understanding diagrams. If Ss had mastered the material they probably would have been more aware of connections between the test items and the structural review passages and diagrams. Interference among the diagrams and passages as well as individual differences in imagery and ways of reading material might also explain some of the questionnaire results. However the above evidence is interpreted mainly as additional support for the hypothesis that the Ss did not comprehend the reliability passage. More extensive treatment of this hypo- thesis is given in Chapter VII. Relationships among the Maior Dependent, Substructure, Background, and Questionnaire Variables Originally correlation coefficients were computed separately for the six treatments. However, the correlations across treatments were quite similar. So as with the previous correlational analysis among the maior depend- ent variables, data for the remaining correlations were pooled across all subiects. Explanations of the variables will be presented first. 95 Two ability measures were available from the students' records: matriculation examination scores and aptitude test scores. Because two aptitude test scores were available, separate analyses were required. The aptitude tests will be discussed later, however, the matriculation examinations will be explained now. Matriculation examinations in subiect matter areas are given to high school seniors for university entrance requirements. Since the matriculation exams were revised in 1960, it was not possible to use scores from matriculation tests which were administered prior to this date. Thus for 34% of the subiects matriculation scores were not available. For the remaining Ss common scores were available in English (Eng), Mathematics (Mth), Social Studies (SSt), and Chemistry (Chm). As mentioned previously several questionnaire items were common to all Ss. These pertained to the familiarity (Fa) of the passage, the attitude (Att) toward the passage, and subiective estimates of the difficulty of each of the six substructures (ED-f). The Ss difficulty estimates on each substructure showed a rather consistent hierarchy (ranking) across both treatments and groups (Table 4). The rank orders were obtained from the number of individuals who checked the substructure as being difficult (Ss were allowed to check more than onesubstructure). Spearman rank order correlations between Group O and R for each treatment were as follows: for D, r = .656; for V, r = .90 signifi- cant at the .05 level; and for NR, r = .972 significant at the .01 level. Kendall coefficient of concordance among all treatments was . 569 (significant .11 96 at the .01 level). Therefore within each experimental treatment the two groups generally agreed on the difficulty of the substructures and across treatments these orderings were also consistent. In all cases Sbl was rated as the easiest and Sb3 was rated as the most difficult. Table 4 Subiects' Rankings* of Substructure Difficulty by Treatment and Group W Diagram . Verbal . No-Review O R " O R . O , R Sbl 1 1 1 l l 1 Sb2 4 5 4 5 4 4. 5 Sb3 6 6 6 6 6 6 Sb4 3 4 3 3. 5 2 2. 5 Sb5 5 2 5 3. 5 5 4. 5 Sb6 2 3 2 2 3 2. 5 * Ranking of 1 represents easiest substructure. Ranking of 6 represents most difficult substructure. The other variables which were correlated were sex, enrollment in a measurement course (MC), age, the major dependent variable scores, and scores on the six substructures. Table 5 presents the correlation matrix for all of these variables. Substructures It was expected that the six substructures would correlate with the total structure score since each substructure score was part of the total score. 97 No- co: mo mo mo mo- oiow 00 mo- .. new nmN N—I moi mo- FF mo- co mo- UNF mo ocF o— mp nNN mo moi mnom cram —P we. MP1 nmm we. no No: per 00 mo mo NF Miom Niom No1 m_1 amp- 501 NF: mo- n—NI 00 NP1 mp- newt nmmi now- mo or- uuq moi umm- m—I mo mo No- F0 on Eco umm nu: mo 0— No- No- mF No Po m_ we NF. 001 FF. men mo v— com me mp Foi NP mo- mo mp- mo mp- _—I cw- No v— mp NF- Fo .. uvm uom cam .. mo mam .. m_ e - NP mo 0F @— Pci FF mp No w_ «p mp n—m o_ umm u—N mp uvm uvm 3mm mum nmm mmm opm m— mooornsm Fpm Low mm_nmrom> mc_ccco_umo:o use .ocaogcxoom .mpauozcomnzm .ucoocmoma ewe: ocosm mocm_o_memou cowom—oucou mp- 30m: ecu mo- No am— No- we. mo- omm- co No. No. po com- co amp- 001 NF c—1 0—. we oo 0— Fe. xmm m epoch —o o_ mo- so- oo: or «o mo mo we —ou 00 me me. ac- we- co- we mo- 2mm- VP mo- —P Po mp op .F oo umm mo- mo umm OF NF no mp me. we uN—I NC v— mo MFI no- MFI .. me _P .. mo w_1 oo- 2mm NF- u—m Po- LAN new mmm mmm amp 2mm P— op 2mm as m_ mo mo- For No- mp me me swam emom scam m— mo- mo FF op- mo- nmm- 2mm co m_ u_m1 co .01 no mo Po- UNF mo- _~ we owe mam com ckm cmm- Po emnm mo c_ m_ omm mo FF _or me. mo new m— nmm uo. mmm 5mm mmm um— mp- No swam «FDm cor No co mo- op- m0. m0. mo N— m— no CF m—I co: e—I mam Fc op mo Ac- o_ No mo- mmm nmw ave NF me me- we. mom 8.; Fo.v a _oo.v a cowocmuom oom Po mo No. me. po- mo- mo op- no- no mFI No Fo- mo NP no mo- mo- m— oo- op- mo- no omm- 00. 2mm- m—1 «P amp om_ mo NF NF mo mF pom com 0— ~_ No- MP poi UFN o_ v— m— FF _o No. we. m_- mo- co- Ho- mo- oo_- mp- co- co w— mo m_ _p cmm m_ p_ Fe Fe mmm mo- mo mum op mem om_ mo m_ mo new we _P om_ PF 2 MP mom o_m mo .. or com mp .. um? new .. oo— mo v_ mo o_m 6 MP co co m— eow mam mmm mmm c com com emm mxm a e_m 2mm mnm mmm a mxm ppm mom mom eNN- new- emm- nmm- mo co m_ mo- mam vow mom mom 1! IUDU mo mp co mo m0 m0 w_- Fe- No oo- mp oo OF NO. no- up No mp co. m_ Om we we mo me ~— Fm mo- mm om mm «— cc- «P Pom miom miom «new miom Piom uu< mu Ecu umm so: ecu mo< or xmm room smom seam «mam «Nam e—om mom mam cam mam mom —om N» FF mm —m m< —< Eh 98 It was also predicted that Sb 3 would correlate with transfer since the original analysis had shown that this substructure was a pre-requisite for many of the transfer items. Originally it was also hypothesized that the substructure scores would be more highly related to transfer than achievement. However the previous analyses in Chapter V showed that achievement and structure were related and that structure was not related to transfer. Because of the built-in dependency between substructure and the total structure scores, it was clear that the original hypothesis would not be supported. However, the relationship between Sb 3 and transfer might still be high, depending upon the correlational pattern between the other substructures and transfer. Each substructure correlated significantly with the total structure score on retention and acquisition, as hypothesized. However, the strongest correlations were within time periods, rather than across time periods, i.e. , acquisition substructures correlated more highly with acquisition total structure than with retention total structure. The substructures were more highly-related to achievement than to transfer, and Sb 3 did not correlate with transfer. The lack of relationship between structure and transfer can be largely attributed to the Ss' inadequate comprehension of the reliability passage and to the format and information load of the items. These factors are discussed in Chapter VII. Time did not correlate with substructure but errors did, especially on acquisition. Significant negative correlations between errors and sub- structures meant that Ss who made many errors on the diagram training program 99 did poorly on the substructures as well. This finding is consistent with the relationship between errors and the main dependent variables. It was hypothesized that Sb 2 and Sb 3 would be related to each other on acquisition and retention. The relationships among the other sub- structures were predicted to be low. In general Sb 2 and Sb 3 were not highly correlated (two of the four correlations being significant and none greater than .45). Within each testing period, the inter-correlations among substructures were generally low (8 significant correlations out of 30 and none greater than .45). The lack of relationship between Sb 2 and Sb 3 can be explained by the contaminating factors of format and information load which are discussed in Chapter VII. It was also hypothesized that correlations between corresponding substructures over time (acquisition and retention periods) would be high. For each substructure significant correlations did occur, supporting the hypothesis. Other correlations between acquisition and retention substructure scores were not significant, as expected. Thus except for the lack of relationship between Sb 2 and Sb 3, the original hypotheses about the relationships among the sub- structures were supported. Questionnaire.--The questionnaire items of familiarity, attitude, and subiective difficulty estimates will be discussed in that order. It was hypothesized that familiarity with the content in the reli- ability passage would correlate with enrollment in a measurement course, and 100 would correlate with achievement and structure but not with transfer. However, familiarity and enrollment in a measurement course were not correlated. Some Ss could have been exposed to reliability concepts in other courses, or the content of the reliability passage itself could have been quite different from the treatment of reliability in an undergraduate measurement course. Both of these reasons could apply to the present sample. Some students learn about reliability in learning and individual differences courses. The treatment of reliability in the experimental passage was more technical than is often given in undergraduate courses. These two explanations are somewhat contradictory, but students could have interpreted "familiarity" differently, this making both explanations possible. Familiarity correlated significantly, but positively, with Al, SI, and S2. A significant positive correlation meant that individuals who were not familiar with the content of the reliability passage scored higher on achieve- ment and structure than those who were familiar with the content. This result was contrary to expectation. Perhaps Ss who thought they were familiar with the content did not study the passage as intensively as those who were un- familiar with it, thus accounting for the unexpected results. Different inter- pretations of "familiarity" by Ss could also have confounded the relationships. It was hypothesized that attitude toward the reliability passage would not be correlated with enrollment in a measurement course and would also not correlate highly with the dependent variables. The first hypothesis 101 was supported, but the second was not. Significant negative correlations occurred with A1, A2 and SI and with 5 of 12 substructure correlations. A significant negative correlation meant that those who disliked the reliability passage scored lower on the dependent variables than those who enjoyed the passage. This result is consistem with two general ideas about the relationship between interest and performance: (a) interest increases motivation thus increasing attention while learning and the final performance level and (b) lack of success on learning and assessment tasks decreases perceived interest and motivation. In general there were no significant correlations among the sub- structure difficulty estimates. Correlations among the difficulty estimates and the main dependent variables were generally non-significant as well. Of particular interest was the relationship among estimated and actual difficulty (determined by 55 scores) on the six substructures. It was It was expected that those who cited a substructure as being difficult would perform poorly and those who understood the content would score high (this relationship being reflected in significant positive point-biserial correlations). However, this was not the situation. Only two of seventy- two correlations were significant for both acquisition and retention (one positive and one negative). One possible reason for the lack of hypothesis support was that the estimation scale (a "yes" or "no") was not sufficiently sensitive to dis- tinguish between the different levels of subiective difficulty that could have 102 existed for the Ss. Thus the scale could have been broadened. Another possibility would have been to follow-up on initial indication of difficulty with a question which asked the S if he thought he had mastered the material even though he had found it difficult. The small variance on some of the difficulty estimates also attenuated the correlations. Background Variables.--Sex, enrollment in measurement course, . age, matriculation, and aptitude variables will be discussed in that order. Sex, enrollment in measurement course, and age were not expected to correlate with the dependent variables. The results for sex sup- ported this hypothesis, however two exceptions occurred between enrollment in measurement course and AI and A2. A significant negative correlation meant that Ss who were enrolled in a measurement course scored higher on achievement than those who were not enrolled. The absolute values of these correlations were not high, but they did indicate that the achievement test was more similar to the content in measurement courses than were the structure or transfer tests. Age correlated with errors on the diagram training program, indicating that the older Ss made more errors than the younger Ss. Matriculation scores did not correlate with time spent reading the reliability passage, but Eng and Chm scores did correlate negatively with errors on the diagram training program, indicating that those Ss who scored highly on these matriculation exams made few errors. Matriculation scores were significantly related to achievement (except Eng and A1), and there was a 103 tendency for matriculation scores to be related to structure (three out of eight were significant). However, matriculation scores were not related to transfer. Although it would have been desirable to administer a common aptitude test to all Ss, this was not feasible since course requirements pre- vented taking additional time from class instruction. Therefore aptitude scores were taken from the students' records. Unfortunately, the aptitude data was not complete: some Ss had only the American Council on Education Psycho:- logical Examination (ACE, 1948 edition), some Ss had only the Canadian Academic Aptitude Test (Coat), and some Ss had no aptitude test. Reliability coefficients for both the ACE and CAAT range from..80 to .90. The validity of the ACE, especially the quantitative scores, has been questioned, and since the CAAT is a relatively new test very little validity data is available (Buros, 1965, 1959). Both the CAAT and the ACE provide a quantitative (Q), a verbal (V), and a total (T) score. However, there were no norms comparing the ACE and the CAAT, and since they differed in certain respects (the ACE developed by Thurstone and based on his theory of separate abilities, the verbal part being primarily linguistic in nature, while the CAAT was developed to meet the needs of the schools within Ontario), the scores on these two tests were treated separately. When the Ss within each experimental treatment were divided according to aptitude tests, only five divisions were large enough 104 to use in statistical analysis. Correlations between aptitude and the depend- ent variables for these divisions are given in Appendix P. It was hypothesized that quantitative ability would be related to the structure and transfer scores for individuals within the D treatment. This relationship did occur for the 0-D treatment with the CAAT test. Time was expected to correlate negatively with verbal ability and achievement was expected to correlate positively with verbal ability and general aptitude. None of these correlations occurred. Several other significant relationships existed, but it was difficult to interpret these finding because of the small sample size and the low validity of the aptitude tests themselves. In general, few background variables correlated with transfer, leaving most of the transfer variance unexplained. Evidently the traditional variables of sex, age, and aptitude do not adequately predict individuals' ability to transfer and apply knowledge to new situations. In this chapter the maior results and minor results have been pre- sented with discussion of the minor results. Discussion of the maior results is in Chapter VII. Chapter VI presents a structural analysis of the substructure and transfer items with a discussion of the importance of such analyses for the classroom . CHAPTER ‘v’l ANALYSIS OF THE COMPREHENSION AND RETENTION OF STRUCTURAL RELATIONSHIPS The analysis presented in this chapter will illustrate the type of diagnostic information that can be obtained from a test based upon diagrams as representations of structure. Such diagnostic information can not usually be obtained from the ordinary classroom test. Therefore one purpose of this chapter will be to emphasize the advantages of tests derived from a structural representation of subiect matter. The other purpose of this chapter will be to present and discuss the differences between the structural relationships which the Ss learned and remembered, and those relationships which actually existed within the reli- ability‘passage used in the present study. Each substructure was analyzed for the relationships that existed for Ss on acquisition and retention. Responses which were the some on both acquisition and retention (referred to as consis- tent responses) were also examined. The transfer items were analyzed similarly. The data on the six substructures is presented first, followed by the transfer data, and finally an interpretive summary of the maior findings. The following presentation of data will be used throughout the analysis of structure and transfer items. Tables presenting both acquisition- 105 106 retention and consistency data for the substructures and transfer are given. When the correct response was not dominant within the acquisition-retention responses or within the consistent responses, the percentages for the dominant one(s) are presented after the percentages for the correct response. In all cases the results are presented in percentages. The detailed analysis of each item is not given in this chapter but can be found in Appendix G. The items for each analysis are numbered according to this appendix and are coded as St’. Since the O and R groups did not differ in performance on the sub- structures, the results are given for the three treatments with data pooled across groups. The results for the three treatments were not pooled in order to illustrate the similarity of Ss' responses across treatments. Structure In order to give a perspective on the relative difficulty of the relationships within each substructure, Table 6 gives the percentages of Ss who had all the correct relationships within each substructure on acquisition and retention. Using these percentages as a measure of difficulty, Sb6 was the easiest, then SbI, Sbs 2 and 5, and Sbs 3 and 4. This order of difficulty was the same across treatments. Although the two easiest substructures, I and 6, were not exceptionally easy, i.e. , not 80-90%, the remaining four substructures were-extremely difficult, as indicated by the fact that in most cases no 5 had all the correct relationships. 107 Table 6 Percentage of Subiects with Perfect Substructure an Acquisition and Retention NR v o Sbi 28—32* 28-24 25-38 5152 02-00 04-02 00-00 51.3 00-00 00-00 00-00 51.4 00-00 00-00 00-00 Sb5 00-00 06-07 16- 10 51:6 40-44 56-61 43-45 * Second figure is retention percentage Substructure 1: Degree of Reliability and its Practical Importance A total of twelve relationships was tested with three true-false items (St 20, 21 and 22a). St 20 tested for six relationships, and St 21 and St 22a each tested for three relationships. Table 7 gives the acquisition- retention and consistency results for the total substructure and each of these items. The overall difficulty level for this substructure was fairly low (20-30%). In order to be significantly greater than chance (at p = .05, chance = 12. 5%), the percentage of correct responses should have been greater than 24%. For most treatments this was the case. The percentages 108 Table 7 Substructure 1: Acquisition- Retention and Consistency Percentages for Structures 20, 21, and 22a NR V D Acquisition- Retention Sb 1: Total 26-34* 28—23 25-38 St 20 65-76 75-58 74-85 St'21 63-53 62-57 53-60 St 22a 88-93 78-89 87-91 Consistency St 20 75 82 73 Correct 72 73 94 St 21 64 70 65 Correct 56 58 55 St 22a 78 83 85 Correct 100 92 95 *Second figure is retention percentage. for each part were higher than for the total, with St 22a being the highest. In general on St 20 and 22a these percentages were significantly greater than chance (chance = 50%, at p = .05, greater than 68%). However, the per- centages for St 21 were below this level. Differences between treatments for each item did not occur. There was no consistent trend over retention, 109 some figures were higher than acquisition and some were lower. The level of most percentages on acquisition-retention indicated that this substructure was grasped by most of the Ss. Within this substructure the prediction relationship (St 22a) was grasped most clearly and the differences among traits of an individual (St 21) grasped least clearly. This trend could be explained, in part, by the nature of the items rather than by the nature of the relationship itself. The application situation in the true-false statement used for St 21 was more confusing than the situation described in St 220. Thus the structural relationship which the Ss had reversed was differences among traits. Diagrammatically, a representation of their cognitive structures would be as follows: (The two different relationships on the last line indicate that about 50% of the Ss had the correct relationship between degree of reliability and differences among traits of an individual and the other 50% had it reversed). Cognitive Structure Low i_ Reliability J High / r Differences among Individuals \ P UnstableL Stable“ on Same Test _. Assignment of Individuals Uncertain,_ to Groups 1 Certain ' fl Inaccurate, Prediction _4 Accurate T 1 ‘— 9 Differences among Traits Stable| of an Individual _J Unstable or Differences among Traits #— Unstabl . . of an Individual ‘ _ . Stable -l 110 The consis'te’ncxlevel for items 20 and 22a was significantly greater than the chance level of 50% (at p = .05, greater than 68%). However, as with acquisition-retention, St 21 percentages were generally not significantly different from chance. Differences between treatments on each item did not occur. The percentage of correct- consistent responses was generally‘quite similar to the proportion of correct responses that occurred on acquisition- retention. In all cases this correct response was dominant. Using consistency data as an indication of stability of Ss cognitive structures, only one of the four sets of relationships within Sb Itseemed to be rather unstable (differences among traits of an individual). This relationship was also the most difficult for the Ss. Whether this difficulty and lack of consistency was the result of the item itself or the underlying relationships could only be determined by writing other items testing the same relationships. Substructure 2: Definition of Systematic Factors and Variation and of Unsystematic Factors and Variation A total of twelve relationships were tested by a pattern type true- false item (St 2). Scores for response patterns could range from 12 to -1 (indicating contradiction within answer pattern itself). Several scores were dominant on acquisition-retention (9, 3, and -1). These scores, and the maximum score of 12 are givenseparately in Table 8 but the remaining scores are grouped. Table 8 also~presents the consistency data with the percentages of the two dominant consistent responses (scores of 9 and -1). 111 Table 8 Structure 2: Acquisition-Retention a n'd Consistency Percentages for Structure 2 NR V D Acquisition- Retention Score 12 02- 00* 04-02 00-00 9 37-29 28-36 ' 47-33 3 15- 17 18-28 2 1-23 - 1 45- 50 44-28 28-36 Others 03-04 06- 07 13- 10 Consistency Total 40 43 4 i 9 40 30 44 .. 1 58 43 32 *Second figure is retention percentage Chance level on this item was .78% and percentages that were significantly greater than this level (at p= .05) had to be greater than 3.8%. Using this as a base, Ss who gave the correct pattern were, in general, respond- ing at chance level. But Ss who responded with patterns that were scored either 9, 3, or -I were definitely responding above the chancelevel. Acqui- sition and retention percentages were quite similar for each pattern. No dif- ferences between treatments occurred, i. e. , if a pattern was low for one treatment it was also low for the other treatments. 112 Thus few Ss grasped all of the relationships (three on acquisition and one on retention had the correct pattern); yet dominant patterns did occur. The score of 9 indicated that Ss identified types of unsystematic variation and factors and systematic variation and factors correctly, but that they had varia- tion as a cause of variation rather than factors alone .as the cause of variation. This response indicated a problem in understanding causal relationships. The score of three indicated that only part of this 9 pattern held; Ss identified only one type of unsystematic variation and factors correctly. They had this type of unsystematic factor as a cause of variation but also had this type of un- systematic variation as a cause of variation. With this pattern it was impos- sible to infer Ss knowledge of systematic variation and factors. The -1 scores indicated a pairing of systematic and unsystematic factors and/or variation. If Ss had carefully examined the true-false items, this contradiction in terms should have been apparent. However, the contradictions might have been the result of test taking behavior. If a S was doubtful about the correct answer he might have marked several alternatives "true" in the hape of getting at least one correct (Ss did not know how the items were scored). Diagrams of the cognitive structures of Ss corresponding to the three maior response patterns are given below. The actual structure of the material is also presented. 113 Structure of the Material S tematic Uns stematic actors actors (Cause) Conltant V; ing . J l Systematic Unsysjematic Varbtion Variation Constant Varying Orderly Type of Score Pattern X Arrange- Complete ment Lack of Order X X Cognitive Structure Systematic Unsystematic Factors 8. Variation Faftors‘ & Yariation (cause) I 4 'fi i CT C7 UF V V Systematic ‘ U systinatic a 'ati ' Variation onsfant arying i Orderly X Type of Score Pattern Arrange- Complete ment Lack of X X Order Score of 9 CL‘F C V (cause) ‘ Constant Unsystematic Variation we 9' Score Arrangement Complete Lack of Order X Score of 3 (only constant unsystematic illustrated.) 114 Systematic Unsystematic Factors Factors (Variati ) (V riation) (cause) Systematic Unsystematic Variation Variation Constant Varying‘ Orderly X Type of Score Pattern Arrange- Emplete ment Lack of X X X Order Score of -I (This general pattern was shown. Constant and varying unsystematic factors have been omitted from the diagram to clarify the Ss general confusion). Considering all possible response patterns that could occur, the chance level for the consistent responses was . 78%. The percentage of consistent responses that did occur was significantly higher than this level (at p = .05, greater than 3. 8%) for each treatment. The percentages for the two domin- ant responses, scores of 9 and -1, were generally quite similar to the pro- portions that occurred on acquisition and retention. Although Ss cognitive structures did not correspond to the structure within the material itself, their cognitive structures were rather stable, as indicated by the consistency percentages. Generally speaking, those Ss who were consistent either had grasped most of the relationships or were confused. 115 Substructure 3: Effects of All Types of Factors A total of 18 relationships were tested. One item, St 6, tested all of these and was a memory item. The other two items (St 4 and 5) were application items, together testing the same relationships. Analysis of the memory item will be presented first. The memory item consisted of six descriptive parts which were to be identified as effects characteristic of systematic, constant unsystematic and/or varying unsystematic factors, or none of these factors. Thus the 5 could give one to three answers for each part. Table 9 gives the proportions on acquisition-retention of the correct answer, and when this response was not the dominant one the other common responses are also given. If a response was not dominant, no percentage is given for it. Each part of the item is identified by the correct answer for that part (S-Systematic, CU-Constant Unsystematic and VU-Varying Unsystematic). Strictly speaking the chance level for each part of St 6 was 6.24% (1/24). However this was considered too conservative a chance level because Ss rarely answered with 'none of these'. Therefore a more appropriate base level of 12. 5% (1/23) was considered. Yet a response bias occurred. Ss tended to respond with only one factor, i. e. , the most common responses on each part were generally VU, CU, or S. Responses with more than one factor were quite infrequent. If the 12. 5% level had been used for each part, significant differences would have occurred where the proportion of responses 116 Table 9 Substructure 3: Acquisition-Retention Percentages for Structure 6 - Memory Item NR V D Total 00-00* 00—00 00-00 Part Response a S S 67-52 48-44 57-51 CU 14-21 17-27 17-27 CU,S 08— 15 14-04 13-09 b VU VU 32-30 36-42 19-20 - None 29-13 16-17 20—13 CU 20-25 18-09 08-25 S 05-04 08-10 28-05 c CU,S CU,S 12-12 07-07 07-08 CU 58-28 37-31 32-39 S 1 9-40 33-48 43- 42 d S S 48-31 34—29 29-34 CU 20 18-21 25 None 12-20 13-15 21-09 VU 02-25 08- 07 25- 17 e VU VU 65-72 72- 57 51-48 CU 20-11 16-16 16-16 VU, CU 06-08 04-09 06 S 05 16 fVU,CU VU,CU 08-07 04—02 01-08 VU 57-56 57-39 59-44 CU 08-30 11-36 . 13-26 * Second figure is retention percentage. 117 was probably at a chance level (Ss choosing between VU, CU, and S). There- fore on the parts where only one response was required, the chance level of 33% (1/ 3) was used in evaluating the proportion of correct responses. On the parts where two or more responses were required the chance level of 12. 5% was used in evaluating the proportion of correct responses. The same rationale was applied to determine the significance of wrong answers - 33% for one response and I2. 5% for two or more responses. This rationale was applied to all other items of this type as well. No S received the maximum score for this item as a whole on acquisition or retention. Thus Ss were responding at chance level (1.4%) and did not understand all of the structural relationships tested by the item. The data showed no differences among the treatments, but rather consistent response patterns instead. In other words, if the item was missed by the majority of Ss in one treatment it was also missed by a majority of Ss in another treatment. In addition the answers which were most common for one treatment also tended to be most common across the other treatments. Examination of each part of the item also indicated certain trends on acquisition and retention. Parts a and d both tested knowledge of the effects of systematic factors. However part a was easier than part d (per- centages for part a were significantly greater than chance, while percentages for part d were at chance level, at p = . 05, greater than 50% required). Both parts were memory. Thus one would assume that if a S knew the correct 118 answer to part a he would also have answered part d correctly (d was an instance of the general case cited in part a). However, that was not the case. Parts b and e both tested knowledge of the effects of varying unsystematic factors. As with the systematic factors, there was a difference in difficulty level for these two parts, part b being more difficult than part e (percentages for part e were significantly greater than chance). Both parts were memory. In this case part b was an instance of the general case cited in part d. Thus if part d was answered correctly, part b should have also been answered correctly. This was not the case. So with both systematic and varying unsystematic factors a similar trend existed. Ss, within the memory domain, had less difficulty with the general case than with a specific example of it. The general cases (parts a and e) were taken verbatim from the structural repetitions given within the test itself. The remaining two parts of this item (c and f) tested, in a verbatim fashion, the students' knowledge of the overlap between the effects of two factors, CU-S and VU-CU respectively. In both instances the same pattern occurred. Very few Ss understood the overlap completely; they did not grasp what was common to both factors. Instead Ss responded with only one of the correct factors. For both parts Ss were responding at chance level (11. 5%) with the correct response. On part c both the CU and S responses were generally not significantly different from the chance level (at p = .05, greater than 50% required). On part f the proportion of VU responses was 119 generally significantly greater than chance level, while the proportion of CU responses was at chance level. Considering the responses to the verbatim parts of St 6 (a, c, e, f) Ss understood the unique effects of VU and S but not the overlap in effects which each of these two factors has with CU. As with Sb 2, Ss had trouble with causal relationships. The difference between the structure of the material and the students' cognitive structures can be briefly diagrammed as follows (here only two response patterns to these four parts are illustrated - acef: S, C U, VU, VU, and S, S, VU, VU). Cognitive Structure Fa tors vu 5 Structure of the Material Cause A0 Factors Effects VU 'U (S, CU, VU, VU) CW“ Factors 9 vu cu Effects C00” A (S, CU-S, VU, vU-CU) Effects (S, S, VU, VU) In conclusion Ss tended to simplify the actual structure of the material both on acquisition and retention. This tendency may be, in part, strongly related to the causal relationship itself, i.e. , that individuals usually think in terms of single rather than multiple causation. 120 The analysis of substructures I, 2, and 3 illustrates the type of diagnostic information which can be obtained from a structure relationships test. Such information is not obtained from the ordinary achievement test because the items do not systematically test for all structural relationships. The confusions and errors that students make on such a test are revealed by the wrong alternatives they mark. However, rarely are such alternatives specifi- cally designed to spot structural confusions that the students might have acquired. More important, perhaps, is the fact that most tests are not analyzed for the diagnostic information that they could provide. However, a test based upon an analysis of the structure of the material is deliberately constructed such that one of the main objectives of the test is to provide diagnostic information about the learner for the tester, teacher, or researcher. The value of this type of information is clearly illustrated by the cognitive structure of Ss on Sb 3. Sb 3 examined multiple causation relation- ships, a type of relationship found in many disciplines. Ss had difficulty with the relationships, and tended to simplify, not complicate, them. With this _ type of information a teacher could easily pinpoint learning problems. How- ever a typical test on the same topic would have tested for only one or two of these relationships. If the student responded correctly, then the teacher would have assumed that the student "knew" the material. However, as indicated by the present structural analysis this conclusion probably would have been in error for most students. Since these relationships were prerequisites for many 121 of the transfer items, a complete assessment of each of them is important for predicting and interpreting performance in transfer situations. Table 10 gives the consistency percentages for St 6. In general the degree of consistency for each part of St 6 was not above chance. Generally 40% or less of the S5 had the some answers on acquisition and reten- tion. Within each part, the same answer was dominant for all treatments. The only major exception was part c where there was a split between CU and S as consistent answers. For parts b, c, and d the proportion of dominant consis- tent responses was similar to their proportions on acquisition-retention. How- ever, for parts a, e, and f the dominant response on acquisition-retention was weighted higher as a consistent response. The abbreviation "Con" will be used on all tables to stand for the percentage of consistent responses on each part or each item. Degree of consistency for the parts to this item was not high, indicating rather unstable cognitive structures. This degree of unstability and the degree of difficulty were not expected on this memory item. In general those responses which were dominant on acquisition-retention were also the most stable responses. On the basis of these findings one would predict that students would do poorly on application type items where the some structural relation- ships were the basis for the required application. The next two items described (St 4 and 5) tested this prediction. St 4 tested for effects of the three factors 122 Table 10 Substructure 3: Consistency Percentages for Structure 6 i j— NR V D Part Response a 5 Con 42 31 41 S 80 56 80 CU 12 b VU Con 36 30 25 VU 44 52 48 None 27 19 c CU,S Con 41 36 26 CU 58 43 45 S 34 52 48 d S Con 34 27 33 S 54 50 45 VU 20 12 None 15 12 e VU Con 49 34 3o VU 100 100 80 f VU, CU Con 34 33 42 VU 100 80 81 on different occasions and St 5 tested for effects of these factors on the same occasion: Table 11 gives the acquisition-retention percentages for both St 4 and 5. The acquisition- retention percentages for the correct response to all relationships within St 4 were significantly greater than chance for each 123 Table 11 Substructure 3: Acquisition-Retention Percentages for Structures 4 and 5 NR V D St 4 Total 48-36* 45-44 51-42 Part Response VU VU 84- 87 84—86 80- 76 S 08- 05 10-09 10- I 1 CU 06- 07 04-02 06- 08 CU CU 62- 47 60- 62 59- 63 VU 20—12 18-22 21-20 S 16—36 21— 18 13- 10 S S 64- 55 76- 73 73- 74 CU 22-28 18- 18 27—25 VU 05-04 04-07 VU VU 80- 83 82-84 90-91 CU 12- 09 18- 17 08-04 S 04 02- 02 St 5 Total 02- 02 00-00 00- 00 Part Response CU, S CU, S 04-02 00-00 00-04 CU 52- 36 32-41 40- 39 S 33- 55 62-48 50-46 VU 08-04 02-06 06-04 VU VU 88- 75 78- 70 73-73 CU 09-26 18-24 17-19 S 04-02 05-04 CU VU 40-25 44-22 24-21 CU 24— 37 37-32 38—40 S 38- 32 21—44 27- 28 * Second figure is retention percentage. 124 treatment (p = .05, greater than 4. 9%, chance level = 1.2%). However the acquisition-retention percentages for the correct response on all parts of St 5 were not significantly different from chance for each treatment (chance = 1.4%). The high percentage of correct responses on St 4 was not expected, because 55 had found St 6, the memory item which tested for the same relationships, very difficult. However, the difficulty of St 5 was consistent with the results on St. 6. As with St 6 the most common responses were either VU, CU, or S. The percentages for the correct responses for each part of St 4 were significantly greater than chance for each treatment (at p = .05, greater than 50%, chance = 33%). Each part required only one response. This requirement alone may have increased the percentage of correct responses. However, the level of percentages for simiiar parts of St 6 were not as high. Perhaps the technical wording in the memory item tended to confuse Ss in classification, whereas in the application item the labels themselves and their apparent meaning served as strong cues to the appropriate classification. For St 5 the first part was quite difficult (responses were at chance level of 12. 5%). The correct answer for this part required both CU and S. In general Ss tended to respond with only one of these answers. This response pattern was consistent with results on St 6 (part c). The other two parts both tested for application of VU factors. However, the first part was easier than the second (first was significantly greater than chance but the second was 125 generally not). An explanation for this is not given, but it~does~indieatethat two application situations are not of equal difficulty for 55 even though the same principle is required in both. The double response requirement on St 5 and the response bias of the 55 could, in part, explain the low percentages that occurred. However, the complexity of the structure itself was probably another factor. On the basis of these items it was difficult to determine exactly what structural relationships the $5 acquired. But the unique part of the VU and S effects were acquired by most of the Ss. The CU effects overlapped with VU and S effects, and Ss acquired only half of this overlap in each case. Consistency data for the total patterns on both St 4 and St 5 was significantly greater than chance levels of 2. 9% and 6. 9% respectively (Table 12). Since the correct pattern was the most consistent one on St 4, separate figures are given for it. Two patterns were high on consistency for St 5 and the percentage figures are given for both of these patterns. Con- sistency data for the parts of each item is also presented. Consistency was generally higher for each part on St 4 and 5 than for the total pattern. This finding was similar to the consistency data on the memory item, St 6. Also the degree of consistency, in general, tended to be higher on these items than on the memory item. At least 50-60% of the Ss were consistent on St 4 and 5, while the memory item had at least 40%. The exception to this trend was the third part of St 5 where the consistency was 126 Table 12 Substructure 3: Consistency Percentages for Structures 4 and 5 NR V D St 4 Total Con 24 42 31 VU, C U, S, VU 72 62 54 Parts VU 78 75 72 CU 51 62 49 S 62 64 64 VU 70 84 82 St 5 Total Con 17 21 21 CU, VU, S 50 36 33 S, VU, CU 13 42 32 Parts CU, S 48 66 49 VU 72 69 65 VU 32 36 39 lower: If difficulty is related to consistency, i.e. , difficult items yielding fairly'lowdconsis'tent results, the relatively high consistent percentages on St 4, in contrast to St 6, can be explained. However St 5 was as difficult as St 6 but St 5 responses were more consistent. Another factor may have been the technical wording used in St 6. Again no differences between treatments occurred. If a pattern was low (or high) for one treatment it was also low (or high) for the others. 127 The consistency data was in accord with the basic acquisition- retention data on both the memory and application items, with the application items (especially St 4) being easier and more consistent. Both of these trends were contrary to expectations. Attempted explanations of these results have been given. Substructure 4: Definition of Reliability, Reliability Coefficient, and Correlation Coefficient A total of 21 relationships were tested by using six different items. Two items (St 14, and 15) tested for 16 of these relationships. St 13 was a pattern true-false item (two statements); St 14 was a pattern true-false item (six statements); St 15 was two independent true-false items; St 16 and 18 were true-false items; and St 19 was a matching item (one stem and two correct options). Acquisition-retention and consistency percentages are given in Table 13. If the correct response was not the majority response, the dominant responses are also given. Two items (St 13 and 14) had very few Ss answering correctly. For St 13 Ss were responding significantly below the chance level of 25% (at p = .05, less than 10.6%), while for St 14 Ss were responding at the chance level of 1.6%. The error in St 13 was that Ss interpreted the correlation coefficient as a cause, as causing high or low relationships between tests rather than describing such relationships. The error in St 14 was that Ss stated that reliability was a quantitative index. The substructure itself made a 128 distinction between the concept of reliability and the quantitative index of the reliability coefficient. St 16 generally exhibited a 50-50 split between the two possible responses (a true-false item). This was a rather straight forward interpretation of the relationship between parallel and non-parallel tests and reliability. However, Ss responded at chance level. Responses to St 18 (true-false item) indicated that Ss did not integrate the concepts of parallel tests and unsyste- matic variation with the degree of reliability. In fact, responses to St 18 were significantly below the chance level of 50% (at p = .05, less than 31%). Table 13 Substructure 4: Acquisition-Retention and Consistency Percentages for All Items NR V D Acquisition-Retention Total 00-00 00-00 00-00 Item Score St 13 3 00-00 00-00 00-00 2 76-85 76-73 83-86 0 23-13 18-24 17-14 St 14 10 00-02 02-02 03-02 9 80-74 74-75 76-79 St 150 2 66-60 69-74 63-64 15b 4 67-70 57-64 83-57 St 16 2 55-79 57-41 42-52 0 45-21 43-49 58-48 St 18 1 20-15 29-12 17-09 0 80-85 71-88 83-91 St 19fd 2 58-66 88-74 71-59 129 Table 13 (Continued) NR V D Item Score Consistency St 13 Con 66 75 87 2 93 83 94 St 14 Con 70 70 60 9 97 94 100 St 150 Can 61 79 71 2 75 71 69 15b Con 66 52 59 4 81 70 86 St 16 Con 64 59 61 2 77 62 12 0 57 St 18 Con 80 72 81 0 89 87 97 St 19 Con 59 70 55 2 76 97 84 *Second Figure is retention percentage. St 15 a and b checked if Ss knew the relationship of unsystematic variation to reliability and correlation coefficients, and if they knew the relationship of parallel and non-parallel tests to both of these coefficients. Generally speaking Ss were responding at chance level (50%) on both of these parts (at p = .05, greater than 68% required). However some treatments 130 were above chance level - for a, V and for b, NR on retention and D on acquisition. St 19 tested for two aspects of reliability, its relationship to unsystematic factors and its strength. Using the chance level of 25% (prob- ability that Ss would respond with both of these factors correctly, ignoring the other options in the matching item which were relevant to other stems), Ss were responding significantly above chance (at p = .05, greater than 39%). In general, most of the relationships (10-16 out of 21) within substructure 4 were acquired and retained at a fairly high level. Again no differences among treatments existed. Diagrammatic representation on the subject matter and the Ss cognitive structures is presented below. The question marks (?) indicate that differences between treatments existed. The major errors were related to the causal relationship and precision of definitions. The latter error would imply that Ss find it difficult to distinguish clearly among closely related concepts. The degree of consistency was fairly high for each item. Degree of consistency was significantly greater than chance for Sts 13, I4, 18, and 19. For St 15a two of the three treatment percentages were significantly above chance level. For St 16 and 15c the responses were generally at chance level. In all cases the response that was most common on acquisition and retention was also the most consistent. Generalizing across all items on substructure 4, the cognitive structures of the Ss were rather stable. 131 Rel. Corr. Rel . Coeff . Coeff . Structure of the Quant. Index X X . Degree of Matem‘ Unsys. Var. x x x Tesbl Par. x x x Not Par. X #4 High Unsystematic Variation Low Cause ‘1 Correlation Coefficient on Low _ Parallel Tests I, High 1 Reliability Coefficient Low f Reliability w High Rel. Corr. Rel . Coeff . Coeff . C . . Quant. Index X l X X ognmve Degree of Structure Unsys. Var. X X Te“ml Para. x ? 7 rNot Par. x ? 7 ~ Correlation Coefficient on Low Parallel Tests High Cause I Reliability .Coefficient 1 High Unsystematic Variation Low High 1“"! Low Reliability 132 Substructure 5: Parallel forms versus Parallel Tests A total of six relationships were tested by a matching item (St 19). The matching item was a memory one consisting of two parts, the characteris- tics of parallel tests and the characteristics of parallel farms of tests. The percentage of Ss who responded with the correct pattern and dominant incor- rect patterns for both parts on acquisition and retention is given in Table 14. In addition the percentage of correct patterns for each response to the parallel forms and parallel tests parts is given. In contrast to the other substructures, definite differences between treatments appeared to exist on the total substructure, with the D treatrnent having the highest percentage of correct patterns followed by V and NR. Chance level for the total substructure was 1. 5% with 5.75% required to be significantly higher than chance at the .05 level. Therefore the NR treatment was always below chance while the D treatment was always above chance, with the V treatment at chance level. However a Chi-Square test showed that the only significant differences between treatments occurred on acquisi- tion (Chi-square = 11.4, significant at p < .01). The order of treatment percentages was also repeated with the parallel tests part and to some extent with parallel forms. For the parallel forms part most percentages were significantly above the chance level of 12.5% (at p = .05, greater than 23%). However, this was not the case for the parallel tests part. On retention all percentages were at the chance level, 133 Table 14 Substructure 5: Acquisition-Retention Percentages for Structure 19 - Parallel Forms and Tests NR V D' Total 00-00* 06-06 18- 10 Parallel Tests Response Pattern ab** 04—04 18-14 32-14 c 14-19 17-16 12-13 be 18-16 12-09 09—17 abc 29-25 19—29 24-26 b 05-08 21- 12 06-02 Correct Response - Each Characteristic a .43-42 47-58 67- 52 b 69-61 73-62 73-68 c 25—31 65-24 49-29 Parallel Forms Response Pattern bc** 27—22 30-33 43-29 b 22-23 12-09 14~18 c 15-17 19— 18 14~28 abc 07-09 16— 16 Correct Response - Each Characteristic a 71-66 66-67 70-62 b 66-66 62-62 74-60 c 68-63 70-72 70-61 * Second figure is retention percentage ** Correct response pattern. while on acquisition some treatments were above and some at the chance level. A Chi-square test for the acquisition percentages on the parallel tests part showed that the treatments differed (Chi-square = 29.3, significant at p <. 001). 134 The differences found between the treatments were consistent with the basic theoretical predictions behind the study. However, since the pattern was not replicated on any of the other substructures, the differences were probably a chance occurrence. No wrong pattern was predominant within or across treatments on the parallel tests or forms parts, even though the correct pattern was not given by a great prOportion of Ss. Within the parallel tests and parallel forms parts, the percentages of correct answers for each option was fairly high. Using 68% as the cutting point for significance (at p = .05) for the chance level of 50%, aption b tended to be significantly higher than chance level for parallel tests, while the other options tended to be at the chance level. For the parallel forms 1 part, options a and c tended to be significantly above chance. Responses an option b were generally at the chance level for both groups. On the basis of these results it was difficult to generalize about the cognitive structures of the Ss. However, using the percentages for the NR treatment on each option, the differences between the structure of the material and the cognitive structure of the students can be diagrammed as fol lows: Structure of the Material Meet Meas. ATways Stat. Same Sim. in Crit. Trait Content X X Parallel Forms X X Parallel Tests 135 Cognitive Structure Meet Meas. Always Stat. Same Sim. in rit. Trait Content I X Parallel Forms | I X X Parallel Tests l Table 15 gives the consistency percentages for this substructure. Table 15 Substructure 5: Consistency Percentages for Structure 19 — Parallel Forms and Tests NR 1 V D Total 19 16 16 Correct 00 13 62 Parallel Tests 39 36 3 1 Correct 03 12 39 a 72 70 61 b 68 73 64 c 78 65 66 Parallel Forms 27 48 3 1 Correct 42 35 50 a 68 68 74 b . 59 71 58 C 61 73 56 136 There were few consistent patterns for the total structure, although all percentages were above the chance level of 1. 5%. Few of these consistent patterns were the correct ones. The degree of consistency was higher (signifi- cantly different from chance level of 12.5%) for the two parts of parallel tests and parallel forms. Even higher percentages of consistent responses for each aption occurred with about half of these percentages above chance. The pro- portion of consistent-correct responses for parallel tests and parallel forms tended to vary with each treatment. Thus even though the absolute level of the consistency percentages was rather low, most of them were above chance, indicating that $5 cognitive structures were rather stable for the relationships within this substructure. Substructure 6: Methods of Estimating Reliability Cdefficients Twelve relationships were tested by a matching item (St. 7-lst) which consisted of five parts. One set of relationships was tested twice (parts a and e of the item). The percentage of correct patterns on acquisition- retention and the percentage of consistent responses are given in Table 16. All acquisition-retention percentages were significantly greater than the chance level of 3. 1% (at p == .05, greater than 8.8%). In fact this substructure appeared to be acquired and retained the best of any of the sub- structures. As was true with the other substructures, the parts showed a higher proportion (on an absolute level) of correct responses than the total structure. 137 Most of these percentages were significantly greater than chance (50%, at p = .05, greater than 68%). In general there did not seem to be much dif- ference in difficulty between parts a and e. No consistent differences between treaMents occurred. For this substructure the diagrams of the students' cogni- tive structures would be the same as the structure of the material. Table 16 Structure 6: Acquisition-Retention and Consistency Percentages for Structure 7- lst NR V D Acquisition- Retention Total 40-4 1 “V 55— 61 43-46 Parts a 60- 75 75- 80 69- 73 b 69- 77 97- 88 86- 79 c 76- 89 90— 93 88- 89 d 78- 78 80- 88 74- 82 e 72- 76 74- 86 90- 76 Consistency Total 39 46 54 Correct 94 96 65 Parts a 78 86 82 Correct 78 78 61 b 80 91 82 Correct 79 92 71 c 84 90 85 Correct 90 98 67 d 75 8 1 8 1 Correct 91 95 71 e 72 72 77 Correct 82 94 75 *Second figure is retention percentage. 138 In general the percentage of consistent patterns for the total structure was lower than the degree of consistency for each part. However, both the total structure and part percentages were significantly greater than chance (3. 1% and 50% respectively). The percentage of consistent-correct total and consistent-correct part patterns was high across most treatments. These high consistency figures indicated that Ss cognitive structures for this substructure were quite stable. Transfer Seven of the ten transfer items were based upon a relationship between Sbs 3 and 6. The remaining three transfer items were based on other substructures. The data for these three transfer items is presented first. . Transfer based on Substructure 1 This true-false transfer item (St 22b) was based upon the generali- zation of the importance of reliability to a correlational situation. The per- centage of correct and consistent responses is given in Table 17. This transfer item was fairly easy. Most acquisition-retention percentages were significantly greater than chance (chance = 50%, at p = .05, greater than 68%). The percentage of consistent responses was also fairly high with the maiority of these responses being the correct one. The relatively high level of performance by Ss on this transfer item was in accord with the level of performance by Ss on Sb 1. 139 Table 17 Transfer, Substructure 1: Acquisition-Retention and Consistency Percentages for Structure 22b NR V D Acquisition- Retention 70- 55 77-75 80- 80 Consistency Total 68 68 82 Correct 69 89 86 * Second figure is retention percentage. Transfer based on Substructure 2 This transfer item (St 3) was based upon the effects of the simul- taneous occurrence of unsystematic and systematic factors. It was a multiple true-false item yielding a pattern of responses. Table 18 gives the percent- age of correct and dominant responses for acquisition-retention and consistency. The proportion of correct responses was significantly greater than chance (chance = 6. 25%, at p '-‘= . 05, greater than 14%). However the domi- nant response of -1 was also significantly greater than chance for all treatr- ments, and the absolute level of this percentage was greater than that for the correct response. Thus Ss seemed to contradict themselves, marking two alternatives "true" which were logically impossible to exist under the con- ditions of the question. The degree of consistency was also significantly greater than chance (6.3%), with the prOportion of correct and contradictory 140 patterns being similar to their corresponding proportions within the acquisi- tion- retention data. Table 18 Transfer, Substructure 2: Acquisition-Retention and Consistency Percentages for Structure 3 NR V D Acquisition- Retention Correct 16—23* 18-20 20—28 - 1 70- 62 67- 62 60- 56 Consistency Total 62 54 55 Correct 14 19 24 - 1 83 81 65 * Second figure is retention percentage. This item was more difficult than originally expected, even though Ss were generally responding correctly above the chance level. The heavy proportion of contradictory response patterns indicated that Ss were confused. Very few 55 responded correctly to all relationships within substructure 2 itself, and contradictory patterns occurred on this substructure. Thus poor performance on this transfer item was consistent with findings on Sb 2. 141 Transfer based on Substructure 5 This true-false transfer item (St 17) was based on an implication made from the fact that statistical criteria is a characteristic of parallel tests. Table 19 gives the acquisition-retention and consistency percentages for this item. Table 19 Transfer, Substructure 5: Acquisition-Retention and Consistency Percentages for Structure 17 NR V D Acquisition- Retention 64- 61 59-64 73- 67 Consistency Total 55 72 55 Correct 73 65 72 *Second figure is retention percentage. The proportion of correct responses was generally at chance level (50%) across treatments on acquisition and retention. The consistency per- centages were generally also at chance (chance = 50%, at p = .05, greater than 68%). The proportion of consistent- correct responses was similar to the corresponding proportions within the acquisition and retention data. The percentage of correct responses on the item testing for the basic structural 142 relationships used in this transfer item (Sb 5, St 19 parallel tests, part a) was also at chance level. Again the results supported the hypothesis that Ss would not perform highly on transfer items unlpss they had performed highly on the structural relationships which were the basis for the transfer. Transfer based on Substructures 3 and 6 Six transfer items (St 7-2nd, 8, 9, 10, 11, 12) were based on the some structural relationships in Sb 3 and 6. The basic relationships tested, in one form or another, by these transfer items dealt with the transfer structure diagrammed in Appendix G. Structure 7-2nd.--ltem 7-2nd directly tested for each of the relationships within the transfer structure (parts a and e were duplications). Table 20 gives the acquisition and retention data for this item on both the total structure and each part. Using the total response pattern as the criterion, this item was quite difficult (Ss responded at the chance level of . 058%). Only one S answered with the correct pattern. Using 33% and 12. 5% as the chance levels for parts ade and bc respectively, Ss responded at the chance level for both parts. The most common responses made by the Ss to all five parts were rather evenly divided among VU, CU, and S. The difficulty of the item and its separate parts may have been, in part, a result of response bias. The common tendency by Ss was to indicate Transfer, Substructures 3 and 6: Acquisition-Retention Percentages for Structure 7-2nd 143 Table 20 NR V D Total 00-00* 00—02 00-00 Parts Response a VU VU 24—21 20-25 13-17 S 43-49 39-53 51-43 CU 19-20 31-14 26-27 b VU, CU, VUI CU! S S 04-04 00-07 02-04 CU 34—47 35-20 18-47 VU 28-25 35-‘38 44-27 5 19-13 12- 17 24- 18 C W, CU, VUI CU! S S 05-04 04-06 00-02 VU 19-08 21- 15 25-17 CU 23-32 33-41 32-36 S 40-44 34-29 29-36 d VU VU 28-35 23-25 25—24 CU 38- 17 23-30 35-40 S 33-32 45-29 33-28 e VU VU 28-34 35-32 13-23 CU 36- 13 31-22 27-30 S 35-40 23-27 43-41 * Second figure is retention percentage 144 onlyvone factor. The fact that parts b and c required all three factors as the correct response reduced considerably the absolute proportion of 55 who answered the item correctly. This transfer item was an integration of Sbs 3 and 6. Sb 3 was not grasped by any of the Ss, however Sb 6 was understood by the maiority. The difficulty of this transfer item then could also reflect the difficulty Ss had with Sb 3. On the basis of these results it was difficult to generalize about the cognitive structures of the Ss. However, using the percentages for the NR treatment, the cognitive structure of the students can be diagrammed as fol- lows (structure of the material is indicated with circles "0"): Cognitive Structure ISystematic Factors x O O Unsystematic Const. O X X 0 Factors Vary. O Q X 0 0 Test Internal Parallel Parallel Retest Consistency Forms Forms Immediate Delayed Consistency data for the total transfer structure and its parts is given in Table 21. The degree of consistency for the total pattern was gener- ally low in absolute terms. However for the NR and V treatments, the degree of consistency was significantly greater than chance (.058%). The degree of consistency for each part was higher on an absolute level than for the total structure, but no consistent results regarding significance occurred. For some 145 Table 21 . Transfer, Substructures 3 and 6: Consistency Percentages for Structure 7-2nd NR V D Total 10 04 00 Parts Response a VU Con 52 36 54 W 26 31 15 CU 18 17 26 S 54 45 59 b VU, CU, 5 Con 21 36 33 VU, CU, S 19 00 00 VU 17 50 38 CU 40 19 34 S 21 19 27 c VU, CU, 5 Con 25 26 34 VU, CU, S 15 11 00 VU 16 CU 37 23 22 S 47 23 62 d VU Con 27 48 27 VU 15 32 18 CU 36 20 56 S 45 43 25 e VU Con 37 50 23 VU 48 41 19 CU 17 23 22 S 31 31 60 146 treatments the percentages were significantly above chance and for others they were not (chance for ade = 33%, chance for be = 12. 5%, greater than 50% and 23% required respectively at p = .05). The consistem answers for each part seemed, in general, to be divided among VU, CU, and S, parallel to the division thatoccurred' with the acquisition-retention data. The general trend of responses found on this transfer item was consistent with previous results. Ss performance was higher within parts of the item than on the total structure itself. Performance was generally at chance level and stability of cognitive structure was also low. The general. low performance on this item was in accord with the high difficulty of Sub- structure 3. Structure 8.--As stated before St 7-2nd tested the $5 knowledge of the types of score variation that could be distinguished in the four situations used for estimating reliability coefficients. St 8 used this same basic transfer structure, but also required Ss to integrate the definition of the reliability coefficient. This transfer item asked Ss to list the types of factors which affect the reliability coefficients obtained from the four different estimation methods. Since the reliability coefficient had been defined in terms of unsystematicvariation, systematic factors were eliminated. Table 22 gives the acquisition-retention data for this item. Only one S responded correctly on all parts of the item (chance level of . 17%). In general, each part was also difficult. Ss responded at the 147 chance level of 12. 5% for parts a and d and at the chance level of 33% for parts b and c. For each part Ss responses were divided among VU, CU, and S, and the most common response was not the correct response. Again the acqui- sition-retention results for this item, the differences between the absolute difficulty levels of the parts and the whole structure, and the common wrong responses were the same patterns that occurred on most other items of this type. The difficulty of the total item was consistent with the difficulty of Sb 3. Table 22 Transfer, Substructures 3 and 6: Acquisition-Retention Percentages for Structure 8 NR V D Total 0002* 00-00 00-00 Parts Response a VU, CU VU, CU 09-12 00-02 04-05 VU 37- 10 53-40 37-33 CU 09- 14 16-11 15-15 S 21-47 22- 27 35-36 b VU VU 25-34 18-25 22-23 . CU 14-21 16-14 24—22 S 36-23 60-50 39-40 c VU VU 08-13 14-09 08-04 CU 47-46 39—41 48-49 S 23-13 39-26 26-31 d' VU, CU VU, CU 09-08 00-02 00-02 VU 35-45 52-51 54-55 CU 16-16 26-24 17-19 5 18-11 09-04 18-12 * Second figure is retention percentage. 148 Consistency data for St 8 is given in Table 23. The absolute level of the total consistency percentages was rather low, but the percentages were significantly greater than chance (. 17%). The degree of consistency for parts a and d were significantly greater than chance (12. 5%), but the percent- ages for parts b and c were at chance level (33%). The most consistent response within each part was usually not the correct response. This could reflect the fact that the correct response did not occur very frequently for the parts on acquisition and retention. The most consistent response for each of these parts was also the most common response on acquisition-retention. As mentioned before responses to all parts were divided among VU, CU, and S on acquisition and retention. However, only two or one of these responses were highly consistent. Generally‘speaking, the 5s were not using a very stable cognitive frame of reference. Structures 9, 10, 11, and 12.--The remaining four trans- fer items each dealt with the same transfer structure and definition of reliability as did St 8. The format of these items differed from the other two transfer items. St 9 was multiple true-false item; St 10 was a true-false item; St 11 was a multiple true-false item; and St 12 was three separate true-false items. Acqui- sition and retention data for all these items is given in Table 24. 149 Table 23 Transfer, Substructures 3 and 6: Consistency Percentages for Structure 8 NR V D Total* 12 14 06 Parts Response a VU, CU Con 36 43 46 VU, CU 06 00 05 VU 23 3 50 S 42 41 25 b VU Con 33 64 47 VU 50 17 11 S 26 71 62 c VU Con 38 41 32 VU 06 00 06 CU 73 77 51 S 12 d VU, CU Con 49 46 37 VU, CU 04 00 00 VU 57 73 80 *Three of these responses were S, VU, CU, VU. Four of these responses were VU, S, CU, VU. The remaining eight responses did not overlap. 150 Table 24 Transfer, Substructures 3 and 6: Acquisition-Retention Percentages for Structures 9, 10, 11, and 12 NR V D St 9 17- 15* 20-14 23-18 0 51-65 48-57 42-57 -I 31-20 32-29 35-26 St to 60-46 69-66 43-52 St 11 13-15 13—24 15-17 I 33—32 26—29 39-29 0 17-27 29-20 15-25 -I 36—25 32—25 32-35 St 12 total 47—55 48-43 38-35 a 60-65 71-60 50-46 b 78-75 72-73 67-62 c 79-92 81-84 85-90 * Second figure is retention percentage. Although based upon the some structural relationships the transfer items varied in difficulty, with St 12 (total) being the easiest, when the chance level was considered. St 12 was the only item which was significantly greater than chance (12.5%, at p = .05, greater than 24%). This high level of per- formance might have occurred because the item parts could be answered "true" or "false" on a common-sense basis. However, if Ss had been asked for the reason for their answer, and the reason then scored in terms of Ss comprehension of unsystematic variation, the item might have been more difficult. 151 (percentages for each part of St 12 were significantly greater than chance). So the apparent easiness of this item did not necessarily contradict the fact that Substructure 3 was difficult or that the other transfer items were difficult. Consistency data for these four items is given in Table 25. Table 25 Transfer, Substructures 3 and 6: Consistency Percentages for Structures 9, 19, 11, and 12 NR V D 5* 9 48 41 4.9 Correct 13 18 24 O 61 60 55 St 10 66 62 67 Correct 55 77 50 St 11 38 26 29 Correct 17 7 13 St 12 Total 51 49 57 Correct 71 51 47 a 43 41 29 b 43 38 41 c 49 40 51 The degree of consistency was significantly greater than chance for St 9, St 11, and St 12. However, the correct response was not predomin- antly consistent for St 9 and St 1 1. These two items were also difficult as indicated by the acquisition-retention data. Except for St 12, the results on these transfer items were generally in accord with the difficulty and lack of consistency found on Substructure 3 and the other two transfer items 152 (St 7-2nd and St 8). An explanation of the easiness and consistency of St12 has been given. Summary and Interpretation In general, differences between treatments did not occur. In fact, when the correct response was not the dominant one, the some wrong response(s) was dominant across the treatments. Of course, this rather striking similarity among treatments did not provide support for the maior hypothesis of the study, i.e. , that the different versions of the reliability passage would produce dif- ferent performances on the structure and transfer tests. Another rather con- sistent result was that the percentages of correct and wrong responses were quite similar on acquisition and retention for all treatments. Thus the $5 cog- nitive structures did not simplify or breakdown with time. For most items and substructures the absolute level of correct responses was higher for the parts of the item or substructure than for the total item or substructure. However, when these percentages were 'corrected for chance', this order usually did not occur, If Ss responded at a chance level for the total item they also did so on its parts. There was a tendency for diffi- cult y to be related to consistency, i.e. , the more difficult the item, the less consistent the responses from acquisition to retention. In other words, if a student did not grasp the structural relationships then his cognitive structure was apt to be more unstable than if he did understand all the relationships. 153 Transfer levels of difficulty and consistency were in accord with the related substructure levels of difficulty and consistency. This result did provide sup- port for another major hypothesis, that high performance on transfer depended upon Ss understanding the structural relationships used to generate the transfer items. Ss tended to respond to classification type items (VU, C U, and S) with only one factor, even though two or three factors were required for the correct response. This tendency may have represented only a response bias, but it could also have represented a general cognitive tendency by Ss in the process of learning (as opposed to consolidation and review) to simplify cog- nitive structures. This simplification is in accord with an information- theory learning position, i.e. , the effects of Ss coding systems and channel capacity, and the consequent limits on Ss ability to prcczess new information. Ausubel's theory of meaningful verbal learning would not directly predict these results, but simplification could be considered consistent with his viewPoint. Assuming that much of the technical material on reliability was new to the Ss, Ss would then have few relevant existing subsuming concepts for this material. However, in such situations Ausubel postulated that indivi- duals use any concepts that might be appmpriate, even though they are often inadequate. Assuming then that Ss used inadequate subsuming concepts, the very inadequacy of these concepts would probably lead to simplification of structural relationships. Inadequate subsuming concepts would then provide 154 relevant structures for only part, not all, of the new material. On the other hand, if the 55 did not attempt to learn meaningfully, i.e. , not provide any subsuming concepts however inadequate, then Ausubel would postulate that 55 would learn rotely. In order for all relationships to be acquired in this way repetitions of the material over time are required. This type of repetition was not allowed in the present study, so 55 could only master part, not all of the relationships. This interpretation of the simplification that occurred is also con- sistent with the "blueprint" theory of diagram presentation. As stated in Chapter III it was assumed that when a S encountered a diagram a relevant mediation process would be "triggered. " However, if the written material was acquired in a less than adequate meaningful way or was acquired rotely, the Ss then only understood part of the content. Thus the diagram would still evoke a mediation sequence, but the sequence would be inadequate or inappropriate. Certain types of relationships were difficult for the Ss, primarily definitional and causal. As shown in Sb 4 Ss lacked precision in making definitions, e.g. , reliability as a quantitative index, parallel and non parallel tests as related to reliability, reliability coefficients, and correlation coefficients. Ss also had great difficulty with causal relationships: classifying descriptive factors as causes (as with systematic and unsystematic variation causing variation in Sb 2 and correlation coefficient causing unsystematic 155 variation in Sb 4) and an inability to grasp multiple-causation (VU, CU, and S each had two effects, some overlapping, in Sb 3). Ss also tended to contradict themselves (-1 scores). Although such contradictions could be attributed to careless reading and test taking behavior, the cause of such responses could be more complicated. Perhaps some Ss are unable to detect contradictions in definitions of concepts, in causal relationships, in descriptions of data, etc. The implications of these findings are discussed in the last chapter. Several maior implications for this type of analysis can be given. First, improvements in present test construction by teachers can be made using such procedures. Diagrams representing structure can be applied to all subiect matter, if structure is defined as presented in this study. Such representations quickly pinpoint sequential dependencies and transfer material. Instead of the usual haphazard procedure for generating transfer items, an algorithm is provided by the diagramming procedure. The importance of diagnostic infor- mation for the teacher and researcher has already been emphasized. Second, too often tests are used only to rank students within a class. This procedure is often a result of the tests themselves; they are not deliberately structured to yield diagnostic information. With the present emphasis in education upon individualized instruction, the need for ranking students is eliminated. Instead information on what the student has and has not learned is required. The type of test construction described in the pre- sent study provides one approach to this need. 156 Third, structure tests could provide information on the way in which individuals learn and retain structures. Several areas of investigation would be the number of trials necessary for all structural relationships to be acquired, the time at which transfer is readily made, and parametric data on the retention of structural relationships after acquisition, overlearning, and application. With the type of information provided by structure tests, rather extensive data on learning and retention processes could be obtained. CHAPTER VII DISCUSSION OF MAJOR RESULTS AND CONCLUSIONS The major emphasis in this chapter is an explanation of the unexpected results that occurred for practically every major hypothesis. Then a brief presentation of the implications of the study is given. Time ‘ The major difference between Groups 0 and R was on the time required to read the reliability passage. The times for the experimental treatments in Group R varied less than the times for the treatments in Group 0, with the D treatment requiring less time in the R group and the NR treatment in the R group requiring more time. Differences in administrative procedures could account for these results. For Group 0 each treatment was administered in a separate room, while for Group R all treatments were given in the same room. Perhaps group pressures for Ss in Group R, similar to pressures in the Asch (1956) line judgement studies, could have made 55 in the NR treatment hesitant to turn in their papers early and could have influenced the Ss in the D treatrnent to read faster in order to turn in their papers with the majority of the Ss. Such pressures were not as great for individuals in Group O because 157 158 all Ss in one room had the same version of the reliability passage, allowing a greater spread among the mean scores for each treatment than for Group R. However, similar pressures could have existed for Ss in Group O. This could partially account for the- time differences among the experimental treatments within Group O itself; the group pressure yielding a smaller variance on the time scores than might be expected if the Ss were reading on an individual basis. Achievement, Structure, and Transfer As expected, no differences existed in achievement among the experimental treatments, and all treatments scored higher than a control group which did not read,the reliability passage. However, in light of the other results, the achievement data did not offer definitive evidence to support the original. hypothesis. One reason for this conclusion is that the differences between the means for the experimental and control groups, although significant, were not as large as one would have expected if the Ss had actually comprehended the material. According to test theory, most test items should be near the 50% difficulty level (adjusted for chance). Thus the mean score on a test should be approximately halfway between the chance and the maximum score. For the achievement test this score was 19 (halfway between 7. 5 and 30). However, the obtained mean score of 15 for each treatment was lower than this, 159 indicating that the test was too difficult for the Ss. In addition, the theor- etical formulation would be supported only if all the hypotheses regarding the differences between the experimental treatments on achievement, structure, and transfer were supported. The structure and transfer hypotheses were not supported. The structure and transfer data also implied that the Ss did not comprehend the material as well as had been anticipated. On the average the experimental treatment's scores on structure were higher than the control group's scores, although this difference was not as large as had been expected. However, no differences existed between the experimental treatrnents and the control group on transfer. Thus it would appear that inadequate comprehension could account, in part, for the unexpected results. Additional support for this interpretation was indicated by the questionnaire results. For both the V and D treatments Ss indicated little awareness of the relationships between test items and review sections, verbal: or diagrammatic. If the structure of the material had been comprehended (it was in effect repeated three times), more Ss within the V and D treatments would have indicated awareness of a connection between the test and the reviews. Subjects' confusion about the diagrams, as stated on the questionnaire, also indicated lack of comprehension. With- out a majority of the Ss attaining comprehension, a test of the theoretical formulation was really not obtained. Apparently one reading was not sufficient for the type of performance required by the test items. 160 Another consistent result was no retention drOp on achievement, structure, or transfer. Again this was contrary to expectation. One reason for this could be that the unique format of the test increased the memory for some items for some Ss. Secondly, perhaps one week was an insufficient period of time for memory changes to occur with meaningful material. Third, if Ss were responding near chance level a retention drop would not be expected. These reasons probably do not adequately explain the stability of the scores, but at present no other explanations are available. Although these results were unexpected, the pilot study results were as hypothesized. On the final revisions of the materials thelD Ss were performing at a higher level than the V Ssl on achievement, structure, and transfer. The average scores of the V and D pilot Ss onleach of these three variables were higher than the highest average scores on these variables from all the six treatments in the present study. Perhaps these differences could be partially explained by the differences in time spent reading the reliability passage. The pilot Ss who received the final versions of the materials took an average of 52 minutes to read the passage compared to the high average of 47 minutes for the six treatments in the present study. In~ fact, the average time for all Ss in the pilot study was slightly higher than the highest average time for the six treatrnents. Even though time was not correlated with perfor- mance in the present study, perhaps if Ss had read the material more slowly differences among treatments would have occurred and time might have been correlated with performance. 161 Sequential Dependencies I (The lack of support for the sequential dependencies hypotheses demanded further investigation. It appeared that two factors were influencing responses to the items: the format of the items and the number of relation- ships tested by the items (called information load). The structure and trans- fer items were ranked separately on both of these factors. The format rank- ings were based on the chance level of the items. For example, in a true- false pattern of six items, the probability of responding with the correct pattern would be (1/2)6 or 1/ 64. These probabilities were. then transformed to rank scores. The information load rankings were based on the number of structural relationships tested by each item. The number of relationships was taken from the test analysis (Appendix G). These numbers were also trans- formed to rank scores. It was expected that a high information load and low probability of chance success would be related to poor performance on the item. The actual difficulty level of each item was the proportion of Ss who received the maximum possible score on the item. Item difficulties were chosen as the criterion because they seemed to be a good index of student performance and by ordering items on this basis a Guttman scale was produced. The reproducibility indices obtained by plotting item difficulties against 55 scores on $1, 52, T1, and T2, were generally high (Table 26). 162 Table 26 Guttman Reproducibility Coefficients on Structure and Transfer $1 $2 T1 T2 O-NR . 83 . 79 . 87 . 85 O-V .81 .79 .89 .86 O-D .82 .80 .87 .86 R-NR -. .84 .81 .84 .82 R-V - .80 .82 .82 .81 R-D. . 80 . 83 . 88 . 82 A reproducibility (rep) index of .85 is usually considered the criterion for separating scales from non-scales (Torgerson, 1.963). Although some rep indices were'Ibelow .85 none of them were below .79. Thus the items gener- ally met the "criterion of scaleability. The correlation between format (F0) and information load (In) rankings was .726 for the structure items (n .__. 20, significant at the .01 level) and was .747 for the transfer items (n ='-= 10, significant at the .01 level). These correlations indicated that as the format of an item became more dif— ficult the information load of the item also increased. Rank correlations between the actual item difficulty and F0 and In for $1, 52, TI, and T2, were relatively high (Table 27). The consistently high relationships of F0 and In to item difficulty indicated that these factors might have had a greater influence upon student performance than knowledge acquired from the reliability passage. It also 163 Table 27 Rank Correlations: Actual Difficulty with Format and Information on Structure and Transfer SI _ $2 ___Fo_. __|n__ .E°.. . fl. O-NR .68“ .51* .73** .58“ O-V .61** .53* .59“ .51* O—D .61** .50* .72** .32 R-NR .59** .51* .64** .33 R-V .71** .59** .64** .63**' R-D .69** .44* .7 ** .63_** T1 . T2 59 In Fo it; O-NR .93** .87** .80“ . 95** O-V . 92** .79** . 93** .86“ O-D .92“ .86** .9]** .86** R-NR .88** .86** .85** .90“ R-V .89“ .80‘”r .91** .80** R—D . 85“ . 88** . 89** . 86** ** .p < .01 * p < .05 explains, in part, why performance was low. In order to have adequately tested the hypotheses about sequential dependencies between content areas, the F0 and In factors should have been controlled. This could have been done by reconstructing items where the chance level was constant or by transforming the item scores to correct for the F0 and In factors. But it is questionable that such a transformation would change the treatment effects in the present study, 164 since additional evidence (achievement variable and questionnaire data) sup- ported the idea that adequate comprehension was not obtained after one reading of the passage. Relationships among Main Variables Contrary to expectation, time was not correlated with achieve- ment, structure, or transfer for any of the experimental treatments. In general the correlations were positive, in accord with the hypothesis, but were not significant. In light of the previous analysis In and F0 appeared to be more important in determining Ss' performance. Errors on the training program were related to performance on achievement, structure, and transfer, indicating that those individuals who made errors on the training program also performed poorly on the criterion tests. Although not anticipated, this relationship is congruent with what one might expect if there is a general factor underlying performance on different tasks. Correlations of performance on initial testing and on retesting were generally high as expected, with the achievement correlations being the highest. The lower consistency of the structure and transfer scores was probably due to the confounding variables of information load and item format, which made the items more difficult and perhaps more subject to differential inter- pretation by Ss upon retesting. The correlation pattern also suggested that the 165 Ss' knowledge or cognitive structure was more clear, more stable for topics covered by the achievement items than by the structure and transfer items. This inference is consistent with the original formulation of the items. A test which thoroughly examines structural relationships and transfer based upon these relationships is more apt to reveal confusions within an individual's organization of knowledge than a test which is not so thorough. The high correlations between achievement and structure were not expected, rather structure and transfer relationships had been predicted. It would appear that the format and information load on the structure and trans- fer itemsllowered the correlations between these two variables. The high relationship between achievement and structure might be explained in terms of similarity of content tested. Of the three tests the achievement and structure tests were most similar and thus individuals who grasped the content in one area would be apt to respond appropriately on both tests. However the transfer testflexamined logically related, but slightly different areas than the other two tests. Another consistent finding was that when patterns existed between dependent variables, the pattern was not unique to a given treatment but rather was common to all treatments. In general the results for Group R were consistent with the results for Group 0 (thus the data was pooled). This lack of interaction between patterns and treatments also indicated that the treat- ments were ineffective within the time allowed. The versions of the reliability I66 passage were objectively different, e. g. , the number and form of repetitions of the structure. However, the data indicated that $5 reacted similarly to each of them. in order for differences among the treatments to occur, more than one reading would be necessary. ‘ Because of the results, it is not possible to say that the diagrams functiOned as effective blueprints of the structure of the material. In fact, the data implies the contrary, that they were not used as blueprints and probably confused rather than aided the Ss. However they did not confuse the students to the extent that the Ss' performance was below that of the other Ss in the NR and V treatments. Implications of the Study Before discussing various implications of the experimental results, the importance of the diagramming approach to structure of knowledge and cognitive structure measurement will be considered. The experimental study itself 'was not specifically designed to investigate the usefulness of diagram- ming for curriculum planning. Nevertheless the basic approach appears to be fruitful and is perhaps one of the most important aspect of the entire study. The importance of clearly defining structure of knowledge has been discussed previously. Diagramming is one answer to this problem. . Diagramming also has potential for individualized instruction, in determining the content and objectives of a course as well as testing 167 individual's knowledge. Although constructing tests based on a structural analysis is difficult, such a procedure provides an alternative to present pro- cedures including tests based on Bloom's Taxonomy (Bloom, et al, 1956). For example, the scoring procedure emphasizes dependencies between ideas, rather than ignoring possible correlations between items. Test constructors usually attempt to construct items which are independent rather than depend- ent. Such procedure seems to be contrary to what is known about how individuals store and organize information. Diagramming also provides an absolute rather than a relative criterion for evaluating student achievement. - The systematic analysis discussed in (Chapter VI)of the structural relationships which Ss understood indicates the inadequacy of most achieve- ment tests for providing diagnostic information about the individual, since this analysis showed that 55 had great difficulty with certain types of relation- ships. Such diagnostic information is usually not provided from achievement tests because they are not constructed to yield such information. However, in order to understand most subject matter disciplines, students need to grasp relationships, such as the causal and definitional ones in the present study. Perhaps lack of such comprehension is one of the reasons students have trouble with subject matter. Tests built upon a structural analysis might reveal certain aspects of cognitive processes, such as Piaget's investigations have done. In both cases there has been an attempt to look beyond an individual's knowledge of "topics. " 168 Rather than saying an individual "knows science and mathematics, it can ‘ be said that he "understands classifications and causal relationships,‘ inde- pendent of content. Of course, the difficulty Ss had with causal relation- ships ‘in the present study was tied to specific content, and only an investi- gation of other content with causal relationships would provide generality to the findings. However, the results suggest that there is need for more intensive study of the-connection between how individuals handle various types of relationships and their success in school. The unexpected results of the experimental study suggest several avenues for future research. First, one reading of the reliability passage did not produce comprehension. In studies using rather complex and lengthy material more than one reading appears to be necessary. Second, large individual differences occurred on the criterion variables. However, few background variables correlated with these scores, making it very difficult to determine the factors related to performance on the tests. Visual imagery might have been one of these factors, yet no adequate imagery tests existed. Presently the variability within each experimental treatrnent remains unexplained. Third, the difficulty that Ss had using the diagrams suggests that presenting diagrams within the material upon original learning may hinder, or at least not aid, understanding. Perhaps diagrams should be presented at later stages within the learning process. Such an approach would be partially supported by Travers, Heath, and Cohen (1968). In examining preferences for verbal, 169 graphic, and symbolic modes of presentation (the graphic being most similar to diagrams), Travers found that although students and teachers both prefered the symbolic mode, the teachers' preference for the graphic mode was con- sistently higher than the students' preference. This finding suggests that diagrams might be most beneficial when an individual has some competence within a content area. Fourth, the stability of scores over the one-week retention period suggests a need for Ebbinghaus-type memory studies on meaningful material in order to provide parametric data on retention curves for such content. Two other results suggest important methodological considerations for similar studies. Group pressures on time spent reading passages is one methodological factor. Although time did not correlate with the criterion variables in the present study, it might be an important correlate or criterion variable in other studies. The high and consistent correlations between item difficulties and ibm format and information load strongly suggest that per- formance on tests is determined by factors other than knowledge itself, and that these factors should be considered in the interpretation of achievement test performance. Finally, the systematic structural analysis indicates some inade- quacies with most learning theories. Although research on concept attainment has shown that certain types of relationships are more difficult than others (disjunctive versus conjunctive), a theoretical explanation of why this occurs 1.70 has not been given by the cognitive or behaviorist learning theorists. From the results of this study it would appear that such factors can be explored more carefully within both the behaviorist and cognitive positions. In behavioral terms, the following type of problem might be investigated: can the nature of the connection or relationships between verbal stimuli (concepts, words) explain why certain associations are more dominant or learned more quickly than others. In cognitive terms, the following question might be explored: does the nature of the relationship between potential subsuming concepts and new material determine the speed of learning and the strength of retention. If learning theories are to adequately explain the learning of subject matter, investi- gation of such questions would seem to be imperative. LIST OF REFERENCES LIST OF REFERENCES Archer, E. J. The psychological nature of concepts. In Klausmeier and Harris (Eds.), Analyses of concept learning, New York, Academic Press, 1966, 37-49. Asch, S. E. Studies of independence and conformity, a minority of one against an unanimous majority. Psychol. Monogr. , 1956, 70, No. 9 Whole No. 416. Ausubel, D. P. The psychology of meaningful verbal learning. New York, Grune and Stratton, T963. Ausubel, D. P. Early versus delayed review in meaningful learning. fix. in the Schools, 1966, 3, 195-198. Ausubel, D. P. & Fitzgerald, D. The role of discriminability in meaningful verbal learning and retention. J. educ. E1” 1961, 52, 266-274. Ausubel, D. P., Robbins, Lillian C., 8. Blake, E. Retroactive inhibition and facilitation in the learning of school materials. J. educ. Psy. , 1957, 48, 334— 343 . Ausubel, D. P. &Youssef, M. The role of discriminability in meaningful parallel learning. J. educ; Psy., 1963, 54, 331-336. Ausubel, D. P. & Youssef, M. The effect of spaced repetition on meaning- ful retention. J. gen. Psy., 1965, 73, 147-150. Berlyne, D. E. Structure and direction in think°ng. New York, Wiley, 1965. Bloom, 8. S., Engelhart, M. D., Furst, E. J., Hill, W. H., 8. Krathwohl, D. R. Taxonomy of educational objectives: Cognitive domain. New York, David McKay, 1956. Bourne, L. E. Human conceptual behavior. Boston, Allyn & Bacon, 1966, 73-79. Bruner, J. S. The process of education. Cambridge, Harvard Univ. Press, 1963. 171 172 Bruner, J. S. Some theorems an instruction illustrated with reference to mathematics. In Theories of learning and instruction, NSSE Yrbk., 1964, 305335. Bruner, J. S. Notes on the plenary sessions, Appendix B. In Bruner (Ed.), Learning about learning, a conference report, Washington, US Govt. mting Off., 1966a, 245-276. Bruner, J. S. Theorems for a theory of instruction. In Bruner (Ed.) Learning about learning, a conference report, Washington, US Govt. Printing Off., 1966b, 1.96311. Bruner, J. S., Olver, Rose R. , 8. Greenfield, Patricia M. Studies in cognitive growth. New York, Wiley, 1966. Buros, O. K. Sixth mental measurements yearbook. Highland Park, New Jersey, Grfihon Press, 1965. Buros, O. K. Fifth mental measurements yearbook. Highland Park, New Jersey, Gryphon Press, i952. Christensen, D. M. 8. Stordahl, K. E. The effect of organizational aids on comprehension and retention. J. edac. Psy. , 1955, 46, 65-74. English, H. B. 8. English, Ava C. A comprehensive dictionary of psycho- logical and psychoanalytic terms. New York, McKay, 19%. Fitzgerald, D. 8. Ausubel, D. P. Cognitive versus effective factors in the learning and retention of controversial material. J. educ. 51., I963, 54, 73-84. Flavell, J. H. The developmental psychology of Jean Piaget. New York, Van Nostrand, I963, 17— i9, 164-236. Gagne, R. M. The acquisition of knowledge. Psy. Rev., 1963, 69, 355-365. Gagne, R. M. The learning of principles. In Klausmeier and Harris (Eds.), Analyses of concept learning, New York, Academic Press, 1966, 81-95. 173 Gagne, R. M. 8. Bassler, O. C. Study of retention of some topics of elementary nonmetric geometry. J. educ. Psy., 1963, 54, 123-131. Gagne, R. M., Mayor, J. R., Garstens, Helen L. & Paradise, N. E. Factors in acquiring knowledge of a mathematical task. fiychol. Monogr., 1962, 76, Whole No. 526. Gagne, R. M. & Paradise, N. E. Abilities and learning sets in knowledge acquisition. Psychol. Monogr., 1961, 75, Whole No. 518. Ghiselli, E. E. Theory of psychological measurement. New York, McGraw- HilU T964. Goss, A. E. Acquisition and use of conceptual schemes. In Cofer (Ed.), Verbal learning and verbal behavior, New York, McGraw-Hill, 1961,72-69. Harary, F., Norman, R. Z. 8. Cartwright, D. Structural models, an introduction to the theory of directed gra—phs. ‘New York, Wiley, 1965. Hartmann, G. The field theory of learning and its educational consequences, In The psychology of learning, NSSE Yrbk. , 1942, 165-214. Johnson, P. E. Some psychological aspects of subject-matter structure. J. educ. Psy., 1968, 58, 75—83. Johnson, T. J. A methodology for the analysis of cognitive structure. Paper presented at the meeting of the American Educational Research Association, Chicago, February 1968. Lovell, K. Educational psychology and children. London, Univ. London Press, 1964, 96-99. McKellar, P. Imagination and thinking. London, Cohen &West, 1957, 19-31, 51-72. Malter, M. J. Children's ability to read diagrammatic materials. Elem. Sch. J., 1948, 49, 98-102. Merrill, M. D. 8. Stolurow, L. M. Hierarchical preview versus problem oriented review in learning an imaginary science. Am Educ. Res. J., 1966, 3, 251-261. 174 Miller, G. A. The magical number seven, plus or minus two. Psy. Rev., 1956, 63, 81-97. Morrissett, l. The new social science curricula. In Morrissett (Ed.), Conce ts and structure in the new social science curricula, New York, Holt Rinehart 8 Winston, 1767, 3-10. Newton, J. M. 8 Hickey, A. E. Sequence effects in programmed learning ofa verbal concept. J. educ. Psy., 1965, 56, 140-147. Novak, J. D. The role of concepts in science teaching. ln Klausmeier 8 Harris (Eds.), Analyses of concept learning. New York, Academic Press, 1966, 239-254. Posner, M. I. Memory and thought in human intellectual performance. Brit. J. Psy., 1965, 56, 197-215. Reitman, W. R. Cognition and thought, an information processing approach. New YorV, WiTey, 1965. Reynolds, J. H. 8 Glaser, R. Effects of repetition and spaced review upon retention of a complex learning task. J. educ. Psy. , 1964, 55, 297-308. Senesh, L. Organizing a curriculum around social science concepts . In Morrissett (Ed.), Concepts and structure in the new social science curricula, New York, Holt Rinehart 8Winston, 1967:2148. Scott, W. A. Cognitive complexity and cognitive flexibility. Sociometry, 1962, 25, 404—414. Sheffield, F. Theoretical consequences in the learning of complex sequential tasks from demonstration and practice. In Lumsdaine (Ed.), Student response in programmed instruction, Washington, Natl. Acad. of Sciences, Natl. Res. CounciT, 1961, 13-32. Sheffield, F. D., Margolius, G. J. 8 Hoehn, A. J. Experiments on perceptual mediation in the learning of organizable sequences. In Lumsdaine (Ed.), Student response in programmed instruction, Washington, Natl. Acad. of Sciences, Natl. Res. Council, 1961, 107-116. Smith, K. U. 8 Smith, Margaret F. Cybernetic principles of learning and educational design. New York, HoTt Rinehart 8 Winston, 19%, 329-352. 175 Torgerson, W. S. Theory and methods of scaling, New York, Wiley, Travers, 1963. K. J., Heath, R. W. 8 Cohen, L. S. Cognitive preferences in mathematics. Paper presented at Annual Meeting of the American Educational Research Association, Chicago, Illinois, February, 1968. Woodworth, R. S. Experimental psychology, New York, Henry Holt, Vernon, Vernon, Vernon, Zajonc, 1938, 39-47. M. D. The instruction of children by pictorial illustration. E1. J. educ. Psy., 19530, 24, 171-179. M. D. The use and value of graphic material within a written text. Occupational Psy., 1952, 26, 96-100. M. D. Presenting information in diagrams. AV Comm. Rev. , 1953b, 1, 147-158. R. B. The process of cognitive tuning in communication. J. Abn. 8Social Psy., 1960, 61, 159-167. APPENDICES APPENDIX A PILOT QUESTIONNAIRE 176 Diagram Treatment A. Interpretation of Diagrams Go back through the material on the interpretation of diagrams and cross out any sections which are badly written (not clear). Did you have trouble remembering to mark your answer in this training pro- gram? Yes No. If so, would additional reminders to indicate your response help? Yes No B. Reliability Passage Go back through the reliability passage and cross out any sections which are badly written. Did you examine the diagrams while reading? Yes No If so, did you have any problem interpreting them in the reliability passage? Yes No Did the review diagram present any problems? Yes No Did you know where to start? Yes No Did you understand the interconnections between diagrams? Yes No Did you understand each of the sub-diagrams? Yes No Additional comments: Did the diagrams hinder your understanding of the material? Yes No If so, would you have preferred a verbal statement instead? Yes No C. Test 177 In general, was the passage difficult to understand? Yes No Go back through the test and cross out any badly written items. Were the instructions for each item clear? Yes No If not, which instructions were not clear? Did any items cue off the answer to another question? Yes No If so, which ones? In general did the items seem a. difficult b. some easy, some hard c. easy Was there any particular type of item which seemed particularly difficult? Did you use the diagrams in any particular way when reading the passage and/ or taking the test? Yes No If so, please briefly describe this process. Passage: Test: If the diagrams helped you, did the small diagrams or the review diagram help you the most? Small Review No difference 178 Verbal Treatment A. Reliability Passage Test Go back through the reliability passage and cross out any sections which are not clear . Did the reliability passage seem disconnected in places? Yes No . If so, mark these places with a check . In general, was the passage difficult to understand? Yes No Go back through the test and cross out any badly written items . Were the instructions for each item clear? Yes No If not, which instructions were not clear? Did anyitems cue off the answer to another question? Yes No If so, which ones? In general, did the items seem a. difficult b. some easy, some hard c. easy - Was there any particular type of item which seemed particularly difficult? Did you notice the small review passages placed within the text as well as at the end? Yes No If so, did these help you while reading and/or taking the test ? Yes No 179 If the passages helped you, did the small passages or large review help you the most? Small Review No difference No-Review Treatment A. Reliability Passage Test Go back through the reliability passage and cross out any sections which are not clear . Did you use any particular process while reading the material ' in order to understand it? i In general, was the passage difficult to understand? Yes No Go back through the test and cross out any badly written items . Were the instructions for each item clear? Yes No If not, which instructions were not clear? Did any items cue off the answer to another question? Yes No If so, which ones? In general, did the items seem a. difficult b. some easy, some hard c . easy Was there any particular type of item which seemed particularly difficult? I I Howdid you recall the material when taking the test? APPENDIX B TREATMENT QUESTIONNAIRE 180 Questions Common to All Treatments 1. Was the content of the reliability passage (in general) new to you? -Yes ' No - 2 . Did y'ou enjoy reading the reliability passage? Yes No 3. Which parts of the reliability passage were difficult to understand? II I. 'l I | Mark the‘ ane‘s which apply. importance of reliable tests ° Distinction between ‘systematic and unsystematic factors ~ Z'Distjnction between constant unsystematic and varying vat-tsyster'natic factors ,Methods of estimating reliability coefficients . concepts of reliability coefficient and correlation coefficient 'lParalleI formsof a test versus parallel tests Additipnal QJesfions - DiagEm Treatment I. 2. Did you examine'the small'diagrams presented within the reliability passage? ' Yes No - If you answered "yes" to the above question, did you have trouble interpreting these small diagrams within the passage? Yes No . If so, why. did you have problems? Did you exbmine the large review diagram at the end of the reliability passage? Yes No ‘ If you answered "yes" to the above question: a. Did you have trouble inte'rpreting the six sub-diagrams? Yes No 181 If so, why? b . Did you examine the interconnections between the sub- diagrams? Yes No If so, did you examine the connections systematically OR in a more or less random, non-orderly fashion? (underline the mpropriate description) 3 . How did you use the small diagrams when you read the passage? (Mark the statements which apply) As a repetition of the previous material To integrate the previous material As a check on what you had previously learned (read) To organize the material As a way to remember the material in a spatially organized (visual) form, rather than or in addition to a verbal form 1 Other use(s) offdiagrams 4. How did you use the diagrams when answering the test items (this applies to both the first and second testing periods)? Mark the ones which apply . Visualized the related diagram to an item and 'l'ead off " the answer from it Instantly recognized that a diagram had dealt with the topic covered by an item but did not visualize the diagram Vaguely remembered that a diagram had dealt with the topic covered by an item , Did not recall while answering any of the questions that diagrams could have been specifically related to the items Ill Other use(s) of diagrams 182 Additional Questions - Verbal Treatment I . Did you examine the small review passages presented within the reliability passage? Yes No to Did you read the large review section at the end of the reliability passage ? Yes No (a) How did you use the review passages when you read the passage? (Mark the statements which apply .) As a repetition of the previous material To integrate the previous material As a check on what you had previously learned (read) To organize the material As a way to remember the material in a verbal form Other use(s) 4 . How did you use the review passages when answa'ing the test items? (This applies to both the first and second testing periods.) Mark the ones which apply . Instantly recognized that a review passage had dealt with the topic covered by an item Vaguely remembered that a review passage had dealt with the topic covered by an item Did not recall while answering any of the questions that a review passage could have been related specifically to the items Other use (5) APPENDIX C DIAGRAM INTERPRETATION TRAINING PROGRAM 1 83 Page 2 Suppose we read the following passage . Most achievement tests can be included in one of two categories, objective or essay . With an essay test a student is required to plan his own answer and express it in his own words, whereas an objective test requires him to choose among several designated alternatives . Through the use of a Venn diagram we can easily represent this idea of two types of achievement tests . Types of Achievement Tests / Objective Essay D \ The ellipse represents the entire class of achievement tests, i.e. , allpossible achievement tests . It was then divided or partitioned into two parts (of course, it could have been divided into more than two parts) representing the two types of achievement tests, objective and essay . With this type of diagram we have illustrated the idea of classification (smaller categories grouped under larger ones). Please turn to the next page . 184 Page 3 Suppose we had the following Venn diagram. Objective Tests Multiple True Matching Choice False '- Using this diagram we could then say that (choose one alternative a. All multiple choice tests are objective . b. All objective tests are true-false . If you chose alternative 2 turn to page 5 . If you chose alternative 2 turn to page 4 . Please remember to mark your choice throughout the passage . Page 4 No, a true-false test is only 2'2 type of objective test. The diagram shows that the category of objective tests is rather broad, being partitioned or split into three types of tests; true-false, matching, and multiple- choice . Therefore not all objective tests are true-false ones . The three types of objective tests are nonoverlapping or distinct from one another as indicated by the partition lines (refer back to page 3 if necessary). Now go on to page 6. Page 5 185 Yes, the Venn diagram represents the idea that we can classify objective tests into three different types; multiple-choice, true-false, and matching. These types are nonoverlapping or distinct from one another as indicated by the partition lines (refer back to page 3 if necessary). Now turn to page 6 . Page 6 Now examine this diagram. Achievement Tests ‘ Objective Essay . [Multiple True Matching Short Long K Choice False Answer Answer Using the above diagram it would be correct to say that a . At the lowest level there are five types of achievement tests, three of which are essay and two of which are objective . b. If an individual wanted an essay achievement test, you could give him a short answer test. If chose 2 turn to page 7. If chose b turn to page 8 . Page 7 It is correct that at the lowest level of achievement tests there are 186 Page 7 - continued five types, 'but these are grouped differently; three are objective (multiple- choice, true-false, matching) and two are essay (short and long answer) not vice versa. The darker line in the center of the ellipse separates objective from essay tests . In turn these two partitions or parts are then divided further by the use of lighter lines within each part. (Refer back to page 6 if neces- sary) . Now proceed to page 9. Page 8 Yes, the diagram shows that one of the two types of essay tests is short answer, the other being long answer . The broad category is first split into objective and essay, represented by the darker line in the center of the ellipse. Each of these types is then divided further, symbolized by the lighter lines within each of these larger parts - three types of objective tests and two types of essay tests. (Refer back to page 6 if necessary) . Now turn to page 9. Page 9 This basic idea of classification and separation into smaller cate- gories could be extended and varied indefinitely and is not limited to the exact examles given here . For instance, consider the classification of rocks. 187 Page 9 - continued Rocks Igneous Sedimentary Metamorphic J'- Extrusive Intrusive Plutonic ‘ _ 3' o 9. r :3 2° 9: 9 grlcgssrt goat; =' 3' a: 8- 3' 8§=2.£&%3on° Inma: a _, s0 ~< _' O 0 -r- O O O _. -' 8 a N 83 -v- 0 0 Q ‘0 0 .0' o —e a a -r a a a .. a q a a a Q. a Here there are three great classifications of rocks; ' igneous, sedi- mentary , and metamorphic, with igneous then being split into three other basic types. Finally at the finest level of classification there are 22 classes; six with- in metamorphic, seven within sedimentary, and nine within igneous. Go ahead to the next page. Page 10 Venn diagrams can also be used to indicate overlap among concepts . Suppose we look at some factors which are often a part of intelligence tests; verbal comprehension, general reasoning, and spatial orientation . We could define verbal comprehension as including cognition, meaningful material and units of thought; general reasoning involving cognition, meaningful material and systems of relationships; spatial orientation as involving cognition, figures, and systems of relationships . 188 Page 10 - continued The relationships between verbal comprehension and general reasoning can be represented as follows: VC GR w_ Units of Systems of Thought —" ' “ I ' Relationships Cognition and Meaningful Material Verbal conprehension and general reasoning overlap because they both involve a . cognition and meaningful material b . systems of relationships If you answered 2, turn to page 11 . If you answered I_J_, turn to page 12 . Page 11 Yes, the brackets indicating verbal comprehension and general reasoning overlap in the rectangle marked cognition and meaningful material. The diagram also represents information about the other two areas - units of thought is not a part of general reasoning and systems of relationships is not a part of verbal comprehension . (Turn back to page 10 if desired) . Now turn to page 13 . 189 Page 12 No, the brackets indicating verbal comprehension and general reasoning overlap in the rectangle marked cognition and meaningful material, not the one marked systems of relationships. In fact, systems of relationships is not a part of verbal comprehension, while units of thought is not a part of general reasoning . (If desired refer back to page 10) . Turn to page 13 . Page 13 Letting circles instead of rectangles (the type of shape used in a Venn diagram is not crucial to understanding the material represented by it) stand for these aspects of intelligence, we can represent the relationship among all three as follows: VC A so We G. ' We can then say that all three overlap in the area of a_. cognition 2 (C2) b. cognition and systems (C-S) If chose 2 turn to page 14. If chose 2 turn to page 15. 190 Page 14 Yes, all three circles intersect in this area . Note that Cl represents the overlqa between verbal comprehension and spatial orientation, while C-S (cognition-systems) represents the overlap between general reasoning and spatial orientation . Refer back to page 13 if necessary . Now turn to page 16 . Page 15 No, C-S represents the overlap between general reasoning and spatial orientation only (only two circles intersect in this area). Instead C2 represents the overlap of all three - verbal comprehension, spatial orientation and general reasoning (all three circles intersect here). Note also that Cl represents the overlap of verbal comprehension and spatial orientation . Refer back to page 13 if necessary . Now please turn to page 16 . Page 16 Other types of diagrams could represent the same material . Both .the rock and test information could be represented by a "tree" graph . For exanple with the achievement tests we could have 191 Page 16 - continued Achi ement Tests /Obiect;ve\’ \E?s Multiple Trl-e Ma ching Sh6rt/ <1xng Choice False Answer Answer Here the lines represent the breakdown of a large class into smaller ones . This division occurs at each level of classification, proceding from a broad, inclusive category to narrower, less inclusive ones . In other terms, the names of tests can be represented by points with lines between these points representing how these tests are related to the classi- fication of tests at the next higher and/or lower level . So here we have points with lines between them; the lines arranged in a certain manner . Go to the next page . Page 1 7 The basic idea of points with lines connecting them is what we might generally call a line graph, e.g., ? . 9, It can be used to represent many different things such as flow charts in chemistry, time lines in history, cause-effect sequences, etc . The preceding tree graph 192 Page 17 - continued illustrating a way of classifying tests would be called a type of line graph . Since you can define both points and lines as you wish , i.e . , have them repre- sent a variety of concepts and relationships, this basic type of diagram is quite flexible. The following exarrples illustrate some ways in which it can be used . Turn to the next page. Page 18 Consider the topic of geological subsidence or sinking of lands . Geological subsidence or sinking of lands results from ' "tmping the earth for oil or gas. Near Long Beach, California . the land above the Wilmington oil field sank' until it had become - a bowl up to 26 feet deep over an area of 22 square miles . The slow subsidence of the land ruined buildings, cracked pavements, twisted railroad tracks, wrecked bridges, sheared off oil wells, and did extensive damage to a power plant and the Long Beach Naval Shipyard resulting in total damage of about $100 million . The explanation for such phenomenon is as follows. Liquid 1 or gas is generally drawn from a stratum of porous rock whose pores are filled with the fluid under pressure. If the rock is well can- solidated (if its grains are well cemented together) it will usually continue to support the weight of the rock and earth on top after the fluid is withdrawn . However, if the fluid-holding rock is a poorly consolidated, easily-molded sandstone, once the support- ing pressure of the fluid has been withdrawn from its pores the pressure of the overburden compacts the rock, and the ground above subsides by the amount by which the rock is compressed . Other factors besides the mechanical strength of the fluid-contain- ing rock may contribute to subsidence . For exanple, subsidence is more likely if soft, clayey material (which is easily compacted) is present in or next to the fluid stratum . . 193 Page 18 - continued One way to diagram the relationship between subsidence and its causal factors is as follows, where " " represents the concept of cause . Compactable material in or next to the fluid stratum Pressure of land above oil or gas Poorly Soft , field consolidated clayey rocks Material \Swsidence We can say then that pressure of the overlying land is the only reason for the occurrence of subsidence . True or False If answered true, turn to page 19. If answered false, turn to page 20. (Please mark your answer to each question) . Page 19 No, the two arrows pointing to the word "subsidence" indicate two major factors, not one, leading to geological subsidence . Just pressure from the land above is not'enough . Compactable material is therefore the other factor. Note that in this big diagram the classification of corrpactable materials into two types, clay and poorly consolidated rocks is represented by a Venn diagram . Thus either type can contribute to subsidence . (Refer back to page 18). Now turn to page 21 . 194 Page 20 Correct. The two arrows pointing to the word "subsidence" indicate two major factors, not one . Note, that in this diagram the classifi- cation of compactable materials into two types, clay-and poorly consolidated rocks, is represented by a Venn diagram . Thus either type c'an contribute to subsidence; although we would say that in general there are two major factors causing subsidence, corrpactable materials and pressure. (Refer back to page 18. if necessary) . Please turn to page 21 . Page 21 Let us consider another situation - the various piston strokes (intake, corrpression, power, exhaust) in a gasoline engine . The sequence of events can be briefly described . . The "intake" stroke is downward with the intake valve ' open to let the gasoline-air mixture into the cylinder . On the upward "compression" stroke the fuel mixture is compressed. The sparkplug ignites the fuel when the piston head is at the top of the compression stroke . The expanding gases then deliver - a "power" stroke pushing the piston downward . The upward "exhaust" stroke pushes the burned gases out the exhaust valve . The crankshaft, changes the up and down (reciprocating) motion of the piston to turning (rotary) motion and vice versa . We could diagram these cause-effect relationships as indicated on the next page, where " l " essentially represents "causes." 195 Page 22 Crankshafl} Exhaust Intake "Intake" / Rotation valve Valve Downstroke of close‘B/Ston Fuel mixture Drawn into cylinder "Corrpression" upward stroke of piston Sparkplug Intake arc\ Valve __ closed Ignition‘l’ of fuel Expansion and burning of gases "Power" downward stroke _ __ __ __ __ of piston Burned gases Exhaust "Exhaust" valve upward stroke of piston open \Release burned gases We can then say that the rotation of the crankshaft has a direct part in causing every type of piston stroke except the 196 Page 22 - continued a . compression stroke b. power stroke If answered g turn to page 24 . If answered b turn to page 23 . Page 23 Yes, this is correct because there is no arrow from the words "crankshaft rotation" to "power stroke" indicating that the direct cause br this stroke is elsewhere . (Refer back to page 22 if necessary .) Turn to page 25. Page 24 No, the direct solid arrow from the words "crankshaft rotation" to "corrpression stroke" indicates that rotation of the crankshaft serves to push the cylinder upward, in this case the compression stroke. There is no line (arrow) ' indicating another cause for this stroke . (Refer to the words "power stroke,‘ back to page 22 if necessary) . Turn to page 25 . 197 Page 25 True-False question We can. say that the intake of the fuel mixture into the cylinder chamber was; due to the open intake valve only . Refer back to page 22 . If answered true, turn to page 26. If answered false, turn to page 27. Page 26 No, the multiple number of arrows or lines focusing on "Fuel mixture drawn into cylinder" means that there are several joint reasons, not just one, for this event. Now'turn to page 28 . Page 27 Yes, you have correctly interpreted the multiple number of arrows (lines) focusing on 'Fuel mixture drawn into cylinder" as representing several joint reasons, not just one, for this event. Now turn to page 28 . 198 Page 28 In the diagram representing the piston strokes of a gasoline engine, several arrows from different points converging on one point meant that all these conditions were necessary to cause this one event, not just one condition alone . If we had a situation where several independent conditions could cause the same event, i.e. , each cause it without the others being present, we would probably diagram this situation as follows: A K 1 (0r) D Here A or B or C alone could cause D as indicated by the word "or" in parentheses . Please turn to the next page . Page 29 Suppose we look at this same type of diagram in the context of historical material, for instance the naming of North and South America after Amerigo Vespucci rather than Columbus . We might present in a written pas- sage the position that Columbus believed he had found India or China while Vespucci was the first to realize that the body of land was really a new con tinent . Various events in Vespucci's life led up to his exploration of the new 199 Page 29 - continued world, including an interest in and a study of geography and his conviction that there was a southern route to India if you only went far enough south . Several sequences will be examined here (a) a timeline of important events, (b) name of the new world, (c) beliefs of the location of the water route to India and (d) why the new world was named after Vespucci . This diagram is sinply a time line of events. The relative dis- tance between the horizontal lines represents the ordering of events, spaced on the‘vertical line which represents the continuum of time . Columbus 1, Vesp . trip to Spain —— Columbus 2, The "1" and "2" refer to trips to the new world . Vespucci 1 Vespucci 2 Name America Go ahead . Page 30 If we examine a sequence of names for the new world we find 200 Page 30 - continued No name China- India America Here we can simply let the lines "-———> " stand for the word "then" or "to," i.e. , at first it had no name then China-India and then America . A similar interpretation applies to the sequence dealing with people's beliefs of a location of a water route to India. Straight west of Europe Vespucci: further south than the West Indies Vespucci: ‘I’south of the Amazon River Vespucci: further south than southern Argentina With this diagram we could then say that the water route to India a . was believed to be below southern Argentina and later south of the Amazon River . b. was believed to be south of the West Indies then south of southern Argentina If answered 2, then turn to page 31 . If answered 2, then turn to page 32 . 201 Page 31 You have apparently confused the order of events . The line or arrow " l " represents the word "then ." Thus if we have the sequence, A—-> B, it means that A is followed by B, not precedes 8. Therefore in this diagram the direction of the arrow means that belief in the south of southern Argentina route followed the south of Amazon River belief. (Refer back to page 30 if desired) . Please turn to page 33. Page 32 You have interpreted the diagram correctly . The line or arrow " l " represents the word "then ." Thus when we have the sequence A-———> B, it means that A is followed by 3, not precedes B. This is the situation concerning the belief in a route south of the West Indies and the belief in it further south than southern Argentina (being respectively "A" and "B".) (Refer back to page 30 if desired .) Please turn to page 33 . Page 33 The last sequence is that of why the new world was named after Vespucci . 202 Page 33 - continued Vespucci: interest and knowledge in geogrcphy and cosmography V: doubted Columbus's reports that he had found China and India V: first sail to new world V: kept accurate maps, trip 1 V: i thought water route to India further south than where he had been - Amazon River V: wanted second trip to find this route V second trip V. ms of second trip {'l': first to question if land was Asia or India V: first to assert land was a new continent lilame new'continent "America" after Vespucci From this diagram information we could say that America was named after Amerigo Vespucci because a. Amerigo Vespucci made very accurate and detailed maps of new land on his second trip b. Vespucci was the first to assert that the land Columbus had found was a new continent. If answered 2, turn to page 35 . If answered b, turn to page 34 . 203 Page 34 Yes, an arrow leading directly from one event to another implies a causal relationship. This is not the case if another event intervenes between the two and there is no arrow directly connecting the first with the last event. (Refer back to page 33 if desired). Now please turn to page 36 . Page 35 - No, because of these accurate and detailed mqas he began to question if this body of land was a new continent. This is indicated by the arrow directly relating to these two events . (Refer back to page 33 if desired) . The reason for the naming was because Vespucci was the first to assert that this was a new continent (represented by the arrow directly relating to these two events). If an event intervenes between two others then the first event does not cause the last one unless there is a direct arrow connecting the two . Now please turn to page 36 . Page 36 Finally, we can intefconnect all of these diagrams on the basis of time - see page 37. Here the same diagrams are reproduced with dashed lines drawn among them to provide a direct connection with the time line . 204 Page 36 -~continued From considering this more complex diagram we can say that a. Columbus's first trip and Vespucci's belief that the southern route to India was south of the Amazon River coincided . b. Vespucci's belief that the route to India was south of southern Argentina occurred on his second trip . If you answered alternative 3, turn to page 38 . If you answered alternative b, turn to page 39. 2CM5 woozqmm> cmucm =aowcms<= ucmcwpcoo 3m: mEmZAIII \ pcmcwpcou 3a: m was ucm— “comma op “memo “> \r mwm< we: ucmp cw cowummzc op one?» "> \ awcu ocm co moms ”> 8 aces ecu ">:II muaoc ucwm cu avg» cam caucus ">I\sw., coma on: m; acmgz can» cozom cospcac muaoc mwucH ocozozu u> moms mumcauom coax n> morcwe< mcwucmoc< :cmcuaom can» gusom cmguczw u> .m :o~me< so guzom u> III: mowcms< mEmz IruomF III. N causamm>trpomc P wuuzumm>II¢mcp upcoz 3m: co mem amp “> 2 chCH ecu mccgo smoccc smog cage oczo+ pogo mucoomc .Fou omppaoo ">III gpaom cmgpczc xgamcmoEmoa ocm acomcmoma cw mmompzocx new pmmcmucw Qmm> Housamu> mmhu< omz13 "muzmscmm 4 Tmmep mzHJ msHH 206 Page 38 No, Columbus's trip was before . The cross-connecting dashed lines as well as the area between them indicate events on each separate sequence which occurred at approximately the same time . Following the line directly across from "Columbus 1" (on first sequence) to 'location of water route to India" sequence we find that before Columbus's first trip the water route was believed to be straight west of Europe, but after the trip Vespucci concluded that it was south of the West Indies, and only even later that he believed it was south of the Amazon River . (Refer back to page 37 if neces- sary) . Proceed to page 40 . Page 39 Yeal The dashed lines and the area between them indicate events on each separate sequence which occurred at approximately the same time . You correctly related the events on the "location of water route to India" sequence and the "causal sequence ." (Refer back to page 37 if desired). Proceed to page 40 . Page 40 The preceding examples have illustrated only a few variations of a line graph . Other concepts such as "equivalence, mplication," "greater 207 Page 40 - continued than," etc. could all be represented by such lines. We will now look at one more type of diagram . This type of diagram is what might be labelled a table. It is quite appropriate for showing descriptive relationships, characteristics of certain obiects, etc . Consider the official languages of the countries in the Western world . (”Country" refers here to even a nation which is not independent). We might divide the countries into the fairly large categories of ”North, South, and Central America . A table of the countries and corresponding languages could then be constructed with the columns being the languages and the rows being the countries or vice versa . The result might look something like that on the next page. The "X" essentially represents the fact that a certain language is the official language of a certain country . Page 41 208 Spanish English French Portuguese Dutch North United States , America Canada Mexico Central America Linnnmg Brit . Honduras M Honduras X N icprggua Costa Rica El Salvador Cuba Dominican Rep . XXXXX Haiti Jamaica Trinidad & Tobago Puerto Rico South America _'__Bt.a..zi_l - - I Paragggy ‘ Argentina X thle Uruguax Peru . Ecuador Bolivia Columbia Venezuela XXXXXXXX Brit. Guiana Surinam French Guiana 209 Page 4l - continued True-False question The dominant language in Central and South America is Spanish while in North America it is English . lf answered true, turn to page 43 . If answered false, turn to page 42 . (Please indicate your answer to each question) . Page 42 No, note that of the three countries in North America two of them have English as an official language while in South and Central America practically all of the countries list Spanish as the official language. This is indicated by the checks ( X ) for a language in correspondence with the classification of the countries . (Refer back to page 4] if necessary) . Proceed to page 44 . Page 43 Yes, this is indicated simply by the number of checks for any language while at the same time considering the classification of countries that is given . Proceed to page 44 . 2l0 Page 44 True-False question No country has more than one official language . (Turn back to page 41 for the table) . If answered true, turn to page 45 . If answered false, turn to page 46. Page 45 This is incorrect. To determine this simply check each row, i.e. , each country, to see if one or more than one check appears . If this is done it._ can be seen that Canada and Puerto Rico each have two official languages. (See page M if necessary). Turn to page 47. Page 46 Yes, Canada and Puerto Rico each have two official languages. As you know, all that is required to determine this is tocount the number of checks for each row, i.e. , each country . Go to page 47. 2]] Page 47 ' Actually such a table reflects the historical heritage of many of these countries. lf'this diagram were related to other historical events, also in diagram form, we could have a more complete picture of the reasons for the official language(s) of each country . We're almost done . Proceed to the next page . Page 48 As a last exanple of the table form of diagram consider the types of rocks referred to before: igneous, sedimentary and metamorphic . Each can be described in a very broad way as a product of a formation process. The following table summarizes this information . 'Types of Rocks Igneous Sedimentary Metamorphic . . Molten rock which has cooled X . and hardened X ' Rock grains locked together by pressure and cementing material X I Rock with changed mineral content A The check ( X ) represents the fact that a given type of rock 212 Page 48 - continued a. can be defined in terms of certain characteristics b. causes certain characteristics If you answered 2, please turn to page 50. If you answered b, please turn to page 49. Page 49 No. The check represents the fact that each type of rock is defined ,by a certain characteristic, these being the end product of specific formation processes (not specified here) . A rock type does not cause certain characteristics, but rather is defined in terms of them . Turn to page 5l . Page 50 Correct. The check essentially is representing a descriptive or defining relationship . e .g. , metamorphic rocks are ones with changed mineral content. The reason for each type of rock having certain character- istics is that it is caused by a formation process (not described here) which has resulted in a certain end product. Turn to page 5l . 213 Page 51 Only one check in each column means that the definitions of the rock types (refer back to page 48 if necessary) . a. overlq) b. do not overlap If 2, turn to page 52 . If 9, turn to page 53. Page 52 No, the definitions do not overlap . If they did this fact would be represented by more than one check in each column . In most cases the pattern of checks is quite inportant in interpreting tables; both the presence and omission of a check having inplications for the subiect matter at hand . Go to page 54 . Page 53 Yes, if the definitions did overlap then there would be more than one check in each column. In most cases the pattern of checks is quite inport- ant in interpreting a table; both the presence and omission of a check having implications for the subiect matter at hand . Go to page 54 . 214 Page 54 You have now concluded a program on the nature and inter- pretation of various diagrams which can be used to clarify and represent ideas presented in written material . Not all variations of these three basic types of diagrams have been presented, but these other variations would sinply be extensions of the basic principles already presented . Of course, small diagrams can be connected to other small ones resulting in a rather large more complex diagram form . In general, the diagrams presented here have been rather simple in structure . Thank you . APPENDIX D OUTLINE OF RELIABILITY PASSAGE lll. IV. VI. VII . 215 General definition of reliability A. Reliability applied to education Importance of reliability in testing A . Differences among individuals, same test B. Assignment of individuals to groups C . Prediction D . Differences among traits of an individual Systematic variation and unsystematic variation Systematic factors and unsystematic factors More precise definition of reliability A . Parallel tests B. Correlation coefficient C . Reliability coefficient JD . Relationship between correlation and reliability coefficient v Unsystematic factors A . Varying unsystematic factors ‘ B. Constant unsystematic factors C . Comparison of varying and constant unsystematic factors D . Comparison of constant unsystematic and systematic factors Methodsof estimating reliability coefficients _ A. Test-retest B. Parallel forms C . Internal consistency APPENDIX E RELIABILITY PASSAGE 2l6 One of the important aspects of any measuring instrument is its reliability. Reliability of measurement refers to the consistency with which an instrument measures whatever it purports to measure . Obviously all measur- ing instruments are not perfectly accurate or consistent. Error is unavoidably involved in any measurement, but the goal of measurement specialists is to reduce these errors to a minimum . To the extent that these instruments deviate from yielding perfectly consistent measurements, i .e . , their scores vary un- . systematically,.they are said to be unreliable . Thus the measurements from a wooden ruler are apt to be more reliable than those from a rubber one, since the later measuring instrument fluctuates with the temperature, tension, etc . , yielding inconsistent results. Dl In the field of education, reliability usually refers to the con- sistency with which a test measures whatever it purports to measure . Generally this consistency reflects the degree to which (the test may be considered stable or may be depended upon to yield similar test results under similar circumstances . Tests may be, achievement, aptitude, or personality measures. These tests give us quantitative‘descriptions of individuals (a score) in terms of the extent to which individuals possess or manifest various traits or abilities. For instance a high score means that an individual possesses more of a certain trait (for example, happiness) than an individual who.has a lower score. Ordinarily we are interested in these quantitative descriptions or scores because of their usefulness in permitting us to make comparisons among individuals on a given 217 trait and within individuals on different traits, for predicting other types of behavior, and for evaluating the effects of various factors upon an individual's performance . As a consequence when we measure an individual, we hope to obtain a score that will give us a precise characterization of him. ”we administer the same test several times to an individual, we may observe unsystematic variation or little self-consistency in his scores . For example, if we give a psychological test to an individual on several different occasions, he might obtain scores of 83, 65, 75, 89, 80. The degree of self-consistency among the scores earned by an individual is termed reliability of measurement or simply reliability. When scores are not self- consistent it) means that we cannot depend too much upon any single score earned by an individual since on another application of the same test he might earn quite a different score . Unreliable scores are of little value when we wish (a) to compare two or more individuals on the same test, (b,- to assign individuals to groups or classes, (c) to predict other types of behavior, or (d) to conpare different traits or abilities of an individual . Let us consider examples of these common uses of tests and the importance of reliability in these situations . The extent to which we are willing to trust the difference between the scores earned by two individuals on a test as reflecting a real or stable difference between them in the trait being measured by the test is a function of the reliability of that test. Sometimes we wish to know whether one Person 218 is superior to-another in the traits or abilities measured by a particular test. If we know that people vary in their scores from one repetition of the test to another by as much as 10 points and the difference between the scores of two persons is 30 points, we probably should be willing to conclude that one per- son is indeed superior to the other and the difference would hold even if we tested them on another occasion . We should know that if we administered the test a second time to the same two people, the individual who was superior on the first occasion undoubtedly would be superior on the second . The one who was superior on the first application of the test might earn a score as much as 10 points lower on the second test and the one who was inferior on the first test might improvehis score by as much as 10 points, but there would still be at least a 104point difference between the two individuals in their scores . However, we should not be so willing to say that the one person is superior to the other if we found that peoplelvary as much as 20 points from one applica— tion of the test to another. In this case, if we tested both persons twice, the individual who earned the higher score on the first application of the test might well earn the lower score on the second application . D2 Reliability of measurement is an important consideration in terms of the precision with which individuals. can be assigned to groups or classes . Suppose that pupils in a school are to be placed‘ in reading sections on the basis of the scores they earn an a reading achievement test, with those earning scores of 60 and above being assigned to the accelerated section, those with 219 scores of 50 to 59 to the average section, and those with scores of 49 and below to the retarded section . Now further suppose that variation in scores of as much as six points occurs when an individual takes the test a number of times . A pupil who earns a score of 55 on the test will be assigned to the average section . However, if he had taken the test on. another occasion he might have earned a score as high as 61 and have been assigned to the accelerated section, or he might have earned a score as low as 49 and have been assigned to the re- tarded section . Because of the degree of unsystematic variation in individual '5 scores the-degree of reliability of measurement of this test is insufficient to assign pupils to sections with very much certainty . On the other hand if the variation among scores in subsequent administrations of the test is only one point, then a very large proportion of pupils can be assigned with a high degree of certainty . D3 The accuracy of prediction from one variable to another is limit- ed by the degree of reliability with which these variables are measured . Scores on tests often are used to predict other types of behavior . For example, scores on intelligence tests commonly are used to predict success in academic work . If the particular intelligence test used in making predictions of this kind happens to be highly unreliable, then from the) score earned by an individual on one occasion a high degree of scholastic success might be anticipated for him, but from the score he earned on another occasion just the opposite conclusion might be reached . It would, therefore, be difficult under such circumstances to make 220 predictions with any satisfactory degree of certainty . D4 The confidence we place on the differences among the scores earned by an individual on different tests is a function of the degree of reli- ability of those tests . In certain circumstances it is necessary to know on which of two traits an individual is superior . For example, as an aid to counseling a student we may wish to know whether he is superior in mechanical or in cler- ical aptitude . Suppose when we apply tests of mechanical and clerical aptitude to the same individuals, scores in each test vary as much as 20 points from one application to another . If we administer both tests to a student on a single occasion and find that on the mechanical aptitude test his score is 70 and on the clerical aptitude test his score is 85, we can not say with much certainty that his mechanical aptitude is superior to his clerical aptitude . However if the variation in scores on repeated applications of the tests were only five D5 points, we should be much more willing to draw this conclusion . Sub-D1 As stated previously since no testing instrument is perfectly reliable, errors and variations in measurement will occur . There are two maior types of variation in scores earned by an individual over repeated testing . Systematic variation is characterized by a systematic change in score, while unsystematic variation is characterized by random and unsystematic fluctuations in scores. When scores exhibit unsystematic variation the test is not measuring accurately and has low reliability . It means that we cannot depend too much upon any single score earned by an individual since on another application 221 of the. same test he might earn quite a different score . We must examine both types of variation so that we can differentiate unsystematic from systematic variation in order to further develop the concept of reliability . D6 Systematic variations in scores are characterized by an orderly progressiomor pattern, with the scores obtained by an individual changing from one occasion to the next in some trend . Systematic changes appear as a regular increase or decrease in scores or they may appear to follow some cycle. Suppose we examine the different scores obtained by an individual on successive applications of the same test . A trend might appear . Thus if we measure the height of an individual at different hours of the day we are likely to find that from the morning to evening the values become smaller and smaller. 7 We might attribute this phenomenon to a gradual sagging of the back- bone . Similarly if we administer the some arithmetic test over and over to the same individual, his scores may gradually increase. This would suggest that the serieslof testing situations operate as practice periods and the individual is gradually improving his skill in solving arithmetic problems . Unsystematic variation, on the other hand, is characterized by a complete lack of order . The scores of an individual fluctuate from one occasion to the next in a completely haphazard manner . For example, if we have an individual react as quickly as possible in a specified manner to each stimulus in a series of stimuli, we shall find that some of his responses are more rapid than others . When we corrpare the times taken to respond to stimuli that occur 222 early in the series with those that occur later, we may find that on the average they are thesame . We might attribute this variation in reaction-time scores to unsystematic moment-to-moment changes in the environmental conditions, in the smoothness of the operation of the reaction-time apparatus, in the indi- vidual's motivation, and in his attention . D7 Various factors can influence a particular kind of variation in test scores. These factors can be classified as systematic or unsystematic according to whether their effects on test scores are systematic or unsystematic . I A systematic factor is one which produces systematic changes in scores. When systematic factors are at work, scores show a regular arrange- ment , an order. Learning, training, and growth produce regular and progress- ive increases in scores . Fatigue, forgetting and old age result in regular and progressive decreases in scores. Mood and living habits may produce regular cyclical changes in scores. An unsystematic factor is one which produces unsystematic changes in scores. Scores fluctuate in a random fashion and do not manifest any consis- tent pattern . Moment-to-moment variations in attention result in random fluctuations in reaction time . An inconsistent and balky pen sometimes permits the student taking an exam to write easily and on other occasions slows the speed of writing. The marks given to an elementary school pupil as he progresses through the various grades are sometimes higher and sometimes lower, depending upon whether the teacher to whom he happens to be assigned tends to be lenient 223 or strict in the evaluation of pupils' performance . D8 The factors which affect scores seem to be almost infinite in number and variety . An individual 's performance is a function of the numerous qual- ities with which he was endowed at birth, elaborated upon by the process of maturation and by his numerous experiences, together with the many environ- mental influences operating upon him at any given moment. The inferences we draw in attempting to explain variation in scores are a function of the know- ledge we have about these factors . In some instances our inferences have quite substantial foundation because our knowledge is direct and extensive . In other instances our knowledge may be indirect and not complete so that we are less sure of our inferences. Finally we may have such limited knowledge about conditions that our inferences are little more than guesses . In the following situation we can be fairly sure of the factors which are operating. Suppose we give an individual a test of knowledge of French vocabulary and find his score is zero . We then have him take an elementary course in French and retest him. Now his score is higher . He continues to take more and more courses in French and after he completes each course we again administer the test. Undoubtedly we shall find a continuous increase in his scores, and with a high degree of certainty attribute his increase in scores to the training to which he has been deliberately subiected . At the other extreme, under some conditions inferences about operative factors may be guesses . Suppose we have before us determinations 224 of the intelligence of a child from several different testings and note that they were substantially lower when the tests were administered during the summer months. We have no knowledge whatsoever about the state the child was in when he was tested nor of the conditions prior to or coincident with the different administrations of the test. There are a variety of inferences we might make to explain the variation in scores . One which might appear rea- sonable to us is that this child's performance on intelligence tests is influenced by the degree of intellectual stimulation he receives, so that during the summer months when he is away from school his scores are lower . This accounts for the changes. but is only a guess . 1 Having examined the kinds of variation that occur in scores and the type of factors that cause them, we are now in a position to define reli- ability more precisely. We shall do so in terms of the extent of unsystematic score variation and the concept of parallel tests . Let us refer to reliability as the extent of unsystematic variation in the quantitative description of the amount of some trait an individual possesses or manifests when that trait is measured a number of times . This definition follows from the fact that the problem of reliability of measurement arises out of the unsystematic variation in scores earned by an individual when we obtain a number of measurements indicative of the degree to which he possesses or manifests some particular trait or quality . Therefore, reliability of measurement pertains to the precision with which some trait is measured by 225 means of specified operations . Basic to any formal mathematical statement of reliability is the concept of parallel tests. In essence, reliability can be defined as the extent of unsystematic variation of an individual '5 scores on a series of parallel tests . Parallel tests refer to a number of operations or tests all of which follow from a particular definition of a trait, and therefore measure the same trait to the same degree. Certain statistical criteria must be met in order that a given trait is measured exactly the same . Theoretically to ascertain the extent of unsystem- atic variation parallel tests are needed, i.e . , a series of scores on the same trait. It is not necessary 'to always use the same device or test, nor do we have to deny usage of the same measuring device or test. All that is needed is tests or operations which evoke the same psychological processes . D9 It is through the use of parallel tests that we are able to ascertain the extent to which we are measuring a trait reliably . Suppose we have a series of parallel tests, k in number and we have scores on all these k tests for one or more individuals. If we were measuring with perfect reliability then any given individual would obtain precisely the same score on all the k parallel tests . There would be no variation at all in his scores over the k tests . On the other hand, if we were measuring with less than perfect reliability then his scores would be different on the different parallel tests, the variation among his scores being completely unsystematic . The less the unsystematic variation the greater the reliability of measurement and the greater the unsystematic variation the 226 less the reliability of measurement. We have seen that reliability of measurement refers to the extent of unsystematic variation in an individual 's scores over parallel tests . The next task is to set up an index which gives a quantitative description of the extent of such variation. Such an index will be useful for comparing different tests so we can ascertain which gives us the most precise or stable scores, and will permit us to ascertain whether the reliability with which a test measures is sufficient for our purposes . Theoretically the reliability coefficient is a quantitative index of the extent to which scores on any one parallel test can predict scores on any other . When the unsystematic variation in an individual 's scores over parallel tests'is great, this means that the prediction of scores on one parallel test from scores on another is poor. On the other hand, if there is no unsystem- atic variation at all among an individual 's scores, then it means that we could predict perfectly on individual 's score on one parallel test from his score on another. Casting reliability in terms of the coefficient of correlation be- tween parallel tests provides a quantitative way of describing the precision of measurement. Essentially, a correlation coefficient expresses the degree of correspondence or relationship between sets of scores . A correlation coefficient can be computed between sets of scores from any combination of tests, one may be an achievement test and another a personality measure . 227 However, when defining the reliability coefficient we restrict the classifi- cation of tests that are correlated to parallel tests as defined previously . We then define the reliability coefficient as the correlation coefficient between parallel tests. When the correlation coefficient is low it means' that an individual '5 scores over k parallel tests show a great deal of unsystematic variation, and when it is high it means that an individual's scores on k parallel tests are. very nearly the same . Let us consider the concept of correlation D10 further . Sub D-2 Suppose we have a set of scores from a group of individuals (A) on a couple of tests, represented by the symbols X and Y, which are as follows: Group A Scores on Scores on Persons Test X Test Y IA 2 3 2A 2 2 3A 4 4 4A 4 .5 5A 6 6 6A ‘ 6 6 7A 8 7 8A 8 8 9A 10 9 10A 10 10 If we plot the scores of each person on these two tests on a graph, where each point represents the two scores of an individual, one on Test X and the other on Test Y, we then have the scatter diagram in Figure 1 . 228 Test Y 12 fill 10 I oo 1 Figure 1 . '- Group A 6 - PLILIIIIJILII I 6 8 IO 12 ‘TestX If we use the same tests X and Y on another group of individuals (B), we might obtain the following set of scores and scatter diagram (Figure 2). Group B Scores on Scores on Persons Test X Test Y 9 10 1B 2B 3B 4B 5B 6B 7B 8B 9B 10B OOQmQOA-hNN carom-poauoo 229 Test Y 12 F (D I 7 I Figure 2 . Group B o l l IlTi I g I l I l I. It I l l I 2 4 6 8 10 12 TextX .Followingithe same procedure with another group of individuals (C) we mightobtain another set of scores and scatter diagram (Figure 3) . Group C - Scores on ‘ Scores on . Persons Test X Test Y 1C 2 3 ‘ 2C. 2 8 3C 4 6 4C 4 4 5C 6 3 6C 6 5 7C 8 7 8C 8 1 ' ~9C 10 2 10C 10 6 230 TestY 12 P 10 I- I- 8 r-v Figure3. - ‘ GroupC 6 L. . 4 t- 2 h- ‘ -.| 1 1 L 1 l i 1 I I 1 2 4 6 . 8 10 12 ' TestX 6 0 With group A the order of individuals on one test is quite simi- lar to their order on the other test. That is, if an individual scores high (low) on one test he'lscores high (low). on the other . With group B the order of the individuals on one test is practically the reverse of the order on the other test; if an individual scores'high (low) on one test he scores low (high) on the other. However.,with group'Cuthe order of individuals on one test is not at all similar to the order on the other test; if an individual from group C scores high (low) on one test ,. he could score either high or low (low or high) on the other test (refer back .to the appropriate scatter diagrams for clarification if needed). I We can say then that there is a high relationship or correspondence between scores on the two tests, X and Y , for groups A and B, but a low relationshiplbetween them for group C . That is, for groups A and B if the 231 score of an individual on one test is known his score on the second test can be predicted with a high degree of accuracy . But for group C such a prediction would be quite subiect to error . The correlation coefficient is the quantitative measure which reflects this degree of relationship between sets of scores . It is symbolized by rxy where x and y refer to the two correlated tests . The corre- lation coefficient itself can range numerically from +1 .00 to -1 .00 (both termed high correlation coefficients) where +1 and -1 represent perfect rela- tionships between two tests similar to that illustrated in the scatter diagram of groups A and Birespectively . On the other hand a correlation coefficient of 0.00 reflects norelationship at all between two tests, such as that illustrated by the scores and scatter diagram of group C . It will be recalled that the correlation coefficient between parallel tests is termed the reliability coefficient. As such a high correlation coefficient means a high reliability coefficient and a low correlation coefficient means a low reliability coefficient. Now let us assume that for groups A and C the tests X and Y represent parallel tests measuring the same trait. As stated before there is a high correlation between X and Y for group A. In other words, there is little unsystematic variation in scores over the parallel tests for group A resulting in a high correlation coefficient and, therefore, a high reliability coefficient. Given a specific score on test X for group A the variation of scores on Test Y is quite small (little unsystematic variation), therefore, the corre- lation between test X and Y is high for group A. For exanple, if an individual 232 in group A receives a score of 8 on test X, then he is apt to receive a score between 7 and 8 on test Y (see page 228 for scatter diagram) . However, the situation is different for group C on test X and Y . Here there is a low correlation between scores; there is much unsystematic variation over the parallel tests, lowering the correlation coefficient and hence the reliability coefficient for this group. Given a specific score on test X, the‘variation of scores on test Y is quite large (much unsystematic variation), therefore, the correlation between tests X and Y is low for Group C . For exanple, if an individual in group C receives a score of 3 on text X he may receive a score anywhere between 1 and 7 on test Y (see page 230 for scatter diagram) . D1 1 formal Brer£7 In order to develop our notions of reliability so that we can consider practical ways for measuring its extent, we shall have to examine in more detail the'nature and effects of unsystematic factors. Previously we have distinguished. between consistent trends in scores that are attributable to un- systematic factors. We have defined reliability theoretically in terms of the extent. of unsystematic variation in individual‘s scores over repeated testing . Now we can separate the class of unsystematic factors into two types, varying unsystematic factors and constant unsystematic factors. £ng - D3 varying unsystematic factors refer to those whose effects are different for the same individual on different occasions and are also different 233 for different individuals on the same occasion . Constant unsystematic factors refer to those whose effects are different for the same individual on different occasions but are the same for all individuals on the same occasion . Thus the difference between these two types of unsystematic factors lies in their effects in a single testing occasion . Let us examine varying unsystematic factors first . The effects of varying unsystematic factors are different for the same individual on different occasions . Hence the score of an individual over a number of occasions is sometimes higher and sometimes lower as a result of varying unsystematic factors . In addition, the effects of these factors are different for different individuals on the same occasion, tending to increase the scores of some individuals on that occasion and to lower those of others. Some of these influences are in the testing situation itself and others are ascribed to the individual. Let us first examine the different sources of varying unsystematic variation that are in the testing situation itself. For exarrple, some persons may be fortunate enough to sit in comfortable seats while they are taking a test, whereas others may find themselves in uncomfortable seats. Those near the window work under conditions of good illumination whereas others may find themselves in the far corners, operating under the handicap of poor light- ing . Because. of their nearness to or distance from the test administrator, some hear the instructions clearly and others do not . These situations illustrate that 234 varying unsystematic factors have different effects on different individuals on the same occasion . When the measuring instrument is a rating procedure, differences among raters produce variations in scores. The individual who hwpens to be rated by a lenient rater is likely to receive a higher rating than one who is rated by a strict rater . Assuming the rating is. done at differ- ent times, this situation illustrates that varying unsystematic factors have different effects on the same individual over different occasions . Because the individual himself changes in unsystematic ways, he too is a source of random variation of this type . His motivation, fatigue, nervousness ,‘ interest, and distractibility may be to one degree on one occasion and to another degree on another occasion, thus having different effects on the same individual over many occasions. Different individuals taking a test at a given time also vary among themselves in these some respects . The motivation of some happens to be high at the time of testing, whereas that of others happensltorbe low; and some people happen to be rested, whereas others happen torbe tired, illustrating varying unsystematic factors having different effects on different individuals on the same occasion . 013 The other type of unsystematic factor, constant unsystematic, is unsystematic in its general influence over a number of testing occasions but yet operates in the same fashion for all individuals at a given time . When constant unsystematic factors are operating the scores of all individuals on one occasion may be higher or lower than their scores on another occasion, 235 with the scores varying in a non-orderly , random fashion . For exanple, we may have a speed test with a 10 minute time interval. Sometimes through erroneous reading of the timing device the test administrator may shade the 10-minute interval by several seconds, while on other occasions he may un- knowingly’be several seconds too generous in his timing . On any given :I occasion the time, though in error, is the same for all individuals being tested . Sometimes when a test is administered the lighting may be poor throughout the entire te‘stin‘g'room. and on another occasion it may be excellent. While from one testing session to the next there (are variations in quality of illumination, on any onelpccasion it is the same for all individuals. .When. individuals are given a testr.in'jthe:morning they may all be fresh and rested and consequently earn higher scores on it than when they take the test in the evening and are all tired . In these situations on any one occasion constant unsystematic factors effect all individuals in the same manner, e.g. all had time cut short, all had poor lighting . Yet across many occasions the effects of constant unsystem- atic factors ’are‘different for the same individual, varying in an unsystematic fashion . For exarrple the lighting conditions over many occasions may be as follows: poor'light, really bad light, excellent, almostldark, mediocre, horrible light, etc . - with no predictable order as to the exact lighting con- dition on any given day . Dl4 . As stated before the primary difference between constant and varying unsystematic factors lies in their effects in a single testing occasion; 236 " constant unsystematic factor effects cannot be detected on one occasion be- cause theylhave the same effect on all individuak, on that occasion, while varying unsystematic factors can be detected since they have different effects on different individuals thus producing random variation in the scores . However, over different occasions both types of unsystematic factors have different effects on an individual's score;. This latter condition, in fact, is exactly what they have in common and why they are both labelled unsystematic . This distinction between constant and varying unsystematic factors needs to be specified further . It .is clear that both types of factors produce unsystematic, non- orderly variations in an individual '5 scores on different occasions . Therefore, if we administer a test to a single individual on many occasions, we cannot distinguish between the two types of factors on the basis of the scores alone . Suppose we administer a test to only one person at a time but to each person we administer the test a number of times . If we have n people and we administer the test k times to each person, we then have nk occasions on which the test has been administered . On each of these nk occasions the effects of varying unsystematic factors are different and also the effects of constant unsystematic factors are different. Hence we could not distinguish their effects and we could not ascertain which type of factor is determining variation among scores or whether both are at work . If the various testings of the subiects are randomly distributed among the k occasions the constant factors operate in exactly the same manner as the varying factors, because the constant unsystematic factors 237 do not have the same constant effects upon all individuals, i.e . , each indi- vidual being tested on a different occasion rather than all tested on the same occasion. Therefore, all individuals have equal likelihood of being tested under favorable and unfavorable conditions of constant as well as varying factors. We might conclude that constant factors should belclassified with systematic rather than with unsystematic fa ctors; since they overlap in function, operatingsimilarly on the same occasion, i.e . , both having the same effect on all individuals on that occasion . However, they are different. The reason for this separate cl'assificiation is that constant unsystematic fa ctors'cause the scores of an individual to vary in a random and unpredictable fashion from occasion to occasion, whereas systematic factors produce systematic and predictable changes over occasions. Since from one occasion to another constant unsystem- atic factors operate in a random fashion for a given individual they are classed as unsystematic . However, systematic factors have the some effects on an individual on different occasions. To illustrate this, if a systematic factor has a facilitating effect on one occasion it will also have a facilitating effect on the following occasions. For exarrple, if tests are always given at the same hour in the morning and we assume that students are fresh at this time; then time of day is a systematic factor affecting everyone the same on one occasion ‘ D 15 and across occasions as well . ' Sub - D4 We have defined reliability as the extent of unsystematic variation 238 of an individual '5 scores over a series of parallel tests . The quantitative index of this amount of variation is the reliability coefficient, i.e. , the correlation coefficient over parallel tests . In a practical situation we rarely have parallel tests available but we usually do have several means of admin- istering non-parallel tests by which we can estimate the reliability coefficient. Any estimation procedure will give us only an approximation, not an exact determination, of the reliability coefficient. There are three basic methods that are used for estimating the reliability coefficient of tests . They are (a) test-retest: estimation from the correlation coefficient between scores on repetitions of the same test, (b) parallel forms: estimation from the correlation coefficient between scores on parallel forms of a test and (c) internal consist- ency: estimation from correlation coefficient among comparable parts of the test. In the discussion below only group testing, not individual, is considered . 016 The first method for estimating the reliability coefficient is called the test-retest method . A certain test is administered two or more times to the same group of individuals, and the intercorrelations among the scores on the various administrations are taken as the reliability coefficient. With tests of aptitude, personality, and achievement the test ordinarily is administered only once so that only one estimate of the reliability coefficient is obtained . If the test is administered several times, the usual practice is to take the average of the intercorrelations among the scores on the various occasions as the estimate 239 of the reliability coefficient. 017 There are two main advantages with the test-retest method . Nothing in addition to the test itself is required . The particular sample of items or stimulus situation is held constant, thereby testing the individuals with pre- cisely the same instrument. A The most serious disadvantages with the test-retest method lie in the variety of carry-over effects from one testing occasion to another . Some- times there are practice effects so that on subsequent occasions scores increase in a systematic fashion . The individual may learn the specific content of the test or develop improved approaches or attitudes toward the material so that his scores increase . In some instances these practice effects are different for different individuals. Of two people who obtain precisely the same score on the first occasion, one may discover certain general principles that help answer the questions in the test or may even rehash or rehearse the material during the interval between the first and second testing . Therefore, on the second testing occasion the scores of one individual may be improved and that of the other may remain the same . If the correlation between the scores on the two occasions is low, we do not know whether the test is unreliable or whether differential. systematic factors have been at work . On the other hand, if the coefficient between scores on the two occasions is high, then it would seem that factors having differential effects. are not very important and the correlation we obtain might be considered to be something like a lower limit of the reliability 240 coefficient. This would be true, of course, only if we could rule out on the retest the effects of remembering the response made on the first test. In other instances there might be a specific carry-over effect in terms of remembering on one testing occasion the response given on an earlier one and merely repeating these responses . In an attitude test on the first testing occasion a person answers "indifferent" to the question "Do you approve of labor unions?" and remembering this on a second occasion, he again responds in the same fashion . Having assigned his subordinate Joe Smith the rating of "superior" in January, a factory foreman does so again in June when he is called upon to rate him in order to demonstrate that he is consistent in his appraisals . These specific carry-overs from one occasion to another may not be deliberate on the part of the individual; indeed, he may be completely un- aware of them . Their presence in the test-retest method may give an overestimate of reliability . They introduce a false consistency in scores . One troublesome problem with the test-retest method has to do with the time interval between testing occasions. We expect lower and lower esti- mates of reliability as the time interval between the testing occasions increases, because the longer the time interval between the two testing occasions the greater the likelihood that the individual will change . Yet in order to minimize the effects of memory, it is desirable to maximize the interval between testing occasions. Therefore, the correlation between scores on two occasions reflects the ability of individuals to remember, as well as the reliability of measurement. 241 The second method of estimating reliability is that of parallel forms. Parallel forms of a test should not be confused with parallel tests . Parallel forms of. a test are tests similar in content and nature designed to measure the same traits . Parallel tests are not necessarily similar in content and nature . As stated before, parallel tests measure the same trait and must meet certain statistical criteria, which we have not specified here . If a series of parallel forms of a test meet these criteria they are also parallel tes ts. But if they do not, they are only parallel forms . To illustrate the concept of paral- lel forms of tests as tests which are similar in content or nature, consider these exanples. Two obiective tests might have the same kind and number of items. I II An item in one parallel form of an arithmetic test might be '27 + 83 = , and an item in‘another form might be " 48 + 72 = . An item in one form of an inventory designed to measure emotional stability might be ”Do you sleep well at night?" and an item in another form might be "Do you have bad dreams at hight?" Sub-05 Having available two or more parallel forms of a test, we take as an estimate of the reliability the intercorrelations among the scores on the parallel forms. If there are more than two forms available the common practice is to take the average of the intercorrelations as the estimate of the reliability coefficient. The intercorrelations among the tests reflect not only the degree of reliability of measurement but also the extent to which the tests measure different traits, since the various forms of a test do not contain precisely the 242 some material . Hence we might say that the method of determining reliability from the intercorrelations among parallel forms of a test gives estimates that are too low. . The carry-over effects from one test to another are minimized be;- cause the content of parallel forms is not precisely the same . In many instances there will be no specific carry-over effects at all, because there is no oppor- tunity to memorize specific responses made to an earlier form . However, there is still the possibility of general carry-over in terms of modes of response, attitudes toward the material and the like . Ordinarily when the method of ' parallel forms is fused to estimate reliability of measurement, the various forms are administered on different occasions, termed parallel forms-delayed; although sometimes theyare administered on the same occasion, termed parallel forms- immediate . ~ I ‘ D18 The lastmethod of estimating reliability, internal consistency, involves only a single administration of a test. Under such circumstances we can obtain an estimate of the reliability of measurement if we consider the test not as a single test but rather as the sum total of a number of parallel forms of a test. Suppose we have an obiective test comprised of 100 items all of which pertain to the same trait. Instead of saying that we have one test of 100 items we might say that we have two tests each of 50 items or four tests each of 25 items or 100 tests each consisting of one item. Having two or more parallel forms available, we can now proceed to estimate reliability coefficients 243 by the method of correlation between scores on parallel forms . Note that we do not have the reliability of a test of 100 items but rather the reliability of a shorter test. If we do not feel that the shorter tests adequately sample the trait we wish to measure, we can find the reliability coefficient of the total test by various statistical methods. Usually theltest is divided into two parts. A problem arises about splitting the test. With a 100 item test we could take the first 50 items as one half and the last 50 as the other half, or we could take the odd-numbered items as one half and the even-numbered ones as the other hdlf . This last procedure, the odd-even method, is the one generally used since it controls for any systematic factors operating during the testing period that change the performance from early in the testing session to later periods; an example of such a factor is fatigue . In order to maximize the probability that the two halves measure the same trait sometimes the division is made on the basis of an analysis of the content of the items, making sure that both halves contain items of the same sort . The prime advantage of determining reliability coefficients by this method is its simplicity. A test need be given only once to a group of individ- uals; repetition of the test or parallel forms are not required . The method is not applicable to certain types of tests which are an integrated whole and cannot be divided into separate and equivalent parts, as is the case with D 19 speed tests . Sub-D6 244 We have developed the concept of reliability both theoretically and practically. We have seen that reliability plays an important role in the practical application of test results . Yet it serves only as a necessary not a sufficient condition for quality in a test. That is, we could be measuring something with high reliability but which is trivial . On the other hand, if a satisfactory'reliability is not achieved nothing has been measured very pre- cisely . It Review Diagram APPENDIX F DIAGRAMS AND CORRES PONDING VERBAL STATEMENTS 245 DI Degree of Unsystematic Variation in Scores High Low Lowl Reliability lHigh The degree of unsystematic variation in scores and the degree of reliability may range from low to high; a high degree of unsystematic varia- tion yielding low reliability and low degree of unsystematic variation yield- ing high reliability . DZ Low Reliability High *7 _ Differences Among lndividualsl ' -, UnstableV on Same Test * Stable . Reliability may range from low to high and differences among individuals on the same test range from unstable to stable; low reliability yielding unstable differences while high reliability yields stable differences. D3 Low Reliability High 1 Assignment of Individuals L: Uncertain to Groups ertain Reliability may range from low to high and assignment of 246 individuals to groups range from uncertain to certain; low reliability yield- ing uncertain assignment while high reliability yields certain assignment. D4 Low Reliability High Inaccurate (y Prediction V Accurate Reliability may range from low to high and prediction range from inaccurate to accurate; low reIiabiIity yielding inaccurate prediction while high reliability yields accurate prediction. D5 Low Retiability High Differences Among Traits Unstable « of an Individual V Stable Reliability may range from low to high and differences among traits of an individual range from unstable to stable; tow reliability yield- ing unstable differences while high reliability yields stable differences. Sub-D1 In summary then, we have the following: 247 Degree of Unsystematic High 7 Variation in Scores Low Low Reliability Differences among Individuals f— Unstable on Same Test Stable -— Assignment of Individuals uncertain to Groups Certain Inaccurate Prediction Accurate Differences among Traits I— Unstable of an Individual 4 Stable —-J' The degree of unsystematic variation in scores and reliability can range from low to high. A high degree of unsystematic variation yields low reliability, unstable differences among individuals on the same test, uncertain assignment of individuals to groups, inaccurate prediction and unstable dif- ferences among traits of an individual. A low degree of unsystematic varia- tion-yields high reliability, stable differences among individuals, certain assignment, accurate prediction, and stable differences among traits. D6 Types of Variation in Scores C Systematic Unsystematic) '- w There are two types of variation in scores, systematic and unsystematic. D7 248 Systematic Unsystematic Variation Variation Type of Orderly X Pattern Score , Complete ' Arrangement Lack of X ‘ Order Systematic variation is characterized by an orderly pattern of scores and unsystematic variation is characterized by a complete lack of order in score arrangement . DB Systematic Factors Systematic Variation in Scores Systematic factors cause systematic variation in Unsystematic Factors Unsystematic Variation in Scores systematic factors cause unsystematic variation in scores . D9 0 Measure Same Trait Meet Statistical Criteria / s4s31 I9|l°J°d scores and un- Reliability of Measurement Extent of Unsystematic X Variation in Individual '5 Scores \Parallel X Tests 249 D9 (Continued) Reliability of measurement is the extent of unsystematic variation in an individual '5 scores over parallel tests . Parallel tests measure the same trait and meet statistical criteria . D10 Reliability Correlation Coefficient Coefficient Quantitative X X Index Degree of Unsystematic Variation in X . x . lndividual's ' I . - , 'Scores ' Fists jParallel X gX L ot-parallel ‘X "The‘reliability coefficient is a quantitative index of the degree of unsystematic variation in an individual 's scores over parallel tests. The correlation coefficient is a quantitative index of the degree of unsystematic variation in an individual's scores over parallel and not-parallel tests. Sub-D2 In summary, we can then describe reliability, reliability co- efficient, and correlation coefficient in terms of the following characteristics: 250 Sub-D2 (continued) Reliability Correlation Reliability ' Coefficient Coefficient Quantitative X X Index Degree of Unsystematic Variation in X X X Individual '5 Scores Fsts I Parallel X . X X |Not-para1131 . x - .. (Reliability is the extent of unsystematic variation in an individual's scores over. parallel tests . The reliability coefficient is a quantitative index of this degree of unsystematic variation; the correlation coefficient is a quan- titative index of the degree of unsystematic variation in) an individual '5 scores over parallel: and not-parallel tests. D11 Correlation Coefficient on Parallel Tests I- Reliabirity Czefficient J ngh ' Low The correlation coefficient on parallel tests is the same as the reliability coefficient, both ranging similarly from low to high . Dl2 Types of Unsystematic Factors @nstant I Varying) . There are two types of unsystematic factors, constant and varying. 251 Sub-D3 At this point we can now briefly re-examine the types of factors and their respective outcomes (with the diagram given below) . Systematic Unsystematic Factors Factors l 7. Constant lVaryIng I . Systematic Unistematic V iation' Variation Co’stant Va rng . Type of q ,Orderly x ' Pattern Score CompTete 7 _ ' : Arrange- ‘Lack of * X I X I ment 'Order Systematic factors cause systematic variation in scores, charac- terized by an orderly pattern of scores; and unsystematic factors cause unsystem- atic variation characterized by complete lack of order. Constant and varying are the two types of unsystematic factors yielding constant and varying un- systematic variation respectively . Now let us examine the two types of unsystematic factors further . DI3 L_ Effects lndividual(s) Occasion} , ame Ditterent Same Different Same Differen v Varying X X X l nsystematic Factors X I X X 1 (Read across each row. )- 252 D13 (continued) Varying unsystematic factors have different effects on different individuals on the same occasion and have different effects on the same indi- vidual on different occasions. Dl4 ffects Individual Is) chasion . All Different Same Same Different onstant Unsystematic Factors X X X ”Constant unsystematic factors have different effects on the same individual on different occasions did have the same effect on all individuals on the same occasion . D15 Effects erent Occasion erent Individual 5 emat c Factors X Systematic factors have the same effect on all individuals on the same occasion and have the same effect on the same individual on different occasions . 253 Sub- D4 Systematic Factors Unsystematic Factors Constant Varying Same X erent Same Di erent ect ndividuals Same Different Systematic factors have the same effect on all individuals on the same occasion and the same effect on the same individual over different occasions. Constant unsystematic factors also have the same effect on all individuals on the same occasion but have different effects on the same individual on different occasions. Finally, varying unsystematic factors have different effects on different individuals on the same occasion and different effects on the same individual on different occasions. D16 Methods of Estimating the Reliability Coefficient Test Parallel Internal Retest Forms Consistency There are three main ways of estimating the reliability coef- ficient; test-retest, parallel forms, and internal consistency. 254 D17 Type of Test Number of Times Administered Time Of 1'95?an Test Admin . Identical Similar Same Different . Once More than Occasion Occasion Once Test x x X Retest The test-retest method involves administering the identical test on different occasions . Sub-D5 ' Parallel Parallel Forms ' . ~ Tests of a Test lways Similar X in Content Measure Same [Trait X X " Meet StatisticaT X - riteria Parallel tests measure the same trait and meet statistical criteria . Parallel forms of a test are always similar in content and measure the same trait . 255 DIB Type of Test Time of Number of Times Administered Testing Test Admin . Iden- Sar‘ne, Different Once More than _ . tical Similar Occasion Occasion once :arallel er. X X X orms Imm . X X X The parallel forms—delayed method involves administering similar tests on different occasions . The parallel forms-immediate method involves administering similar tests on the same occasion . D19 Type of Test I Time of Number of Times Administered Testing Test Admin . dentical Similar Same Different Once More fin ~ Occasion Occasion once Internal X X X ConsistencyL The internal consistency method involves administering the identical test once . Sub-D6 All of these methods of estimating reliability coefficients can be characterized as follows: Sub-D6 - continued 256 Parallel Parallel Test I nternal Forms Forms Retest Consistency Immed . Delayed Identical X X —Similar X X Same Time of £ccasion X X Testing glfferent X X ccaslon Number of Once X Times Test More than dministered Once X X X Test-retest involves administering the same test on different occasions; internal consistency method involves giving a test once; parallel forms-delayed involves administering similar tests on different occasions; parallel forms-immediate involves administering similar tests on the same occasion . Instructions preceding the review diagram On the next page is a large diagram which reviews and integrates the diagrams presented in the test into six sub-diagrams . lnterconnections be- tween concepts in these sub-diagrams are indicated . The sub-diagrams are numbered and a suggested order of progressing through the entire diagram is given . While interpreting the chart diagrams, read down the columns rather than across the rows . 257 I— oZ-um .. .3332: cc Co 3.2» nee-a 86:28:“. . 02395 . 39.33 I. .8. 3660.5 _ 3953:: I33» 7:...- uoz x o H..- uez 3n.» n n x «28¢. «Ls—33v... $3.30 5333: : x u x 5 do, Sagas”... yo or. :8 II I. : _ «2.3m “no.— Iem co 2332!: aeol- ooucocots . 03325 : u x 2.3!»..25 acct—teen 223308 a 83:85 3:33.... 3.333. N _ I I I I 5:. I 8.. 58...”. x s _ II I // I I I 9 3 a a: has!» : IAII n v w v . 2.23:8 3:33.... _ 95: x x I I I : / w r v 5:. 7 as» 3...... s. 8.. 2.3.80 5 x I t co 233:8”. 8328.39 5...; 5.. p a 8mm» . I ye also; 3:95.. 8.. . 3.33 E 832.5. . =3: l 7:3: 655935 e .298 .32 05 / 386.: 33:. 62.25.. as... no» :2: o > 93 : no. vim : 7. ~.— [8.3 2:32 o.— .5 E 18.53.. .5 2338 3 ace. u :- soy .8951: .5 E 338:3 in .3203 88 e283 «8335.8 836.: .1. x x x :2... use: :5! .8363 9.3.8 .5 5:25 a... use» to: x 88 annexe...» v..- .l..o.:. .353 5:. mew: :25 x . 2L5“. 8.3: 3:9. x x 'Iemrl Ce 2: :23» x x gate—m -5! : an... : u x 393.33 ye en».— .. .320 a... z . t u x 3 .63 8.3.5 r a“. an"... ta... 5... A. ... a... o a... o q... of c .. 8 e .:..:. .8. up a 2.32.. :25 m a: be» 2.3.28 852:» mas—opvhg huzwnuv—l I I I fit 0 alumna o in). 9.5323 »o 385.: I I I 3.833875839 x x 3882.. 7.5338 . .5 3 m _ Inn x x x 'm x x p 2 o: 1.3» 239.8 m x 2.25... 3.13:2. x x I8 x K “GP—8‘ E §&U.h EUIK use: a x x e.g.-.35.... e.g.-32.... 81...; 2.388 E391 E366 c H.383»; 6533:... II II II 258 (Verbal Review) The degree of unsystematic variation in scores, correlation coefficient on parallel tests, reliability coefficient and reliability can range from low to high . The correlation coefficient on parallel tests is the reliability coefficient. A high degree of unsystematic variation yields low correlation coefficients, low reliability coefficients and low reliability, while a low degree of unsystematic variatign yields high correlation coefficients, high reliability coefficients and high reliability . Low reliability yields unstable differences among individuals on the same test, uncertain assignment of individuals to groups, inaccurate prediction, and unstable differences among traits of an individual. High reliability yields stable differences among individuals, certain assignment, accurate prediction and stable differences among traits . Systematic factors cause systematic variation in scores, characterized by an orderly pattern of scores; and unsystematic factors cause unsystematic variation characterized by complete lack of order in scores. Constant and varying are the two types of unsystematic factors yielding constant and varying unsystematic variation respectively . Varying unsystematic factors have different effects on different individuals on the same occasion and have different effects on the same individ- uals on different occasions. Constant unsystematic factors also have different effects on the same individual on different occasions but have the same effect on all individuals on the same occasion . Systematic factors have the same 259 effect on all individuals on the same occasion and have the same effect on the same individual on different occasions. Reliability of measurement is the extent of unsystematic varia- tion in an individual's scores over parallel tests. The reliability coefficient is a quantitative index of this degree of unsystematic variation in an individual's scores over parallel tests. The correlation coefficient is a quantitative index of the degree of unsystematic variation in an individual '5 scores over parallel and not parallel tests . Parallel tests measure the sameetrait and meet statistical criteria . Parallel forms of a test measure the same trait and are always similar in content. The main ways of estimating reliability coefficients are test- retest, parallel forms (delayed and immediate) and internal'consistency . Test- retest method involves administering the identical test on different occasions . Parallel forms-delayed method involves administering similar tests on different occasions; parallel forms-immediate method involves administering similar tests on the same occasion . The internal consistency method involves administer- ing the same test once . APPENDIX G TEST AND TEST ANALYSIS 260 I‘ll .690 _ X X mo v.00.— .o *0 «ea—gou .cqumcatd. 200m o aux X Eaton. h. ._. x1020 o o 4? .- mamba> hcatcoU , cotata> % cotata> u _ BEuzmxm 3 $50.12: a > 3 o m5ba> .cotcou , 02.35.30 I V .833 .. v .030U r v rota“. 9.200". 050807.me azaEetxm N— I o ,. . acorn—ax Co 34.52 261 General Instructions for the Multiple True- False Questions In the following questions circle each alternative “true" or "false ." This means that you will mark all alternatives, not iust one . Any number of alternatives could be true and any number of them could be false . This, of course, includes the possibility that all could be true or all could be false . STRUCTURE AND TRANSFER QUESTIONS 1 . (STRUCTURE) If we observed random fluctuation in scores over several occasions we could say that this was caused by TF _ ‘ T -t|—l -r In ‘nl'nl'n ._g 111 T 111 Analysis 0. b. constant unsystematic factors systematic factors systematic variation varying unsystematic factors constant systematic factors unsystematic variation constant systematic variation If a‘lte‘rnative a marked true and the rest marked false, then all parts of 3 must be true: constant Unsystematic factors . Total of 4 points . (l) causal relationship (CUF-CUV) 262 (2) subset - factor and variation (1) ‘ random or lack of order description If alternative 3 marked true and f also true (rest false), i.e . , had both variation and factors as cause, then eliminate constant unsystematic variation descriptive relationship . (3 points) If alternative 1:1 marked true, and the rest marked false, then all parts must be true: varying unsystematic factors . Total of 4 points . (1) causal relationship (VUF-VUV) (2) subset - factor and variation (1) lack of order description If alternative 91 marked true and _f_ also true (rest false), then eliminate varying unsystematic variation descriptive relationship. (3 points) "both 2 andg marked true and rest false, then an additional 4 points (total of 12) (1) causal relationship - unsystematic factors (1) descriptive - random fluctuation - unsystematic variation (2) can also infer correct relationships for systematic factors and variation (causal and descriptive) If both gand dare true and_f also true (rest false), then eliminate the 3 descriptive relationships for unsystematic variation . (Total of 9 points.) If _a and d are both false, the following possibilities then exist for respond— ing as false. 263 l. variation instead of factors as cause 2. systematic instead of unsystematic 3. constant and varying describe systematic not unsystematic If the reason is variation instead of factors, alternatives 3 and _l_" check this. d. systematic variation f. unsystematic variation No points for either of them marked true (and rest false) because cause in wrong direction. Cannot infer correct descriptive relationship either, i.e. , might have thought was factor. If the reason is systematic, alternatives b and _c_ check this. b. systematic factors c. systematic variation If I: marked true (2 true or false, rest false) than had causal relation (factors cause variation) but wrong description. Therefore assume that the subiect knew systematic factors cause systematic variation and unsystematic factors cause unsystematic variation. (2 points) If 2 marked true (2 true or false, rest false) then causal relationship incorrect and also descriptive relation- ship incorrect. (no points) If the reason is that constant and varying describe (are subsets of) systematic rather than unsystematic, alternatives 3 and _g are pertinent. e. Constant systematic factors 9. constant systematic variation If 3 marked true (2 true or false, rest false) assumed subiect knew that factors cause variation (S and U). 2 points If g marked true (3 true or false, rest false) causal and descriptive relationships incorrect. (no points) Any pattern having both variation and factors as cause and not involving 264 an inconsistent combination of systematic and unsystematic was scored 1 point for having factors causing variation, in general. Consistent Patterns (Remaining patterns scored -1 point.) (A blank indicates "false. ") _ 1. , r 12449330'0002‘2111.111111 2 . (STRUCTURE) Suppose we had repeatedly tested Bill and Jack on the ”Student Happiness Invenflo'ry ." The scores resulting from these testings, in order, were as follows: Bill-12 67 73 29 44 5412 73 97 48 Jack- 25 30 35 40 45 30 35 40 45 50 Given these two sets of scores we could say that I F a . Jack's scores were caused by systematic factors .4 I-n b. Jack's series of scores can be described as unsystematic variation Bill and Jack underwent different experiences during this |—I m 0 testing period T F d . Bill's scores were caused by systematic factors Analysis This item was to some extent a check on item one. The analysis was not systematic . 265 If g marked true (2 points) (1) cause (1) description lfb false (1 point) (1) definition of systematic variation Alternative 2 (Transfer) - on effects of factors . 1 point if marked true . If 2 false (2 points) (I) knew Bill's scores caused by unsystematic factors (1) description of unsystematic variation 3. (T RANSPER) If bothosystematic and unsystematic variation occurred simultaneously within a given set of scores the result would be i T F a. a decrease then an increase in the scores T f b. random fluctuation only in scores I F c . an overall trend in the scores with random fluctuations about it. T f d . lack of order followed by a regular order in the scores Analysis If correct combination, then 3 points: 2 for effects and l for integration . If 2 true, then 2 effects, but wrong combination ( 2 points) |f_b true, then only one effect (1 point) If it true, then 2 effects, but wrong combination (2 points) 266 (Remaining patterns, - 1 point) 4. (STRUCTURE) John and Bill were both administered a general' science test two times . The second administration of the test followed the first by a period of four weeks . On the firstfadministration John had a cold, while on the second he was healthy . Bill was in good health both times. On the first administration of the test the time limit was cut short by an emergency fire drill but on the second administration the time limit was exactly in correspondence with the instruc- tions. Before the first test and also between the two test periods both John and Bill were students in the same general science class. On the first admin- istration Bill had iust failed on English test and was quite unhappy . On the second administration John had won a track race and was in good spirits. In this situation the following factors - health, test time limits, exposure to general science, and mood - affected the test situation and presumably the test results . Classify each of the four situational factors as S - Systematic factor or VU --'. Varying unsystematic factor or CU - Constant unsystematic factor 267 o x . o x actco> . 9.23 X . X X m 00 I «00 o o 35anth X X n X Eotcou .o X X X Eaton. o X X r X 62083me n o I o- 223:0 qum :< ESE—:0 eEom 220:5 060m 38.880 33:222. .83 Amamsmcotfluc n ma £3300 Eaton—v £52. 0 mo .33 m. .. 3 30.5.3. .o .3232 268 using the above symbols . If the described classroom situation is such that you cannot classify a factor uniquely, list each of the possibilities . Test Exposure to Health Time Limits General Science Mood vu, ‘ y ' cu s‘ vu 5 . (STRUCTURE) A group intelligence test was given to the sixth grade class . The super- vising teacher allowed 20 additional minutes for the test. _‘Mary was sitting next to the window and Jane in the dark corner of 'the‘room'.’ Different forms Q of the test were given to Mary and Jane . Q Classify these situational factors - time limits, light»,_.med_ mo .c0Ecm_3< D l“ .30.. 0Eom co £03239.— mcoE< “02.0.0..me O—anmca l 0.05000:— £2325 03233.! 32.0.3.3. Bo.— 284 Omission of 2, I point Reliability If marked f, I point for continuum . Omission of d, I point for definition of reliability . 20. (STRUCTURE) Suppose individuals in one space agency were using psychological tests to screen astronauts for claustrophobia . The tests were reliable . Would you recommend the tests for future use? Answer ya or no . Analyé is _ g If marked as, 6 points. (Assignment of individuals to groups and differences among individuals on same test.) (2) continuums (4) reliability connection 2I . (STRUCTURE) I F Unsystematic variation in scores from different tests on different , 'i traits would be desirable if we were constructing 9 53' °f tests for the kit entitled'Gufess who is like you ." (The purpose is to make a difficult game.) Analysis If marked true', 3'points (Differences among traits) (I) continuum (2) reliability connection 22. (First part - STRUCTURE, Second part - TRANSFER) If we wanted to predict how a student taking test X would do on 285 a biology test, we would prefer that F a. test X be reliable I-II I F b. the biology test be reliable Analysis If gtrue, 3 points for continuum and reliability connection . If 2 true (Transfer), 3 points for continuum and reliability connection . ACHIEVEMENT QUESTIONS Structure Relationship Questions I . ' A test is said to be reliable when it is published by a reputable company V provides a basis for diagnosing pupil weaknesses can be scored quite easily measures what it was designed to measure -, gives an accurate estimate of whatever it measures* (DO-00") 2 . Even unreliable scores can be useful to us under the following circumstances: comparing different traits of an individual comparing individuals on the same list predicting behavior . none of the above* . all of the above 00.60'0 3 . Systematic variation in scores refers to a. a "systematic" distribution of scores in a class (e.g . , a normal distribution) b. unbiasedness (e.g. , as in a fair dice) c . an orderly sequence of scores* d . none of the above 286 Reliability is a function of controlled variation systematic variation unsystematic variation" . randomness 0.0 0'0 Tests K and L are parallel tests. In a certain group they correlated .95 and in another the correlation was .20. Such a situation is . possible, though not common* possible, and reasonably common mathematically impossible impossible by the definition of parallel tests impossible, but not for the above reasons undo-a The term, varying unsystematic factors, refers to thos elements ;which cause a. variation between individuals in the same situation 'b. variation within individuals over time, differentially affecting each person c . variation over time, affecting everyone in the group the same d. both a and b* An example of a source of varying unsystematic variation would be a. the test items b. the testee* c . the authors of the test d . the subiect matter of the test Contained in constant unsystematic variation would be a. variation between individuals in the same situation b. variation within individuals over time, differentially affecting each person* ' c . variation within individuals over time, affecting each person the same each time d. both a and b ‘0 IO. II. I2. I3. 287 To distinguish between systematic and unsystematic factors we would need to administer a test to 00.00“!) only one person on only one occasion . only one person on several occasions . several people on only one occasion . several people on several occasions* in actual practice, it is impossible to distinguish them In categorizing factors, the constant factors are a'o . always placed with systematic factors . always placed with unsystematic factors* . placed with systematic or unsystematic, depending on the nature of the factors . sometimes not placed in either systematic or unsystematic factors A reliability coefficient is obtained by correlating scores on the same form of a test twice administered to the same pupils a number of days apart . Such a reliability coefficient has been termed a 00.00'0 . split-half coefficient coefficient of equivalence internal consistency coefficient . validity coefficient . test-retest coefficient How are parallel forms related to parallel tests? 00.00'0 . parallel tests are a special type of parallel forms* .' parallel forms are a special type of parallel tests . they are both the same they are actually two rather unrelated terms . their relationship is more complicated than indicated by any of the above alternatives The corrputation of internal consistency coefficients requires the administration of 0. comparable tests to the same group 288 one test to two groups comparable tests to different groups . one test to the same group on two occasions one test to one group* 00.00" Related Structure Questions (e .g . , may test for presence of elements .) I4. Which of the following is not _on_e of the maior types of variances in scores: . constant* . systematic . unsystematic . all of the above . none of the above ca (Loo-o I5. Systematic variation and systematic factors are terms uSed by different authors, but refer essentially to the same thing. T orf I6. Which of the following is an essential concept in reliability theory? a. relevance b. parallel tests* c. parallel forms d e . criterion measure . all of the above I7. "The odd-even method is a special case of the . parallel forms method . internal consistency method" . systematic variation method . test-retest method . none of these (00.00'0 Additional Questions I8 . ln assigning persons to groups, an unreliable test will likely have the effect of I9. 20. 2]. 22. 289 a. creating a large middle group b. depleting the middle groups c . increasing the errors of classification?‘ d . none of the above In general, the number of systematic and unsystematic factors which influence a score are approximately: a. one b. two c. three d. ten e . none of the above" In which of the following instances would we be most confident of the operating factors? test scores from groups in two different schools . test scores from morning and afternoon sessions . test scores before and after a computer programming course* test scores before and after a summer recess . 0.0 0'0 The reliability of a reading test of fourth grade‘pupils is reported to be .78. From this information we can best iudge: a . how many points pupils are likely to change on the average, if an equivalent test is given b. how many fourth graders are above the norm c . the extent to which each pupil will maintain his position in the group if an equivalent test is given* . how many fourth graders are below the norm e . the extent to which the test is related to other significant factors in the individual 0. In order to compute a correlation coefficient between traits A and B, it is necessary to have a . one group of subiects some of who possess characteristics I of trait A, the remainder possess those of trait B b. measures of traits A and B on each subiect in one group* ' c . one group of subjects, somewho have both A and B, some with neither, and some with one but not the other 23. 24.’ i is represented by a coefficient of 25 .' 26. 0 (Ln 0‘0" 290 d ._ measures of trait A on the group of subjects, and of . trait B on another. e. two groups of subiects, one which could be classified as A or not A, the other as B or not B An individual reported a reliability coefficient of an intelligence test as I .15. It was obtained by correlating the results of a given group on Form A with their results on Form B. This coefficient indicates that . the test has low reliability . the test is moderately reliable .' the test is highly reliable no interpretation can be made without some further. crucial information ' ' CLO O"0 V e. a mistake has been made in corrputing the correlation coefficient* A perfect correspondence or correlation between two variables . -_I .oo* .oo .90 2.00 .Ioo.oo :Which one of these r's has the least predictive value? a. .9I b. .50 c. .l7* d. — .23 e. -I .00 Under a scatter diagram: there is a notation that the coefficient of correlation is .06. This means that a. most of the cases are plotted within a range of 6% above or below a sloping line in the diagram b. there is a bit more than moderate correlation c . ' plus and minus 6% from the means includes about 68% of the cases 27. 28. 29. 30. d. e. 29] there is a negligible correlation between the two variables" the data mostly (plotted) falls into a narrow band 6% wide. Carry-over effects are most serious with 100'0 split half method parallel forms method test- retest method* ' not very serious with any of the above methods .When reliability coefficients can be estimated by several correlation coefficients, one should use the . O. b. ‘Ce' d. '8' . first one calculated median arithmetic mean" geometric mean none of these Internal consistency coefficients are often used because they 0.00'0 are the easiest to compute can be caIculrIted from a Single administration of a test* are easier to interpret are the most accurate actually, they are seldom used. In determining the quality of a test, reliability is a 00.00'0 desirable but neither a necessary nor sufficient condition necessary but not sufficient condition" ‘ necessary and a sufficient condition sufficient but not necessary condition none of the above. APPENDIX H ORDER OF TEST ITEMS I0 II. I2 I3 ‘ I4 I5 I6 I7. I8 I9. 20. 2] 22. 23 24. 25. 26 27. omuouhwmr 292 AI2 SI filler item 52 Al A27 A4 SI8 54 A2 A5 Al I A29 55 AI 7 A20 A23 57 AI8 A24 SI 6 $8 S2I AI6 A25 59 SI 7 Maximum number of points: Achievement - 30 Structure Transfer - I00 - 59 28. 29. 30. 3I . 32. 33. 34. 35. 36. 37. 38. 39. 4I 42 43. 45. 46 47. 49. 50. 5I . 52. 53. AI 9 A3 A22 SI I 522 A2] A26 SI2 S3 AI 3 S6 A30 SIO A28 AI 5 SB SI4 SI5 AI 0 A7 S20 AI4 SI 9 A 6 A9 A8 APPENDIX I GUTTMAN DEPENDENCIES 293 . A. Questions 4 and 5 iointly dependent upon ability to answer question 6. All of these questions related to the substructure covering the effects of the three types of factors . Item 6 was a simple recall type item and 4 and 5 were applications of this knowledge . Item 4 pertained to different testing occasions and item 5 pertained to the some testing occasion . 8 Question 7 (second part) dependent upon questions 6 and 7 (first) . Seven-first covered identification of different reliability estimation methods. Seven-second involved listing the type of score variation which could be distinguished in each situation . . This required knowledge of the effects of factors on different and same occasions as well as appropriate identification of the type of situation described . The reliability passage didfnot include the answer to the second part of 7 (transfer) . It was expected that both 6 and 7-first would be answered by a geat maiority of the Ss because these two questions were not very difficult . C . Question 8 dependent upon 6 and 7-first (7-second) and I5a and/or I9d . Question 8 (transfer) tested which types of factors affected reliability coefficients estimated by the various methods. There- fore it required knowledge of the definition of reliability co- efficients in terms of degree of unsystematic variation (I50 and/or I9d , as well as the interrelationships between factors and estimation methods . 294 D . Question 9 dependent upon 6and 7-first (7-second) and I50 and/or I9d . in Question 9 (transfer) was perhaps more difficult than question 8 and could have been interpreted as an inference from 8 itself. It asked which parts of estimation methods gave the highest and lowest reliability indices . The same reasoning as given in "C" , applied here . E. Question II dependent upon 6 and 7-first (7-second) and I5a and/or I9d . Question II (transfer) set forth a dilemma posed by the differences between the theoretical definition of reliability and methods of estimating reliability coefficients . The same dependency argument used in "C" applied here . 55 were apt to get this item correct by ' chance partly because of its phrasing . APPENDIX J ANALYSIS OF VARIANCE FOR TIME AND ERRORS 295 Table 28 Analysis of Variance for Time and Errors Source SS df MS F Time Group O Treatment 3373 .3l 2 2 I686 .66 26 .60” Error 7I0l .I87 IIO 63 .40 Total I0474 .499 I I 2 Group R Treatment 225 .738 2 I I2 .87 I .29 Error 3499 .89I 40 87 . 50 Total 3725 .629 42 Groups O Treatment 380I . 8l 2 5 760 .36 I0 .90*** and R Error I060I . I25 I50 69.74 Total I4402. 937 I55 Errors Groups O and R Treatment 5.502 5 I .10 I .4] Error II7.338 I50 .78 Total I22 .840 155 *** p< .00] ** p< .0I 296 Table 29 Scheffe, Multiple Comparisons on Time Group O D V NR (M) 46.53 41 .38 33.22 (SD) 8 .57 8 .98 5 .41 NR I3.3I0*** 8.I59*** V 5 .I 5I Groups O and R O-D O-V R-D R-NR R-V O-NR (M) 46.53 41.38 40.88 36.36 35.92 33.22 (SD) 8.57 8.98 10.99 8.81 6.07 5.41 O-NR 13.31*** 8.I6*** 7.66* 3.14 2.71 R-v IO.60*** 5.45 4.95 .43 R-NR 10.17*** 5.02 4.52 R-D 5.65 .50 APPENDIX K ANALYSIS OF VARIANCE AND COVARIANCE FOR ACHIEVEMENT 297 Table 30 Analysis of Variance for Achievement Source SS df MS F AI Group O Treatment. I. 652 2 .83 .06 Error 1498.922 110 (3.33 Total I500 -574 112 Al Group R Treatment 4 ~563 2 2 .28 .I 7 Error 533 -207 40 I3 .33 Total 537.770 42 AI Groups 0, Treatment 1200.05 6 203 .01 15 .98** R and C Error 2840 .9] 227 I .52 Total 4040.96 233 A2 Group O Treatment 5.30l 2 2.65 .20 Error I332 .9l 4 I00 I3 .33 Total I338 .2l 5 I02 A2 Group R Treatment 27,339 2 I3 .67 .88 Error 560 .250 36 I5 .56 Total 587.589 38 A2 Groups O, Treatment 9I4 .246 6 I52 .37 II .80*** R and C Error 2749.843 2I3 I2.9I Total 3664 .089 2l 9 *** i0< .601 ** p <.0I 298 Table 3I Analysis of Covariance for Achievement AI , Tm Covariate Group O Treatment 2 .235 2 .I6 Error l3 .4I5 l09 Al , Tm Covariate Group R Treatment I .290 2 .09 Error ' I3 .577 39 A2, AI Covariate Group O Treatment 2 .677 2 .36 Error 7 .4I 2 99 A2 , AI Covariate Group R Treatment 5 .457 2 .67 Error 8 . I 03 35 299 Table 32 Scheffé Multiple Comparisons on Achievement for Six Treatments and Control A1 R-D R-NR R-V O-V O-NR O-D C (M) 17.06 16.50 16.31 15.53 15.38 15.37 11.06 (SD) 3.36 3.88 3.29 3.67 3.23 3.81 3.31 C 5.99** 5.44** 5.24M 4.46“ 4.31** 4.30** O-D 1 .69 1 .13 .94 .16 .01 O-NR 1.68 1.12 .93 .15 O-V 1 .54 .97 .78 R-V .76 .19 . . . R-NR .56 . . . A2 R-D R-V O-D R-NR O-NR O-V C (M) 16.86 15.50 15.23 15.00 14.77 14.74 11.06 (SD) 3.08 4.67 3.27 3.82 3.89 3.61 3.31 C 5.81** 4.441“: 4.16* 3.94* 3.70* 3.67* O-V 2 .14 .76 .49 .26 .03 O-NR 2 .11 . 73 .46 .23 R-NR 1 .88 .50 .23 O-D 1 .65 .27 R-V 1.37 . . . ** p< .01 * p< .05 APPENDIX L ANALYSIS OF VARIANCE AND COVARIANCE FOR STRUCTURE [:1 1.....- 300 Table 33 Analysis of Variance for Structure Source SS df MS F SI Group O Treatment 37.875 2 18.95 .15 Error 13871 .125 110 123.85 Total 13909.000 112 ;. S1 Group R Treatment 136.125 2 68.06 .65 Error 4220 .625 40 I 05 .52 Total 4356 . 750 42 51 Groups O, Treatment 6244 .06 6 1040.68 8 .09***' R and C Error 29187.81 227 128.58 Total 35431 .87 233 S2 Group O Treatment 180.500 2 90.25 1 .02 Error 8864 .937 100 88 .65 Total 9045 .437 102 52 Group R Treatment 140.438 2 70.22 .95 Error 2664 .375 36 74 .34 Total 2804 .81 3 38 52 Groups O, Treatment 5927.500 6 987.92 9.00*** R and C Error 23387.437 213 109.80 Total 29314 .937 219 *** p < .001 301 Table 34 Analysis of Covariance for Structure SI , Tm Covariate Group O SI , Tm Covariate Group R 52, SI Covariate Group O 52 , S1 Covariate Group R Treatment Error Treatment Error Treatment Error Treatment Error 11.732 122.416 63 .089 107.959 72 .669 57, 754 67 .398 65 .839 109 39 99 35 .09 .13 .10 302 Table 35 Scheff’e Multiple Comparisons on Structure for Six Treatments and Control R-NR R-D O-D o-v TD-NR R-v C (M) 64.93 64.50 62.74 62.47 61.32 60.85 51 .81 (50) 7.99 10.73 11.64 8.95 11.52 11.35 12.31 C 13.12**12.69* 10.93* 10.67* 952* 9.04 R-v 4.08 3.65 1.89 1.63 .48 O-NR 3.60 3.18 1.41 1.15 o-v 2.46 2.03 .26 0-0 2.19 1.76 R-D .43 52 R-NR o-v R-D R-v O-D O-NR C (N0 66.46 -63.38 63.25 61.70 61.09 60.24 51.81 (50) 7.24 8.15 7.83 10.01 9.78 9.79 12.33 C 14.64** 11.58** 11.44** 9.89 9.28** 8.43* O-NR 6.23 3.15 3.02 1.47 .85 o-0 5.33 2.30 2.16 .61 R-.V 4.76 1 .68 1.55 R-D 3.21 .13 o-v 3.08 *** P<.oo] ** p<.01 * p<.05 APPENDIX M ANALYSIS OF VARIANCE AND COVARIANCE FOR TRANSFER 303 Table 36 Analysis of Variance for Transfer Source SS df MS F T1 Group O Treatment 128 .000 2 64 .00 .86 Error 8325 .000 1 10 74 .34 Total 8454 .000 I 12 T1 Group R Treatment 42 .996 2 21 .50 .35 Error 2425 . 984 40 60 .65 Total 2468 .980 42 T1 Groups 0 , Treatment 522 .812 6 87 .14 I .34 R and C Error 14735 .062 227 64 .91 Total 15257 .874 233 T2 Group O Treatment 64 .160 2 32.08 .60 Error 5313 .687 100 53 .14 Total 5377 .847 102 T2 Group R Treatment 30. 805 2 15 .40 .23 Error 241 9 . 965 36 67 .22 Total 2450 . 770 38 T2 Groups O, Treatment 759.563 6 126.59 2 .40* R and C Error 11216.562 213 52.66 Total 11976 .125 219 * p< .05 304 Table 37 Analysis of Covariance for Transfer Sourct Adi. MS df F TI , Tm Covariate Group O Treatment 39.519 2 .53 Error 74 . 746 109 T1,_ Tm Covariate Group R Treatment 20.546 2 .38 Error 53.907 39 T2, T1 Covariate Group O Treatment 16.553 2 .38 Error 43 .993 99 T2, T1 Covariate Group R Treatment 23 .541 2 .57 Error 35 41 .011 APPENDIX N ANALYSIS OF VARIANCE FOR SUBS TRUCTURES 305 Table 38 Analysis of Variance for Substructures an Acquisition - Six Treatments and Control Source SS df MS F Sb1 Treatment 443 .101 6 73 .85 3 .37** Error 4978 .062 227 21 .93 Total 5421 .163 233 Sb2 Treatment 328 .551 6 54 . 76 3 . 70** Error 3356 .414 227 14 . 79 Total 3684 .965 233 Sb3 Treatment 350 .660 6 58 .44 4 .47*** Error 2969 . 957 227 1 3 .08 Total 3320 .61 7 233 Sb4 Treatment 607.438 6 101 .24 4 .57*** Error 5025 .664 227 22 .14 Total 5633 . I 02 233 Sb5 Treatment 111. 941 6 18 .66 9 .48*** Error 446 .674 227 1 .97 Total 558 .615 233 Sb6 Treatment 10 .875 6 1 .81 .34 Error 1214.511 227 5.35 Total 1225 .386 233 *** p < .001 ** p< .0] 306 Table 39 Analysis of Variance for Substructures on Retention - Six Treatments Source SS df MS F Sb 1 Treatment 117 .543 5 23 .51 1 .21 Error 2632 .887 I36 19 .36 Total 2750 .430 141 Sb 2 Treatment 25 .391 5 5 .08 .40 Error 1727.598 136 12 .70 Total 1852 .989 141 Sb 3 Treatment 92 .195 5 18 ,44 I .72 Error 1461 .382 136 10 . 75 Total 1553 .577 141 Sb 4 Treatment 79 .973 5 15 .99 . 90 Error 2423 .324 136 1 7 .82 Total 2503 .297 141 Sb 5 Treatment 6 .443 5 1 .29 . 68 Error 256 . 775 136 1 .89 Total 263.218 141 Sb 6 Treatment 22 .379 5 4 .48 1 .12 Error 545 .320 136 4 .01 Total 567 .699 141 O-D 307 Table 40 Scheffé Multiple Comparisons on Substructure - Acquisition for Six Treatments and Control O-NR R-D O-V R- NR Sbl R-V ‘ c (M) 9.05 8.38 7.75 7.31 7.11 6.07 5.44 (SD) 4.49: 4.93 5.55 4.93 4.83 4.74 4.81 C ., 3.62* 2.94* 2.31 1 .87 1 .67 .64 R-NR. 2.98 2.31 1 .68 1 .24 1 .03 o-v 1 .95 1 .27 .65 .20 R-V' 1.75 1.07 .44 R-D 1.30 .63 o-NR .67 Sb2 R-NR R-D O-NR O-V R—V O-D C (M) 17.79 17.06 17.03 16.42 15.54 14.92 14.41 (SD) 1.82 3.25 3.23 3.18 4.34 4.02 4.40 C 3.38 2.65 2.62 2.01 1.13 .41 O—D 2.87 2.14 2.11 1.50 .62 R-V 2.25 1 .52 1 .49 .88 O-V 1.37 .64 .61 O—NR .76 .04 R-D‘ .72 S83 R—NR O-V R-D O—D R—V O-NR C (M) 13.29 12.95 12.56 12.24 11.54 10.46 10.12 (SD) 3.15 3.09 2.18 2.87 3.88 4.10 4.02 C 3.17 2.83 2.45 2.12 1.42 .34 O-NR 2.83 2.49 2.10 1.78 1.08 . . . R-V 1.75 1.41 1.02 .70 . . . O-D 1.05 .71 .33 . . R—D .72 .39 . . . O-V .34 308 Table 40-- (Continued) Sb4 .R.-NR R-D R-V O-NR o-v 0-0 c (M) 17.21 16.44 16.39 16.34 16.16 16.00 12.94 (SD) 2.78 3.86 2.62 4.62 3.31 4.68 5.72 C 4.28 3.50 3.45 3.41 3.22 3.06 O-D 1.21 .44 .39 .16 .16 . . . ,o-v 1 .06 .28 .23 .18 .O-NR .87 .09 .04 1. R-v .83 .05 R-D .78 Sb5 ‘O-D R-D R-NR R-V o-v O-NR C (M) 4.03 3.94 3.93 3.85 3.61 .29 2.33 (SD) 1.49 1.56 .56 1.29 1.44 1.16 1.46 C 1.69** 1.60* 1.59 1.51 1.27 .97 rO-NR .73 .64 .63 .55 .31 .o-v .42 .33 .32 .24 R-V .18 .09 .08 R-NR .09 .01 R-D .08 55" 'R-D R—NR C o-v R-V O-NR. 0'0 (M) 6.75 6.64 6,58 6.14 6.23 6.16 6.16 (SD) 2.25 2,02 2.28 2.42 2.75 2.31 1.94 ** p<.01 * p<.05 309 Table 41 Means and Standard Deviations on Substructure - Retention for Six Treatments SbI (M) (SD) Sb2 (M) (SD) Sb3 (M) (SD) Sb4 (M) (SD) Sb5 (M) (SD) Sb6 (M) (SD) O-NR 6.71 3.91 O-NR 16.24 3.39 O-NR 11.38 3.81 O-NR 16.15 4.51 O-NR 3.24 1.06 O-NR 6.53 2.12 O-V 9 00 4:39 o-v 16.12 2.91 O-V 13.24 2.73 O-V 15.09 4.21 O-V 3.68 1.32 O-V 6.27 1.97 O-D 7.51 4.42 15.63 4.27 O-D 12.00 3.44 O-D 15.40 4.74 O-D 3.34 1 .53 O-D 7.29 1.79 R-NR 8.92 4.37 R-NR 16.54 4.38 R-NR 13.62 3.81 R-NR 17.39 2.65 R-NR 3.54 1.08 R-NR 6.46 1 .59 R-V 7.60 4 .48 R-V 15.30 2.57 R-V 12.60 2.62 R-V 16.60 3.44 R-D 8 .63 4 .47 R-D 16.81 2.45 11.63 3.81 R-D 16.81 2.89 R-D 3.00 1 .50 R-D 6.38 1.79 APPENDIX O QUESTIONNAIRE ITEMS UNIQUE TO DIAGRAM AND VERBAL TREATMENTS 310 Table 42 Responses to the Diagram and Verbal Questionnaire Items Question Response Yes No No Response Diagram Treatment 1 . Examine small diagrams 49 2 1 Trouble with interpretation 11 37 2 2. Examine large diagrams 51 Trouble with interpretation 24 26 1 Examine inter-connections 47 3 I Randomly 30 Systematical ly 15 3. (Use of diagrams while reading) Repeat 19 Integrate 18 Check on learning 19 Organize 18 Remember spatially 17 Other 4 4. (Use of diagrams during test) Visualized diagram 10 Recognized connection 13 Vague remembrance 26 No recall 6 311 Table 42-- (Continued) Question Response Yes No No Response Verbal Treatment 1 . Examine small reviews 37 2 2. Read large review 36 3 3 . (Use of review passage while reading) Repeat 21 Integrate 8 Check on learning 17 Organize 8 Remember verbally 13 Other 1 4. (Use of review during test) Instant recognition 9 Vague remembrance 25 No recall 7 APPENDIX P APTITUDE CORRE LATIONS it a: _.._ -_.: 312 Table 43 Correlations among Aptitude Scores and Main Dependent Variables O-V, ACE, n =11 O-V, CAAT, n =8 Q V T Q V T A1 .109 .490 .300 .318 .560 .545 A2 .093 .464 .289 .1 68 .434 .370 $1 .329 .673* .611* .353 .565 .572 $2 .496 .620* .644* .315 - .050 .192 TI .165 .347 .314 -.453 - .244 - .460 T2 .186 .522 .445 - .305 .344 - .414 Tm .128 - .120 .138 - .724* .439 - .261 _ E .119 .118 .133 -.470 .012 -.326 O-D,ACE,n=I7 O-D,CAAT,n=9 A1 .072 .157 .148 .834” .212 .620 A2 .531 .527 .434 .614 .407 .499 $1 .044 .057 .030 . 323 .185 .298 $2 .055 .507 .384 .779** .373 .677* T1 .432 .394 .502* . 797** .027 .493 T2 .037 .I 70 .102 .207 .455 .380 Tm - .389 -.076 .250 - .025 -.374 -.225 E - .130 .134 .027 - .084 -.078 - .094 R-NR, ACE, n = 7 A1 .681 .374 .099 A2 .823 .044 -.224 $1 .799 - .232 -.449 S2 .748 .079 - .1 71 T1 .556 .664 .737 12 .525 .112 -.072 Tm .593 -.375 -.505 E .792* -.623 -.778* ** p<.01 * p<.05 IIIIIIIIIIIIIIII 111111111111111111111111111111111111111'111'11111111111"‘