A TECHNIQUE FOR DETERMINING THE EVALUATIVE DISCRIMINATION CAPACITY AND pommw or- SEMANTlC DIFFERENTIAL SCALES FOR SPECIFIC CONCEPTS Thesis for III. Degree of Ph. D. ' MICHIGAN STATE UNIVERSlTY Donald Keith Darnell 1964 - I! IIIIIIIIII A TECHNIQUE FOR DETERMINING THE EVALUATIVE DISCRIMINATION CM’ACITY AND POLARI'I‘Y OF SEMANTIC DIFFERENTIAL SCALES FOR SPECIFIC CONCEPTS TIMI: for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY Donald Keith Darnell I964 IIIIIIIIIIII IIIHIIII lllll 3 1293 10747 v.--“ This is to certify that the thesis entitled A Technique for Determining the Evaluative ‘ Discrimination Capacity and Polarity of Semantic Differential Scales for Specific Concepts presented by Donald Keith Darnell has been accepted towards fulfillment of the requirements for Ph.D. degree in Communication . d‘z /j/&/._/4r’ Mm Aug/7 0—169 LIB R Al Michigan 5 UniveniI ‘— L I B R A R Y MIChigan Stabs Univenity I" F“ A TECHNIQUE FOR DETERMINING THE EVALUATIVE DISCRIMINATION CAPACITY AND POLARITY OF SEMANTIC DIFFERENTIAL SCALES FOR SPECIFIC CONCEPTS BY DONALD KEITH DARNELL AN ABSTRACT OF A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Communication 1964 W“ "WW ABSTRACT A TECHNIQUE FOR DETERMINING THE EVALUATIVE DISCRIMINATION CAPACITY AND POLARITY OF SEMANTIC DIFFERENTIAL SCALES FOR SPECIFIC CONCEPTS by Donald Keith Darnell The purpose of this study was to develop a technique of measurement which can be used to investigate the objective criteria that people use in making evaluative judgments about events or objects in their environment. The tech- nique developed employs a scaling technique similar to that employed in the semantic differential but uses a different system of analysis. The first chapter is devoted to a reassessment of the semantic differential, based on ten years research with that instrument. Empirical and theoretical arguments are pre- sented that the results of SD research do not support the general conclusion that evaluations of events are "inde- pendent" of objective judgments for particular events or categories of events. The major hypothesis of this study was: Bipolar adjec— tival scales such as those used in the semantic differential, Donald Keith Darnell and including those identified in factor analysis as non- evaluative, can be shown to have an evaluative discrimina— tion capacity for some concepts. Subjects were asked to respond to the ”best imaginable" and the "worst imaginable" examples of the categories of events named by concepts. Twenty concepts and 75 scales, borrowed from earlier research with the SD, were used. The Sign test was used to determine if there was a significant agreement among subjects on the polarity of each scale for each concept. It was assumed that the "best- worst" stimulus would permit each subject to indicate a preferred direction for each scale—concept item and that significant agreement among subjects on the relation between ”best” and "worst" responses would indicate an evaluative discrimination capacity of the scale for the concept. Affirmative results were obtained. Of the 46 scales identified in earlier factor analyses as non-evaluative, 44 showed a significant evaluative discrimination capacity. In all, 72 of 75 scales demonstrated this capacity. A second hypothesis was also tested: There is a posi- tive relation between discrimination capacity of a scale for a concept and the importance of that scale as an evalua- tive criterion for a particular concept. Donald Keith Darnell Subjects ranked the 75 scales in order of importance to an evaluative decision about each of six concepts. A rank order correlation between importance and discrimina— tion capacity provided support for the second hypothesis. The conclusions of this study were: 1. The evaluative judgments that people make about events are related to their "objective" judgments of those events. 2. The objective criteria on which people base their evaluations of particular events are discoverable, using the best-worst technique. 3. The greater the statistical confidence in the evaluative discrimination capacity of a given scale the more likely it is to be an important criterion of evaluation. 4. The fact that a particular scale discriminates evaluatively (or does not) for a particular concept is not generalizable to other, unrelated, concepts. The implications of this study for the semantic dif- ferential as a measurement technique, for meaning, and new directions in research are discussed. A TECHNIQUE FOR DETERMINING THE EVALUATIVE DISCRIMINATION CAPACITY AND POLARITY OF SEMANTIC DIFFERENTIAL SCALES FOR SPECIFIC CONCEPTS BY DONALD KEITH DARNELL A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Communication 1964 Acknowledgments I wish to express my appreciation to my principal adviser, Dr. David K. Kerlo, and to Dr. Erwin P. Betting— haus, Dr. Hideya Kumata, and Dr. Malcolm Maclean,‘Jr., who have provided counsel and encouragement throughout this program of research. A special word of appreciation goes to Dr. Charles Osgood and his associates who contributed greatly to this dissertation and to my general intellectual development. I must also thank Dr. Norma Bunton of Kansas State University who tolerated my preoccupation. To Dan Costley and Mike Miller, for acting as inter- preters with the computers, goes my sincere gratitude. The greatest debt is owed to the family and friends who never doubted my sanity. ii Table Acknowledgments . . . . . List of Tables . . . . . . List of Illustrations . . Introduction . . . . . . . Chapter 1 . . . . . . . . Chapter 2 . . . . . . . . Chapter 3 . . . . . . . . Chapter 4 . . . . . . . . Bibliography . . . . . . . Appendix A . . . . . . . Appendix B . . . . . . . Appendix C . . . . . . . of Contents iii Page ii iv 11 35 51 73 88 95 100 102 Table List of Tables Evaluation Discrimination Capacity and Polarity of 75 Scales for 20 Concepts . . . . A Principal Axis Factor Analysis Varimax Rotation (75 Scales) A Principal Axis Varimax Rotation Three Indices of Capacity . . . . Relations Between Importance and Factor Analysis (50 Scales) Scale Evaluation Discrimination Capacity iv Page 53 6O 64 65 '70 List of Illustrations Figure Page 1. (Shows a hypothetical distribution of responses on the scales good—bad and large-small for the concept PROFIT) . . . . 20 2. (Shows a hypothetical distribution of responses on the scales good-bad and large-small for the concept LOSS). . . . . . 20 3. (Shows the effect on correlation of summing across concepts when the assumption of constant polarity does not hold) . . . . . . . . . . . . . . . . . 20 4. (Compares linear and curvilinear cor- relation for one hypothetical distri- bution of scores) . . . . . . . . . . . . . 25 5. (Compares linear and curvilinear cor— relation for one hypothetical distri- bution of scores) . . . . . . . . . . . . . 25 6. (Compares linear and curvilinear cor- relation for one hypothetical distri- bution of scores) . . . . . . . . . . . . . 25 Introduction The semantic differential (SD) was first introduced in 1952 by Osgood and Suci, and it stimulated a flood of re- search which was summarized in 1957 by Osgood, Suci, and Tannenbaum. Barely ten years after the debut of the SD, the writer is able to report more than seventy-five publications that make some mention of the technique. The name has.gained sufficient importance to researchers of human behavior that it warrants a place in the index of Psychological Abstracts. The SD is currently a standard measurement technique at many universities and other research organizations. The variety of ways in which it has been applied is evident from a glance at the appended bibliography. All of this argues the importance of the technique to a field short on measuring instruments. It does not, of course, imply that the SD is a technique without imperfec- tions. Instead, it implies urgency in discovering the flaws that may exist in the technique and repairing these flaws. A detailed description of the technique of semantic differentiation is available in several places (e.g., Osgood & Suci, 1952; Osgood, Suci, & Tannenbaum, 1957), and there seems to be no need to reproduce that detail here. Perhaps it will add continuity, however, to give a brief review of that tedhnique. The Semantic Differential "The semantic differential is essentially a combination of controlled association and scaling procedures" (Osgood et al., 1957, p. 20). It is a means of eliciting subjects' responses that indicate which member of a pair of adjectives is more closely associated with a particular concept, and the intensity of that association. In its most common form, the SD form looks like this: TREE good : : : : : : : bad happy : : : : : : : sad large : : : : : : : small The subject (S) is instructed to mark in the middle of the scale if the adjectives at either end are equally associated with the concept at the top of the page. If one is more closely associated than the other, §_can indicate "extremely" (by marking the box next to the stronger asso— ciate), "quite" (by marking the second box from the stronger associate), or ”slightly" (by marking the third box from the stronger associate—-next to the center). It is assumed that an adequate sample of such scales would provide a fairly specific profile of §fs meaning for a concept. Given that each scale represents a choice among seven alternatives, and that with §_independent scales g could differentiate the universe into 7E'categories (five scales would then allow for 16,807 categories of response), this does not seem an unreasonable assumption. There is evidence that gs can make these responses reliably (Osgood et al., 1957, Chapter 4; Norman, 1959). Norman tested reliability on the instrument for individuals in his sample and reports a median test-retest (four weeks intervening) reliability coefficient of .66. Osgood reports an over—all test-retest (immediate) reliability coefficient of .85. Both of these are reasonable when one considers that the indicated error includes mistakes in marking, changes in meaning, and differences in the situational con— text of the marking behavior. The scaling procedure also has high face validity. In faCt, the S's marks may be said to constitute a complex, graphic definition of a concept for an individual. Osgood et al. (1957, p. 142) also report significant relations between SD markings and independent measures of attitude as well as significant predictions of voting behavior by "unde- cided" voters, which indicates a kind of predictive validity for the instrument. But, given that the meaning of a con- cept to an individual is his response to it, the validity of the SD rests on the assumption that gs follow instructions and do, in fact, respond to the concepts by means of the scales. One of the strong advantages that the SD has over other comparable instruments is the speed with which it generates data. It can be administered to groups of subjects limited in number only by convenience. It has been estimated (Osgood et al., 1957, p. 80) that an §_can respond to 10 to 20 items per minute and can sustain a rate of 5 to 10 items per minute for periods up to an hour. (An item is defined as a scale- concept pairing.) The writer's experience in administering the SD indicates that this is probably a conservative esti- mate, or a fairly good estimate of the rate maintained by the slowest member in the average experimental group of col- lege students. By comparison, §_can respond to four SD scales in the time required for one typical multiple choice question. The graphic scale system also presents an advantage to the experimenter if he wishes to compare two subjects or two concepts. Given an §fs responses to concepts A and B on scales a, b, c, d, and e, a direct profile comparison can be made by preparing the following kind of chart: a : :A: : B: : :-a b : A: : : : B: :-b C A: : : : : :B:-c d : : A: : B: : :-d e : : :AB: : : :-e It is easy to see at a glance that the two profiles have certain similarities and certain differences, and it does not seem unreasonable to assume that there are correspond- ing similarities and differences in the meanings of the two concepts. However, it is difficult to express what the eye can see, even in this simple example. When groups of sub- jects or concepts are compared it is necessary to translate to some other system, such as the language of mathematics, to express the complex relationships. To make mathematical description possible, Osgood and his associates assume that the intervals of a scale are equal intervals and assign successive integers to the scale categories. To further simplify the problem of communicat- ing the comparative data collected from gs about concepts, they set up a mathematical model described in The Measure- ment 9f_Meaning (p. 25) as follows: We begin by postulating a semantic space, a region of some unknown dimensionality and Euclidian in character. Each semantic scale, defined by a pair of polar (oppo- site in meaning) adjectives, is assumed to represent a straight line function that passes through the origin of this space, and a sample of such scales then repre- The larger or more sents a multidimensional space. representative the sample, the better defined is the space as a whole. Now, . . . , many of the "directions" established by particular scales are essentially the .) and hence their replication adds little same (. . To define the semantic to the definition of the space. space with maximum efficiency, we would need to deter- mine that minimum number of orthogonal dimensions or axes (again assuming the space to be Euclidian) which exhausts the dimensionality of the space--in practice, we shall be satisfied with as many such independent dimensions as we can identify and measure reliably. The logical tool to uncover these dimensions is factor analysis, . . . . Semantic differentiation is explained in terms of this model (p. 26) as the successive allocation of a concept to a point in the multidimensional semantic space by selection from among a set of given scaled alternatives. Difference in the meaning of two concepts is then merely a function of the differences in their respective allocations with- in the same space. Osgood’s development of the SD stems, at least in part, from his interest in a theory of meaning. In Method and Theory 2§_Experimental Psychology (Osgood, 1953) and again in The Measurement 9f_Meaning (Osgood et al., 1957) he dis- He cusses his learning theory conceptualization of meaning. defines meaning there as "representational mediating proc- esses." These processes are said to be both responses and sources of self stimulation. "They may well be purely neural events rather than actual " (1957, These processes are learned. muscular contractions or glandular secretions . p. 7). And, it follows from the statement that they may be "purely neural events" that these meaning processes are, at On present, not directly observable in an intact organism. the other hand, with the SD there is a definition of mean- ing, a point in semantic space, that can be mathematically traced to the responses of subjects. Osgood says (1957, pp. 26-27): we now have two definitions of meaning. In learning- theory terms, the meaning of a sign in a particular context and to a particular person has been defined as the representational mediation process which it elicits; the meaning of in terms of our measurement operations a sign has been defined as that point in semantic space specified by a series of differentiating judgments. We can draw a rough correspondence between these two The point in space which serves us levels as follows: as an operational definition of meaning has two essen- tial properties--direction from the origin, and distance We may identify these properties with The from the origin. the gyality and intensity of meaning respectively. direction from the origin depends on the alternative and the distance depends on the polar terms selected, extremeness of the scale positions checked. What properties of learned associations--here, associations of signs with mediating reactions-- correspond to these two attributes of direction and At this point we must make a rather tenuous intensity? Let us assume that assumption, but a necessary one. there is some finite number of representational media- tion reactions available to the organism and let us further assume that the number of these alternative reactions (exofintory or inhibitory) corresponds to the number of dimensions or factors in the semantic space. Direction of a point in the semantic space will then correspond to what reactions are elicited by the sign, and distance from the origin will correspond to the intensity of the reactions. After several paragraphs designed to "clarify this assumed isomorphism somewhat," the writers (Osgood et al., 1957, p. 30) conclude with three statements that are significant for the present study. is one rationale by which the semantic 1. "This, then, can be differential, as a technique of measurement, considered as an index of meaning." "It is true that many of the practical uses of the indeed its own empirical semantic differential, if at all, on sudh a tie- validity, depend little, in with learning theory." 3. "If we are to use the semantic differential as an hypothesis testing instrument, and if the hypotheses are to be drawn from learning-theory analysis, some such rationale as has been developed here is highly desirable." It seems to be clearly implied by these three statements that one need not concern himself with the mediation hypothe- sis to be concerned with the semantic differential, and since it is the purpose of this study to revaluate the measuring technique, it does not seem desirable to employ any unnecessary theoretical assumptions. Purpose of This Dissertation This dissertation is intended to focus attention on the problems that have arisen in the application and interpre— tation of the semantic differential. Most of the problems that will be discussed have been pointed out by the authors of The Measurement 9§_Meaning; however, these problems have not been attacked systematically to determine what revisions in the technique they might suggest. In the course of the discussion, data will be presented to show that (a) certain assumptions made in the standard SD analysis are untenable, (b) there is a more parsimonious explanation than the one given for the results obtained so far, (c) the factor structures obtained, though they may describe the universe of concepts within sampling limita- tions, may not describe any particular member of that uni- verse, and (d) there is a need not met by the SD that could be met by a slight modification of that technique. An alternate technique will be described which (a) shares the advantages of ease and speed of data collection with the SD, (b) makes fewer assumptions than does the SD, (c) 10 permits a new kind of semantic differentiation, and (d) has direct and immediate implications for efficiency in persuasion. The two techniques will be compared—-in terms of their relative appropriateness to various kinds of problems-- through analysis of two sets of data collected from the same subjects and using the same scales and concepts. Chapter 1 This chapter is a discussion of methodological problems that are associated with the use of the semantic differential and a suggestion for the solution of those problems. Methodological Problems with the Semantic Differential In his review of The Measurement gf_Meaning, Gulliksen (1958, p. 116) summarized a number of the problems that arise in the interpretation of SD data. From the point of view of the general stability of the data, it is encouraging to find that several dif- ferent factor studies give similar results. With regard to factor analysis, however, the authors mention (Chap. 4) a number of disturbing characteristics of the data, such as "concept-scale interaction" (p. 187), variation in scales contributing to a given factor (p. 180), variation in inter-scale correlation for different con- cepts (p. 177), . . . . In the same vein, Osgood states that "the vast majority of scales show significant vari- ation in their correlations with other scales across concepts" (p. 177), that there is variation in the "relevance" of particular scales to particular concepts (p. 78), and that some scales shift in meaning with the concept being judged (p. 179). The foregoing comments may be summarized by saying that there is a marked "concept-scale interaction." For data which exhibit this characteristic, a general factor analysis of a number of concepts may give quite misleading results. Such interaction means that the emphasis throughout the book on correlational analysis is to be regretted. Other methods of analysis should be considered. There are at least nine references in The Measurement of M§§E$22,to concept-scale interaction (pp, 39, 93' 108, 176, ll 12 177, 178, 187, 200, and 326). The import of this inter- action to the interpretation of SD data is suggested by the following examples from these references. To the extent that there are differences in factor structure as between concepts, and to the extent that our sampling of only 20 concepts was nonrepresentative, the factorial results of the first analysis could be biased. To the extent that the relations among scales (and factors) vary with the classes of concepts being judged (see section in Chapter 4 on comparability across con- cepts), some error in the interpretation of Q.is being introduced for certain concepts. For purposes of generalized semantic measurement we would like to have a set of scales which consistently load heavily on a certain factor and are independent of other factors, despite variations in the concepts being judged. We have had difficulty trying to isolate a set of scales having these properties. What do these findings have to say about the prac- tical problems of semantic measurement? For one thing, it now seems less likely that we will be able to dis- cover a single set of scales which represent an adequate set of factors and which are stable across whatever con- cepts may be judged. On the other hand, it may be pos- sible to identify classes of concepts for which general instruments may be used, and perhaps, in course, the principles which Operate in determining a common frame of reference can be discovered. These statements indicate that Osgood and his associates were not entirely satisfied with the results that had been obtained by 1957. But, enthusiasm is apparently easier to communicate than caution. More than half of the empirical studies reported in the journals since that time have l3 borrowed scales on the basis of the general factor loadings and applied them without qualification to many different kinds of concepts. However, three research projects, car- ried out since 1957, show clearly the need for caution (Osgood, Ware, & Morris, 1961; Smith, 1959, 1961, 1962; Triandis, 1959, 1960). Smith's three studies employed scales chosen from Osgood's lists on the basis of factor loadings on the evalu— ation, potency, and activity factors and "literal application to speech concepts." His concepts were "speech related," "theater," and "speech correction" concepts. With each new set of concepts and each new factor analysis, differences in the factor structures were noted--in spite of the fact that he started with a select set of scales, and all of the concepts were drawn from the "same" academic discipline. Smith (1961) remarked, . . . the dimensions of any special subject matter area must be individually determined even with areas as closely related as those of general speech and the theater arts since there are both factor and scale variations in significant amounts. This nece531tates for any special area of investigation in which the semantic differential is to be used, a spec1f1c factor analysis to determine the important factors and the scales which measure them. Smith (1959) also noted two other problems with the SD. From the fact that his subjects seemed to treat "worthless" l4 and "meaningless" as positive values, he inferred, "It is impossible to determine an absolute scale polarity apart from the conceptual structure within which it is to operate." This bit of evidence, added to earlier findings (Osgood et al., 1957, p. 68) that scales may reverse polarity with a change of concepts, raises a serious question about the meaning of correlations between scales across concepts using a constant polarity assigned by the experimenter. The third problem noted by Smith (1961) was that the hggrgglg scale seemed to be the best available measure of Factor I, although subjects could neither apply nor inter- pret it. This seems to be related to the fact noted by Osgood et a1. (1957, p. 323) that they were unable to work back from the profile or the point in semantic space to identify the concept. At any rate, it emphasizes the fact that scales vary in relevance from concept to concept (Osgood et al., 1957, pp. 78-79). Triandis (1959, 1960) used restricted samples of job- related concepts and a quite different set of scales. He found factors that differed from those found by Osgood, and "there were also certain differences between the factors obtained from managers and those obtained from the workers." He describes (1960, p. 300) the changes in factor structure as follows: 15 Instead of evaluation we have objective and sub- jective job evaluation factors. Instead of potency and activity we have a fusion of the two in a relatively insignificant dynamism factor. New factors, such as the white collar, variety, and job level factors, that are specific to the job domain of meaning, have taken the place of the potency and activity factors and account for a portion of the variance that was pre- viously accounted for by these factors. It is difficult to tell how much of this change in factor structure must be attributed to the changes in scales, how much to the changes in concepts, and how much to the author's interpretation. It probably makes little differ- ence, for it is clear that the reproducibility of the original factor structure is less than satisfactory-~for one reason or another. Osgood, Ware, and Morris (1961), in their study of values, used a comparable set of subjects and some of the same scales as used in an earlier study, but they found an entirely different factor structure in this restricted sample of concepts. The following quotations from their report round out the empirical case for concept—scale interaction: The scales selected for this study provide ample Opportunity for at least the three general factors usually obtained, "evaluation," "potency," and "activi- ty," to appear (p. 67). The semantic space of connotative meanings generated when these value statements (the Ways) are judged is 16 clearly not the same as that obtained when more varied samples of concepts are used (p. 68). In the case of value statements (the Ways) being judged as concepts, by American students, "evaluation," "potency," "activity," and "receptivity" fuse together as a single "successfulness" factor (p. 69). Comparing these results with those of other factor analyses (cf. Osgood, Suci, & Tannenbaum, 1957, Ch. 2), then, we have clear evidence for concept—scale inter- action (p. 69). The results . . . make it clear that factors derived from more "representative" samples of concepts are not necessarily independent, and hence visible, when some specific subset of concepts is judged (p. 72). What does all this mean? How is it that factors derived from representative samples of concepts are not visible When some specific concept, or set of concepts, is judged? What difference does it make? Perhaps all three questions can be answered with an example. Take the three scales that name the three general factors mentioned above, good-bad, strongeweak, and active-passive. To simplify the example, assume that these are dichotomous like heads-tails. On one "flip" then, it would be possible to obtain any one of eight combinations. And, if it is possible to think of a word in English appropriate to each combination, it can be said that the three dichotomies are logically independent--all combinations can occur. The eight combinations are enumerated below with some likely 17 candidates for the set of concepts that would prove logical independence. - ATHLETE - ARMY RESERVE - KITTEN ANTIQUE CHAIR - DERELICT — HOUSEFLY - QUICKSAND - ROGUE ELEPHANT oooooooo mmssszmm m'Um'o'om'om I If it is agreed that these concepts are not only possibly described by the three adjectives indicated but that sub- jects would be highly likely to pick just those combinations to go with those concepts, then it is also highly likely, in standard SD procedure, that the three scales would cor- relate zero with each other if the data were summed over these concepts. However, it is also evident that the rela- tion of independence (factor structure) indicated by summing over concepts does not hold for any particular concept in the sample--that the concepts fit better with some com— binations of adjectives than with others. Thus, if one were to apply these scales to a set of concepts which name sub- classes of one of these concepts (e.g., FOOTBALL PLAYER, BASEBALL PLAYER, MILER, and BOXER), he would likely obtain significant correlations between the scales (or indeter- minate ones if the variance among subjects were very low). 18 Adding a set of DERELICTS to the set of ATHLETES would as- sure sufficient variability in the data to permit signifi~ cant correlations between the three scales in question (see Osgood, et al., 1957, p. 35), and, in either case, a quite different picture of semantic space is obtained than that produced by the broader set of concepts. Osgood et a1. (1957, p. 180) and Bettinghaus (1961 a) have made comments to the effect that one would not expect the evaluation of some concepts (ATHLETE, POLITICIAN, SECRETARY) to be independent of activity, potency, or stability. Experience tells us that there is a relation between strength and stability for rigid structures and between strength and activity for living organisms. Yet, these seem to be independent factors in diverse samples of concepts. All of this supports the proposition that the preceding example is not an isolated one. There is an explanation for this seeming inconsistency. Factor structures depend on correlations, and correlations depend on variance—-they are indices of covariation. In the normal SD analysis, there are two kinds of variance contributing to the outcome. Variance is the dispersion l of scores around the concept means. Variance2 is the dispersion of concept means around the grand mean. l9 Variancel and variance2 are independent in that neither is predictable from the other on a_priori grounds. Thus, there is no reason to expect that a factor analysis based on either kind of variance alone would be the same as one based on both. It is possible to construct plausible cases in whidh the addition of two concepts-—the inclusion of variance --would increase, decrease, and have no effect on 2 the correlation between two scales based on the covariation within either concept. Osgood, Ware, and Morris (1961) found that when V2 was severely reduced, by the use of closely related concepts, "the factor structure was clearly pg£_the same as that ob- tained when more varied samples of concepts are used." They also found that V1 could be eliminated (by using the concept means in the correlation instead of individual scores) with no effect on the pattern of loadings, although the propor- tion of the total variance accounted for was greatly in- creased. These findings strongly suggest that the factor structure is heavily dependent on V2—-the among-concept variance. The most familiar SD factor structure was based on con- cepts selected on the criterion "that they be as diversified in meaning as possible so as to augment the total variability 20 in judgments" (Osgood et al., 1957, p. 34). Again these authors say (p. 85), "Ordinarily in making up a sample of concepts for a differential we try to balance off good concepts with bad, strong with weak, and so forth. . . ." Given these pieces of information, it seems reasonable to expect that when concepts are selected on other criteria (such as a specific problem of meaning) that different re— sults would be obtained. Given that scales may reverse polarity with a change of concepts (Smith, 1959; Osgood et al., 1957, p. 68), there is still another way that adding concepts together can affect the correlations between scales. Figures 1, 2, and 3 illustrate a possible relation between the scales 329g- b§d_and large-small for the concepts PROFIT and LOSS. If 23 subjects were to respond to the concept PROFIT as in Fig. 1 and to the concept LOSS as in Fig. 2, the results 6 2 6 2 6 2 2 5 3 5 3 5 3 3 4 4 4 4 4 4 4 3 5 3 5 3 10 2 4 2 4 2 4 4 l 3 l 3 l 3 3 0 2 0 2 0 2 2 0 l 2 3 4 5 6 0 l 2 3 4 5 6 0 l 2 .3 4 5 6 r = 1.00 r = -l.00 r = 0.00 —XY ’XY "XY Fig. 1 Fig. 2 Fig. 3 21 would be completely obscured by adding the two concepts together, as indicated in Fig. 3. The numbers in the fig— ures are frequencies, and their positions in the matrices indicate covariation on the two scale axes. Although no data has been reported on the frequency with which this polarity reversal might occur, it has been observed in SD data, and the possibility must be taken into account in the interpretation of SD results. If the line of reasoning in the preceding discussion is valid, and concept-scale interaction is inevitable in the SD technique, then it follows that there are as many factor structures or "semantic spaces" as there are samples of concepts. If that is true, then the SD must be treated as a purely descriptive technique and a relatively inefficient one at that. That is, for anyone not highly sophisticated in mathematics, it probably takes more effort--more words-- to describe the meanings of a concept in terms of a factor pattern than would be required without the factor pattern. And, without the inferential power that derives from an assumption of a fundamental pattern there is little to be gained for the effort. "On the other hand, it may be possible to identify classes of concepts for which general instruments may be 22 used, and perhaps in course, the principles Which operate in determining a common semantic frame of reference can be discovered” (Osgood et al., 1957, pp. 326-327). There are, however, reasons why the suggested reorientation may not be entirely effective without some changes in the technique. The first of these reasons to be considered involves the assumption of equal intervals in the semantic differ- ential scales. This assumption is essential to the usual method of analysis, but the evidence that it is tenable may be described as inconclusive. Osgood et a1. (1957, pp. 146-152) report evidence that the intervals of the nine most frequently used scales are approximately equal. The argument, of course, is not whether the intervals are actually equal but whether a significant distortion is introduced by assuming equal intervals. Osgood argued that little distortion would be introduced for these nine scales. However, no evidence is available on this question with regard to less frequently used scales, the ones that have behaved less predictably in analysis. Since the cost of such evidence would be quite high, it seems advisable, for the moment at least, to employ some means of analysis with SD data that does not require this assumption. 23 Gulliksen (1958) suggests another problem in the SD technique. His argument concerns the advisability of using the linear correlation coefficient instead of the curvilinear coefficient. He points out that the linear correlation may lead one to draw unjustified conclusions about the presence or absence of functional relations, because a high linear correlation does not imply that the curvilinear relation is negligible. Of course, it is also true that a curvilinear relation may exist when the linear coefficient is zero (see Ferguson, 1959, p. 109). The greater sensitivity of the curvilinear coefficient, §£§_(E), relative to the linear coefficient, 3, can be made clear by comparing basic as- sumptions. When one computes a correlation coefficient between two variables (X and Y) he predicts that there is a line that can be fitted to the data matrix which will enable him to predict X from Y or Y from X with greater success than he could from the mean of the predicted varia- ble. §_puts no restriction on the nature of the line, but g restricts the possibilities to straight lines. In other words, £_assumes that the relation is linear or nonexistent. When this assumption is tenable, §_has its merits: A single coefficient can describe the reciprocal relation between X and Y while two E_coefficients are required. An 24 £_may be calculated from raw data on a desk calculator, but E, which is essentially an analysis of variance, requires that the data be grouped and plotted, or more complex equipment. Finally, ;_shows direction of the relationship (ranges from -l.00 to +1.00) which §_does not do. Thus, if its straight line assumption is met, £_is more efficient than EJ but, according to Senders (1958, p. 271), When there is any doubt about whether or not the rela- tionship between two variables is linear, both §_and £_shou1d be computed. E_will always be equal to or greater than £_in absolute value, but if the relation- ship is linear the difference will be small. A large difference indicates a non-linear relationship, in which case E_rather than £_should be used. The following examples should make the import of Sender's statement clear. The numbers in the matrices are hypothet- ical frequencies. Computation is after Walker and Lev (1953, pp. 238 & 279). Figure 4 illustrates the case in which £_and §_are equally good predictors, because the relation is linear. Figure 5 illustrates the case in which £_is zero, and the predictability of X, given Y, is zero, but the predictability of Y, given X, is nearly perfect. Figure 6 illustrates a case in which either variable is perfectly predictable from the other, but £_shows only about 13% of the variance accounted for. 25 6 5 6 10 6 5 5 4 5 4 4 5 3 4 l 4 4 4 4 4 4 3 l 3 1 3 3 3 3 3 2 2 l 2 3 3 2 3 l 2 l 2 2 l 0 1 0 2 2 0 2 0 l 2 3 4 5 6 0 l 2 3 4 5 6 0 1 2 3 4 5 6 r = .97 r = .00 r = ..36 ‘XY ‘XY 'XY E = .97 E = .00 E = 1.00 ‘XY —XY ‘XY E = .97 E = .97 E = 1.00 “yx ‘YX _YX Fig. 4 Fig. 5 Fig. 6 The only question that remains is whether there is any doubt about the relations between SD scales being linear. McNelly (1961), in searching for an index of "interest" in news stories about "countries," found a strong positive rela- tion between intensity_of evaluation and absolute strength and activity. He argued that it made sense to think that news about a country would become more interesting as the perceived strength and activity of that country increased. It also seemed reasonable that extremely good or bad countries should be more interesting than neutral countries, and his findings support these contentions. McNelly's find- ings also suggest the possibility of curvilinear relations between the evaluative scale and the other two for this particular set of concepts. The relation could be that in 26 Fig. 5 with passive-active or weak-strong as the ordinate axis and good-bad as the abscissa. In the normal SD analysis, making the linear-or-nothing assumption, this relation would be overlooked. The reverse curvilinear relation with good-bad might reasonably be predicted for strongrweak on the concept COFFEE, for fast—slow on the concept CLOCK, or h2£f£2l§_in regard to the weather. Al- though proof that E_would be more appropriate than £_in SD analysis is limited, given concept-scale interaction, there seems to be a reasonable doubt that the linear assumption is universally tenable. Osgood et a1. (1957, p. 91), in their justification of the use of 9, point out that "the product-moment correlation not only distorts the information, but may be completely inapplicable in some cases." They refer to the fact that correlation disregards differences in the means of the two variables correlated, but the truth of the statement seems to be generalizable to some other instances in which the technique has been employed. Given that the scales do not behave the same way from concept to concept——that there is concept—scale interac- tion—-the SD must be restricted to the task of describing the relations among a specific set of concepts. Further, 27 it has been argued, there are problems inherent in the technique that make its value as a descriptive instrument something less than certain. On the other hand, it must be noted that the weaknesses in the SD seem to be in the tech- nique of analysis while the strengths (speed and ease of data collection, reliability, and validity) rest on the scaling technique itself. That alternative methods of analysis should be explored seems to be the most reasonable conclusion. An Alternative Approach The original problem that prompted this survey of the SD seems a reasonable starting point for laying the foun- dation for the alternate approach. It has been observed that people rather consistently make evaluative judgments about events in their environment. They show preferences for one event over another. They can quite often give reasons (or rationalizations) for their preferences, such as, it's sweeter, more dependable, more durable, smells better, cheaper, larger. smaller. faster, slower, hotter, colder, or it matches my shoes. Observa- tions like these suggest the hypothesis that there are discriminable characteristics of events that are linked by 28 some psycho-logical value system to evaluative judgments about those events. A question was formulated: Is there some efficient way of finding out what these criteria Qf_evaluation are, for particular events? The semantic differential was immediately suggested, but, upon examination, it seemed that the SD was not only unsuitable for the task, but the results seemed incompatible with the hypothesis. That is, the SD seemed lto say that evaluative judgments are independent of all kinds of sense related discriminations, across concepts. This conclusion, based on repeated SD results, is easily countered by the technique of argument called reductio §g_ absurdum: If it is true that evaluative judgments are independent of sense related discriminations, then it must be true that evaluative judgments are independent of the events themselves. And, if it is true that evaluative judg- ments are independent of the events being judged, then it is as likely that an individual would feel favorable toward a rattlesnake bite as that he would feel favorable toward a dish of ice cream. The conclusion to this line of reasoning is obviously false, but it does not explain the contrary result of the SD research. It is believed, however, that a sufficient 29 explanation for the apparent independence of evaluative and objective judgments has been given in the first part of this chapter, so that it is now reasonable to attempt to support the counter proposition--that evaluative judgments are related to objective characteristics of events. Any kind of reliable discrimination between two events, evaluative or otherwise, would seem to require that the events he objectively different in at least one respect. In discrimination learning experiments, for example, if an experimental subject develops a reliable preference for one of a pair of stimuli, under conditions of controlled reinforcement, it is taken for granted that g has discovered the relevant distinctive cue, which in various experiments may be weight, brightness, size, configuration, etc. (Osgood, 1953, pp. 446—453). It does not follow from this proposition, however, that the heavier, brighter, or larger of the pair of stimuli will be universally preferred. The "direction" of the preference depends, instead, on the "direction" of the reinforcement. In the reference cited immediately above, experiments are mentioned in which the "significance of these cues" was intentionally reversed by the manipulation of the reinforc— ing conditions, but in all cases it was assumed that the 30 formation of a reliable preference depended on the asso- ciation of the reinforcement differential with discriminable differences in the stimuli. From this assumption it follows that there are at least two necessary conditions for the development of a reliable preference; discriminable differences in the stimulus events, and discriminable differences in the reinforcement conditions associated with each of the stimuli. If this is true, then, a reliable preference implies both of these conditions but is not, in turn, implied by either. In the laboratory situation, the relation between reinforcement and the observable qualities of the stimulus is usually, by design, quite arbitrary and in the control of an experimenter. The arbitrariness of the relation per- mits the experimenter to exercise his control in the manipulation of the subject's preferences. In the natural (non—laboratory) situation, however, this level of control does not obtain, and the predictability of preferential responses is nil under the assumption that the reinforcement differential and the observable qualities of the stimulus are independent. On the other hand, if the relation is not one of independence, an observer with a knowledge of the relation and of the stimulus should enjoy 31 approximately the same predictive power as the experimenter who controls both the stimulus and the reinforcement. And, it seems likely that the reinforcement associated with a given stimulus is frequently, in the real world, directly (causally) related to the objective properties of the stimulus, limited of course by the wants of the individual and his ability to employ the stimulus in the satisfaction of those wants. If this is true, one might ask, why have these relations not been discovered, in SD research for instance? The answer has already been given. It has been assumed that the relations are linear and constant across categories or nonexistent. But, the attribute of an event that is asso- ciated with positive reinforcement in one category of events, or in one situation, may be associated with nega- tive or non-reinforcement in other categories or in other situations. For example, heaviness is usually quite desirable in football players and quite undesirable in jockies. Again, certain attributes of coffee Which make it very desirable to some people at 7 a.m. make it very undesirable to the same people at 10 p.m. In other words, there is reason to believe that the "significance of cues" quite naturally changes from one category of events to 32 another, from one combination of cues to another, so that a given objective cue can be strongly related to evaluations of particular categories of events, and yet appear to be independent of those evaluations when examined across cate- gories under the linear-or-nothing assumption. Examples were given earlier showing how this change of the relation might affect the computation of a correlation. Using the SD factor analysis as a base,evaluative judg— ments appear to be independent of objective criteria insofar as evaluative and objective variables are measured by the semantic differential. Yet, logically, evaluative judgments must be related to objective variables if they are related to events at all. Or, if events and evaluations are inde- pendent, no predictive propositions about evaluative behavior can be made from knowledge of physical events. The assump- tion that there is a "natural" change of cue significance from category to category provides a very simple and par- simonious explanation for this apparent inconsistency. The evidence of concept—scale interaction, reported earlier, supports this assumption. It seems reasonable, therefore, to employ this parsimonious assumption and to hypothesize that evaluative judgments are related to objective variables for particular concepts or categories. 33 In retrospect, Osgood et a1. (1957, pp. 62, 180, 188, & 78) offer some support for the position taken here. The evaluative factor is itself further analyzable into a set of secondary factors--various "modes" of evaluation whidh are appropriate to different frames of reference or objects of judgment. What is good depends heavily upon the concept being judged--strong may be good in judging athletes and politicians, but not in judging paintings and symphonies; harmonious may be good in judging organized processes like family life, symphony, and hospital, but notso much so in judging people or objects. Evaluation thus appears as a highly generalizable attribute which may align itself with almost any other dimension of meaning, depending on the concept being judged--and it is most often the dominant attribute of judgment. Another criterion in scale selection is relevance to the concept being judged. For example, in judging a concept like ADLAI STEPHENSON, one evaluative scale like beautiful-ugly may be comparatively irrelevant while another like fair-unfair may be highly relevant; on the other hand, just the reverse would be true for judging paintings. These are only a few of the more than thirty instances in The Measurement p§_Meaning that this writer interprets as supporting this proposal. But, the most important one to this argument is, "What is good depends heavily upon the concept being judged. . . ," for from this assertion it fol- lows that evaluative judgments are not independent of the concepts being judged and are not independent of the objec— tive attributes of events named by those concepts. If this 34 is true, and the nature of the relationship for particular events can be ascertained, then it should be possible to predict evaluations and changes in evaluation with greater success than we now enjoy and to change or stabilize an evaluation more effectively by the efficient use of influence. None of these things is likely to be accomplished under the assumption that evaluations are independent of the objective observations that a person makes of events. There are several ways that the validity of this argu- ment could be tested empirically. But, the most direct way also happens to be the one that offers the greatest promise of generality, because it has a methodological emphasis. In the following chapter, a design will be presented for a study to test the hypothesis that bipolar adjectival scales such as those used in the semantic differential, and including those previously identified as non—evaluative, have an evaluative discrimination capacity for some con- cepts. Since extensive correlation analysis is not avail- able to many people who might be interested in the question posed here, and since there are serious doubts about the appropriateness of that type of analysis to data produced by the SD scaling technique, a simpler alternative method will be introduced to test this hypothesis. Chapter 2 This chapter includes the designs for two experiments. The second eXperiment is contingent upon the outcome of the first one, so its design will appear as a separate section following the complete design of the major experiment. Experiment 1 Hypothesis 1 Bipolar adjectival scales, such as those used in the semantic differential, and including those identified by factor analysis as "non-evaluative," have an evaluative discrimination capacity for some concepts. Rationale The elements of a rationale for this theoretic hypothe— sis have been given in Chapter 1, but they may be summarized as follows: It is assumed that a necessary condition for a reliable discrimination between two events is that the discriminator reliably perceives an objective difference between the two events. It is further assumed that an individual cannot reliably perceive differences between events When there are, in fact, no differences between the events. These 35 l IIIIIIl l I 36 assumptions imply that if an individual makes an evaluative judgment about an event, it either does not differentiate that event from any other, or it is related to some objec- tive characteristic of that event, and that characteristic is a variable among events. Therefore, evaluative judgments must be related to objective variables or they are inde- pendent of events. If the latter is assumed, then there is no ready ex— planation for the variability and apparent reliability of evaluative judgments and no basis for predicting the be- havior of an individual from a knowledge of his environment. On the other hand, acceptance of the proposition that evaluative judgments are related to objective variables presumes an explanation for previous research findings ob— tained with the semantic differential. In the previous chapter, two kinds of explanations were suggested for the apparent independence of evaluative and objective judgments: (a) It may be partly accounted for by the fact that the linear correlation is insensitive to certain kinds of relations. Several examples were given to illustrate the importance of the fact that the curvi— linear relation is always equal to or greater than the linear relation. (b) It may be the direct result of the practice 37 of summing across concepts whidh assumes that any relations that do obtain between evaluative and non-evaluative judg- ments are themselves independent of the categories of events (concepts) being judged. This study, then, assumes that evaluative judgments are related to objective variables; that bipolar adjectival scales, such as those used in the semantic differential, do measure both evaluative and objective variables; that events within a category may be evaluated differently if and only if they are objectively different; and that con- cepts name categories of events. It does not assume that the evaluative or the objective variables or the relations between evaluative and objective variables are independent of the events or categories of events being judged. If this position is a tenable one, then it should be possible to support the theoretic hypothesis given above. Definitions Given that a set of semantic differential responses are factor analyzed and two or more orthogonal factors are ob- tained; that factor with which the good-bad scale is most highly correlated is the evaluative factor. Any other factor is non-evaluative. The correlation between any 38 scale and the evaluative factor is the evaluative factor loading for that scale, and this correlation squared is an indication of the variance in the markings on that scale accounted for py_the evaluative factor. Given the variance accounted for by the evaluative factor (9?) and the total variance accounted for by all the factors (pg); if the quantity 2 g? — h? for a given scale is greater than zero, that scale is predominant1y_evaluative. If that quantity is less than zero, that scale is predominantly non-evaluative. Given sets of subjects' responses on a given scale to the "best imaginable" and the "worst imaginable" events in a category named by a concept, if a significant proportion of the sample of subjects agree on the polarity of the scale, the scale has an evaluative discrimination capacity for that concept. Since this definition is the key to this Whole experiment, it is elaborated elsewhere in the design, but it is given here in summary form to help the reader understand those elaborations when they occur. Design Subjects. The sample consisted of all those people enrolled in the seven sections of Oral Communication Ia, Spring, 1963, at Kansas State University, Who attended class 39 on a given day. Of the 159 enrollees in this class (73% male and 91% freshmen), 139 actually participated in the experiment. Concepts. The concepts used are the same twenty used in the first analysis by Osgood et a1. (1957, p. 34). The original rationale for the selection of these concepts can be found in the reference cited. They were selected for this study so the results would be directly comparable to the earlier work. The concepts are: LADY, BOULDER, SIN, FATHER, LAKE, SYMPHONY, RUSSIAN, FEATHER, ME, FIRE, BABY, FRAUD, GOD, PATRIOT, TORNADO, SWORD, MOTHER, STATUE, COP, and AMERICA. Scales. The 75 scales used in this study include the 50 from the first analysis by Osgood et al. (1957, p. 37) and an additional 25 selected from their thesaurus study (pp. 53-61). The extra twenty-five scales were selected on the basis of their non—evaluative factor loadings in the thesaurus study to compensate for the fact that the first set was predominantly evaluative. The complete set of 75 scales may be obtained from Table 1. Instructions. The instructions for this study were considerably different than the ones used in the earlier research (Osgood et al., 1957, pp. 82-84). The complete 4O instructions may be found in Appendix A, but the essential difference is this: Subjects were asked to mark, on each scale of a set, a "B" to indicate their feeling toward the "best imaginable" and a "W” to indicate their feeling toward the "worst imaginable” example of the class of things named by the concept at the top of the page. Administration. Forty sets of scales were prepared for each concept, the scales appearing in a random, but con- stant, order for all concepts. Test booklets were compiled containing five concept sets and one instruction sheet. In preparing the test booklets, the concept sets of scales were arranged in an arbitrary order, and starting with the ”first" set, five concepts were stapled together with an instruction sheet. The second booklet started with the "second" con- cept, the third booklet started with the "third" concept, and so on. Thus, any two contiguous booklets had four con- cepts in common, and all concepts occurred an equal number of times in the five possible positions. The test booklets were distributed, in the order of preparation, to subjects as they seated themselves in the classroom. Given the manner of distribution, the fact that not all the booklets were used, and the fact that a few gs failed to complete all the scales in a booklet, the number of 41 subjects actually responding to a given concept-scale pair ranges from 28 to 39. After the subjects were seated, the experimenter entered the classroom with the regular instructor and was introduced as a fellow member of the speech faculty. The instructor encouraged the students to cooperate in the experiment and then, in most instances, left the room. The experimenter distributed the test booklets and allowed three minutes for the reading of the instructions. After this period, the experimenter answered any questions about the instructions and remained in the room to remind the subjects, periodi- cally, of the passage of time. Subjects were encouraged, but not required, to put their names on the test booklets. This procedure resulted in about 30% anonymous questionnaires. However, since a fairly adequate description of the population, of which the sample was 87%, was available in class records, it did not seem advisable to insist on identification after a member of the first group "identified" the questionnaire as a personality inventory. Analysis. Given the data obtained by the method described above, the decision of whether or not a particular scale has an evaluative discrimination capaCity was based on 42 the sign test, a non—parametric statistic. With this test, a null hypothesis was tested for each concept—scale item-- that the number of subjects who placed their response to the "best example of the concept" to the left of their response to the "worst example of the concept" is equal to the number who indicated the opposite direction of preference. Given the kind of flip-flop tendency that had been anticipated, the two tailed test seemed most appropriate. It was decided to test this hypothesis at the 95% level of confidence. The sign test is described in Siegel (1956, pp. 68—75). According to Siegel, "The only assumption underlying this test is that the variable under consideration has a continuous distribution." For this reason it was chosen for use in this study in preference to other tests which assume interval data or independence of samples. Application of the sign test is a scoring task rather than a computational procedure. Looking at a set of responses to a scale—concept item, each subject who places "B'' on the left of "W" but not in the same scale interval scores a plus. Each subject who places both marks in the same cell scores zero and drops out of the sample. A subject Who places "W" on the left of "B" scores a minus. Starting with 20 gs one might obtain +++++++++++++++ 00—_—, 43 The sign test is related to the binomial expansion, and the test of significance is a binomial test. The effect of a tie is to reduce N, The probability of obtaining an event as rare as some particular outcome can be determined by reference to a table of binomial probabilities. In the example just given, the probability of obtaining 3 or less of either sign, with an N_of 18, is .008, and the null hy— pothesis would be rejected. If N_is greater than 25, the normal approximation to the binomial distribution can be used. In this study, §_(corrected for continuity) was com- puted for estimation purposes and for use in Experiment 2, although the reduced N_was expected to be less than 25 in some cases. The actual decision to reject the null hypothe- sis, however, was based on the exact binomial probability. If, for a given concept-scale item in this study, the sign test yields a significant value, the inferences made are (1) that the subjects can make evaluative distinctions among events in the concept category and (2) that these evaluative distinctions are related to discriminations that they make within the concept category on the dimension named by the test scale. A scale, then, will be said to have an evaluative discrimination capacity for a concept if, and 44 only if, the null hypothesis of the sign test is rejected at the 95% level of confidence. This study set out to obtain, along with an estimate of the evaluative discrimination capacity, an objective criterion for determining scale polarity for each scale and concept. The sign test data provide just such an objective criterion. If the scale shows a significant evaluative discrimination capacity, then it may be said, also, that a significant proportion of the sample of subjects indicates a directional preference on the scale. Given a directional preference, then, the left end of the scale is preferred if the number of plus signs is greater, and the right end of the scale is preferred if the number of minus signs is greater. Since, in this case, the sign test is applied to indi- vidual scale-concept items, in order to make a general state- ment about the discrimination capacity of a given scale, it is also necessary to show that a scale discriminates for more than 5% of the concepts tested (the alpha level on the indi— vidual tests being .05). Reference to a table of binomial probabilities shows that the probability of obtaining four or more significant values out of twenty, when the proba- bility of each one is .05, is less than .05 (.016). Thus 45 if a scale discriminates for four or more of the twenty concepts, the null hypothesis that the scale does not discriminate for any concept can be rejected at the 95% level of confidence. Controls. As discussed earlier under the heading of "administration," the normal controls--standard question- naires, standard instructions, familiar surroundings, and a single experimenter-—were observed in this study. For reasons of convenience and available subject time, the subjects were tested in seven separate sessions, and each subject contributed data to only five of the twenty concepts. To control for subject and group differences that might result from these arrangements, the test booklets were systematically distributed so that no significant finding could be accounted for by any experience unique to a par- ticular class section or peculiar to a single administration of the test. Discussed earlier, under the heading of "definitions," was the proposition that a scale is labeled "evaluative” or "non-evaluative" on the basis of a factor analysis, by the formula 2 g? - pg. Although values for §_(the evaluative factor loading) and h? (the proportion of the variance on a given scale accounted for by all the factors) were available 46 in The Measurement pf_Meaning for the 75 scales used in this study, all of the scales did not occur in any single factor analysis, and the factor loadings in the two studies in which they did occur are not exactly comparable. The sub- jects for this study are not exactly comparable to those used earlier, and the administrative procedures were some- what different than those followed by Osgood, Suci, and Tannenbaum (1957). For these reasons, a standard form of the semantic differential, with single mark instructions, was administered in the same population of subjects, under similar conditions, approximately three weeks after the "best-worst" data were collected, and a new factor analysis was performed. Since, for this analysis, equal Ns for all concepts was an important consideration, only 140 data forms were pre- pared and each data form was checked for missed pages of scales as it was turned in. It was, however, necessary to have six test booklets filled out by volunteers from the same population and to code an occasional ”missed" scale "4" to obtain an N_of 35 for every concept-scale item. This control factor analysis was performed, due to the availability of computer programs, by the method of prin- cipal axes with varimax rotation. Although these methods 47 are not exactly the same as those employed by earlier re- searchers (which makes comparison between factor analyses difficult), according to Harman (1960), the principal axes solution is the rigorous mathematical solution to which the centroid method is a crude approximation, and the varimax rotation tends to provide mathematical precision for the intuitive notion of simple structure. Although the task of making a comprehensive comparison between this factor analysis and the previous ones would be extremely difficult if not impossible, there is a very sim- ple test of similarity that satisfies the needs of this study. That is a rank order correlation between the two sets of "evaluative dominance scores." It was reasoned that this correlation should be significantly large, and that such a finding would support the generality of this total study. But, in any case, the new factor analysis serves as a control, making it possible to compare the item by item analysis and the general factor analysis with data gathered from the same subjects under comparable conditions. Experiment 2 Since it was felt that the results of the previous eXperiment would be more meaningful if it could be shown 48 that the evaluative discrimination capacity of a scale for a concept is related to the importance of the scale varia- ble as an effector in evaluative decisions, it was decided to test the following hypothesis: Hypothesis 2 There is a positive correlation between the ranks assigned to scales on the basis of the confidence in the scale's discrimination capacity (the absolute size of the sign test g) and ranks assigned by subjects instructed to rank the scales in order of importance to an evaluative decision about a given concept. Rationale Given that the sign test is, as applied in this study, a measure of agreement among subjects on the direction of the relation between a particular scale and the "best-worst" evaluative variable, and given that subjects are more likely to call important those variables on which there is high agreement, the hypothesis seems to follow. Design Concepts. Hypothesis 2 was tested for six concepts—— the six common to the two studies from which the scales for this study were selected (Osgood et al., 1957, pp. 34 & 39). 49 The concepts are ME, SYMPHONY, AMERICA, MOTHER, BOULDER, and SIN. These concepts appear to be typical of the total set by the criteria of earlier experimenters. They also represent the "personal-impersonal" and the "well discriminated-poorly discriminated" dimensions of variability in this set of concepts which will be noted in the results of Experiment 1. Subjects. Forty-eight subjects were selected (on the basis of availability) from a population similar to that sampled in the first experiment. The subjects were students at Kansas State University, Summer, 1963. Of these subjects, 19% were males and 70% were sophomores. None had partici- pated in the discrimination experiment. Eight subjects were randomly assigned to each of the six concepts. Administration. This experiment was administered in two class sessions. Each subject was given a mimeographed set of instructions (a sample appears in Appendix B), a con- cept card, and a set of 75 cards each with a pair of scale adjectives typed on it. The instructions directed the sub- ject to sort the cards, ranking them in order of importance to an evaluative decision about the event named by his concept. Analysis. For each of the six sets of eight rankings, 50 a Kendall Coefficient of Concordance was computed to deter- mine if there was a significant relation among the rankings (Walker & Lev, 1953, pp. 283-386). Then, given a signifi- cant relation among rankings, the ranks of the sums of ranks (computed for the Kendall Coefficient) were assumed to be the best estimate of the true rank for each scale-concept (Walker & Lev, 1953, p. 286). Finally, a Spearman rank correlation was computed between these "importance ranks" and the discrimination capacity ranks to test, for each concept, the hypothesis that rho is equal to or less than zero. Since the number of scales that had shown a significant discrimination capacity differed for each of these concepts, and, since there was no assurance that non-significant scales were in any sense rankable, it was decided in ad- vance to perform the same analysis, as described above, considering only those scales which had shown a significant discrimination capacity for each of the concepts. The same data were used as in the complete analysis simply by dis- regarding the insignificant scales and assigning the new ranks accordingly. Chapter 3 This chapter is a report of the results of two experiments. In the preceding chapter, two experimental designs were presented. The first, a major experiment, tested the hypothesis that evaluative judgments are not independent of "non—evaluative" or objective judgments for particular events or categories of events. The second, a minor ex- ploratory experiment, was intended merely to make the find- ings of the other more meaningful. It tested the hypothesis that the importance of a scale (dimension) to an evaluative decision about an event is not independent of the scale's evaluative discrimination capacity for that event. Results of Experiment 1 The sign test was used to test, for each scale-concept item, the null hypothesis that the number of subjects who Place "B" (the response to the best imaginable example of the concept) on the left side of "W" (the response to the worst imaginable example of the concept) is equal to the number Who made the opposite choice. It was decided to reject this hypothesis at the 95%>1evel of confidence, two tailed test. Table 1 shows the results of the 1500 sign tests that 51 52 resulted from the combinations of 75 scales and 20 concepts. In Table l the scales are ordered from first to last by the number of concepts for which each discriminated "best" from "worst." The concepts are ordered from left to right by the number of scales that discriminated for each concept. Only significant values are indicated--those showing a preference for the adjective on the left by a plus and those showing a preference for the adjective on the right by a minus. Several of the scales have been reversed to simplify the reading of the table. Since 20 sign tests were performed for each scale, eadh at the 95%»leve1 of confidence, reference was made to a table of binomial probabilities to determine the probability that N_values out of 20 might be "significant" by chance. It was found that the probability of obtaining four or more significant values, when the probability of each one is .05, is less than .05. Thus, according to Table 1, all but three scales can be said to discriminate for some concept with greater than 95% confidence. Due to the arrangement of scales in Table 1, it is the last three scales that are of questionable value as evaluative discriminators for these concepts. 53 snow m>flmmmm I Hammad mamum Omcwu + cesamuumcoo xufim pom + msoHUouwm wsmwoum maonBOo Hanan xmms Homsm xams mmwafluuos own HDOM + Must ummsonmflp UTHMUHOM + xuuflv ucmmmwamss mmeHoHoo stnu + + + + + + o s stmmom comma mmHs oofluom Hflmm amohm pomemH mmum msuamms mamas Hammommm pmuomm o>mufl momma msouum moan HSMHUDMOQ OHQMSHM> boom DGMHWMHH assess umDCOL Eamo GmOHO ucmmmoam HSMHOHOU ones m> mwamom mo mo huflsmaom psm xuaomomo coflumsaaauomflm m>aumsa m H Dante .mm .wm .mm .NN .HN .ON .mH .mH .hH .oa .mH .wa .ma .NH .HH .OH .0 .m .h .0 .m .v .m .m .H so: .om + + + + CH0 I + + + + + + + + >H0>mm .mv mmmamwmmu + H + + msoH .mw rm I I I + + + + + mm3m .be Houses + + + + + + + + + u . + QUHSB) 0? Roman + + + + + + + + . noes ms Room I + + + + + + + + . unsafl I + + + + + + munch so + + + I mcaasomma .mw ocflcaamw + + + + + + + I + . some .Nv Danass + + + + + + + + + + p on .He saw I I I I + + + + + + + no . w>HuuoflQsm + + + + + + + + + + + m>HuuonQo .ow BOHHMC I I + I + + + I + + I CUHB .mm Conch + + + + I + + I + + + QUOOEm mm m . Haze + + + + + + + + + + + mums .nm OHHGMHW + + I + + + + + I + I flmdou Om M CHNUHHCC I + I + l Commsu .mm + + + + + + . zossmtm I + + + + + + + + + + + some em stoummumso + + + + + + + + + + + + mummy .mm usom . + + + + + + + + + + + + ummBm ”mm whose I + + + + + + + + + + + aasm am H5w£u90> + + + + I + + + + + + + musuma .om 30am + + + + + + + + + + + + ummm .mm MMSHU + + + + + + + + + + + + Hmflom .mm so; + + + + + + + + + + + + “moan .sm camp I + + + + + + + + + + + D>Ham mm 0 S m D Q U H m G C Q H x C m E H m mlfl zaHmmom .o sz .o moemem .o mess .x as .H QDHuHDDCH + + + mafia .vn mCHUHwfiw I + + + HMGOHUMH ”mm oaoo + + + + mSOAUmsmu .me compo _ + + + Don .Ho Hmswo + + + I own on 0Hw>0m + + + + anampcs .00 + + + + Damages .mo o Dos + + + + I >Hp .eo Hawuaoxw I Eamo .oo masons + + + + . + + + I + I mme mo XUHfiu + I + I I + CHQH .¢® 6H0 + I maso> .mo spasmsm + + + + . + + I + + + UCUQSOH No m>Hmsommo + + + + + + m>ammmummw .Ho ”w maflmwmsmflo I + + + I I magnum .om ucoummmsmnu + + + I + + + osvmmo .om 30H + + + + + + + soar .mm sfimao + + + I + + + mpmsuo .om meEHm I + + + + + + waano .om usoflmnum + I + I + I + + pm>uso .mm s>mos + + I + I I + + Drona .em pooa I + + + + I + + umom .mm wsoflnmm I + + I + + + + msososss .mm Uwuofluumsoo I I + + + + + + msoflommm .Hm o x m u m U u w W p Q a x s m E H m m n ZéHmmDm .o ZHm .m MDBhoo .mm mm. ma. mH.I so.I oo. as. mm.I om.I ss.I HooMIoamnomum .mm as. NH.I mo. mm. ma. mo.I om. Hm.I Hm. samoIoHnmusoxo .Hm om. ma. ma. mm. mm. o~.I oo. mm. mm. ooHoHoAsImoosomooo .om NH. ss. so. ms. sH.I oo. No. om. mH.I D>ADsooaAIHmooaumn .os mo. oH.I so.I mo.I oo. om.I mm. oH. oo. omoHoIsuuao .os me. so. mo. Ho. mm. NH.I so.I mo.I mm.I omnIoooo .sa om. so.I os.I sm.I oH.I pH.I oo. so.I ms. ohmnmIHHoo .oH mm. oo.I os. so.I so. ms. so. oo. mm. DuomIonms .ma sm. m~.I oH.I mm.I oH.I mN.I Hm. sm.I ss. o>mhnIsHoum3oo .sH ms. 6H. om. os. mo. as. oH.I No.I ma. ucoHnIonmsm .ma as. so. o~.I so. mm. Ho. so.I om. oo. zonumoIoose .NH ms. as.I mH.I mm. oo. so.I so. m~.I sm. oHnmumIoHnmoocmso .HH om. so. so.I no. mo. Hm. oH.I NH.I mm.I whosoummoIsuo>mm .os so. oo.I ms. ms. oH. mo.I No. me. ms. manomAMIsoooo .o so. so.I oo. oH.I sm. No. so. sm.I so.I sHooIHsmsoomon .m ss. oH. oo. om. oo. mo. mo. ma. km. DcvooHIoho>om .s so. mo.I om. om. mo. NH.I mm. ss. mo. asuoomooImsoaoonoo .o om. oo.I mo.I sH.I HN.I HH.I ms. oo. oo. ossmosm>ImmoHsDuos .m os. mo. om. Hm. ss. No. so. o~.I sm.I mmoahoaooIsoohoaoo .s om. mm. mH.I mo.I ss. oo.I mo. NH.I so. o>soomIo>Ammmo .m om. ms. mm. mm. so.I oa.I NH. so.I sm. ooohoIooh .m mm. oH. om. mo.I sH.I so.I ma. No. oo. oooooothmHooom .H ms o s o m s m N H moamom soaumuom meHHw> gums mflmhawsm H0u0Mh mfix< Homeosfium s N OHQMB 61 em. sH. No. ms. Ho.I sH. sH.I mN. mN. o>HmaomooIo>Hmmohoom .Hm so. Ho. No.I oo. so. oN. oo.I ms. oH. oomoHHooIooooou .om om. sm.I oo. oo. oH.I mH. No. mo. oo. oamHHIucoocoo .os os. mN. mo. oH.I so. mm.I oN. oo.I NH. msoHomamIoouoHuumooo .os oo. HH.I sH.I mN.I Nm.I Ho.I mo. mm.I oN. ocoHHmemos .Ns ms. mH. oH. so. NH. HH.I mo.I os. Hs.I HomsusomIonoDms .os Hm. so.I mo. mo. mN. mH.I so.I so.I mo.I mammouoIooHomm .ms 6H. NH. oo.I No.I HH.I Nm. so.I NH. so. DoonmmmcmsoIooomoo .ss mm. so.I No. so.I ss. oH.I mo.I No. mN.I onIsoHs .ms NN. mo.I oH.I oo.I mo.I Hm. sN.I oH.I NN.I DommmoHoosIocmmmoHs .Ns 5N. so.I oN. oo. oN.I NN.I Hm. oo. oN. oanIsoHHos .Hs om. sH.I No. sH.I sH.I NN. NH.I ms. oo. oHnoHDImmmH .os ms. mo.I NN.I so. No. NH.I No. mN. so.I :HsonoHso .om om. oH. oo. mo. oo.I so.I oN. oo.I NN. HHomIsuoso .mm Hs. oo.I Ho.I mo. oH. NN.I oo.I Nm.I om.I msloomImsoHosos .Nm mm. oH. mo. mo. oH. oo.I NH.I mm. so. ooHoHsomIooHHoomms .om mm. mH.I HH. sm.I oo.I oo. HN. HH. Hs. o>HHmIomoo .mm os. oo. HN. Hs. so. No. mo.I mN. Ho.I 3onIDmmn .sm oo. so.I so.I NN. HN.I os. om.I so. om.I onmIssonos .mm NN. mo. oH.I oH. oo. os.I NH.I so.I Ho.I HmsmsaoIHmsms .Nm ss. HH.I oo.I NH. mo.I mo. mo.I mo. mo.I oHosHmeoHosoo .Hm Hs. so. mo.I oo.I sH.I ms.I mo. HH. mm. soHHIHooo .om me. so. so. oH. oH. mN.I HH.I oo. NN.I DmooosmHoIomosos .oN mm. NN.I oH. Ho. sN.I Ns.I NH. os.I sH.I oonHIHHmsm .mN os. oN. HN. HH. mm. oN. sH. mN.I mH.I oHoHoIoomouo .NN Nm. Nm.I ss. mN. so.I oo. sH. No. Ho. oHooIDos .oN N; m N o m s m N H monom Aposcflusoov N wHQMB 62 Ho.N oH.m mo.m so.s oH.m mo.m mm.N mm.sH oocmHHm> Hmoou usmo Mom oH. oN. oo. sN.I mH.I sH.I oN. mo. oH. Hmothmm .ms mm. sH.I oN.I NH.I so. Ho.I sN. oo. sH. DsoHHIs>mDH .sN oo. mo. No.I mo.I No.I oo. sN.I so.I oN.I HoshoIoon .mN om. mo. mo.I mo.I mo.I NH.I om. oH. os. smonIonom .NN os. mo.I om. mo.I sH. oH. oN.I oN.I om.I sumoIosoHAH .HN Ns. mo.I so.I oH. oH.I so.I mm.I oH. Ho.I sooHoIhosom .os sm. mo. oH.I HN.I oo.I om.I oN. mo.I oN. omHmIoochhumooo .oo om. so.I mo. NH.I HH. oo. oN.I mo.I NN.I sumsIumoHo .mo mm. so.I No.I NH.I mo. mo.I so. oo. Hm. oDHsstomHn .No ss. mN. NN. NH.I Nm. oN. sm.I oo. sH. DuosmIoaoH .oo Ns. mo.I so. mo.I mH.I oH.I oN. om. mm. DoozmIAoDDHH .mo om. NH. sH.I oN.I oo. so. ss.I mo.I ss.I oopmoHomIsHmo .so ss. Ho. so.I oN.I NH.I Ho. mm. Ns. oo. ocooNIoHo .mo ms. NH.I No. mo.I oH. NH. mN.I oN.I mm.I HoomIDoosm .No mm. sN.I HN. oH. so. No.I sN.I om.I No.I oHoIsoo .Ho so. mo.I No.I HN. mo.I so.I mN. oo. mN. suoosmIsoooH .oo om. mN.I oH. NH.I oo. mo.I Hm.I NH.I sm.I omooDIoomeoH .om so. Ho.I so. mH.I mo.I sH.I om. HH. ms. seamsIomm .om om. Hm.I NH. mH.I oo. NH. om.I mo.I ss.I HsmoommDmHoIsummD .Nm oo. mo.I Ho.I HH.I Ho.I HH.I mo.I mo.I on. AHmHIhHmmoo .om Ns. oo. mo.I oN. oo.I NH. om.I mH. mo.I o>HDoomnsmIo>HooomHo .mm mm. oN.I Ho. HH.I Ho. mo. Ho.I sH.I Ho.I HomsmIooHo .sm ss. mm. No.I om. No. am. sN.I sH. NH.I oHHsosIoooum .mm Ns. NH.I so. NN.I oH.I oo.I mo.I om.I om.I oooHIDmom .Nm N: o N o m s m N H monom HomoeHuoooo N mHnme 63 two "potency" factors .91. The coefficient between the two "activity" factors is .27 and between the two "fourth" fac- tors .31. Table 3 shows the rotated factor loadings for the four factors obtained in this study. The scales and factors are arranged for easy comparison with the corresponding set of loadings in The Measurement g£_Meaning (p. 37). . There is no test of significance of this measure of factor similarity, but it is evident that the agreement between the two analyses on the first two factors is quite high, and that the agreement on the other two factors is somewhat less satisfactory. This seems, however, to be adequate support for the similarity of the two sets of subjects. There are three indices of scale evaluation capacity reported in Table 4, which summarize the results of this experiment. Column I is simply the proportion of the 20 concepts for which each scale shows an evaluative dis- crimination capacity in the "best-worst" analysis. Column II shows the evaluative dominance scores (twice the vari— ance accounted for by the evaluative factor minus the total variance accounted for) computed from the factor analyses by Osgood, Suci, and Tennenbaum. Column III shows the same evaluative dominance scores based on the new factor analysis. A Principal Axis Factor Analysis with Varimax Rotation 64 Table 3 Scales l 2 3 4 h2 1. good-bad -.81 -.05 -.19 -.02 .70 2. large-small .09 .71 .24 -.17 .60 3. beautiful-ugly -.69 —.12 .15 -.18 .54 4. yellow-blue .32 -.16 -.22 .35 .30 5. hard-soft .21 .70 -.01 .05 .54 6. sweet-sour -.68 -.14 .22 -.08 .54 7. strong-weak -.26 .66 .15 .21 .57 8. clean—dirty -.75 -.06 .21 --.03 .61 9. high-low -.35 .24 -.1O -.13 .21 10. calm-agitated -.59 -.ll .01 -.34 .47 11. tasty-distasteful -.62 .10 .26 -.19 .50 12. valuable-worthless -.74 .03 .02 -.02 .55 13. red-green .24 -.O8 .09 .50 .32 14. old-young -.28 -.32 .01 .29 .27 15. kind—cruel —.77 -.13 -.39 .03 .76 .16. loud-soft .37 .41 -.10 .23 .37 17. deep-shallow -.15 .39 .00 -.16 .20 18. pleasant-unpleasant -.76 .00 .27 -.18 .69 19. black-white .56 -.04 -.22 .02 .36 20. bitter-sweet .62 .06 -.25 .18 .48 21. happy-sad -.76 -.08 -.09 .06 .60 22. sharp—dull -.33 .20 .19 .38 .33 23. empty-full .43 -.22 .08 .02 .24 24. ferocious-peaceful .64 .16 -.15 .37 .59 25. heavy-light .19 .40 -.73 .00 .72 26. wet-dry -.os .03 -.03 -.2s .08 27. sacred-profane -.65 .01 .30 -.05 .52 28. relaxed-tense -.53 -.23 -.17 .00 .36 29. brave-cowardly -.55 .41 .04 .24 .53 30. long-short -.05 .37 .48 -.10 .38 31. rich-poor -.48 .11 .21 .01 .29 32. clear—hazy -.56 .03 .01 -.17 .34 33. hot-cold -.02 .00 -.13 .64 .43 34. thick—thin -.1O .35 -.52 -.O9 .41 35. nice-awful -.83 -.10 .09 -.15 .73 36. bright-dark -.59 -.O7 .25 .15 .43 37. bass-treble .01 .49 .09 -.ll .26 38. angular—rounded .07 -.07 .03 .44 .21 39. fragrant-foul -.62 -.21 -.10 —.04 .44 40. honest-dishonest -.71 .17 .27 .05 .61 41. active-passive —.07 .23 .51 .38 .46 42. rough-smooth .36 .58 -.14 .17 .51 43. fresh-stale -.57 .05 .58 -.12 .67 44. fast—slow -.ll .35 .15 .47 .38 45. fair-unfair -.81 .06 -.12 .08 .67 46. rugged—delicate .12 .77 -.08 .04 .62 47. far-near .23 -.16 -.64 .01 .49 48. pungent-bland .07 -.08 -.15 .24 .09 49. healthy-sick -.65 .22 .29 .03 .55 50. wide-narrow -.08 .40 -.08 -.29 .26 Per cent total variance 24.69 8.96 6.68 5.19 Three Indices of Scale Evaluation Capacity 65 Table 4 Scales I II III 1. kind-cruel .90 +.6l +.52 2. colorful-colorless .85 -.10 —.25 3. pleasant—unpleasant .80 +.57 +.32 4. clean-dirty .80 +.66 +.30 5. calm-agitated .80 +.24 -.12 6. honest-dishonest .80 +.70 +.43 7. bright-dark .80 +.39 -.l8 8. fragrant-foul .80 +.69 —.15 9. good-bad .75 +.78 +.59 10. valuable-worthless .75 +.6l +.37 ll. beautiful-ugly .75 +.66 +.25 12 nice-awful .75 +.69 +.58 13. strong—weak .70 —.39 —.46 14. large-small .70 -.50 - 54 15. brave-cowardly .70 +.23 -.10 16. sacred-profane .70 +.54 -.33 17. peaceful-ferocious .65 +.28 +.18 18. happy-sad .65 +.57 -.18 19. healthy-sick .65 +.37 -.16 20. free-constructed .65 -.05 -.16 21. relaxed-tense .65 +.14 -.08 22. fresh-stale .65 +.40 -.25 23. fair—unfair .65 +.67 +.59 24. active-passive .60 -.33 -.50 25. hard—soft .60 —.33 -.48 26. alive—dead .60 -.94 -.24 27. clear-hazy .60 +.32 -.48 28. sober-drunk .60 -.68 +.10 29. fast-slow .60 -.50 -.40 30. mature-youthful .60 -.11 “~10 31. full—empty .60 +-23 ‘035 32. sweet-sour .60 +.66 +.09 33. tasty-distasteful .60 +.38 -.11 34. deep-shallow .60 —.22 -.34 35. rugged-delicate .55 —.33 -.59 36. tough-fragile .55 -.88 -.57 37. sharp—dull .55 -.23 -.35 38. smooth—rough .55 ’°02 "'41 .55 —.ll —.41 39. wide-narrow Table 4 66 (continued) Scales I II III 40. objective-subjective .55 —.02 -.42 41. near-far .55 +.l4 -.17 42. proud-humble .50 -.O4 -.37 43. masculine-feminine .50 -.23 -.38 44. sharp-blunt .50 -.16 —.40 45. rich-poor .50 +.32 -.27 46. white-black .45 +.31 -.38 47. sweet—bitter .45 +.59 +.1O 48. long-short .45 —.15 -.40 49. savory—tasteless .45 —.89 -.26 50. new-old .40 -.92 -.37 51. spacious-constricted .40 —.O7 -.43 52. humorous-serious .40 -.07 -.16 53. soft-loud .40 -.15 -.16 54. light-heavy .40 -.27 -.19 55. curved-straight .40 -.ll -.36 56. complex-simple .35 -.O6 -.44 57. ornate~plain .35 —.08 -.37 58. high-low .35 +.30 —.20 59. opaque-transparent .35 -.O6 —.15 60. stable—changeable .30 -.05 -.31 61. aggressive-defensive .30 -l.00 -.25 62. rounded—angular .30 -.17 -.38 63. young-old .30 -.10 -.43 64. thin-thick .30 -.20 -.48 65. bass-treble .30 -.ll —.36 66. calm-excitable .25 -.08 -.22 67. dry-wet .25 -.02 -.27 68. lenient-severe .20 —.15 +.19 69. unusual-usual .20 -.08 -.27 70. red-green .20 —.06 -.15 71. hot-cold .20 —.22 -.37 72. tenacious-yielding .20 —.13 —.25 73. rational-intuitive .15 -.04 -.08 74. blue—yellow .15 +.05 -.20 75. pungent-bland .00 —.33 —.38 Rank correlation between I & Rank correlation between I & III Rank correlation between II & III II II II .46 (p < .05) .33 (p < .05) .63 (p < .05) 67 Rank correlations were computed between all pairs of these three indices of scale evaluation capacity. As indi- cated before, the correlation of .63 between indices 11 and III is an indication of the similarity between two (actually three) factor analyses on this criterion. This result tends to support the idea that college students at Kansas State University in 1963 are not too different from college stu- dents at The University of Illinois in the mid 1950's. In other words it supports the comparability of the SD technique. The correlations between I and II (.46) and between I and III (.33) indicate a tendency for scales that are evalu- atively dominant (+ signs in columns II and III) to dis- criminate between ”best" and "worst” for a larger proportion of the concepts (column I of Table 4). However, the contrast in the two techniques is also noticeable. Of the 46 scales identified in column II of Table 4 as predominantly non—evaluative, 44 show an evaluative dis- crimination capacity for a significant proportion of the concepts. This ratio is 59/62 in the new factor analysis (column III). It is also true that a significant evaluative discrimination capacity has not been demonstrated for any scale for all concepts. Even the good-bad scale does not appear to differentiate among TORNADOS, RUSSIANS, FIRES, SINS, or FRAUDS. 68 The results of experiment 1, then, indicate that factor analysis of these scales across a set of concepts seriously underestimates the predictive power of "non—evaluative" scales for particular concepts and probably overestimates the predictive power of "evaluative" scales for certain con- cepts. The nature of the concept—scale interaction noted in earlier research with the semantic differential is now reasonably explicit, and it is clear that ignoring this interaction may lead to serious misinterpretation of SD results. There is a "postscript" result to this study. On see- ing the results of the "best-worst" analysis (Table l), the writer noticed that the most frequently used scales, repre— senting all the major factors, seemed to discriminate for more concepts than the less frequently used ones. This suggested the "hypothesis" that all of the regularly obtained factors are "evaluative." That is, the factors obtained in an SD analysis might be viewed as "modes of evaluation“ (see Osgood et al., 1957, p. 62) rather than "evaluative" and "non-evaluative" dimensions of "meaning." When the new fac- tor analysis was completed, a rank order correlation was computed between the proportion of the 20 concepts dis- criminated for by each scale and the communality (the 69 proportion of the total variance accounted for by all the factors) for each scale. The coefficient was .70, higher than any of the other interrelations in this set of data. If this result is not coincidence, it provides a basis for reinterpreting certain factor analytic results already obtained. Results of Experiment 2 In the second experiment, eight subjects (for each of six concepts) ranked the 75 scales in order of importance to an evaluative decision about a concept. The six concepts which had occurred in both of the earlier studies from which the scales were drawn (Osgood et al., 1957, pp. 34 & 49) were chosen to represent the total set for this ranking task. The Kendall coefficient of concordance (W) was computed for each set of eight rankings. As can be seen in Table 5, there was a significant agreement among the eight rankings in each set, since the six We are all significant beyond the .01 level. Given significant We, the ranks of the sums of ranks were taken as the best estimate of the true importance ranks, for each concept. A second set of rankings was ob- tained from the results of Experiment 1 by ranking the 7O scales on the estimated confidence in their discrimination capacity (sign test §_values) for each concept. Table 5 (column Rt) shows that there is a significant correlation (p < .01) between these two measures for all six concepts. Table 5 Relations Between Importance and Discrimination Capacity Concepts W 3t 3d MOTHER .59 .65 .82 BOULDER .34 .50 .52 ME .64 .61 .56 AMERICA .66 .59 .71 SYMPHONY .59 .43 .44 SIN .45 .36 .35* *Probability greater than .05 all others less than .01. The same correlations were calculated for just those scales which had shown a significant discrimination capacity for each of the concepts. Column Bd of Table 5 shows that five of the six correlations for discriminating scales are significant (p < .01). Two are higher and four do not differ from those calculated for the total set of scales (column 3th The only insignificant one (SIN) was based on an N_of sixteen. 71 It was decided to advance to perform this second analysis on the discriminating scales only, because there was no rationale available to predict that subject's dispositions of non-discriminating scales would not be random. The data show, however, that subjects do agree on not—important scales. The size of these correlations, which probably approach a limit set by the reliability of the two measures, indi- cates that both the "best-worst" technique and the importance ranking technique are measuring, to a large extent, the same variable. This means that one can select scales for a par- ticular measuring task by using either the group technique (best-worst) or the other, which is essentially an individual technique. Summary In the two experiments reported here, strong support has been obtained for both theoretic hypotheses. In Experi- ment 1, 72 of 75 scales were shown to have an evaluative discrimination capacity for a significant proportion of the 20 concepts. Among those scales showing an evaluative dis- g g — ive— crimination capacity were stron —weak, lar e small, act passive and hot—cold, all of which have been consistently identified with factors said to be independent of evaluation. 72 These results are believed to clarify the concept-scale interaction that has been noted in research with the seman- tic differential. In obtaining these results, a technique has been demonstrated which has practical value as a means of select— ing scales for semantic differential analysis of specific concepts and, in many applications, can be substituted for the more complex factor analytic technique. Perhaps more important, this technique does not require the same assump— tions as the SD technique and, therefore, provides a means of testing those assumptions in specific content areas. In experiment 2, it was found that evaluative discrim- ination, by a scale for a concept, is related to the impor- tance of that scale as a criterion of evaluation for that concept. This result suggests that an individual ranking technique may be used for the selection of evaluative discriminators and indirectly supports the validity of the "best-worst" method. Chapter 4 In this chapter, the background of this research, two experiments, and their results are reviewed; the writer's conclusions drawn from those results are summarized; then, further implications are discussed for (a) the semantic differential, (b) meaning, and (c) new directions in research. Background This research developed out of the semantic differential, a technique called a measure of meaning. The SD has been frequently used as a research tool since its introduction in 1952, and has been applied in a wide variety of research studies. However, the SD was not adaptable to answering the question, what are the objective criteria on which people base their evaluations of events? On the contrary, results with the instrument seemed to indicate that evaluations are independent of the objective attributes of events. Unable to accept this conclusion, the writer undertook a reassessment of the SD technique, hoping to find another way of interpreting the empirical results that would not conflict with what seemed a reasonable question. The re— sults of that reassessment are reported in Chapter 1 of this thesis. In brief, it was found that the apparent independence 73 74 of evaluative judgments and such objective attributes as "activity" and "potency" can be reasonably explained as an artifact of the statistical method that produced this result. To put it another way, the SD technique assumes that any relation which does obtain between these "dimensions" of judgment is linear and constant across concepts. Applica- tions of the SD under conditions in which either of these assumptions is untenable would, then, quite predictably result in a failure to reject the null hypothesis of inde- pendence among the dimensions of judgment. To state the conclusion still another way, the practice of summing across concepts obscures any differences that may obtain among the concepts. And, to the extent that there are differences, the final result is not necessarily descriptive of any con- cept in the set. Conclusions based on these results have, however, been assumed to apply to every concept in the set, on the assumption that no such differences obtain. Going even fUrther, with an assumption of representative sampling, the conclusions have been generalized to concepts outside the sample. Prior to this research there was extensive evi- dence of concept-scale interaction available--evidence that the basic assumption is not tenable. 75 In spite of the fact that the interpretation of SD re- sults seemed to be in error, the instrument itself—-seven- interval scales bounded by polar adjectives--appeared to be a highly reliable and efficient means of obtaining judgments from subjects about events or categories of events. So it was decided to take this as a starting point for developing a method of finding the objective criteria on which eval- uative judgments are based. This study, then, was designed to provide a new basis for interpreting existing SD data and to provide a founda- tion for further research that could take advantage of the SD scaling technique. In line with this purpose, the scales and concepts used in this study were selected from earlier research so that it would be possible to make direct com- parisons between the two methods of analysis. Experiment 1 On the assumption that bipolar adjectival scales like those used in the semantic differential do measure evaluative and objective variables, and that evaluations are to some extent influenced by the objective attributes of events, the following hypothesis was tested. Hypothesis 1. Bipolar adjectival scales, such as those used in the semantic differential and including those 76 identified in factor analysis as "non-evaluative,“ have an evaluative discrimination capacity for some concepts. This hypothesis was tested for each of 75 scales by testing the null hypothesis, for each concept, that the number of Se who indicated one polarity for the scale equals the number of gs who indicated the opposite polarity (alpha = .05) and, then, testing a second null hypothesis that the first was not rejected for a significant proportion of the 20 concepts (alpha = .05). Experiment 2 On the assumptions that the sign test, as applied here to subjects' responses to the "best" and "worst" examples of a concept, measures the level of agreement among subjects on the polarity of a scale for a concept and that the prob- ability of agreement is greater on important criteria, a second hypothesis was tested. Hypothesis 2. There is a positive relation between the confidence in a scale's evaluative discrimination capacity and its importance as an evaluative criterion. The test of this hypothesis was based on the rank cor— relation between the absolute size of the sign test g_values and importance ranks assigned by subjects. 77 Results The results of the first experiment strongly support hypothesis 1. Of the 75 scales, 72 show an evaluative dis- crimination capacity for a significant proportion of the concepts. Of the 46 scales that were identified by earlier factor analysis as predominantly non-evaluative, 44 showed an evaluative discrimination capacity for a significant proportion of the 20 concepts. Only one scale (pungent- bland) did not show a single significant discrimination for these concepts. No scale (including gpggypgg) showed a significant discrimination capaCity for all of the concepts. The number of scales discriminating for a particular con- cept ranged from 59 to 6, and the number of concepts dif— ferentiated by a particular scale ranged from 0 to 18. Factor analyses performed as a check on the subjects and methods of data collection employed in this study showed the expected amount of agreement with earlier work. In the 75 scale analysis, evaluative, potency, and activity factors appeared in the right order of variance accounted for, and the correlation between evaluative scores computed from the new data and earlier factor Studies was .63 (p < .01). A separate analysis of the 50 scales previously employed with these same concepts was performed, and computed indices of 78 factor similarity showed high agreement between this and previous data on the first two factors (.97 and .91 respec- tively). The less satisfactory results with the third and fourth factors (.27 and .31) are attributed to differences in the techniques of analysis. It was noted that the frequently used scales, that con- sistently load high on some factor but not necessarily the evaluative factor, seemed to discriminate for more concepts than the less frequently used scales. It was deduced that the proportion of the set of concepts for which a scale discriminates should, then, be related to the communality of the scale (the proportion of total variance accounted for on that scale in factor analysis). A rank order cor- relation was computed between these two variables for the 75 scales, and the coefficient obtained was .70, significant well beyond chance expectations. It is appreciably larger than the correlation of .33 obtained between the first variable and the evaluative dominance of the scales calculated from the same data. Viewed as a descriptive statistic, it sum- marizes the relation between the two techniques of analysis. In the second experiment, strong support was obtained for hypothesis 2. Over 75 scales, for six concepts, the 79 correlations between discrimination capacity and importance were all greater than .36 and significant beyond the 99% level of confidence. Conclusions 1. The evaluative judgments that people make about events are related to their "objective” judgments of those events. 2. The objective criteria on which people base their evaluations of particular events are discoverable, using the "best-worst" technique. 3. The greater the evaluative discrimination capacity of a given scale the more likely it is to be an important criterion of evaluation. 4. The fact that a particular scale discriminates evaluatively (or does not) for a particular concept cannot be generalized to other, unrelated concepts. 5. Using the same scales and concepts, the replication of Osgood's factor analysis produced a similar factor structure. Implications for the Semantic Differential Prior data, plus the evidence of concept-scale inter— action obtained in this study, is sufficient to reject the 80 assumption that such interaction does not occur and to invalidate inferences which employ that assumption. This statement is chiefly concerned with the acceptance of the independence of factors, but it applies as well to all aspects of a factor analysis based on scores that have been summed over a set of unrelated concepts. As pointed out earlier (Chapter 1), given evidence of concept~scale interaction, the factor structure obtained from a set of concepts does not necessarily describe any concept in the set. By the same token, results obtained from a set of concepts individually do not necessarily describe the set as a whole. That is, the inclusion or exclusion of the variance among concepts has an unknown effect on the obtained factor structure which makes it impossible to generalize from either the specific or the general factor analysis to the other. If, on the other hand, factor analysis of SD data were performed on single concepts, the problem of concept-scale interaction would be eliminated, but there would then be the problem of low variance and the indeterminate correla— tions that result from it. It would also still be nec- essary to consider the fact that SD data may not meet the interval data assumption of the product moment 81 correlation,and the evidence that the linear assumption is not universally tenable. The problems of polarity and relevance would not interfere with the single concept fac- tor analysis, but the information provided by such an anal- ysis, over and above that provided by the simpler "best- worst" technique, is probably not worth the additional effort required to obtain it. In some cases, factor analysis might reasonably be employed with the "best-worst" method to assist in the de- velopment of categories of evaluative criteria for partic— ular concepts or categories. Even in this restricted ap— plication, however, the results must be interpreted with extreme caution. Implications for Meaning The most important implication of this study for meaning is that there may not be any non—evaluative dimensions of meaning. This study has provided support for the contention that a scale is either evaluative or irrelevant to a partic- ular concept. Nearly all the scales have shown an evaluative discrimination capacity for some concept, and with a larger sample of concepts, this result would probably have been unanimous. 82 The idea that we attend to those attributes of events which have a sign relation to some form of reinforcement, in some context, seems to fit better with commonly accepted theories of human behavior than the idea that we attend to some, but not all, attributes which are independent of evaluation. That is, the finding that all the dimensions are evaluative provides a closer tie with learning theory, particularly the reward principle, than the "tenuous as- sumption" on which interpretation of the semantic differ- ential previously depended (see Osgood et al., 1957, p. 27). These results do not imply that the SD does not mea- sure meaning. For to infer,from,the observation that all the "dimensions" appear to be evaluative,that the SD is an "attitude" measure and, therefore, not a measure of meaning, is to commit the same fallacy of two—valued logic that led to "non-evaluative" factors in the first place. To argue over whether the SD technique and the "best- worst" technique measures meaning or attitude does not seem to be a useful expenditure of energy, for whether or not the only "dimension of meaning" is evaluative, it certainly is reasonable to say that the ability to discriminate eval- uatively among members of a category and the ability to discriminate evaluatively among categories may both be 83 considered evidence that the categories are meaningful to the discriminator. It does seem desirable to be able to separate these two abilities and to explore them separately in that they appear to index two quite different levels of meaningfulness. The "best-worst" technique seems admirably adapted to this task. Implications for New Directions in Research It has already been suggested that the technique de- veloped here has application in consumer market research. Image research has been done with the semantic differential by comparing the "ideal" of a particular product with some specific brand of the product. This approach did not, how- ever, include an objective method for determining scale polarity (which this study has shown is a variable) nor show the evaluative significance of variables that could be manipulated ;g_the product. The "bestsworst" technique, on the other hand, is a method for discovering the dimen- sions that have an evaluative discrimination capacity for a specific kind of product (whether it's automobiles or toothpaste), it establishes the polarity of the scales for that product, and by anchoring both ends of the scale, gives an idea of how much difference makes a difference. 84 Most importantly, it shows the evaluative significance of objective variables. A profile of a product on scales selected by this technique should be directly interpret- able as suggestions for modification in the product and the advertising of the product. The image research idea, though, is only a specific example of the kind of research to which the B-W technique is appropriate. As mentioned earlier, this instrument was designed for (and seems to be suited to) asking why one event or object is preferred to another of its kind, in regard to any category of events whatsoever. Broad applica- tion of this instrument--perhaps in conjunction with the standard SD—-might lead, eventually, to a description of the system of values which controls the majority of human behavior. At this point, it seems reasonable to assume that there are both general and specific values involved in the de- cisions that people make, and knowledge gained about any of those values would increase the overall predictability of human behavior. In other words, the behaviors of people are probably controlled by their evaluative judgments, evaluative judgments are probably related to the objective attributes of events, and the BsW technique is a crude but 85 usable technique for extending our knowledge of that intricate network of relationships. Perhaps the most exciting area of research in which the B-W technique may be applied is the general area of persuasion. Is a speaker perceived as more knowledgeable when he restricts his arguments to dimensions that are "relevant?" Is a persuader more effective when he presents "factual information" that fits the audience's predeter- mined value system or when he presents and interprets "facts" they didn't know were supposed to have an effect on their judgment of value? Is an advertisement less effective if it bases a claim of value on a dimension which does not have an evaluative discrimination capacity for a given audience and a given product? Why does an argument that works with one audience fail with another--is it because they have different criteria of evaluation? All of these questions have been asked before, and their answers seem obvious, but the technique presented here offers innumerable possibilities for refining both the questions and the answers. The advertiser or any other persuader certainly has reason to want to accomplish his purpose with as little 86 effort and expense are possible. It seems very likely that, by determining in advance exactly what variables the mem- bers of his audience perceive as relevant to the decision he requires of them, he can not only decrease his cost but increase his effect as well. The results of this study and the viewpoint evolved in this discussion have some very important implications for the measurement of attitudes pg£_§g, Kerrick and McMillan (1961) found that subjects responded differently to news stories on SD scales when they were informed that they were engaged in attitude research. The following quotation is from the summary of their report. The informed group showed much less tendency to change their attitudes in response to the news stories. In addition, when members of the informed group did show change in response to the stories, they were more likely to change in the direction Opposite to that advocated in the stories than were members of the naive group. The naive group's attitude change was predictable from the principle of pressure toward congruity. The informed group's was not. Instructions inhibited only evaluative change; non-evaluative change in response to the communica- tion was no different for informed and naive groups. This suggests that if subjects are told or guess that the study in which they are participating is an "attitude study," a more accurate picture of the effect of an independent variable can be obtained by discarding the 87 "evaluative" scales and basing one's conclusions on the ”non-evaluative" scales that are used for "masking” purposes. Summary In short, there are two important outcomes of this re- search project: (a) The results of research that has been done with the semantic differential must be re-evaluated, because at least one basic assumption in that technique is untenable, and results have been obtained here that are in- compatible with inferences based on that assumption. (b) An alternate method of analysis for the SD has been demonstrated which does not require as many tenuous assumptions as the factor analytic method, can be more parsimoniously inter- preted, can be applied in a wide range of research designs, and is specifically suited to the task of finding out why people prefer one example of a category to another example of the same category. Bibliography Baxter, J. C. Mediated generalization as a function Of semantic differential performance. Dissertation Abstr., 1959 (Nov.), 20, 1957. (Abstract) Berlo, D. K. The process p§_communication. New York: Holt, Rinehart, & Winston, 1960. Berlo, D. K., & Gulley, H. E. Some determinants of the effect of oral communication in producing attitude change and learning. Speech Monogr., 1957, 24, 10-20. Bettinghaus, E. P. The application of the principle Of congruity to certain aspects of language behavior. East Lansing: Communication Research Center, Michigan State University, 1961. (a) Bettinghaus, E. P. The operation of congruity in an oral communication situation. Speech Monogr,, 1961, 28, 131-142. (b) Block, J. An unprofitable application Of the semantic dif- ferential. g, consult. Psychol., 1958, 22, 235-236. Brown, R. W. Review of: "Osgood, Suci, & Tannenbaum, The measurement of meaning." Contemp, Psychol., 1958, 3, 113-115. Brown, R. W. Words and thing_, Glencoe, Ill.: The Free Press, 1959. "Osgood, Suci, & Tannenbaum, Carroll, J. B. Review Of: 1959, 35, 58—77. The measurement of meaning." Language, Church, J. Languaqe_and the discovery_g§_reality. New York: Random House, 1961. Dicken, C. F. Connotative meaning as a determinant of stimulus generalization. Psychol. Monogr,, 1961, 75, No. 1 (Whole No. 505). 88 89 Donahoe, J. W. Changes in meaning as a function of age. g, genet. Psychol., 1961, 99, 23-28. Eisdorfer, C., & Altrocchi, J. A comparison of attitudes toward old age and mental illness. g, Gerontol., 1961, 16, 340-343. Endler, N. S. Changesiximeaning during psychotherapy as measured by the semantic differential. J, counsel. Psychol., 1961, 8, 105-111. Ferguson, G. A. Statistical analysis ;p_psychology_and education. New York: McGraw—Hill, 1959. Flavell, J. H. Meaning and meaning similarity: I. A the- oretical reassessment. J, gen. Psychol., 1961, 307-319. (a) Flavell, J. H. Meaning and meaning similarity: II. The semantic differential and co-occurrence as predictors of judged similarity in meaning. g, gen. Psychol., 1961, 64, 321-335. (b) Greenburg, B. S., & Tannenbaum, P. H. The effect of bylines on attitude change. Journ. Ouart., 1961, 38, 5357537. Grigg, A. E. A validity study of semantic differential technique. g, clin. Psyghol., 1959, 15, 179-181. (a) Grigg, A. E. A validity test of self—ideal discrepency. g, clin. Psychol., 1959, 15, 311—313. (b) Gulliksen, H. Review of: "Osgood, Suci, & Tannenbaum, The measurement of meaning." Contemp. Psychol., 1958, 3, Harman, H. H. Modern factor analysis. Chicago: University of Chicago Press, 1960. Jenkins, J. J., Russell, W. A., & Suci, G. J. An atlas of semantic profiles for 360 words. Amer. g, Psychol., 1958, 71, 688-699. 9O Jenkins, J. J., Russell, W. A., & Suci, G. J. A table of distances for the semantic atlas. Amer. g, Psychol., 1959, 72, 623-625. Kelly, Jane A., & Levy, L. H. The discriminability Of con- cepts differentiated by means Of the semantic differen- tial. Educ. psychol. Measmt., 1961, 21, 53-58. Kerrick, Jean S., & McMillan, D. A., III. The effects of instructional set on the measurement of attitude change through communications. J, soc. Psychol., 1961, 53, 113-120. Kjeldergaard, P. M. Attitudes toward newscasters as meas- ured by the semantic differential. g, appl. Psychol., 1961, 45, 35-40. Kraus, S. Modifying prejudice: Attitude change as a func- tion of race of the communicator. Ay_comm. Rev., 1962, 10, 14-22. Kumata, H. A factor analytic investigation Of the gener- ality of semantic structure across two selected cultures. Unpublished doctoral dissertation, Univer. of Illinois, 1957. Lambert, W. E., & Jakobovits, L. A. Verbal satiation and changes in the intensity of meaning. g, exp. Psychol., 1960, 60, 376—383. Lyle, J. Semantic differential scales for newspaper re- search. Journ. Quart., 1960, 37, 559-562, 646. McNelly, J. T. Meaning intensity as related to readership of foreign news. Unpublished doctoral dissertation, Michigan State Univer., 1961. Manis, M. Assessing communication with the semantic differ- ential. Amer. g, Psychol., 1959, 72, 111—113. Messick, S. J. Metric prOperties of the semantic differen- tial. Educ. psychol. Measmt., 1957, 17, 200-206. 91 Messick, S. J., & Solley, C. M. ‘Word-association and semantic differentiation. Amer. g, Psychol., 1957, 70, 586-593. Michon, J. A. An application of Osgood's "semantic differ- ential" technique. Acta psychol. Amst., 1960, 17, 377- 391. Mindak, W. A. Fitting the semantic differential to the mar- keting problem. g, Market.. 1961, 25 (April), 28-33. Mitsos, S. B. Personal constructs and the semantic differ— ential. g, abnorm. soc. Psychol., 1961, 12, 433-434. Moss, C. S. Current and projected status of semantic dif- ferential research. Psychol. Rec., 1960, 10, 47-54. Moss, C. S. Experimental paradigms for the hypnotic inves— tigation of dream symbolism. Int. g, clin. exp. Hypno- sis, 1961, 9, 105-117. Moss, C. S., & Waters, T. J. Intensive longitudinal inves- tigation of anxiety in hospitalized juvenile patients. Psychol. Rep., 1960, 60, 262-265. Mogar, R. E. Three versions of the F scale and performance on the semantic differential. g, abnorm. soc. Psychol., 1960, 60, 262-265. Nebergall, R. E. An experimental investigation of rhetori- cal clarity. Speech Monogr. Norman, W. T. Stability-characteristics of the semantic differential. Amer. g, Psychol., 1959, 72, 581-584. Osgood, C. E. Method and theory g§_experimental psychology. New York: Oxford Univer. Press, 1953. Osgood, C. E., and Luria, Zella. A blind analysis Of a case of multiple personality using the semantic differential. g, abnorm. soc. Psychol., 1954, 49, 579-591. Osgood, C. E., & Suci, G. J. A measure of relation deter- mined by both mean difference and profile information. PsyChol. Bull., 1952, 49, 251-262. 92 Osgood, C. E., & Suci, G. J. Factor analysis of meaning. g, exp. Psychol., 1955, 50, 325-338. Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. The meas- urenuent gf_meaning. Urbana, Ill.: Univer. Of Illinois Press, 1957. Osgood, C. E., & Tannenbaum, P. H. The principle of con- gruity in the prediction of attitude change. Psychol. Rev., 1955, 62, 42-55. Osgood, C. E., Ware, E. E., & Morris, C. Analysis of the connotative meanings of a variety Of human values as expressed by American college students. g, abnorm. soc. Psychol., 1961, 62, 62-73. Rabin, A. I. A contribution to the "meaning" of Rorschach's inkblots via the semantic differential. g, consult. Psychol., 1959, 23, 368-372. Resnick, J., & Landfield, A. W. The oppositional nature of dichotomous constructs. Psychol. Rec., 1961, 11, 47-55. Rosen, E. A cross cultural study Of semantic profiles and attitude differences: Italy. g, soc. Psyphol., 1959, 49, 137-144. Seman, Catharine B. Use of the semantic differential with lobotomized psychotics. g, consult. P§ych01., 1957, 21, 264. ' Senders, Virginia L. Measurement and statistics. New York: Oxford Univer. Press, 1958. Siegel, S. Nonparametric statistics. New York: McGraw- Hill, 1956. Smith, R. G. Development of a semantic differential for use with speech related concepts. Speech Monoqr., 1959, 26, 263-272. Smith, R. G. A semantic differential for theater concepts. Speech Monogr., 1961, 28, 1-8. 93 Smith, R. G. A semantic differential for speech correction concepts. Speech Monoqr., 1962, 24, 32-37. Solley, C. M., & Messick, S. J. Probability-learning, the statistical structure Of concepts, and the measurement of meaning. Amer. g, Psychol., 1957, 70, 161-173. Solomon, L. N. Semantic reactions to systematically varied sounds. g, Acoust. Soc. Amer., 1959, 31, 986-990. Springbett, B. M. The semantic differential and meaning in non objective art. Percept. mot. Skills, 1960, 10, 231—240. & Lynch, M. D. Sensationalism: The con- Tannenbaum, P. H., 37, 381- cept and its measurement. Journ. Quart., 1960, 392. Tolor, A. The "meaning" Of the Bender-Gestalt test designs: A study in the use Of the semantic differential. J, proj. Tech., 1960, 24, 433-438. Triandis, H. C. Differential perception of certain jObs and people by managers, clerks, and workers in industry. g, appl. Psychol., 1959, 43, 221-225. Triandis, H. C. A comparative factorial analysis of job semantic structures of managers and workers. Q, appl. Psychol.. 1960, 44, 297-302. Triandis, H. C., & Osgood, C. E. A comparative factorial analysis of semantic structures in monolingual Greek and American college students. g, abnorm. soc. Psychol., 1958, 57, 187-196. Walker, Helen M., & Lev J. Statistical inference. New YOrk: Holt, Rinehart, & Winston, 1953. "Osgood, Suci, and Tannenbaum, Weinreich, U. Review of: 14, 346-366. The measurement of meaning." Word, 1958, \ Winter, W. D. Values and achievement in a freshman psychol- ogy course. g, educ. Res.. 94 Wiener, D. N., & Ehrlich, D. "Goals" and "Values." Amer. g, Psychol., 1960, 73, 615-617. Wohl, J. A note on the generality of constriction. g, proj. Tech., 1957, 21, 410-413. Zax, M. Loiselle, R. H., & Karras, A. Stimulus character- istics of Rorschach inkblots as perceived by a schizo- phrenic sample. g, proj. Tech., 1960, 24, 439-443. Appendix A A Survey of Semantic Relations The purpose of this study is to develop a method for finding out what criteria people use for making judgments about things--what kinds of questions would they want to ask about a particular thing before they could decide wheth- er it was a "better" or "worse" thing Of its kind. For ex- ample, you probably don't care whether your friends are large or small, but that's the first question you would ask about a pay check. You may not care whether your automobile is red or green, but it makes a difference in apples. If we had a hundred years to spare, we might be able to answer this question by discussion, but this study is an attempt to get an answer more quickly than that. On the following pages are scales with adjectives at each end that look like this: left : : : : right The intervals on these scales may be interpreted as extremely left, gpite left, slightly left, neither or both, slightly right, guite right, and extreme1y_right. Of course you are to substitute whatever words occur at the left and right ends Of the scales. At the top of each page is a concept, such as DOG. What you are to do is to think of the best imaginable and the worst imaginable examples of the class of things named by that concept (in this case, the best imaginable DOG and the worst imaginable DOG) and indicate where you think the best and the worst examples fall on each Of the scales on that page. For example, if you happen to like large DOGS, and you don't care much for small DOGS, you might indicate that the best imaginable DOG is extremely large and the worst imaginable DOG is guite small. Of course, your best DOG may be "gentle" and your worst DOG "mean," but you will have an opportunity to indicate that on another scale. Indicate your feeling for the "best" example by mark- lng a "B" on the scale in the appropriate place. Indicate 95 96 "worst" by marking a "W" in the appropriate place. Your responses might look like this: DOG large BI : : : : : W : small mean W : : : : : : B gentle green : : : BW : : : red The latter mark indicates that you don't really care wheth- er a DOG is green or ggg or that this scale just doesn't apply to DOGS. With concepts such as ELEPHANT, MONSTER, or RUBY you might feel that one of the extreme positions on the scale describes g;l_the members of the class, in which case you should mark "BW" in the extreme position. Just make sure that you have two marks on every scale. (Concept) hot : : : : : : cold ornate : : : : : : plain small : : : : : : large honest : : : : z : dishonest poor : : : : : : rich complex : : : : : : simple usual : : : : : : unusual healthy : : : : : : sick fast : : : : : : slow dead : : : : : : alive masculine : : : : : 3 feminine humorous empty thick bass yellow .'pleasant high opaque sacred mature weak constricted pungent rugged aggressive soft proud nice objective unfair tasty sad (Concept) .0 97 O. serious full thin treble blue unpleasant low transparent profane youthful strong spacious bland delicate defensive loud humble awful subjective fair distasteful happy 98 (Concept) relaxed : . : : : tense rough : : : : smooth new : : . . : : old sweet : : : : . sour old : : : young calm : : : : : agitated bitter : : : : : sweet long : : : : : : short black : : : : : : white clear : : : : : : hazy constrained : : : : : : free sober : : : : : : drunk bright : : : : : : dark stale : : : : : : fresh , kind : : : : : : cruel heavy : : : : : : light far : : : : : : near angular : : . : : : rounded red : : : : : : green passive : : : : : : active colorful : : : : : : colorless Worthless : : : : : : valuable 99 (Concept) ferocious : : : : : : peaceful severe : : : : : : lenient beautiful : : : : : : ugly tough : : : : : : fragile savory : : : : : :4. tasteless changeable : : : : : : stable wide : : : : : : narrow sharp : : : : : : blunt cowardly : : : : : : brave hard : : : : : : soft dull : : . : : : sharp good : : : : : : bad dirty : : : : : : clean rational : : : : : : intuitive tenacious : : : : : : yielding excitable : : : : : : calm fragrant : : : : : : foul curved : : : : : : straight deep : : : : : : shallow wet : : : : : : dry Appendix B A Study in Semantic Relationships You are about to participate in an experiment. This experiment is part Of a larger study that is attempting to find out the WHY behind statements like "This one's better than that one," or "This one's not as bad as that one." You will be given a set of 76 cards. On the first one is a concept (MOTHER, ME, SIN, AMERICA, SYMPHONY, or BOULDER). It is assumed that this concept names a class of objects or events, some of which you like better than others or dislike less than others. It is also assumed that you have some reasons for your likes and dislikes. For example, you probably prefer a capitalistic AMERICA to a communistic AMERICA, because it is capitalistic rather than communistic. The next 75 cards in your set (numbered sequentially in the lower right hand corner) each has a pair of adjec- tives typed on it, like communistic-capitalistic. Each pair of adjectives is assumed to name a dimension on which events named by your concept might differ. You must de- cide which of these dimensions are most important to you-- 100 101 whictiones carry the most weight--when you are trying to decide that "This one is better," or "That one is not as bad." RANK THE 75 ADJECTIVES CARDS IN ORDER OF IMPORTANCE TO YOUR FINAL EVALUATION OF A PARTICULAR EXAMPLE OF THE CONCEPT. You might begin by dividing the set of cards into three stacks--important, maybe important, not important--and then divide them again and again until you have the set in rank order with the most important gp_ppp, Put the concept card last. Appendix C A Survey of Semantic Relations The purpose of this study is to measure the meanings of certain things to various people by having them judge them against a series of descriptive scales. In taking this test, please make your judgments on the basis of what these things mean pg_yp_, On each page of this booklet you will find a CONCEPT to be judged and beneath it a set of scales. You are to rate the concept on each of the scales in order. Here is how you are to use these scales: If you feel that the concept at the top of the page ;§_very closely related to one end of the scale, you should place your check-mark as follows: fair X : : : : : : unfair OR fair : : : : : : X unfair If you feel that the concept is quite closely related to one or the other end of the scale (but not extremely), you should place your check-mark as follows: fair : X : : : : : unfair OR fair : : : : : X : unfair If the concept seems only_sliqht1y_related to one side as opposed to the other side (but not really neutral), then you should mark as follows: fair : : X : : : : unfair fair : : : : X : : unfair The direction twoard which you check, of course, depends upon which of the two ends of the scale seems most charac- teristic of the thing you're judging. 102 103 If you consider the concept to be neutral on the scale, both sides of the scale egually associated with the concept, or if the scale is completely irrelevant, unrelated to the concept, then you should place your check-mark in the mid- dle space: : X : ° unfair fair : IMPORTANT: (1) Be sure you check every scale for every concept--do not omit any. (2) Make one and only one check mark on each scale. (3) Make each item a separate and independent judgment. Work at a fairly high speed through this test. DO not worry or puzzle over individual items. It is your first impressions, the immediate "feelings" about the items, that we want. On the other hand, please do not be careless, because we want your true impressions. 0M USE ONLY wear," "' " nLiLN‘J‘i USE LII‘IiJ I . -..—:— TIM—‘4 -. .. “a” MSG-Q )5 j ’16 "IIIIIIIIIIIIIIIII