{.13 ‘ . ,le . L d 15.3.. cu. ‘L' m ”a! ~\ 2.“ N. uni“ “II“:VUT' Ia- “do II 3K \.r:¢UQ« :HN‘ .Ilu n Jan. .. .211“ I .E :3 é... %..W§@E§§§ . . . 3 ‘| " (-3: mulllillllllllllfillllflmull \w u 5/ GERARY 3 1293 01801 7032 Michigan State University This is to certify that the dissertation entitled PSYCHOLOGICAL MEASUREMENT AND STATISTICAL INFERENCE: IMPLICATIONS OF SCALE MISSPECIFICATION FOR MODERATED MULTIPLE REGRESSION presented by William Michael Rogers has been accepted towards fulfillment of the requirements for PhoDo degree in PsyChOTogy Mitfjor professor Date ’1/3/79 ] / MSUiJ an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINB return on or before date due. MAY BE RECALLED with earlier due date if requested. I DATE DUE DATE DUE DATE we 4 01 W203» 423 o M33 .'g§-r:22 2510‘ ' w W14 PSYCHOLOGICAL MEASUREMENT AND STATISTICAL INFERENCE: HVIPLICATIONS OF SCALE MISSPECIFICATION FOR MODERATED MULTIPLE REGRESSION By William Michael Rogers Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1998 ABSTRACT PSYCHOLOGICAL MEASUREMENT AND STATISTICAL INFERENCE: IMPLICATIONS OF SCALE MISSPECIFICATION FOR MODERATED MULTIPLE REGRESSION By William Michael Rogers The purpose of this thesis was to reexamine the critical relationship between scales of measurement, moderated multiple regression, and theoretical inference. The first major section critically reviews the modern measurement paradigm in psychology, and argues that psychologists have placed too much faith in both their measures and their methodologies. The second section narrows the issue, focusing on how these uncertainties in measurement scales can affect inferences and tests using moderated multiple regression methods. It is shown that weaker scales prevent the researcher from making conclusive statements about presence and strength of moderating effects. A study is conducted, by which the effects of measurement scale on interpretation of moderated multiple regression in a variety of situations is clarified. It is shown that the interpretability of obtained effect sizes for interaction effects is based, in part, on the precision of both predictor and criterion measures. In addition, the overall predictability of criterion measures appears to be a factor in the complex relationship between measurement precision and interaction effects. Copyright by WILLIAM MICHAEL ROGERS 1998 ACKNOWLEDGMENTS There are several individuals whose contributions to this dissertation are noteworthy. First, I would like to thank the faculty at Michigan State University, especially members of my dissertation committee. Neal Schmitt, my committee chair, kept the dissertation process streamlined, correcting my tendencies to vainly allocate time toward the mathematically intractable. Rick DeShon provided helpful advice in developing and programming the mathematical Simulations used in the thesis. Alexander von Eye helped clarify the theoretical issues in defining and understanding scales of measurement. Ann Marie Ryan was instrumental in framing the practical implications of measurement theory for applied psychology. I would also like to thank the several scholars whose work has greatly influenced my own philosophy of measurement, and motivated me toward further study: R. Duncan Luce, Louis Narens, Patrick Suppes, David Krantz, Amos Tversky, Joel Michell, and Jean-Claude F almagne. Finally, I would like to thank my parents, for believing in me at times when it was difficult for me to believe in myself. Without their constant encouragement, completing the dissertation would have been impossible. iv TABLE OF CONTENTS LIST OF TABLES .................................................................................................. vii INTRODUCTION ................................................................................................... 1 Measurement Theory in Psychology: Campbell’s Problem and Stevens’ Solution ................................................................................. 4 Psychological Data: Ordinal, Interval, or Unimportant? ............................ 9 Interval Scales, Ratio Scales, and Moderated Multiple Regression ........... 15 Moderated Multiple Regression and Ordinal Scales: Criterion Issues ....... 21 Moderated Multiple Regression and Ordinal Scales: Predictor Issues ....... 29 Moderated Multiple Regression and Level of Measurement : Summary and Implications ........................................................................... 36 Rationale and Overview for the Study ........................................................ 39 Simultaneous Conjoint Measurement ......................................................... 42 The MORALS Algorithm ........................................................................... 49 Research Design ......................................................................................... 50 Study Independent Variables .......................................................... 50 Study Dependent Variables ............................................................. 52 Hypotheses .................................................................................................. 54 METHOD ............................................................................................................... 60 Structure of Predictor Variables and Error Variance .................................. 63 Dataset Generation ...................................................................................... 67 RESULTS ............................................................................................................... 69 DISCUSSION ......................................................................................................... 85 Effects of Baseline R2 ................................................................................. 85 Effects of Incremental R2 ............................................................................ 86 Effects of Predictor Intercorrelation ........................................................... 88 Effects of Measurement Properties of Variables ........................................ 89 Crossing vs. Non-crossing Interactions ...................................................... 90 Design Interaction Effects .......................................................................... 90 Measurement, Interaction Effects, and Psychology .................................... 91 Practical Implications of the Study ............................................................. 95 APPENDIX A: MORALS Algorithm .................................................................... 100 APPENDIX B1: 1‘2 Values of Crossing Interactions with Two Continuous Predictors .................................................................................................... 1 02 APPENDIX B2: f2 Values of Crossing Interactions with One Continuous and One Binary Predictor ........................................................................... 103 APPENDIX B3: f2 Values for Crossing Interactions with Two Binary Predictors .................................................................................................... 104 APPENDIX B4: Pre-Post Correlation Coefficients for Crossing Interactions with Two Continuous Predictors ................................................................ 105 APPENDIX B5: Pre-Post Correlation Coefficients for Crossing Interactions with One Continuous and One Binary Predictor ........................................ 106 APPENDIX B6: Pre-Post Correlation Coefficients for Crossing Interactions with Two Binary Predictors ........................................................................ 107 LIST OF REFERENCES ........................................................................................ 108 vi Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11. Table 12. Table 13. Table 14. LIST OF TABLES Datasets for Slope Bias Example: GPA by Race and SAT Score ........ 23 Datasets for Social Behavior Example: Social Behaviors by Work Experience and Time ............................................................. 26 Dataset for Predictor Rescaling Example: Y by X and Z ..................... 32 Dataset in Table 1a presented as SAT by Levels of Race and GPA ..... 33 Performance by Levels of Motivation and Ability ................................ 45 Mean Af2 by R2 of Additive Model and Measurement Level of Variables ................................................................................. 70 Mean Correlation by R2 of Additive Model and Measurement Level of Variables ................................................................................. 72 Mean Correlation by Incremental R2 of Interaction Effect and Measurement Level of Variables .......................................................... 74 Mean Af2 by Incremental R2 of Interaction Effect and Measurement Level of Variables ................................................................................. 75 Mean At‘2 by Pre—Transforrnation Predictor Intercorrelation and Measurement Level of Variables .......................................................... 77 Mean Pre-Post Transformation Correlation by Pre-Transformation Predictor Intercorrelation and Measurement Level of Variables .......... 78 Incremental Attenuation (AAfz) by Variable and Variables Already Transformed .......................................................................................... 80 Mean At‘2 by Non-Crossing / Crossing Interaction and Measurement Level of Variables ................................................................................. 82 Mean AfZ Values for Study Design Factors .......................................... 96 vii INTRODUCTION Many theories and procedures in applied psychology predict interactive or moderating relationships between independent variables in determining their effects on a dependent variable. Moderating effects are usually defined (e. g. Zedeck, 1971) as situations in which the bivariate relationship between two variables (X, Y) is influenced by a third variable (Z). Personnel psychologists typically use the moderator concept in assessing test bias. Evidence for moderating effects of categorical variables such as gender or ethnicity is considered differential prediction, and the test is deemed biased against a subgroup defined by the moderator (Cleary, 1968). Use of moderators is also prevalent in other applied domains, such as organizational behavior (e. g. Pierce, Gardner, Dunham, & Cummings, 1993), and training / skill acquisition (e. g. Kanfer & Ackennan, 1989). AS the theoretical models generated to explain or predict human behavior in organizational settings become more complex, the development of theories Specifying moderator variables will grow in importance. The primary purpose of early studies using moderators was not to test theories or detect test bias, but to assess differential validity across subgroups defined by a third variable (e. g. Ghiselli, 1956; Saunders, 1956). This usage was predicated on the notion that moderator variables defined homogenous subgroups, within which criterion-related validity was generally thought to be more accurately assessed, and, in some cases, of greater magnitude. As such, this initial use of moderator variables was primarily atheoretical and focused on validity maximization, rather than on substantive relationships between grouping variables, predictors, and criteria (Lubinski & Humphreys, 1990). As the early use of moderator variables was in differential validity assessment, the primary method of examining moderator effects was based on the comparison of subgroup correlations. The use of this method was restricted to categorically-defined subgroups, but this was less severe of a restriction, given the nature of most subgrouping variables (e. g. gender, race). However, it posed problems for the treatment of continuous moderator variables. Since it is generally not desirable to collapse continuous scores into categories, the preferred modern method used to assess interactive or moderating relationships is moderated multiple regression (MMR) (Saunders, 1956; Zedeck, 1971). This method has been shown to provide more information than subgroup correlational analysis, in the form of subgroup slopes (Stone-Romero & Anderson, 1994), and can be applied to situations with either dichotomous or continuous moderator variables. Using MMR, the test for interactive or moderating effects is a test on the regression weight of a multiplicative term composed of both predictor components. Expressed in terms of a linear model, this is as follows: Y=bo+b,X+b,Z+b3XZ+e (1) where Y is a continuous dependent variable, X is a predictor variable, and Z is a predictor variable thought to have a moderating effect on the X-Y relationship. A test of the significance of b3, in this case, is a test of the relevant moderating effect. This test is mathematically equivalent to a hierarchical F -test of the incremental R2 for the above model over a reduced model without the XZ product term. Despite the theoretical importance of interactions in applied psychology, confirming evidence has often been difficult to gather. Cronbach (1987) notes the difficulties in finding interaction effects to be statistically significant. Zedeck (1971) has termed moderator effects “as elusive as suppressor variables”. Moderator effects have also been characterized as more difficult to detect in non-experimental field settings than in experimental settings (McClelland & Judd, 1993; Morris, Sherman, & Mansfield, 1986). Although the failure to discover significant moderating effects using MMR led some researchers to advocate alternative methodologies, such methods were eventually shown to be invalid (c.f. Wise, Peters, & O’Connor, 1984). In response to these difficulties, many researchers have investigated statistical artifacts which may contribute to Type H errors using MMR. Type I errors (detecting an interaction when one is not present in the population) are assumed to be controlled for by the significance level of either the t-test for the product term in the MMR equation, or the F-test for incremental R2 after inclusion of the product term. Several factors, such as small sample size (Alexander & DeShon, 1994), measurement error (Busemeyer & Jones, 1983), small population effect sizes (Stone-Romero & Anderson, 1994), and range restriction (Aguinas & Stone-Romero, 1997) have all been shown to increase Type II errors and reduce the power of the MMR method. These factors reduce the probability of concluding interaction effects are present when they are, in fact, present in the population. While the above findings are of considerable practical utility, they have perhaps overshadowed more fimdamental issues related to the erroneous interpretation of moderated multiple regression analysis. These issues are rooted in the measurement properties of the variables used by the investigator. These measurement properties, and the associated scales of measurement, are typically defined based on the prevalent theory of scale types (Stevens, 1946). Murcment Theory in Psychology: Campbell’s Problem and Stevens’ Solution In the majority of psychological circles, the name S.S. Stevens is synonymous with scales of measurement. His nominal, ordinal, interval, and ratio categorizations (Stevens, 1946) have been almost uniformly accepted in the psychological literature, and rarely do methodological or statistical textbooks go beyond these concepts when discussing measurement scales. The unanimity of this acceptance cannot, however, be explained solely by the utility of Stevens’ model. Rather, Stevens’ impetus for developing such a measurement taxonomy, and the taxonomy’s subsequent widespread acceptance, owe themselves, in part, to a reactive stance by early 20th century psychologists against influential measurement theories, most notably the ideas of Campbell (1920, 1928). Campbell, a physicist, attempted to formalize extensive measurement within physics. Extensive measurement is the numerical representation of physically additive properties of objects. Mathematicians such as Helmholtz (1887) and HOlder (1901) had developed complex theorems and proofs for such measurement systems, and physical concepts were readily applied to the theory. Length, mass, and distance are common examples. Measurement of these properties is based on the empirical concatenation, or physical addition, of identified subunits (e. g. meters, grams) which correspond to the object being assessed. Campbell, and others (e. g. Bridgman, 1922), proposed that extensive measurement is the only basis for measurement, and any scale or measurement system must, at some level, be based on extensively measured entities. Accordingly, Campbell called extensive measurement fundamental measurement. The measurement of empirical properties which could not be extensively measured, but was instead composed of fundamental measures, was called derived measurement. Concepts such as density and acceleration are derived measures, as they are determined by simple mathematical relationships between fundamental measures (e. g. density derived from mass and volume), or powers thereof (e. g. acceleration derived fiom velocity). Campbell’s theory essentially classified measurement as fundamental or derived, and any scale or representation which was neither was voided as measurement. By constraining measurement to extensive attributes, Campbell’s theory had thrown down a gauntlet to psychology. The vast majority of psychological variables were not amenable to empirical concatenation operations, and next to none consisted of additive physical units. This was even true of the psychophysics discipline, which was, in the early-to-mid 20th century, considered to be the most rigorously quantitative of any field in psychology. Many psychophysicists at the time (c. g. McGregor, 1935; Johnson, 1936; Smith, 193 8) attempted to integrate psychological measurement within Campbell’s theory, but to little avail. In 1940, a committee of the British Association for the Advancement of Science, on which Campbell was an influential member, provided another damaging blow, formally declaring fundamental measurement in psychology an impossibility (Ferguson et a1, 1940): “Why do not psychologists accept the natural and obvious conclusion that subjective measurements of loudness in numerical terms (like those of length or weight or brightness) are mutually inconsistent and cannot be the basis of measurement?” As Campbell’s theory gained support from the committee’s pronouncement, the prospects of acceptable measurement in psychology grew dimmer. Given the vital role ascribed to measurement by the scientific community as a whole, the mood of many was that psychology was on the defensive in a philosophical battle for its existence as a science. Stevens, a psychophysicist, was especially influenced by the edict of the British committee, as his own loudness sensation scale, the some scale (Stevens & Davis, 1938), was among those that the association chose to examine in detail. Stevens proposed an alternative to Campbell’s theory by relaxing the requirement of extensively measurable entities. Stevens suggested that any numerical coding which somehow represents an empirical reality should be considered measurement, regardless of the presence of additivity in either the numbers used or the empirical objects in question. Stevens’ complete theory argues that empirical non-additivity does not preclude measurement itself, but only restricts the ways in which the measurements can be used. Stevens identified four primary scale types: nominal, ordinal, interval, and ratio. His nominal scale is the numerical coding of attributes based on equality or inequality. Thus, using a ‘1’ to represent a male and a ‘2’ to represent a female tells us only that “males are not equal to females”. Stevens’ ordinal scale uses numbers to denote order properties of an attribute. A simple example is order of finish in a marathon. 1St place finishes ahead of 2'“, which finishes ahead of 3rd, and so on. Ordinal scales also contain equality and inequality information, in the form of “ties” at any given rank. In the marathon example, if after 1St and 2nd place, three people all crossed the finish line at exactly the same moment, they could all be given rank “3”, as they finished after “2”, and before “4”. Stevens’ interval scale possesses the properties of the aforementioned scales (i.e. equality and order), and has the additional property of equality of differences. A classic example of an interval scale is temperature measurement using a thermometer. The liquid in the thermometer is known to increase in volume linearly with an increase in temperature. The marked gradations on the outside of the thermometer are set at equal intervals of the volume within the thermometer. Thus, the change in volume of the liquid from the 10° mark to the 20° mark is equal to the volume change between the 20° and 30° marks. Given the linear relationship between volume and external temperature, one can conclude that the physical temperature differences are also equivalent. Ratio scales possess the properties of equality, order, and difference, and, in addition, reflect a physically additive structure by the presence of a true zero point. Ratio scales are used when the object in question has a meaningful point of absence or non-existence. Mass and length are common examples. Each scale type defines a set of permissible transformations, under which the information contained in the scale remains invariant. Nominal scales permit only one-to- one transformations, where any value in the transformed scale s, has only one corresponding value in original scale so, and vice versa. Ordinal scales permit monotonic increasing transformations, as these preserve order from the original scale. Interval scales permit positive linear, or afline, transformations, of the form: 3, = a -so + b. Ratio scales permit positive similarity transformations, which preserve the ratio of two scale values, of the form: 3, = a -so, where a is a positive real number. Non-permissible transformations of any scale result in loss of information, and the resulting scale can only be treated at the level of measurement which permits the transformation. For example, a non-linear monotonic transformation of an interval or ratio scale results in an ordinal scale, and a linear transformation of a ratio scale results in an interval scale. In the case of linear transformations and monotonic transformations with functional formulas (e. g. Xn or log(X)), a non-permissible transformation can be reversed to recover original scale information. This is not true of all monotonic transformations, however. The notion of permissible statistics was a natural extension to permissible transformations. Permissible statistics were defined by Stevens to be functions whose meaning and statistical inference remained invariant across permissible transformations of a given scale type. Non-parametric statistics, such as frequency-based and rank-order concordance indices, were the only statistics applicable to nominal and ordinal scales, respectively. Advanced parametric methods, such as t-tests, F-tests (analysis of variance), and Pearson correlational indices were restricted to interval and ratio scales. Accepting Stevens’ theory of scale types, even with the restrictions it placed on transformations and statistics, would have represented considerable gains in measurement theory for psychology. Despite these potential gains, however, many psychologists still saw a problem. Stevens’ theory essentially told them that the variables they study are indeed measurable, but due to scale properties, only certain statistics are allowable, hence only certain hypotheses could be tested. Acceptance of this, combined with the lack of evidence for an interval nature of a majority of psychological variables, would amount to an admission that the large group of meaningful statements limited to interval and ratio scales could rarely be made in psychology. To psychologists attempting to expand measurement practice and psychological science to the boundaries of physical science, these restrictions were unacceptable. The nature of psychological data, and the methods used to analyze them, became the major point of contention. Psychological Data: Ordinal, Interval, or Unimportant? The earliest attacks on Stevens’ theory were based on the notion of statistical methods being closed systems. Typified by Burke (1953), Lord (1953), and Anderson (1961), these criticisms essentially stated that numerical calculations are independent of measurement scales and empirical phenomena, and thus any calculation can be conducted on any numbers. After all, states Lord (1953), “the numbers don’t know where they came from.” Though the logic of this statement is unclear to some], Lord’s conceptual separation of nmnerical computation and statistical meaning is evident. He uses examples such as calculating the arithmetic mean of j ersey numbers for freshmen on a football team. Gaito (1980) supports such reasoning, suggesting that statistical theory and measurement theory are independent and unrelated considerations. This type of argument, sardonically termed “computational libertarianism” by Michell (1990), boils down to both a difference in semantics and a lack of consideration of empirical meaning. It would have been ridiculous for Stevens to suggest that calculations cannot be done using scale values. No researchers, at least to date, have been legally or otherwise restricted from performing mathematical calculations. These calculations should instead be judged by their eventual meaning or use in hypothesis testing. Unless researchers following Lord’s logic can propose meaningful hypotheses about the mean of nominal scales such as jersey number, the calculation of the mean remains theoretically and empirically meaningless. Other researchers have questioned the relationship between statistical methodology and measurement theory based on statistical assumptions. Gaito (1960) presents the argument that statistical tests are only mathematically based on distributional assumptions. According to Gaito, verification of these distributional assumptions, rather than scale type, validates the use of a particular statistical method. The problem with this argument is similar to that of the early criticisms of Lord (1953) and Burke (1953), in that no explicit linkage is made with the meaning of the statistical test or hypothesis. A variety of transformations of a given variable could be conducted in order to produce a normally distributed result. Gaito is technically correct, as this would indeed validate the use of statistical methods assuming normality, but the hypotheses tested under the transformation may become meaningless (e. g. difference in logarithms of attitudes toward an object), or, at the very least, difficult to interpret. Again, the key is not valid statistical methodology, but valid empirical inference and theoretical meaning. Stine (1989) sums up the critical relationship between substantive theory, measurement, and statistical methodology, and the flaws in arguments such as Gaito’s: “In short, for the statistician or mathematician, statistical methods are closed systems. For the scientist, statistical methods are but one component of a larger, more complex system. The full potential of a statistical technique is realized only lTownsend & Ashby (1984): “Just exactly what this curious statement has to do with statistics or 10 when its proper role as a component of the scientific endeavor is realized. A failure to recognize this role can lead to scientific decision making on the basis of nonsense.” Another set of responses to Stevens’ theory were motivated by the desire to use parametric methodology with data that were not shown to be interval or ratio scaled. Recall that the only difference between ordinal and interval data is an equivalent distance between scale points. This meaningful distance, according to Stevens’ (1946) theory, permitted the use of advanced parametric statistical methods of testing mean differences. The difficulties in verifying equal intervals in measurement instruments, combined with the desire for parametric methodology, led many psychologists to seek proxy indicators for an interval scale. Perhaps because of its role in many parametric methods, the most popular proxy indicator of an interval scale has been the normal distribution. Gaito (1959) reasoned that the normal distribution is evidence for an interval scale because one can divide a normal distribution into equal units based on the standard deviation. Achenbach (1978) writes: “In effect, then, the assumption of a normal curve also implies an assumption about the type of measurement scale employed.” Jensen (1974) states: “if normality of the population distribution of the trait is correct, we have a true interval scale of measurement”. In 1980, he writes: “Ipso facto, any test of intelligence that yields a normal distribution of scores must be an interval scale.” Despite all claims to the contrary, there is no evidence that normality of distribution is a valid indicator of an interval scale (Stine, 1989). The argument of Gaito measurement eludes us.” 11 (1959) is flawed, in that standard deviations only allow a normal curve to be divided into equal areas based on probabilities. The empirical distance between points on the scale is an entirely different consideration. For instance, knowing a data set has a mean of 10, standard deviation of 1, and is a perfect normal distribution only tells us that the probability of a data point falling between 8 and 9 is equal to the probability of falling between 11 and 12. It in no way informs us of the empirical equality of these distances. Thomas (1982) illustrates a situation which further falsifies claims that normal distributions infer scale of measurement. The study he used as an example, Yuan (1933), suggested that weight be considered a lognormal variable due to the lack of negative values. Yuan graphed weight and log weight from a sample of 1000 girls and illustrated that the log weight conformed to a normal distribution much better than the untransformed weight. Since weight is a ratio scale (and consequently an interval scale), a log transform is not permissible. Thus, log weight is not an interval or ratio scale, yet it displays a normal distribution. Thomas (1982) also proves that, for any ordinal scale measuring an underlying continuous distribution, a transformation to a normal distribution existsz. He points out a startling implication of the latter proof: if we incorrectly assume normal distributions are the result of measurements using interval scales, and know that any ordinal scale can be transformed to normality, we would erroneously conclude that any ordinal scale can be transformed into an interval scale! Scholars in measurement theory have suggested there is little evidence to conclude that performance measures are linearly related to the underlying construct of interest (Krantz & Tversky, 1971). We are usually only able to Show these measures to be of ordinal 2 Proofbased on Roussas (1973), pp. 185-186 12 level, and thus only monotonically related to the construct. Despite these findings, most psychologists still believe that normality of distribution somehow implies an interval scale. Even among those accepting the non-interval status of measurement in psychology, there is continued use of advanced statistical methodology requiring interval scales. This use is based primarily on simulations demonstrating the robustness of parametric methods to transformations not permitted of interval scales. In one of the first, and most often cited, of these studies, Baker, Hardyck, & Petrinovich (1966) calculated t- tests on interval—level data disturbed by random ordinal transformations, finding that the sampling distributions of the resulting t statistic were very similar to the sampling distribution of the statistic with the undisturbed interval data. They therefore concluded that the t-test is robust to violations of interval assumptions, and it is adequate for use with ordinal data. Results of a Monte Carlo simulation by Gregoire & Driver (1987) suggested no clear power advantage of either parametric or non-parametric tests when testing for two group (t test vs. Mann Whitney U) or multi-group (F test vs. Kruskal Wallis H) mean differences of Likert scales under various ordinal transformations. However, reinterpretations of their results have implied a power superiority of parametric tests (see Rasmussen, 1989). Zumbo & Zimmerman (1993) demonstrated minimal power differences when t-tests are applied to ranked data, or to ranked scores with added error, and conclude that it is not necessary to use non-parametric tests on ordinal level data. The message emerging from these investigations is that it is generally permissible to use parametric tests with ordinal data, or at the very worst, it is an arbitrary consideration. This conclusion has been challenged by Stine (1989), who notes that 13 simulations based on random disturbances to interval data, such as that of Baker et al. (1966) and Zumbo & Zimmerman (1993), may not be valid. According to Stine, these Monte Carlo methods are valid only if different ordinal scales are randomly selected for each use of a given statistic, i.e. that diflerent ordinal representations of the data are used across replications. Stine argues that it is more likely for a single ordinal representation (or disturbance) to be in effect across many situations. For example, because of anchoring or other problems, a Likert scale may have “compressed” values near one endpoint, such that the empirical distance between ratings of ‘6’ and ‘7’ is less than that between ‘ l ’ and ‘2’. The behavior of a series of t-tests using this scale may be very different than a series conducted on a Likert scale which exhibits random “compression”. Stine concludes that if a permissible transformation exists (i.e. an empirically equivalent scale) such that the inferences made (or decision error probabilities of such) using the statistical method in question are altered, then the method is not robust to violations of interval level assumptions. By advocating proxy indicators of interval scales, such as normality of distribution, and using parametric methodologies with ordinal data, psychologists have perhaps, to use an athletic analogy, both inappropriately lowered the bar and strengthened the high-jtunper, with respect to measurement and statistical methodology. The next section will describe how these uncertainties in level of measurement have the potential for misinterpretations of moderated multiple regression analyses. 14 IntervilSiles. Ratio Scalesoand Moderated Multiple Regression Although the Monte Carlo studies discussed above have quelled most concerns about robustness of parametric methods, Stine’s criticisms notwithstanding, the effects of measurement level have been further investigated in the context of moderated multiple regression. Since, for reasons discussed earlier, most psychological data is assumed to be of interval scale, much attention has been given to the effects of linear transformations on parameter values and interpretation of MMR. In addition to the standard linear transformation (3, = a- so + c, where st is the transformed scale, so is the original scale, a is a positive real number, and c is any real number), the special case of additive transformations (s, = so + c, where c is any real number) has been addressed. Additive transformations of the form s, = So — 3'0 are often used to “center” the data prior to estimating the regression equation. Such centering simplifies interpretation of simple slopes and often reduces multicollinearity between the product term and its component terms (Aiken & West, 1991). Perhaps the most comprehensive assessment of the effect of additive and linear transformations on regression equations with product terms is that of Cohen (1978). In an effort to demonstrate the invariance of results across linear transformations, Cohen illustrated the effects algebraically. Given arbitrary linear transformations of the predictors, X’ and Z ’, where X’ = aX + c, and Z ’ = dZ + f, the simplified MMR equation in terms of the transformed scales becomes: t—lt——1t—l(——-—l 15 Regarding the new regression coefficients for variables X’, Z ’, and X ’Z ’, several conclusions are apparent. Coefficients for X’ and Z ’ are altered by both additive (c,f) and multiplicative (a,d) components of the transformation equations. This can easily be seen in the common situation where one wishes to standardize the predictors prior to analysis (i.e., a = SDX'l , d = SDZ'l , c = —X , e = —Z ), resulting in standardized regression coefficients. If there is no interaction present using the original scales (B3 = 0), then transformations which are solely multiplicative (c = e = 0) will simply change 31 and [32 by a factor of their respective multiplicative constants, a and d. The regression weight for X ’Z ’ is shown to be affected by the multiplicative constants a and d, but unaffected by additive components of the transformations. Thus, the “centering” operation described by Aiken & West (1991) has no effect on the value of [33. Cohen (1978) also showed that under a linear transformation, the new regression weight counteracts the shift in the standard deviation of the product term created by the transformation. Because of this, significance tests on the transformed regression weight remain unchanged. Important as they are, the examination of these effects on regression weights was not Cohen’s primary focus. Cohen demonstrated that Rzy . x, 2, x2 = RZY . x', z; x'z’, and that Rzy . x, z = Rzy . x', z', leaving the F-test for incremental R2 also unchanged. Cohen’s overall conclusion is that, despite the various effects on regression weights, the essential tests of interactive effects are invariant to linear transformations. According to Cohen, this demonstration renders concerns such as multicollinearity of product terms and 16 components (Althauser, 1971), and correlated random predictors (Sockloff, 1976) of trivial importance. There have also been some discussions of ratio level data in MMR analyses. If one examines Equation 2 above, it can be seen that additive transformations exist (a = d = 1; c = 132/133, f = [31/53) which equate the regression weights on X' and Z' to zero. In this situation, all predictable variance from the original equation is carried by the X’Z' product term (Rzy . x, z, xz = Rzy . x’z')- If one were to use the R2 as an index of fit for the model, it results in an arbitrary decision between a strictly multiplicative model (Y = X’Z') and an additive-multiplicative model (Y = X + Z + XZ), depending on which interval scales are used. Schmidt (1973) illustrates this point using Vroom’s (1964) Expectancy-Valence theory of motivation. Since the additive transformations which create the ambiguity between the multiplicative and additive-multiplicative models are not permissible of ratio scales, Schmidt concludes that ratio scales are necessary to make the distinction. Using ratio level measures of valence and expectancy, for instance, a researcher could only perform multiplicative transformations, which cannot convert an additive-multiplicative model to multiplicative, or vice versa. Arnold & Evans (1979), based primarily on Cohen’s (1978) previously discussed work, take issue with Schmidt (1973), and suggest that the MMR F -test of incremental R2 is the proper test of a multiplicative model. According to Arnold & Evans, the proportion of variance “carried” on the product term or its components is not relevant to testing a multiplicative relationship. In support of their point, Arnold & Evans (1979) present an example of two physicists attempting to verify the ideal gas law, which is given by: 17 V=— (3) where V=volume, P=pressure, T=temperature, and R=a constant. It is at this point where Arnold & Evans make an error. Perhaps due to a misunderstanding of physical laws, they claim the primary difference between physical laws and relationships in psychology is that physical laws must specify units of measurement (6. g. centimeters, degrees Kelvin, etc.) in order to be valid. The ideal gas law is only valid, according to Arnold & Evans, if temperature (T) is measured in units Kelvin. This statement is incorrect, as the measure of temperature only has to be a ratio scale, like the other scales in the formula. As can be seen in Equation (3), the value of the constant R can simply be adjusted to reflect a permissible ratio rescaling of T, and the law will maintain its fit to data and, more importantly, its multiplicative form. This confirsion between measurement level and measurement unit is compounded as Arnold & Evans (1979) describe their example. Physicist A uses degrees Kelvin as a measure of temperature and Physicist B uses degrees Celsius. To test their theories, these physicists set up a moderated multiple regression equation as follows: 1 1 V=b0+blT+b2F+b3(T-;) (4) Both physicists are expecting to verify the ideal gas law by finding: 1) a significant increase in R2 when the (T-l/P) term is added; 2) b0, b1, and b2 all to be zero; and 3) for b3 to equal the constant R. Arnold & Evans (1979) state: “This rather strong 18 prediction is based upon their confidence that their measures of T, and P, and Vare on ratio scales.” We will soon see that this statement cannot be true. Both physicists run their analyses and predict 100% of the variance after the product term is added, concluding they have indeed verified a law. Physicist A finds that the increase in R2 was highly significant, b0 = b] = b; = 0, and b3 = the constant R. Thus, she concludes the underlying physical law is: or, the correct law as originally described in Equation 3. Physicist B, who also finds an identical significant increase in R2, finds his equation to be somewhat different. As with Physicist A, be and b; are both zero, and b3 = the constant R. However, b2 is now equal to a new constant K. Physicist B concludes the correct form of the ideal gas law is: V1.5: “P P ’ which reduces to: V_K+RT _ P , Recall that the only difference between Physicists A and B is the scale chosen for temperature, Kelvin or Celsius. Arnold & Evans use this fact to argue that neither physicist’s formula for the law is really correct. A simple change in the unit of measurement has effectively changed a law from a multiplicative form to an additive- l9 multiplicative form. This ostensibly supports their argument that laws are not constant without specification of measurement units. As noted earlier, the flaw in such an argrment is based on a confusion between measurement unit and measurement level. Arnold & Evans commit this error when they allow Physicist B to use a Celsius scale for temperature, and simultaneously assert that both physicists are confident their scales are of ratio level. Kelvin is a measure of temperature based on molecular activity, and thus has a true zero point. Celsius is a measure of temperature constructed by means of a non-permissible additive transformation to the Kelvin measure (°C=°K - 273°). Thus, the Celsius scale loses the true zero point (e.g., 40° C is not twice as much warmth or molecular activity as 20° C), and is merely an interval scale. Physicist A can be confident her formula is the correct form of the ideal gas law, as permissible transformations of all its ratio measures can only change the value of the constant R, and will leave the essential multiplicative form of the law unchanged. However, Physicist B’s additive—multiplicative model can be changed by permissible linear transformations of the temperature measure. This discussion reinforces Schmidt’s (1973) earlier arguments. Verification of a multiplicative theoretical model cannot be accomplished by means of moderated multiple regression when the variables are measured at the interval level. Although Cohen’s (1978) work showed that the significance test for a product term is invariant to linear transformation, this test is not equivalent to a test of a purely multiplicative theory. The test of this interaction term with interval scales does, however, allow one to reject a purely multiplicative model based on a zero-weighted interaction term. An additive model (Y = X + Z) cannot be made multiplicative by a linear transformation. If one 20 obtains a significant interaction using MMR, a further examination of the variables in question is required to verify a multiplicative model. Arnold & Evans (1979) make this point, and this author is in complete agreement. However, a clear concept of the level of measurement is required, not, as Arnold & Evans suggest, the unit of measurement. While theory and measurement in applied psychology may never reach the point of specifying standardized units, the goal of establishing constructs with theoretically meaningful zero points is more realistic, and the only necessary consideration. Moderated Multiple Regression Jand Ordinal Scales: Criterion Issues The use of ordinal scales with moderated multiple regression poses a more severe set of problems. It has been repeatedly noted that permissible monotonic transformations of dependent variables can completely remove non-crossing interaction effects (Cliff, 1992; Loftus, 1978; Busemeyer & Jones, 1983; Krantz & Tversky, 1971), and can often attenuate a crossing interaction to the point of potential non-Significance (Busemeyer, 1980). Note that these findings also imply that data suggesting no interaction is present can be subject to a monotonic transformation which creates a significant interaction term in an MMR analysis (see Loftus, 1978). An interaction is said to be “non-crossing” when the rank orderings of Y across X are the same for all Z values, and the rankings of Y across Z are the same for all X values. Any rank order changes indicate a “crossing” interaction, as a plot of Y regressed on X (or Z) would cross at the point on Z (or X) where the order change occurred. Unfortunately, the effects described above are not easily demonstrated algebraically, as Cohen (197 8) had done with linear transformations. A few popular 21 monotonic transformations, such as power (Xn), root (W ), and logarithm, have been discussed (see Bimbaum, 1973; Busemeyer & Jones, 1983), but these only represent a subset of monotonic transformations which have functional forms. Since monotonic transformations, as a class, cannot be expressed with a functional formula, the examples here will use small data sets and simple transformations. First, let us examine a non-crossing interaction. Consider a situation in which we are determining whether or not slope bias exists for Black and White subgroups when predicting college achievement, indexed by GPA, from the score on a standardized SAT test. The dataset for this example, based in part on Figure 7.7 in Gregory (1996), p. 268, is shown in Table 1a. It can be seen that GPA maintains the same rank ordering within Race across levels of SAT score, and within SAT score across levels of Race, verifying a non-crossing interaction. If one were to examine these data with moderated multiple regression, the following additive and additive-multiplicative equations would be generated: GPAA = (.006 XSAT) + (.85 xRace) - 1.44 (5) GPAAM = (.002 x SAT) - (.17 xRace) + (.0026 xSATxRace) + .09 (6) The R2 for Equations (5) and (6) are .960 and .998, respectively. The F-test for ARZ, as well as the t-test of the regression weight of the (SAT x Race) interaction term, are both significant at the .001 level. Based on these results, a researcher would conclude that slope bias exists when using a standardized SAT score to predict college 22 achievement. This conclusion would likely result in a more detailed examination of the test, and perhaps its discontinued use. Table 1. Datasets for Slope Bias Example: GPA by Race and SAT Score a. Original Data grog M White 1.1 Black 0.8 b. Transformed Criterion Data R_ac_e M White 1.1 Black 0.4 SAT—$202 .499. 5%). E 2.5 3.25 4.0 1.7 2.2 2.6 SAT Score 451). :09 $2 2.4 2.9 3.3 1.7 2.2 2.6 Rather than conclude the problem lies with the predictor in this case, an examination of the criterion may be in order. There is no reason to believe that GPA is an interval scale of college achievement. For example, the difference in achievement between GPA’s of 4.0 and 3.5 may be much greater than between 3.5 and 3.0, due to a 23 particular grading policy which requires students to put forth “extra effort” in order to attain very high grades. The potential variety of such grading policies casts any interpretation of GPA as an interval scale in doubt. In light of this, consider the dataset presented in Table lb. These GPA values have an identical rank ordering as the data in Table 1a, and thus represent a permissible monotonic rescaling of the original GPA variable. Comparing Tables 1a and 1b illustrates that the rescaling is simply slight, order-preserving changes to four GPA values. Conducting moderated multiple regression on the rescaled numbers would result in the following equations: GPAA = (.0053 x SAT) + (.7 x Race) - 1.18 (7) GPAAM = (.0053 XSAT) + (.7 xRace) + (.000 xSATxRace) - 1.18 (8) The R2 for Equations (7) and (8) are obviously the same, .976. The AR2 is zero, so there is no additional variance accounted for by adding a (Race x SAT) product term. A researcher using this dataset would conclude that no evidence of slope bias exists for this standardized test. While there is evidence of an overall mean difference in GPA across Race, the regression lines for each subgroup are otherwise identical. The above example illustrates how a non-crossing interaction can be completely removed by applying a monotonic transformation to the dependent variable. In’the specific case of assessing slope bias, it is notable that the level of measurement of the criterion may have serious implications for the future use of the predictor, when the measurement properties of the predictor are not even assessed. Using two datasets which 24 are empirically equivalent non-interval scales of achievement, one can reach two different conclusions regarding the existence of slope bias. Neither conclusion is, in fact, necessarily correct or incorrect. Using non-interval scales prevents the researcher from making any statement about slope bias, as the test for slope bias is not invariant to transformations permissible of the variables involved. Now consider an example of a crossing interaction. A researcher is studying the relationship between social integration behaviors, previous work experience, and the length of time working for a particular organization. The dependent variable, amount of social integration behavior, is measured using summated Likert response items, resulting in a 20-point instrument. The data for this example are shown in Table 2a. The additive and additive-multiplicative equations for the dataset in Table 2a are as follows: SocBehA = (1.1815 x Time) + (.46 x WorkExp) - .095 (9) SocBehAM = (-.385 x Time) - (5.81 x WorkExp) + (1.045 x Time x WorkExp) + 9.31 (10) The R2 for equations (9) and (10) are .827 and .987, respectively. The AR2 of .16 is significant at or=.0001, which the researcher concludes to be evidence of a strong interaction effect in the determination of social integration behavior. Individuals with no previous work experience tend to exhibit more social integration behavior than individuals with previous work experience when first arriving in a new organization. 25 Table 2. Datasets for Social Behavior Example: Social Behaviors by Work Experience and Time 3. Original Dataset Previous Work Experience No Yes b. Transformed Dataset Previous Woplg Experience No Yes c. Transformed Dataset Previous Woalg Experience No Yes Time fi'om Employment 2 mo 3 mo 4 mo 5 mo 6 mo 5.0 6.0 7.1 9.2 10.0 1.3 4.2 7.5 12.3 14.3 Time from Ermrloyment 2 mo 3 mo 4 mo 5 mo 6 mo 4.4 5.3 6.1 7.9 8.9 2.9 3.7 7.0 9.9 12.1 Time from Employment 2 mo 3 mo 4 mo 5 mo 6 mo 4.7 6.0 7.1 9.7 10.3 3.2 4.4 7.5 10.5 12.2 26 However, as time passes, the individuals with previous work experience increase their social integration behavior more rapidly than those without such experience. The researcher concludes the initial difference in behaviors is due to the newcomers without work experience attempting to “fit in”, perhaps using social behaviors to compensate for lack of knowledge of workplace etiquette. Those with previous experience, and such knowledge, have no need to compensate, and eventually their previous workplace experiences allow them involvement in more social integration behaviors. Suppose, however, that the 20-item scale used by this researcher is a non-interval level measure of social integration behavior. For example, this could be due to problematic anchors for Likert-type items or inclusion of items reflecting very different levels of behavior. Problems such as these might create situations where distances between any two scale points would not be a constant across the entire scale. If the scale as a whole is considered ordinal, transformations preserving rank order can result in the data presented in Table 2b. Submitting these data to a moderated multiple regression analysis results in the following equations. SocBehA = (.905 x Time) + (.60 x WorkExp) +.49 (1 l) SocBehAM = (-.07 x Time) - (3.3 x WorkExp) + (.65 x Time x WorkExp) + 6.34 (12) R2 for equations (11) and (12) are .866 and .976, respectively, resulting in 3 AR2 of .11. This is Significant at the 0t=.01 level. However, this AR2 is smaller than the .16 obtained using the original data. While the interaction is still present, its strength has 27 somewhat diminished. Now consider a second transformation to the original data, presented in Table 2c. The associated regression equations for these data are: SocBehA = (.975 x Time) + (0 x WorkExp) + 1.71 (13) SocBehAM = (.285 x Time) - (2.76 x WorkExp) + (.46 x Time x WorkExp) + 5.85 (14) R2 for equations (13) and (14) are .926 and .978, respectively, with AR2 equal to .052. This is significant at the or=.05 level. The effect size associated with the interaction is smaller than that in the previous data set, and a great deal smaller than the original data set. When treating the scale as ordinal, two researchers using empirically equivalent scales can thus reach two very different conclusions about the strength of the interaction effect. As opposed to the non-crossing type of interaction, however, the regression equation can never be rendered completely additive. It is impossible to do so without affecting the rank order of the criterion variable. This can easily be understood if one thinks of the crossing interaction graphically, in terms of intersecting regression lines. An additive equation is graphically represented by parallel regression lines. Thus, in order to transform a crossing interaction model to an additive model requires the “uncrossing” of the lines to make them parallel. Such a manipulation requires that some of the rank orders near the high or low end of the criterion be inverted. It is nevertheless possible to reduce the effect size of the interaction by minimizing the scale distance between ranks at extreme values of the criterion. Graphically, this has the effect of “compressing” the ‘X’ formed by the interaction. This increases the fit of a linear equation through the ‘X’, and subsequently reduces the amount of variance accounted for by an additional product term. 28 While the two examples shown above still result in a significant interaction, more severe scale transformation would cause the interaction to be statistically non-significant. However, the regression weight of (and amount of variance accounted for by) the product term can never be reduced to zero. The above sections have highlighted the problems with interpreting interaction effects when the criterion cannot claim an interval level of measurement. When the interaction is of the non-crossing variety, transformations permissible of ordinal scales can completely remove the effect, essentially converting an additive-multiplicative equation into an additive equation. When the interaction is crossing, such transformations cannot completely remove an interaction, but can potentially attenuate the effect size to the point of statistical non-sigrrificance. Although crossing interactions can never be completely removed, the reduction to non-significance would result in a researcher concluding there is no interaction effect, and advocating the default additive model. Moderafi Multiple Regression m Ordinal Scales: Predictor Issues Problems using moderated multiple regression techniques with ordinal level data are not restricted to the criterion variable. Although the measurement level of a predictor is obviously only an issue with continuous predictors, as dichotomous moderating variables are only of nominal level3, monotonic transformations of a predictor can have effects on interpretation of regression results in these situations (Busemeyer, 1980; Busemeyer & Jones, 1983). Busemeyer & Jones (1983) examine the specific case 3 Ordinal, interval, and ratio properties cannot be assessed with only two scale points. 29 involving quadratic transformations of a predictor variable. If there is reason to suspect a quadratic component in the relationship between a predictor and criterion, Busemeyer & Jones (1983) suggest the inclusion of higher order terms, such as X2, in a hierarchical regression analysis. As with interactions, tests of these trend components are interpretable if they are entered into the regression equation afier lower order components (Cohen & Cohen, 1983; Cohen, 1978). Testing of cubic and higher order terms proceeds in a Similar manner, with the regression equations becoming exponentially larger as an increasing number of exponential and product terms are required. Rarely, however, do psychological theories obligate an assessment of trends beyond the quadratic form (Cohen, 1983). The methods outlined by Cohen (1978) allow us to examine nonlinear trend components and their interactions within the context of moderated multiple regression, but only in the case where the firnctional form of the nonlinear transformation (or nonlinear relationship) is suspected or known. In a situation where we have no reason to suspect a predictor is related to a criterion via a logarithmic or polynomial function, yet also have no reason to believe the predictor is of interval level, inclusion of specific functions of predictors in a regression equation offers us little more than a “hit and miss” method of finding a critical functional transformation, if one even exists. This method is useless when faced with a non-interval predictor with unknown distances between scale points. The general case of monotonic predictor transformations can, however, be examined in a similar fashion as that of criterion transformations, i.e., finding the transformation of X which renders X-Y regression lines parallel across levels of Z. 30 Determining whether a transformation of X, say X’, exists, such that Y = b0 + bIX’ + b2Z, is equivalent to determining whether a criterion transformation exists for X' = [Y + b2 (- Z) - b0] / b1. Since linear transformations are a subset of monotonic transformations, we can dispose with the b0, b1, and b2 terms in the equation, leaving X’ = Y + (-Z). In other words, if Y is shown to be an additive function of X and Z, then X is also an additive function of Y and Z, albeit with an inverted ordering on the Z variable. Any transformation of X which achieves X' = Y + (-Z) also achieves Y = X’ + Z. We can see from the previous data examples that, when Y values are tabled by X and Z, the transformation of a Y value is an operation on a cell, or set of equally valued cells, in the table. The transformation is order-preserving as long as the new number is less than the next highest original number and greater than the next lowest original number. For instance, in the slope bias example data in Table la and Table lb , four GPA values were changed, which preserved rank ordering and completely removed the moderating effect. However, consider what a monotonic change in a predictor represents in the data tables presented to this point. No longer is one altering a single cell, but “shifting” an entire column or row. The regression equation at any level of Z is affected by a change in an X value, provided a Y value exists at the level of X and Z. For example, in Table 1a, changing any SAT value would result in changes to both White and Black regression lines, and their slopes could never be equated. Thus, it appears that the monotonic transformation of a predictor cannot remove a moderating effect. A predictor transformation examined in this manner can, however, attenuate the interaction effect. Consider the very simple dataset in Table 3. RZA and RZAM for these data are .9375 and 1.00, respectively, resulting in a AR2 of .0625. After one performs the 31 simple monotone predictor transformation (2=2.5) on X, the RZA and RZAM become .894 and .952, respectively. The AR2 in this case is .058, slightly lower than the original value. If one performs a further transformation (1=1.5) on X, the RZA and RZAM become .917 and .978, respectively. The AR2 increases from the last situation to .060. While this dataset represents a non-crossing interaction, similar effects are likely to be observed with a crossing interaction. Monotonic transformation of a predictor appears to have the potential to attenuate an interaction effect but not remove one. Table 3. anset for Predictor Rescaling Example: Y by X and Z Z X .1. Z 3 _1_ l 2 3 2 2 4 6 However, this still is not the entire story. Because the labelings of “predictor” and “criterion” in these tables have already been shown to be arbitrary if a transformation to an additive model exists“, X could be considered the criterion, with Y and Z as predictors. We’ve already shown that Y can be transformed so that Y = b1X + b2Z, which is equivalent to saying bIX = Y - sz. If X is considered the criterion, it looks as if a ’Y=le+bZZ+bo€--) b.X=Y-b2Z-bo 32 monotonic transformation to “predictor” Y exists in the data from Table 1a to create this additive model. This is contrary to what was found when X was considered a predictor. The reason for this can be seen by examining the same data in Table la, but with SAT values in the cells at different levels of GPA and Race. This arrangement of the data can be found in Table 4. Table 4. Dataset in Table lapresentedJLS SAT by Levels of Race and GPA Race GPA White 200 300 400 500 600 Black 200 300 400 500 600 The difference between Table 4 and Table la is clear. Monotonic transformations of GPA in Table 4 still involve the shifting of entire columns, but since only one SAT value is in each column, we can effectively alter the regression line for one group without affecting the other. Thus, the regression slopes can be equated by an order-preserving rescaling of the GPA “predictor”. The cause of this phenomenon is primarily the design of the dataset. While SAT score is likely a continuous distribution in the larger sample, the table represents a dataset in which a pair of observations were selected from six levels 33 of SAT score, one observation for each Race. As such, the table represents a completely crossed design of Race x SAT. Monotonic predictor changes in a fully crossed design will, by definition, alter regression equations across all levels of the other design factor. The data neither are, nor were designed to be, fully crossed in GPA x SAT, and the empty “predictor” cells created by this crossing allow a monotonic transformation to have an effect. The issue of predictor transformations affecting moderating effects thus reduces to the question of what situations involve empty cells, rows, or columns in the data matrix. An obvious situation is one in which X and Z are correlated. IfX and Z are completely uncorrelated, as in a completely crossed design, no predictor transformation can remove the interaction, as it necessarily affects all regressions across Z. Conversely, if X and Z are perfectly correlated, Z is a linear firnction of X, and the moderated multiple regression equation reduces to: Y = b() + bx + b2(kX+m) + b3(X)(kX+m) (15) Y = (b0 + bzm) + [x x (b3m + bzk)] + [x2 x (13310] (16) In this case, it can be seen that Y becomes an additive function of X and X2. Since the quadratic term itself is a monotonic function of X, a monotonic rescaling of X can easily remove any variance accounted for by X2, leaving only a main effect for X. We have now seen that when predictors X and Z are perfectly orthogonal and uncorrelated, monotonic transformations cannot remove moderating relationships, but can attenuate them. In the trivial demonstration of perfect correlation, predictor 34 transformations can completely remove nonlinear effects. This suggests predictor intercorrelation may have an important role in determining the “robustness” of moderated multiple regression when predictor variables are not of interval level. Dunlap & Kemery (1988) have noted that increases in intercorrelation between X and Z result in the increased probability of detecting an interaction effect, and, when X and Z are measured with error, a higher reliability for the XZ product term. However, the above discussions suggest that, when predictors are of non-interval level, the increases in detection of interactions (due to predictor intercorrelation) noted by Dunlap & Kemery (1988) may paradoxically be accompanied by an increasing lack of precision when interpreting them. The problems associated with a single predictor measured at the ordinal level are compounded when both X and Z predictors are ordinal scales. Typically, in these cases, the researcher is not interested in examining differences in regression slopes across a third variable, but in evaluating a theory which predicts a multiplicative combination of the two predictors. In addition to considering the issues related to interval and ratio scales raised by Schmidt (1973), the researcher is advised to be wary of ordinal level data. The issues raised above now apply to both X and Z variables, and one must consider the effects of monotone transformations of both simultaneously. When faced with two predictors and a criterion of ordinal level, the researcher can make very few confident statements about the form of the relationships between the variables. Bimbaum (1973, 1974) notes that, in this situation, the multiplicative equation Y = a x Xb x Zc can be rendered additive by permissible logarithmic transformations, log(Y) = log(a) + b log(X) + c log(Z). Thus, when we lose confidence that any of our 35 variables are measured at an interval level, we have come full circle to the point of being unable to distinguish a multiplicative model from an additive model. Moderated Multiple Ragressimnd Level of Measurement: Summary and Implications The previous sections have described the behavior of moderated multiple regression when the predictor and criterion variables are defined by scales at various levels of measurement. Three types of models emerging from moderated regression were discussed, the additive model, the additive-multiplicative model, and the multiplicative model. It was shown that when our predictor and criterion data are ratio scaled, we can accurately select one of these models as providing the best fit to data. When all variables are measured at the interval level, we can confidently reject the multiplicative model, but, if failing to reject it, cannot confirm or reject an additive-multiplicative model. When the criterion is measured at the ordinal level, we can attenuate or even eliminate interaction effects, thereby making a choice between additive and additive-multiplicative models arbitrary, or choosing an additive model in default due to lack of statistical significance of an additive-multiplicative model. The extent of potential attenuation is primarily a function of whether the interaction is of the crossing or non-crossing variety. Similar effects are potentially observed when a predictor is measured at the ordinal level, though the effects may themselves be moderated by the degree of intercorrelation between the predictor variables. Further ambiguity between the additive and additive-multiplicative models can arise when both predictors are measured at the ordinal level. Finally, when all variables are ordinal, we cannot make a distinction between additive, multiplicative, or additive-multiplicative models, since transformations exist which can transform any of 36 the three models into any other of the three models. The relevance of these problems is borne out by the earlier discussion of measurement in psychology. As our confidence in the interval nature of psychological measurement decreases, the interpretation of moderated multiple regression results becomes more difficult. A lack of scale precision may be an important factor reducing the “power” of moderated regression tests. While not related to statistical power, per se, the probability of detecting an interaction effect, when one exists, may be reduced when scales are not of interval level. Consider a situation where one thousand researchers are testing a moderating effect. Five hundred use interval scales and five hundred use non-interval scales. Since all interval scales are related by linear transformation, and we know linear transformations cannot remove moderating effects, we then know that if one of the five hundred researchers using interval scales finds a moderating effect, all of the researchers will find the effects. The situation is bleaker for the researchers using non-interval scales. These five hundred scales, related only by monotonic transformation (perhaps slight), wouldn’t necessarily show the same moderator effect size, and some might not even show the moderating effect at all. Examining the studies using the interval scales, the scientific field as a whole would likely decide they have found a robust and important moderating effect. Using the non-interval scales, the field might argue the interaction is difficult to detect, statistically unreliable, or perhaps not even to exist. These arguments should ring familiar, as they are those currently made regarding moderator effects in applied psychology. 5 For the sake of argument, this assumes all other research factors are the same. 37 Lack of scale precision may also be a factor explaining the common observation that moderator effects are more often detected in controlled, experimental settings than in applied field settings (McClelland & Judd, 1993). Several lines of reasoning point to measurement level playing an important role. First, experimental studies are more likely to use categorical predictors and test interactions via cell mean comparisons. Such tests are only affected by the scaling of the dependent measure, as predictors are merely of nominal level. Applied studies often use continuous scales for both predictor and criterion variables. In these situations, permissible transformations are given greater latitude to affect tests of moderation. Second, the independent variables involved in experimental studies are typically under sufficient control to allow the complete crossing of factors. Even in cases where one predictor is polychotomous, this prevents predictor rescalings from removing interactions, for the reasons discussed earlier. Conversely, applied studies usually sample both predictor variables, having very little control over their intercorrelation. This intercorrelation potentially leads to a greater likelihood of predictor rescalings affecting the test of moderation. Third, McClelland & Judd (1993) note that experimentalists typically predict crossing interactions, whereas field researchers usually predict only non-crossing interactions. We have seen that monotonic rescalings can attenuate a crossing interaction, but not remove it, and can completely remove a non-crossing interaction. For these reasons, it is possible that the use of non-interval data in applied field research poses much more of a threat to empirical meaningfulness of results than using such data in experimental settings. 38 While the above issues relate to the potential effects of scale misspecification on detection of interaction effects, there are also important implications for interpreting interactions that are found. Currently, applied psychologists lament that many interactions that are found account for a very small portion of overall variance. Field researchers have indeed noted that observed interactions usually account for between 1% and 3% of total variance (Champoux & Peters, 1987). The frustrating search for moderating effects has also led some authors to go so far as claiming that interactions accounting for 1% of the variance should be deemed important (Evans, 1985). In light of the demonstrations earlier in this thesis showing that minor changes to data can create large changes in moderator effects, it is possible that the meaningful interpretation of interactions accounting for 1% of variance would require measurement precision beyond the status of most psychological scales. Rationale and Overview for the Study Thus far, this thesis has demonstrated that using non-interval data with moderated multiple regression procedures can have a variety of harmful effects on a researcher’s ability to interpret results. These harmful effects have important implications for theory verification in applied psychology, and may especially be relevant to issues distinguishing experimental and field detection of moderators. However, two important issues remain. First, under what conditions will these harmful effects manifest themselves? The simple demonstrations presented earlier in this thesis are not representative of the wide variety of moderator effects found in research settings. To address this issue, the study presented in this thesis examined interaction 39 effects in situations defined by a variety of factors, including the baseline R2 prior to adding a product term, the AR2 incremental percentage of variance accounted for by the product term, the intercorrelation of predictors, and the measurement properties of all variables involved in the moderated regression equation. Results obtained from this study can assist researchers by determining what situations are most susceptible to interpretation problems when the precision of measurement is uncertain. A second important issue is reconsidering what exactly constitutes a monotonic transformation. Some researchers might defend their scales of measurement - which cannot be proven to be interval level - by suggesting that just because a scale is not verified to be interval level does not mean we can conclude it is merely a rank ordering of the attribute. In this sense, scales commonly used in psychology may be thought to lie somewhere on a continuum between ordinal level and interval level. Advocates of this position might argue that violent monotonic rescalings, though technically permissible of purely ordinal data, are not reasonable with most psychological scales. This author would agree that the majority of psychological scales likely represent more than ordinal information, and lie somewhere on the continuum between ordinal and interval level, it may also be true that the “reasonableness” of the transformation may be inversely related to the strength of the observed moderating effect. Interactions with large effect sizes may require drastic rescaling to remove or attenuate the interaction to non-significance, but moderators with small effect sizes may require only slight alterations of the scales used. Given the earlier discussion on interpreting moderators which account for very small percentages of variance (Evans, 1985), it is important to clarify this issue. If a monotonic transformation results in a scale which has measurement properties very 4O similar to the original data, most of the information present in the scale has been preserved, and an argument suggesting we have somehow destroyed the scale is less tenable. If such transformations remove or attenuate moderating effects with very small effect sizes, interpreting such effect sizes is likely a fruitless endeavor when one lacks very precise interval scales. This study will examine this issue by attempting to place specific monotonic transformations on the continuum between pure rank-order preserving transformations and linear transformations permissible of interval scales. This will be done by calculating the Pearson correlation coefficient between pre-transformation and post-transformation variables. Since a value of 1.00 denotes a linear transformation, I argue that very high correlations in the 8-9 range are “reasonable” and similar to the original scale. In these cases, the transformation is not drastic, and any changes of interpretation based on the transformation should be of serious concem. Answering the two general research questions presented above requires both a means of determining whether a moderating effect can be removed or attenuated by a monotonic transformation, and a means of generating a transformation which accomplishes such a feat. Two approaches have generally been used. The first, simultaneous conjoint measurement, examines the extent to which conditions are met in the dataset such that an additive, non-interactive representation is possible. Generation of such a representation is not required. The second method, Multiple Optimal Regression by Alternating Least Squares (MORALS) (Young, de Leeuw, & Takane, 1976), is an iterative algorithm for generating numerical transformations which maximize the R2 between sets of independent and dependent variables. Each of these methods is discussed in more detail below. 41 Simultaneous Conioint Measurement Recall that one of the criticisms of Stevens’ (1946) measurement paradigm was the arbitrary nature of scale assignment. Measurement level was not determined by consistent empirical relationships, but by the judgment of the investigator. The operations described by Campbell were not possible in psychology, so demonstrating the ratio or even interval level nature of data within Stevens’ framework was extremely difficult. As noted earlier, much of the desire to use parametric methods with ordinal data may have been due to the imposing conditions necessary to verify an interval scale. Some psychologists, however, chose to develop alternatives to both Stevens’ and Campbell’s theories, attempting to loosen the restraints imposed by the latter without accepting the investigator-centered aspects of the former. This recent avenue was spearheaded by the work of Luce & Tukey (1964). Rather than relax Campbell’s requirements of empirical additivity, as Stevens did, Luce & Tukey relaxed only the requirement that the basis of additivity be physical concatenation. Luce & Tukey demonstrated that empirical addition can be based on non-physical operations and did not require use of subunits placed side-by-side. In other words, psychologists could develop what amounted to interval-level scales, in Stevens’ framework, without formal extensive measurement. Luce & Tukey called this new type of measurement simultaneous conjoint measurement. As its name suggests, simultaneous conjoint measurement considers the combined effects of variables, rather than treat them independently. Simultaneous conjoint measurement can potentially be applied in any instance where two or more variables are thought to determine an empirical outcome 42 variable. Only categorical (nominal) and rank order (ordinal) properties need to be present in the determining variable set and outcome variable, respectively. If the relationships among the variables conform to a number of axioms (a series of if-then rules), necessary and sufficient conditions are met to define interval scales (termed standard sequences) on the set of determining variables, and subsequently on the determined variable. Essentially, the theory says that given variables X and Z which determine Y, if the necessary axioms are met, then monotonic transformations of ordered variables a, b, and c exist, such that a(X) + b(Z) = c(Y). These scales are defined by transforming the original component variables, producing an additive representation for all components. The work of Luce & Tukey (1964) was expanded upon in a three volume series entitled Foundations of Measurement (Krantz, Luce, Suppes, & Tversky, 1971; Suppes, Krantz, Luce, & Tversky, 1989; Luce, Krantz, Suppes, & Tversky, 1990), in which other types of conjoint measurement other than additive (e. g. polynomial, difference, geometric) are discussed in detail. The development of simultaneous conjoint measurement provided exactly what early 20th century psychologists were looking for in response to Campbell’s theory. If the axioms were successfirlly applied to psychological measures, it could provide standard sequences, i.e., interval scales, for psychological variables. The construction of these scales should also have interested adherents to Stevens’ theory, as it allowed more advanced statistical techniques to be used, avoiding debates about permissible statistics. More importantly, the central concept in Campbell’s original theory, additivity, had been preserved, and shown to be possible with non-physical variables. Thus, one of the problems motivating the development of Stevens’ theory had, in effect, been solved. 43 Despite the dramatic ability of conjoint measurement to potentially produce interval-level scales and additive relationships, these are not its most important implications. Taken as a whole, the essence of the theory is that these scale definitions were produced by examining empirical relationships between one or more variables. Scales were not constructed or arbitrarily determined in isolation. Measurement, according to conjoint measurement theory, is the assignment of numbers to empirical components, such that the relationship between the numerical assignments adequately represents the relationship between the empirical components. In this fashion, it stands in contrast with Stevens’ theory, which proscribed only permissible transformations for scale types, which themselves could be selected in isolation by the researcher. Partly because of the abstract nature of the theory’s presentation, demonstrations of its utility have been few in number. Early uses of the theory include areas such as animal behavior (Campbell & Masterson, 1969) and psychophysics (Levelt, Riemersma, & Bunt, 1971). Recent recognition of the relationship between conjoint measurement and use of the Rasch model in item response theory (Perline, Wright, & Wainer, 1979) has generated some linkages between the two areas, specifically in the assessment of interactions with classical vs. IRT ability estimates (Embretson, 1996). The axioms of simultaneous conjoint measurement can perhaps be best understood with an example. Consider a researcher investigating the combined effects of ability and motivation on performance. In order for the conjoint measurement method to be applied, it is assumed that the dependent measure (in this case, performance) is of ordinal level. The determining factors (ability and motivation) are probably assumed 44 ordinal by the researcher, but only need to be of nominal level for conjoint measurement to be used. It is helpful to view this situation in terms of a matrix, similar to the data tables presented earlier in the thesis. If we let a1.. .ak denote different classifications (values) of Ability, and m1...mIn represent different values of Motivation, then Performance at any combination of Ability and Motivation can be denoted by aomo. Thus, performance (P) when ability (A) is at level i and motivation (M) is at level j, can be represented by aimj, which simply means that these levels of the variables combine in some way to result in a certain level of P. Recall that we only assume ordinal properties on P, with both A and M treated as nominal variables. This is shown graphically in Table 5. Table 5. Performance by Levels of Motivation and Ability M LELII Ell—ell Lexi; E711}. m1 81 111132 m1 a3 Motivation Lfll mzal mzaz m2a3 LCM 111331 111332 msaa 45 Once the data are arranged in this manner, with each cell containing the mean or value of performance at the appropriate levels, the researcher can begin testing the axioms of conjoint measurement. The most important axiom of conjoint measurement is double cancellation. Essentially, the double cancellation axiom tests whether the order of certain P values implies the ordering of other P values. The axiom is stated as follows (Krantz, Luce, Suppes, & Tversky, 1971): For any three values of M, mo, mo, mo, and any three values of A, ad, ao, a;,: if moao 2 mood and moafz moao then moafZ mood. The double cancellation axiom is essential to determining whether M and A have an additive relationship with P, as it assumes one exists. To illustrate this, replace each miaj term above with “m, + a”. This is equivalent to stating that any given level of performance is an additive function of ability and motivation. Note that we do not invoke any concept of weights on M and A, as they are still assumed to be of nominal level, and we make no assumptions regarding their ordering. Replacing gives: i (mo+ao2mo+a and mo+a 2mo+ao)then(mo+a 2mo+ad) f f Summing across the left side and subtracting common terms (denoted by strikeout text): zf(mo+ao+mo+afl2 (mo+ad+mo+ao)then (mo+af2mo+ad), 46 Thus, if there is an additive relationship between A and M in determining P, the double cancellation axiom will hold true for all values of A, M, and P (i.e. miaj) in the data set. Typically, the extent to which a data set satisfies the double cancellation axiom is indicated by the percentage of independent axiom tests which support the double cancellation axiom (Nickerson & McClelland, 1984). As this number would grow exponentially with the number of categories of A and M and number of levels in P, these tests are usually done using computer algorithms. A second axiom of conjoint measurement is the solvability axiom: Given any three of: mo, mo, ad, ao, the fourth must exist such that moao = moao This means that values for M and A must exist such that all feasible values of P can be generated. In terms of the data matrix illustrated above, this simply means that for any combination of ability and motivation, there must be a level of performance, i.e. there are no structurally empty cells in the data matrix. For this example (and for nearly all psychological data), this axiom is trivial and usually assumed true. This axiom seems to suggest that ability and motivational components need to be uncorrelated to use the conjoint measurement methodology, but this isn’t necessarily the case. A sample correlation between two predictors tells us nothing about which cells are impossible to exist, but only which combinations of predictors are more likely to occur. 47 There is another axiom of conjoint measurement, the Archimedian axiom. Although methods have been devised to test it indirectly (Scott, 1964), it is generally considered technical in nature, and usually not tested in finite data sets (Luce et a1, 1990). Michell (1990) notes that if both solvability and double cancellation axioms are established, two additional important properties of M and A are also verified: order and independence. Independent ordering implies the following statements: Given m1, m2. m2 2 m, ifmza 2 m1afor any a in A. Given a], a2. a2 2a1ifma2 2 malfor any m in M. Or, Level 2 of motivation (m2) is greater than Level 1 of motivation (m) if, for any ability level (a,), individuals with motivation level 2 have higher performance (mzai) than individuals with motivation level 1 (ma). This observation is identical to the notion of equal ordering of Y on X across levels of Z and Y on Z across levels of X described in the previous example data sets. Since successfirl tests of the aforementioned axioms imply an ordering of P, M, and A which create an additive equation P = M + A, it is sufficient evidence that monotonic transformations exist which can eliminate any multiplicative component present in the original data. The researcher can successfirlly construct scales of performance, motivation, and ability, and, consistent with their theory, ability and motivation will have to be compensatory in determining performance, i.e. a given change in Motivation will result in a specific change in Performance, and be offset by a specific 48 change in Ability. All these changes will be constant across all scale points, thus defining interval scales for all three constructs. Due to the nature of this study, the aforementioned axioms of simultaneous conjoint measurement proved untestable in the generated data for several reasons which will be described in more detail later in the thesis. The MORALS Algorithm The goal of the MORALS algorithm is similar to that of conjoint measurement, in that an additive representation is sought. However, rather than verify conditions which permit an additive representation, the MORALS algorithm attempts to generate actual scales conforming to such a representation. The algorithm uses a least squares convergent procedure, in which least squares estimations of regression weights are alternatively performed on a matrix of transformation parameters and a matrix of regression parameters. The least squares estimates for one matrix are used in the next iteration for the other matrix, until a convergent solution is reached. Further mathematical details on the procedure can be found in Appendix A. de Leeuw, Young, & Takane (1976) and Young, de Leeuw, & Takane (1976) contain detailed conceptual and procedural discussions of the MORALS algorithm. Having now stated the rationale and reasoning behind the study, a formal research design using statistical simulation methods Will be described. 49 Research Design In order to assess the effect of measurement precision on estimation of interaction effects, the statistical simulation will examine six factors. These factors are: 1) Incremental R2 of the XZ product term; 2) Baseline R2 of the additive model prior to adding an XZ term; 3) Intercorrelation between predictors X and Z; 4) Measurement level of predictors and criterion, 5) quantitative/qualitative nature of X and Z; 6) Crossing or non-crossing nature of the interaction. Each is now specified in more detail. Study Independent Variables Incremental/Baseline R2_and Predictor Intercorrelation. The first two factors in the design involve the strength of the interaction effect and the predictability of the additive model prior to adding an interaction term. The strength of an interaction effect is often indexed by the AR2 after addition of a product term. This study used incremental R2 values at three levels: .05, .15, and .25. The baseline R2 of the additive model was also varied with three levels: .2, .4, .6. This resulted in a 3 x 3 crossing of factors, with maximal and minimal R2 of .85 and .25, respectively, for an interactive model . The correlation between predictors X and Z in the simulated datasets was varied at three levels: .1, .3, and .5. The levels for the preceding three factors were selected to create a wide coverage of potential R2 values (.25 to .85), a range of additive R2 representative of those found in psychological research, as well as a wide enough spread in intercorrelation levels to detect small differences across level. Complete orthogonality (rx,Z = 0) was omitted due to its potential qualitative difference from situations where intercorrelation was non-zero. 50 Maasurement Level of Predictors and Criterion. Measurement level for predictors and criterion was fixed at five possible levels, representing all possible combinations of ordinal or interval continuous variables, and assuming the criterion is always continuous: 1) Non-Interval Y, Interval X, Interval Z; 2) Non-Interval Y, Non-Interval X, Interval Z; 3) Non-Interval Y, Non-Interval X, Non-Interval Z; 4) Interval Y, Non-Interval X, Interval Z; and 5) Interval Y, Non-Interval X, Non-Interval Z. The assignment of this independent variable determines which of the variables are permitted to undergo monotone transformations. If a variable is Non-Interval, monotone transformations are permitted. If a variable is Interval, no transformations are permitted. It is important to note that the level of this independent variable does not change anything about the variables themselves, but only what transformations are permitted to them. The numerical values generated in the simulation have no inherent interval or non-interval status. This is only determined when they are used in the scaling algorithm. Note that this design factor could not be completely crossed with the qualitative / quantitative nature of predictor variables, as monotone rescalings of binary X or Z variables are impossible. Thus, when Z is binary, only levels 1,2, and 4 of this factor are possible. When both X and Z are binary, only level 1 is possible. This will affect the “sample” sizes in the cells associated with these combinations. Quantitative and Qualitative Nature of Van'ables. The type of variables involved was varied according to the three possible combinations of predictor variables: Continuous X and Continuous Z, Continuous X and Binary Z, and Binary X and Binary 51 Z. Binary variables were defined as a 50% proportion in each qualitative category. Continuous variables were “true” continuous numerical values with precision of eight decimals. Crossings. Non-crossing Interaction. The form of the interactive effects was manipulated by fixing the crossing point for X-Y regression lines across levels of Z, and the crossing point for Z-Y regression lines across levels of X. The formulas for these crossing points are Xc = -bz/b,oZ and Zc = ~bx/bxz, respectively (Aiken & West, 1991). Values for Xc and Zc were set at —2.00 for non-crossing interactions in the case when both variables were continuous distributions with variances of 1.0 and means of zero.° In the case of binary variables scored (-1, 1), crossing points for non-crossing interactions were set at —l.1 for both X and Z. Crossing interactions were set to cross at the mean of X (0.0). Study Dependent Vambles The dependent measures in the study were selected to assess the effects of optimal monotonic rescalings of the simulated data. The nature of these transformations depends primarily on the factors outlined above. Two variables were examined: 1) the effect size differences between interaction effects assessed in the pre-transformation data and those assessed using the post-transformation data, after transformation by a MORALS 6 Since continuous variables were random standard normal distributions, the determination of a crossing point outside a variable’s range was probabilistic. For this, and other reasons, non-crossing interactions were excluded from later analyses. 52 algorithm; 2) the Pearson correlation between pre-transformation and post- transforrnation variables. The effectiveness in reducing interaction effects in the above conditions will be indexed by the difference in effect size between the post-transformation interaction effect and the original effect size in the pre-transformed data (Afz). In both cases, the appropriate calculation of effect size is: 2 2 r er-r Y.M f2— 2 1"‘ Y.MI In this formula, 1'2y_M1 refers to the squared multiple correlation of a model including both additive effects of X and/or Z, and the product term XZ. rZM is the squared multiple correlation of a model including only the additive effects of X and/or Z. As can be seen, the overall effect size of the interaction depends on both components. As already described, rZM and 12M; will be manipulated as experimental factors in this study. Since each factor has 3 levels, 9 distinct f2 values will be present in the pre-transforrnation data. The two values for the post-transformation data, rzM' and rzM'r, were evaluated by conducting a moderated multiple regression analysis on the data. These values were then used to calculate the transformed effect size, flz. Since the effect size of the moderation was expected to be larger in the original data, the index (ftz- f2) was used as a standardized indicator of attenuation. The double cancellation axiom of conjoint measurement was to be examined using a computerized testing procedure. All possible double cancellation tests were to be 53 conducted, and the proportion that are true was to be used as an index of additive representability. However, due to both the lack of predictors with three levels (for qualitative predictors) and the sampled nature of predictors (for quantitative predictors), the assumptions underlying the double cancellation tests of Simultaneous conjoint measurement were not met in any of the design cells. The lack of testability of these axioms does not necessarily translate into unimportance. The axioms do hold for additive relationships between quantitative variables, and thus would be relevant to the examination of such relationships in psychology. Although formal examination is not possible in this study, it is probable that the manipulated independent variables would have effects in situations which did not violate axiomatic assumptions. Hypotheses Main Effects of BJaseline R2. Since 1 - R2 represents error in predicting the additive model Y = X + Z, it was expected that decreases in this R2 would have effects on the double cancellation axiom tests of conjoint measurement. Recall that conjoint measurement assumes all component variables are measured without error. However, as noted previously, this dependent variable was not assessed. In contrast with the axiom tests, the MORALS algorithm has been shown by Young, de Leeuw, and Takane (1976) to perform well, even when error is present. Also, when one examines the f" effect size equation for moderating effects, 2 2 r rm — r Y.M f2- 2 1— 1‘ Y.MI 54 it is apparent that effect size calculations are more sensitive to AR2 (RZYMI - RZYM) as the R2 of the additive model (RZYM) increases, i.e., as the total R2 approaches 1.00. A given AR2 is a stronger effect at a high baseline R2 as opposed to a low baseline R2. Thus, it is expected that the MORALS algorithm will be more successful at attenuating interaction effect sizes observed at low baseline R2, and, that such transformations will be less severe. H1: As the R2 of the additive model decreases, the proportion of true double cancellation axioms will also decrease. H 2: As the R2 of the additive model decreases, the attenuation of the moderator eflect size will increase. H3: As the R2 of the additive model decreases, the transformed data will exhibit lesser deviation fi'om the original variables, thus higher pre-post transformation correlation coefficients. _min Effects of ARZ. AR2 indexes the amount of additional variance predicted by the XZ product term when it is added to the regression equation. The only main effect predicted for this factor involves the correlation between pre— and post- transformation variables. Regardless of whether the transformation attenuates or completely removes an interaction, more severe transformations are expected to be required in order to affect stronger multiplicative components. 55 H4: As ARZ increases, the transformed data will exhibit greater deviation from the original variables, thus lower pre-post transformation correlation coefficients. Mn Effects of Predictor IntercorrelaLtion. Earlier in the thesis, it was suggested that the rescaling of predictors may attenuate an interaction effect, depending on the intercorrelation of the predictors. Recall the earlier discussion of Equations [15] and [16], where it was shown that perfect (ru=1.0) correlation between predictors X and Z resulted in a model identical to a quadratic model involving either predictor. In this situation, monotone transformation can render the model completely additive. It is predicted that similar effects will occur with intercorrelations less than 1.0. Specifically, it is predicted that, as the correlation between predictors increases, the MORALS algorithm will be more effective at attenuating or removing the interaction. In addition, it is also expected that as predictor intercorrelations increase, the severity of transformation attenuating an interaction effect will be reduced. H 5.: As predictor intercorrelation increases, the attenuation of the moderator effect size will increase. Ho: As predictor intercorrelation increases, the size of the transformation required to attenuate interaction effects will be smaller, and thus higher pre-post transformation correlation coeflicients will be observed. _hfirin Effects of Measurement Level. This factor reflects different levels of uncertainty about predictor and criterion measures. This uncertainty gives the MORALS 56 scaling algorithm more potential for rescaling, and, it is expected, a greater ability to attenuate or remove moderating effects. It is expected that more variables presumed to be non-interval (i.e., between ordinal and interval level) will reduce the severity of transformation (necessary to attenuate the interaction) in any single variable, as the additive model can be generated by changing more variables. In this sense, the transformations necessary to attenuate a moderating effect are “spread” across multiple variables, with each individual variable carrying less of the necessary transformations. This factor is not relevant to the axiom tests of conjoint measurement, since the tests assume a criterion measured at the ordinal level and predictors measured at the nominal level. H7: As the number of variables submitted to monotone transformation increases, the effect size of the interaction will be attenuated to a greater extent. H3: As the number of variables submitted to monotone transformation increases, the transformed data will exhibit lesser deviation from the original variables, thus higher pre-post transformation correlation coefficients. Main Effects for Form of Interaction. As discussed and demonstrated in this thesis, non-crossing interactions can be completely removed by criterion rescalings, and crossing interactions can often be attenuated. It was also implied that predictor rescalings can potentially attenuate any form of interaction. Given this, it is expected that the attenuation of moderator effects will be greater for all non-crossing interactions. For 57 crossing interactions, the attenuation can never be a complete removal, so the attenuation will necessarily be lower. H9: Non-crossing interactions will exhibit greater attenuation than crossing interactions. Interaction Effects. In addition to the previously listed main effects, two interaction effects are expected: Predictor Intercorrelation x Measurement Level, and Measurement Level x Form of Interaction. Predictor intercorrelation may have a greater effect when two predictors are measured at the ordinal level rather than only one, as the potential effects of intercorrelation on rescalings now applies to rescalings of both X and Z. We also know that any non-crossing interaction can be removed by a criterion rescaling, so it can be said with certainty that attenuation in ordinal criterion / non- crossing interaction conditions will necessarily result in complete removal of interaction effects. However, the same is not true for crossing interactions. These interactions can be attenuated, but never removed. We have also demonstrated that predictor rescalings can eliminate an interaction effect, but only in the trivial case of perfect intercorrelation between predictors. In the range of intercorrelation used in this study, and present in most data, this will never happen. Predictor rescalings may, however, attenuate an interaction effect at many levels of predictor intercorrelation. Thus, it is expected that the main effect for form of interaction will be stronger for conditions with criterion rescalings 58 than for conditions with predictor rescalings, since criterion rescalings can completely remove an interaction, but predictor rescalings can only, in most cases, attenuate them. H10: As predictor intercorrelation increases, the attenuation of the moderator effect size will increase to a greater extent with two predictors submitted to monotone transformation than with one predictor submitted to monotone transformation. H11: Diflerences in attenuation of non-crossing and crossing interactions will be greater when the criterion is submitted to monotone transformation than when predictors are submitted to monotone transformation. 59 METHOD The baseline and incremental R2 values, predictor intercorrelation, and qualitative / quantitative nature of variables were all manipulated during dataset generation. The basic moderated multiple regression equation can be expressed as: y=b0+blx+bzz+b3xz+bee, (17) where Y is a continuous variable and X/Z are continuous or binary variables. The boe term represents error variance uncorrelated with X, Z, or the X2 product term. bo controls the total R2 of the model assuming error distribution e has a mean of zero and variance of one, and is uncorrelated with variables X, Z, or the product term XZ. The R2 values for an additive model (Rzyoxz) and additive-multiplicative model (Rzyoxzyz) can be calculated from the complete correlation matrix of {Y, X, Z, XZ} , based on the matrix determinant formulations of McNemar (1969): 1 rm r) rM 1 r,[ R2 -1 r" r” 1 18 A — - ( ) 1 rm r¥.Z 1 60 2 _ ry.xz rX,.l’Z 2.17 R M _1— (19) The correlations used in the above calculations are themselves functions of the variance-covariance matrix of {Y, X, Z, XZ}. Under the assumption of bivariate normality7 of X and Z, and given knowledge of E(x), E(z), var(x), var(z), cov(x,z), b1, b2, b3, and be, the remaining two variances and five covariances are derived as follows: var(y) = h‘ var(c) + bf var(z) + 211le cov(c, z) + b,2[var(z)E(x)2 + vach)E(z)2 + 2cov(c, z)E(x)E(z) + var(r) var(z) + cov(c, z)2] (20) + 2[b,b3(var(c)E(z) + E(x)cov(c, 2)) + b,b,(var(z)1~:(x) + E(z)cov(c, z))] + b,‘ var(xz) = var(z)E(x)2 + var(x)E(z)2 + 2cov(x,z)E(x)E(z) (21) + var(x) var(z) + cov(x, z)2 cov(x, y) = bl var(x) + b, cov(x, z) + b, (E(z) var(x) + cov(x, z)E(x)) (22) cov(z, y) = b, var(z) + b, cov(x, z) + b, (E(x) var(z) + cov(x, z)E(z)) (23) 61 cov(xz, y) = b, [E(z) var(x) + cov(x, z)E(x)] + b, [E(x) var(z) + cov(x, z)E(z)] + 24 b3 [E(x)2 var(z) + 2E (x)E (z) cov(x, z) + E (z)2 var(x) + var(x) var(z) — cov(x, z)2] ( ) cov(x,xz) = var(x)E(z) + cov(x, z)E (x) (25) cov(z,xz) = var(z)E (x) + cov(x, z)E(z) (26) Calculation of correlations can proceed from these variances and covariances. These correlations determine R2), and RZAM as described above. Substitution of these correlation formulas into the R2 determinant formulas presented earlier produces large and unwieldy expressions. In order to facilitate use of these formulas in later analyses and discussion, they were programmed using a Microsoft Excel spreadsheet. This spreadsheet was created to generate solutions for b1, b2, b3, and be, given a desired RZA, RZAM, E(x), E(z), var(x), var(z), and rm. Further constraining the crossing points on X and Z axes by setting minimums and/or maximums could generate a crossing or non- crossing interaction. Solving for b1, b2, b3, and be completely specified RZA, RZAM, and the crossing points on X and Z. R2,, was set to .2, .4, or .6. RzAM was set to RZA plus .05, .15, or .25. The qualitative vs. quantitative nature of predictors X and Z, and their intercorrelation, are the remaining factors constrained in data generation, and are now described in detail. . 7 Calculations will also hold, with slight modification of squared covariance terms, when X or Z is a binary variable coded —l/1 with E(x)=E(z)=0 and var(x)=var(z)=l. 62 Structure of Predictor Variables and Error Variafl The study required the generation of X and Z variables which had a given intercorrelation (rm) and a given qualitative or quantitative nature (binary or continuous). The intercorrelation of X and Z was set to one of three levels: .1, .3, or .5 . The nature of variables X and Z was set to one of three combinations: Continuous X — Continuous Z, Continuous X — Binary Z, or Binary X — Binary Z. An error vector (E) was also generated to be correlated zero with either predictor vector. The specific procedures for generating observations in each of the above conditions are detailed below. In the case of two continuous predictors, three vectors of standard normal deviates (n=10,000) were generated and submitted to a principal components analysis (SAS procedure PRINCOMP). The resulting orthogonal components (P1, P2) were used to construct scores for predictors X and Z using the following equations: 2:10l . (27) X = P,r,., + Pn/l — rx,’ where rm is the desired Pearson correlation between continuous predictors X and Z. This resulted in an exact rm correlation, and an uncorrelated error vector based on the third principal component (P3). All variables (X, Z, E) were standardized to means of 0 and variances of 1. Product term XZ was constructed from the standardized scores. Note that the intercorrelation between product term XZ and X, Z, or E, is theoretically zero, as X, Z, and E are constructed to be bivariate normal, and with E(X) and E(Z) set to zero: 63 cov(XZ, X) = var(X)E(Z) + cov(X,Z)E(X) cov(XZ,X) = 0. (28) However, since the vectors are only sampled from a bivariate normal distribution, the assumption will never perfectly hold in the observed vectors. Without the assumption: cov(XZ, X) = EkAX)2 (AZ)J+ var(X)E(Z) + cov(X, Z)E(X) cov(XZ, X) = E[(AX)2 (AZ)1 (29) where AX=X-E(X) and AZ=Z-E(Z). Similar equations hold for cov(XZ,Z) and cov(XZ,E). Although this issue proves to be somewhat of a limitation to generating datasets with exact study parameters, the deviations from bivariate normality are likely slight enough to have little effect on the final outcome of analyses. In the case of one continuous predictor and one binary predictor, the desired intercorrelation parameter is a point-biserial correlation between continuous vector X and binary vector Z. The binary predictor (Z) was constructed to represent a qualitative binary variable, and not simply a split of an underlying quantitative variable. Each qualitative category of Z had equivalent sample sizes. Thus, it was also assumed that X is normally distributed in each of the qualitative categories. Note that this creates a bimodality in the total distribution of X across both categories of Z, the degree of which is dependent on the magnitude of rm. Two vectors of standard normal deviates (n=10,000) were generated and submitted to a principal components analysis (SAS procedure PRINCOMP). Each set of 64 resulting orthogonal components (P1, P2) were used as uncorrelated vectors X and E — one for each Z category. Error vector E was standardized to a mean of zero and variance of 1 within each Z category, which created an rm of zero. The standardization of X was based on the desired intercorrelation between X and Z. Given the equation for the point biserial correlation, X — X ,—— rx,z : 2 l p1p2 (30) which, in the case of p1=p2=.5, and for overall Sx=l and X = 0 , reduces to X, = X, = rm. The variances of X within each category of Z are determined using the formula for the variance of a mixture of two distributions (X101), and solving for the variance of components X1 and X2. var(Xm)= var(X11p1+Var(X2)P2 +(X1-X)2P1+(X—2 -)—(—)2p2 p1 = p2 = '5; var(X,0,) =1; 2, = —r,,, (31) X2 = rx.z var(X,)=var(X2)=l—rx22 Categories of variable Z were set to —1 and l to force a mean of zero and variance of one. The X subset in subgroup Z=-1 was standardized to a mean of —r,,,Z and variance of (1- 65 rmz). The X subset in subgroup Z=l was standardized to a mean of rx,z and variance of (no.2). An XZ product term was calculated based on X and Z. Due to Z’s binary nature, there were no bivariate normality issues, as in the previous case with two continuous predictor variables. All intercorrelations between product term XZ, components X and Z, and error vector E, are exactly zero. In the case of two binary predictors, an exact phi coefficient was desired. Marginal proportions for both predictors X and Z were .5, and the following proportions generate the desired correlation: p_,‘_1 = “25"” +.25 p1.1 = .25rm +25 (32) p-” = -5 - 171,1 pH = -5 - pm X and Z values are determined according to each (p*N, where N=10,000). Within each cell, a normal deviate is generated for the error distribution. It is standardized to a mean of zero and variance of one within each cell, forcing a zero correlation with X and Z. The product term XZ was created based on X and Z values. As in the previous case, there are no bivariate normality issues to consider. All intercorrelations between product term XZ, predictors X and Z, and error vector E are exactly zero. As noted earlier, the formulas for the X-axis and Z-axis crossing points are Xc = - bZ/b3 and Z, = -b1/b3, respectively (Aiken & West, 1991). Values for Xo and Zc were set at —2.00 in the case when both variables were continuous distributions with variances of 66 1.0 and means of zero. In the case of binary variables scored (-1, 1), crossing points were set at —l .1. Crossing interactions were set to cross at the mean of X (0.0). Xc and Zc were set as constraints in solving for b1, b2, b3, and b3 in the Excel spreadsheet. Because the Pearson, point biserial, and phi correlations described above are all product-moment correlations, they can be used in the previous presented formulas to obtain multiple regression parameters by means of determinant analysis. Using the b1, b2, b3, and be values obtained from the Excel spreadsheet, criterion scores (Y) were calculated for the generated X and Z distributions based on Equation [17]. These datasets were then submitted to the MORALS algorithm for rescaling. The manipulation of measurement level for predictor and criterion variables was accomplished in the MORALS algorithm itself. If a variable is ordinal (non-interval) in a given condition, the MORALS algorithm was allowed to perform monotone transformations to the variable. The algorithm was only allowed to perform identity transformations (i.e., no transformation) to interval level variables. In terms of allowable transformations within the MORALS algorithm, the five possible measurement level combinations become: 1) Monotone Y; 2) Monotone Y, X; 3) Monotone Y, X, Z; 4) Monotone X; 5) Monotone X, Z. Dataset Generation The above design factors required the generation of 162 data sets - 3 (R2A=.2, .4, .6) x 3 (AR2=.05, .15, .25) x 3 (rm = .1, .3, .5) x 3 (2 continuous predictors, 2 binary predictors, 1 of each) x 2 (crossing vs. non-crossing interaction). It was discovered during data generation that some of the factor combinations in specific design cells were 67 mathematically impossible to create. It was impossible to create a non-crossing interaction between two continuous predictors at additive R2 levels of .2 and .4, and, even then, datasets could only be generated for an incremental R2 of .05. Similar problems were found in the case of two binary predictors and the case with one of each type, with slightly more conditions being possible in these situations. The end result is 63 of the 162 cells being impossible, leaving 99 cells for analysis. Within each of these design cells, up to five levels of the measurement level design factor can be fixed, limited by the potential qualitative nature of predictors described earlier. For each of these cells, a single 10,000 observation dataset of variables Y, X, and Z was generated per the above descriptions. This sample size was chosen due to the behavior of the MORALS algorithm at smaller sample sizes. Pilot tests conducted by the author showed that, across several sample sizes of 300, the MORALS algorithm converged to additive R2 values varying in a roughly .07 range within the same design cell. When the sample size was increased to 10,000, the fluctuations only had a range of .01. Larger sample sizes, such as 50,000, resulted in minimal gains of convergence stability at the expense of exponential increases in processing time, primarily as the number of non-interval variables increased. 68 RESULTS Hypothesis 1 Hypothesis I predicted that increasing values of additive R2 would result in larger proportions of true double cancellation axiom tests. This hypothesis could not be formally tested due to the nature of the generated data and the assumptions of the double cancellation axiom. Double cancellation tests assume levels of predictors are fixed. This assumption is violated in design conditions with continuous random predictors. Double cancellation tests also assume at least three levels of a fixed predictor. The situations examined in this study only involved fixed predictors with two levels. Thus, all conditions violated double cancellation assumptions in some manner. Hypothesis 2 Hypothesis 2 predicted that moderator effect sizes would be attenuated to a greater degree at lower levels of additive R2. Degree of attenuation is indexed by Afz, the f2 effect size statistic of the post-transformation dataset minus that of the pre-transformation dataset. Negative values indicate an interaction effect is being attenuated. Mean Afz, design cell frequencies, and standard deviations for the five measurement level combinations, collapsed across all levels of predictor intercorrelation and types of predictor, are shown in Table 6. The pattern of attenuation is opposite that predicted by Hypothesis 2. Across all combinations of monotone transformation of variables, optimal transformations resulted in greater attentuation at higher levels of additive R2. The greatest mean attenuation (- .743) occurred when monotone transformations were permitted to predictors X and Z, and 69 =3 5 338% :wioc Do u H : mafia Soto :ouSEoDmwaba 3:? ONE “onto :oumgoDabémoa H m2 ”NM 9563 u <~M 68 Z hmwr 93.- 3.0.- :32 29cm nmvr oomr 30. gm: mgr a wmo. m3..- 3 m3. 3%.- a m6. mmcr 3 wow. var R NmN. mmmr a 0mm. N. D Mr 3 GD. mm? a mom. womr 3 of. 37 mm owe. moor a :2. Km... M: 2:. com- o mam. 08.. 2 K D. 58.. mm coo. m5. m Dim :32 m Dim :82 m mm :82 m Dim :32 m Dim. :82 Ndn 39052 X 0:90:02 N.X.> 0:80:02 XS o:ouo:o2 > 6:90:02 :82 2.35 0. «mm asea> E :93 388382 Ba 68: oéeefio A: 3 AZ 522 8 same 70 the additive R2 was 0.6. The lowest mean attenuation (.013, actually a slight enhancement of interaction effects) occurred when monotone transformations were only permitted of the criterion Y, at an additive R2 of 0.2. Hypothesis 3 Hypothesis 3 predicted that the attenuations of interaction effects presented for Hypothesis 2 would be achieved with less severe monotone transformations at lower levels of additive R2. Mean correlation coefficients, design cell frequencies, and standard deviations for the five measurement level combinations, collapsed across all levels of predictor intercorrelation and types of predictor, are shown in Table 7. The obtained pattern is opposite that predicted by Hypothesis 3. The severity of monotone transformations attenuating interaction effects generally decreased as additive R2 increased. There also appears to be a pattern related to the number of variables for which monotone transformation was permitted. As the number of non-interval variables increased (from 1 to 3), average severity of individual transformations was greater. Despite the fact that results for Hypothesis 2 and 3 were opposite that predicted, the observed patterns are internally consistent, i.e., greater attenuation of interaction effects was achieved with less severe monotone transformations. 71 830:8 gangemgbamoa u E J: 5 $0309; :0:08:0.«m:0:-0a n N .ch ”90 Z :82 85. NR. Ba. 08. 38. Km. mafia Em. Sm. 03. Ra. $0. 90. new. 80. ME. 03. 0. E. 8%. N8. 80. 0% 3A N9. 20. mm: :8. e. 50. wow. so. :3. 2.... 5 Km. :0. $0. 000. N. mrnm mam an mam xlsrxm am 04mm am am :82 038mm Ndm 0:90:02 X 0:90:02 N.X.> 0:90:02 Xdr 0:90:02 .2 0:90:02 (mm 839:5» D0 004 #:0808082 0:0 D0002 0350040~ .«0 mm as :930b00 :82 .N. 033. 72 Hypothesis 4 Hypothesis 4 predicted that in order to attenuate interaction effects, variables would undergo transformations of greater severity at higher levels of ARZ. Transformations of greater severity are indicated by lower correlations between pre- and post-transformed variables. Table 8 shows mean pre-post transformation correlations broken down by AR2 and combinations of measurement level. As can be seen in Table 8, the pattern of mean correlation across levels of incremental R2 is different for variables Y, X, and Z, depending on which of Y, X, and Z are subject to monotone transformations. Predictor X demonstrates generally decreasing correlations with larger incremental R2 values, regardless of which other variables are transformed. Criterion Y and predictor Z show no such consistent pattern. Although no specific hypotheses were made regarding degree of attenuation at different levels of incremental R2, the pattern of results are worthy of presentation. Mean Afz, design cell frequencies, and standard deviations for the five measurement level combinations, collapsed across all levels of predictor intercorrelation and types of predictor, are shown in Table 9. It can be seen that overall, larger pre—transformation interaction effects were attenuated to a greater extent. Further discussion of this finding will be presented in a later section. Hypothesis 5 Hypothesis 5 predicted that higher levels of intercorrelation in the pre- transforrnation predictors would be associated with greater attenuation of interaction effects. Mean Afg, design cell frequencies, and standard deviations for the five 73 839:8 :0ca:::0.«m:ab-:mca n N: .0: S: 8033:? 8.5089882: n N i.» ”0:02 :82 8: NR. Ga. 80. «we. :8. 295m om: 5. ca. So. 0%. 80. X: 0%. :0. m3. 2. own. :3. So. So. an 80 m E. 90. 50. m3. 2. 5:. mm. ma. :3. 8m. 8: :w. E. was. So. 8. mid mad gleam mum :4.st am flaw: am am :82 0385 Ndn 0:90:02 X 0:90:02 Ndndw 0:90:02 XS 0:90:02 > 0:90:02 mmlm: 00305:» m0 _0>0q 80805082 0:: 60.3mm 5:00:95 :0 NM 3:080:05 3 8:20:00 :82 .w 030:. 74 :00 E 90080: $80: .«0 u u : “030 :00»? 8:08:9059-2: 35:: 03m 80:00 8308:9059-30: u 03 ”NM 035000 N <~M ”0:0 Z «mm..- a 0mm. Sm- w_ mum. w 3..- a woo. mg.- m: 3 m. var em Nam. 03.- mm. 3:.- 0 m2. 1mm.- wfi 0:. 53.- a 0mm. 02.- M: 53. 37 mm .30. v8.- 2. mmor m wmo. 30.. M: wmo. one: a one. one; i owe. m 5.. nm omo. 50.- mo. :. qw- :82 m film- :82 m Dim :82 m mm- :82 m NW :82 :82 0385 Ndm 0:90:02 X 0:90:02 N.X.> 0:90:02 x.> 0:90:02 > 0:90:02 Md m0~£flg> MO _O>O\H 995805302 fig HOD-am €039,303: MO NM fiquOEOHOE kn— NM< 502 .@ ”_nmh. 75 measurement level combinations at each level of predictor intercorrelation, collapsed across all levels of RZA, ARZ, and types of predictor, are shown in Table 10. Table 10 illustrates a pattern generally consistent with Hypothesis 5. Across all combinations of measurement level, higher correlations between predictors X and Z resulted in greater attenuation of interaction effects. However, the patterns within each category of measurement level are worthy of further discussion. As they directly relate to Hypothesis 10, these issues will be discussed in more detail later in the thesis. Hypothesis 6 Hypothesis 6 predicted that the greater attenuations at higher levels of predictor intercorrelation (presented in Table 10) would be achieved with less severe monotone transformations. Table 11 shows mean correlations between pre-transformation and post- transformation variables at each level of pre-transformation predictor intercorrelation and measurement level combinations, collapsed across all levels of R2,, and ARZ. Overall, there is no clear pattern relating the severity of transformation and predictor intercorrelation, although the pattern for any given variable appears to depend on which other variables are also submitted to monotone transformation. For instance, pre-post correlations for criterion Y decrease with increasing predictor intercorrelation when it is the only variable submitted to monotone transformation, and when both predictors X and Z are additionally submitted to monotone transformation. When only predictor X is also submitted, pre-post correlations for criterion Y increase with predictor intercorrelation. Predictor X shows a consistent decrease in pre-post correlations as predictor intercorrelation increases, except in the case where criterion Y is also submitted 76 :00 8 0:08:00 :38: :0 a u : ”08: 89:0 8808:9888: 8:8 08.: .09-:0 8:898:30: u 03 mm 03:89: n <~m ”0:0 Z mom.- a 89:. 03.- m. com. nmmr : wmv. 5:.- w: m 3. 5&- R mg. 80.. m. Sum.- m 03. mm:.. m: mvm. 0 :Nr 0 NNm. owmr 3 won. ommr hm ofl. Eb.- m. NE: m 00:. m S..- w: Go. mmor m :3. m 3.. w: 0mm. mm_.- mm mmm. mwor _. m Dim :82 m mm :82 m Qw- :82 m mm :82 m film- :82 :82 038mm NX 0:90:02 X 0:90:02 Nana.» 0:90:02 Kw» 0:90:02 > 0:90:02 3: 83083 .90 .084 80808802 0:: 832080808: 8880:: 8908988885 :3 03 :82 A: 030:- 77 m0_n0t0> :ozfiEomng-aoq u 5 .5 5 $0309; 020080880505 n N a; ”08 Z own. 80. 30. omm. 03. NS. Now. 08. 03.. ~00. m. mg. 0%. com. woo. $3. a; 0?. $0. 0:. #3. m. we». Em. mam. moo. omm. ~00. 005. $0. Go. 23. _. .001-Q flaw 0.10m Nix-um xii-Q am mam am am G002 295m Ndm 0:995: X 0:30:22 Ndnfl occuocoz Xa> oaouoaouz > 0:90:02 010m 8§E> mo _0>0A 0580:5002 0:0 “83030008005 080605 :ou0anommc00-H-0E 3 soufloboo 020080830; “mom-00m :002 .2 050% 78 to transformation, where no trend is apparent. Predictor Z shows a generally increasing pre-post transformation correlation when only predictor X is also transformed, but no consistent pattern when criterion Y is also transformed. Hypothesis 7 Hypothesis 7 predicted that the extent of interaction attenuation would vary as a fimction of the number of variables submitted to monotone transformation. Specifically, it was predicted that greater degrees of attenuation would occur when more variables underwent transformation. The average Af2 for each category of variable transformation can be found in Table 6. Two results are apparent from examining these means. First, there is not a simple relationship between the number of variables undergoing monotone tranformation and the degree of interaction attenuation. The average attenuation for a one-variable-transformed situation (monotone Y or monotone X) is -.1375. The average attenuation for a two-variable-transformed case (monotone Y,X and monotone X,Z) is -.3255. The attenuation for the three-variable-transformed case (only monotone Y,X,Z) is -.307. Second, it can be seen that the incremental attenuation resulting from monotone transformation of any single variable depends on which variable is considered. Specifically, monotone transformations of predictors X or Z attenuate interaction effects more than transformations to criterion Y. Table 12 lists the incremental attenuation (AAfZ) for each variable, which is defined as the difference between the Af2 when the variable wasn’t subject to monotone transformation and the AF2 after it was subject to such transformation. Table 12 shows that the average incremental attenuation of adding 79 m2.- m2.- 509+ nmmr mwor com.- 0:.- om_.+ vmor who.- €03 ANJQ 0:000:02 T CG 0:000:02 ANJSC 0:000:02 T 036 0:000:02 N 0000605 0.50205 .00 300.0%” CC 0:000:02 T 3030800000000. 02 036 0:000:02 T 9v 0:000:02 X 88:00.5 0:60—05 m0 muoobm £036 08882 T £00 88282 OSC 0:000:02 T 00 0:000:02 A>V 0:000:02 T 00030808000; 02 > 8:200 050205 00 2800 008080—0008 300002 0030t0> 0:0 0E0t0> >0 A023 8:00:00?~ 3:080:05 .2 030% 80 predictors X or Z (-.1745 and -.155, respectively) was greater than that of adding criterion Y (+.007). Hypothesis 8 Hypothesis 8 predicted that any given variable would be transformed less violently when the number of variables being transformed was higher. The relevant mean pre-post transformation correlations can be found in Table 7. Results suggest a pattern opposite that predicted. The highest mean correlation occurred when only one variable was transformed (.971 and .961 for Monotone Y and Monotone X, respectively). The lowest mean occurred in the case when all three variables underwent monotone transformation (.666). Hypothesis 9 Hypothesis 9 predicted that non-crossing interactions would be attenuated to a greater extent than crossing interactions. This hypothesis, however, cannot be adequately evaluated, given the mathematical impossibility of several cells in the study design. This is apparent from examining Table 13, which contains mean attenuation, standard deviations, and cell frequencies for crossing and non-crossing interactions, collapsed across all other design factors. It would appear that crossing interactions were generally subject to greater amounts of attenuation than non-crossing interactions, contradictory to the prediction of Hypothesis 9, and much of the literature cited earlier in the thesis. There are several potential explanations for these results, some of which have important 81 :00 5 $0080: $80: 9.0 u u : UN: “8&0 5308:880505 35:: 0E0 “8&0 guacomgbamoq u 03 “NM 035000 N <~m ”0:02 nvmr moo. :002 0386 _mm.. mm 8:. 5.97 m «m0. m8.. m qm. :002 Nam 0:20:02 Vm CII NS .- KN. oomv 98. 3%.- film :82 X 0:000:02 mm 2 .- mm 00:. nomr m 03. moor m mm. :002 Nan? 0:000:02 02.- Vm mom. vmmr w 3o. Rev m mm: :002 X? 0:80:02 S w_ :1! «mo. mm _. mgr 0.9». mm _. Ed :002 > 0:90:02 :002 038% mfimmofi @685 -:0 Z 4% 0050?; m0 _0>0A #:0803302 0:0 5:00:25 wEmmEU \ $6020-52 >0 m2 :002 .2 030,—. 82 theoretical implications for the mathematics underlying interaction effects. These issues will be discussed in greater detail in a later section. Hypothesis 10 Hypothesis 10 predicted an interaction between predictor intercorrelation and the number of variables submitted to monotone transformation, in determining the attenuation of the interaction. Specifically, it was predicted that the difference in attenuation between using two non-interval predictors (monotone X and Z) and using one non-interval predictor (monotone X) would increase as the intercorrelation between pre— transfonnation X and Z increased. Table 10 gives At‘2 for Monotone X and Monotone X,Z categories at all levels of rm. Subtraction yields AAfZ values of -.360, -.212, and - .109, for rm of .1, .3, and .5 , respectively. The pattern is opposite that predicted by Hypothesis 10. Hypothesis 11 Hypothesis 11 predicted an interaction between the non-crossing / crossing nature of an interaction and the measurement status of predictor and criterion variables in determining the degree of attenuated effect. It was predicted that differences in attenuation between non-crossing and crossing interactions would be greater when the criterion is subject to monotone transformation than when a predictor is subject to monotone transformation. The relevant summary information for this hypothesis is found in Table 13, in the columns for Monotone Y and Monotone X. Subtraction yields a AAf2 of -.257 for Monotone Y, and a AAf2 for -.156 for Monotone X. Although this pattern is 83 consistent with Hypothesis 11, it should not necessarily be considered empirical support for the hypothesis, given the previously mentioned problems generating certain cells for non-crossing interactions. 84 DISCUSSION The general goal of the study was to examine the relationships between the attenuation of interaction effects, the severity of monotonic transformations required by such attenuations, and a variety of factors commonly associated with moderated multiple regression analysis. Discussion of these relationships will be organized in two sections. The first section summarizes the results of each design factor, and offers potential explanation of observed results. The second section provides a general discussion on the impact of these results for the future study of interaction effects in psychology. Effects of Baseline R2 Effects were predicted for the baseline additive R2 in determining both the extent of, and severity of transformations required by, attenuation of interaction effects. The obtained results were opposite that predicted. Interaction effects were attenuated to a greater extent at higher levels of additive R2, and with less severe transformation. The reason for this inverted pattern of results is unclear, but may be related to the amount of error variance present in the criterion (Y). At higher levels of additive R2, the ordering on Y is constrained to a greater degree by predictors X and Z than by random error. In a situation where additive R2 is very low, most of the predicted variance is carried by the XZ product term. Since the ordering on Y is minimally constrained by the ordering of X and Z (separate from their product term), transformations to X or Z will minimally increase the additive R2. Thus, interactions in situations where additive R2 is low (.2) are more difficult to attenuate (mean Afg=—.O74) than those at high (.6) additive R2 (mean 85 Af2=—.487). The facility 'with which interactions were removed or attenuated in the simple examples presented earlier in this document is easily understood when one recalls that the examples all had total R2 values in the high .90’s. In the process of optimizing total R2, the MORALS algorithm was, in part, optimizing the fit to error variance in Y. By definition, the minimum pre-post transformation correlation will be obtained when a variable is monotonically transformed to maximize its Pearson correlation with uncorrelated random error. At lower levels of additive R2, the overall pre-post transformation was thereby suppressed. Consider the differences between an additive R2 of .2 and .6. The average pre-post transformation correlation at R2A=.2 is .691, compared to the .814 at R2A=.6. Comparing these values for individual columns in Table 7 indicates the largest difference occurred when Y, X, and Z were all subject to monotonic transformation (Y: .573 -—> .895, X: .531 —) .849, Z: .410 —> .651). Supporting this reasoning is the fact that the highest average pre-post transformation correlation across all levels of additive R2 (.971) was obtained when only Y was subject to monotonic transformation. Effects of Incrementjl R2 It was predicted that the attenuation of interaction effects would require more severe transformation of variables at higher levels of incremental R2. This prediction received general support in the case of predictor X, which displayed the predicted pattern of decreasing pre-post transformation correlations as incremental R2 increased, regardless of which other variables were permitted monotone transformation. This can be seen by examining the pre-post transformation correlations for predictor X (rm) in Table 8. The 86 correlations for predictor Z (rm) displayed no consistent ordinal relationship with incremental R2, with an order inversion when Z was transformed along with X and Y (.509 —> .561 -—) .540, as AR2 goes from .05 to .15). Transformations to criterion Y were generally consistent with the predicted ordering, except in the case when all three variables were permitted monotone transformation (Monotone Y,X,Z), which it displayed an order inversion (.811 —-> .715 —> .734, as AR2 goes from .05 to .15). These results suggest a more complex relationship between the severity of transformation required to attenuate an interaction, and the number of variables undergoing such transformation. The exact nature of this relationship can likely be understood via a careful examination of the underlying mathematics, which is beyond the scope of the current study. It is also possible that the differing behavior of predictor Z may be due to its being a qualitative binary variable in 2/3 of the design cells, whereas predictor X was a quantitative continuous variable in 2/3 of the cells. Although no effects were predicted relating the degree of attenuation and original interaction effects size, it was discovered that greater attenuation occurred with larger effect sizes. Table 9 shows an average Af2 of -.O35 at AR2 of .05, and an average Af2 of - .534 at AR2 of .25. On one hand, this finding may be tautological, i.e., interactions with small effect sizes cannot be attenuated to a large extent. This result may also have a similar explanation as did the effects of baseline R2. Across all levels of baseline R2, larger values of incremental R2 implies a larger total R2, which implies less error variance in Y. If error variance in Y were a factor working against the attenuation of interaction effects, one would see greater attenuation at higher levels of both baseline R2 and incremental R2, which is consistent with study results. 87 Effects of Predictor Intercorrelation The study predicted effects for predictor intercorrelation in determining both the extent of, and severity of transformations required by, attenuation of interaction effects. Results shown in Table 10 generally supported the prediction that greater attenuation of interaction effects would occur at higher levels of intercorrelation. The only exception to this pattern occurred when only criterion Y was subject to monotone transformation, in which case increases in predictor intercorrelation slightly decreased attenuation of the interaction (—.085 —) -.077 -—> -.063, as rx ,1 goes from .1 to .5). Given that the predicted pattern occurred in other conditions in which Y was transformed (Monotone Y,X,Z and Monotone Y,X), it is possible that the effects of predictor transformation in these conditions compensated for the slight opposite effect of criterion transformation. The explanation of these results is straightforward, and consistent with the theoretical reasoning presented earlier in the thesis. As rm increases, the product term XZ behaves more like a quadratic function of either predictor. As discussed earlier, in the extreme case of perfect redundancy (rx,z = 1.0), the moderated regression formula reduces to a quadratic function of X, and any interaction between X and Z is completely removed by monotone transformation. The pattern of pre—post transformation correlations presented in Table 11 is not entirely consistent with the patterns of attenuation in Table 10. In a few cases, increasing mean attenuation in Table 10 occurs with decreasing average pre-post correlation in Table 11. For example, the Monotone Y,X,Z case shows increases in attenuation as rx,Z increases (-.213 —) -.286 —> -.421), yet both the Y and X pre-post correlation decreases 88 (.788 —) .702 and .782 —> .622). The same pattern occurs, though less dramatically, in the Monotone X case, where mean attenuation increases (-.053 —) -.210 —> -.337) as mean pre-post correlation decreases (.992 —) .929). As noted earlier, the pattern for any given variable depends on which other variables also underwent monotone transformation. There are no theoretical reasons for expecting this pattern, and its cause remains uncertain. Effecg. of Memment Properties of Va_ri_a_b_le_s It was predicted that the extent of interaction attenuation would be higher when a greater number of variables underwent monotonic transformation. This generally proved to be true, as shown in Table 6, except for the case of monotonic transformations to predictors X and Z, which served to attenuate interaction effects (mean Afz=-.427) to a greater degree than monotonic transformations to Y, X, and Z (mean Af2=-.307). The reasons for this inversion are not clear, but it may be related to the interaction-enhancing effects of monotonic transformations to criterion Y. Table 12 illustrates that across all situations, the addition of monotonic transformation to Y had minimal effect on interaction attenuation, and, in fact, slightly enhanced the effects (+.OO7). The addition of transformation to predictors X and Z had similar, and much greater, effects on attenuation (-.175 and -.155, respectively). These large differences in average attenuation between predictors and criteria may be due to the fact that all interactions evaluated in the study were crossing interactions. As noted in earlier discussions, crossing interactions cannot be removed by monotonic transformation to the criterion, but can be attenuated by such transformation of predictors. 89 Crossings. Non-Crossing Interactions As noted earlier, the incompatibility of the selected study parameters and generation of non-crossing interactions precluded an adequate examination of this issue. However, the difficulty in generating non-crossing interactions at particular levels of baseline R2 and incremental R2 is itself an important issue. It has been lamented that the non-crossing interactions involving continuous variables typically predicted in field settings have been notoriously difficult to find (McClelland & Judd, 1993). The difficulty in finding these interactions may be due to their mathematical impossibility. In attempting to generate these interactions for this study, it was found that non-crossing interactions were more feasible (i.e., crossing points farther away from variable means) when the additive R2 was high and the incremental R2 was low. It was also possible to generate greater numbers of non-crossing interactions for binary-continuous and binary- binary predictor pairings than for continuous-continuous predictor pairings. This suggests that even if non-crossing interactions exist for the empirical constructs under study, it may not even be mathematically possible to discover them until a reduction of error variance in predicting the criterion can be established. It also suggests that the effect size of an interaction and statistical evidence of it may be related in a complex manner with the qualitative vs. quantitative nature of the predictors. Deggn Interacfition Effects The results from Table 10 cited earlier suggested that the difference in attenuation between using two non-interval predictors and using one non-interval predictor decreases 9O as the intercorrelation between pre-transformation X and Z increases. This result may have a simple explanation analogous to the effect of adding predictors in regression equations. Given a constant rm, adding Z to a Y-X regression model will result in larger increases in R2 when rx,z is lower. As rx,Z increases, X and Z have more common variance, and the addition of one to the other results in less unique variance predicting Y. The attenuation of interaction effects via monotonic transformation may work in a similar manner. If X and Z are highly correlated, monotone transformation of Z will add less incremental attenuation over the transformation of X alone. Just as X and Z share predictability of Y via their correlation in the regression situation, they may share the potential for attenuation in the transformation situation. The interaction effects for non-crossing and crossing interactions will not be discussed, as the generated data did not support an adequate means of comparison. Measurement, Interaction Effects, and Psychology This study examined the effects of measurement imprecision on the interpretation of statistical results in one particular methodology - moderated multiple regression. The simple examples presented earlier in the document suggested dire consequences of measurement imprecision on the interpretation of interaction effects in MR. Interaction effects were severely attenuated and completely removed based on innocuous changes to the predictor and criterion variables. Although the results of the actual study suggest that these effects aren’t nearly as clear-cut when realistic data are considered at realistic levels of predictability (R2), there are real and dramatic effects of measurement imprecision. 91 Consider the effect of monotone transformation to predictor X, observed in Table 6. Average reductions in interaction effect size of .104, .153, and .344, were obtained with average pre-post transformation correlations of .942, .965, and .977. This, to me, remains a striking finding. The correlations between the original data and the transformed data (representing a lack of measurement precision) are higher than those seen in virtually any reliability situation, and are representative of very slight changes in the original scales. Two researchers studying the same interaction could obtain extremely high correlations between their separate predictor measures of the same construct, and yet arrive at very different estimates of interaction effect size. Extend this situation to several researchers studying the same interaction, and you arrive at the situation commonly lamented in psychology - that interaction effects are difficult to detect and unreliable. The nature of this difficulty may be rooted in the measurement precision of predictors, and to a lesser extent, criterion variables. Even considering the findings, several factors may have contributed to an underestimation of attenuation effect in this study. First, the continuous variables used in this study were assumed to be quantitative variables with infinite resolution, i.e., any difference represented an actual empirical difference. Thus, the number of realizable states was extremely large.8 In real psychological data, this is rarely the case. Typically, what psychologists consider “continuous” variables are Likert scale items or composites of such items. Rarely do such single scales exceed 7 realizable states, and the composites almost never attain over 100 states. The implications of this difference for interpreting 8 This fact would have made the axiom tests of conjoint measurement (if they were not excluded on theoretical grounds) an extremely computer-intensive process, as 100,000,000 (for n=10,000) paired comparisons would be needed for each cell. 92 the results of this study are unclear, and fiiture research needs to examine this issue. It would seem, at first blush, that even if the deleterious effects of transformation on interaction effects can be shown to be less severe with Likert-type items, the question of empirical representability can still be raised. That is to ask, does the limited number of realizable states in a Likert scale adequately represent realizable states in the attribute under study? The question can perhaps best be answered by further progress in both measurement theory and substantive psychological theory on qualitative vs. quantitative judgment. Second, the primary set of study results were the result of analyzing only crossing interactions. It is known on theoretical grounds that these interactions are less susceptible to attenuation than non-crossing interactions, so overall effects of transformation across all design cells may be underestimates. Further study of the issue at levels of study parameters where non-crossing and crossing interactions can be examined in a fully- crossed design will shed light on this aspect of the problem. Third, the MORALS algorithm used in this study only optimizes additive fit to data, and does not minimize severity of transformation to do so. Thus, many of the pre- post transformation correlations presented in this study may not be maximum, and may over-estimate the severity of transformation necessary to attenuate an interaction. There is no immediate solution to this problem, as such additional optimization criteria would somehow have to be integrated into the MORALS algorithm.9 Despite the dramatic findings described above, several of the results were in direct opposition to proposed hypotheses. As discussed earlier, much of this may involve the 9 Based on discussion with the developer of the MORALS implementation in the SAS package, this 93 unclear role of error variance. In general, the message emerging from this study may be that the less we know about our criterion (in terms of overall R2), the more we are able to interpret observed interaction effects (due to our decreased ability to attenuate them via transformation). To this author, this seems an odd conclusion, despite its consistency with study findings. It would seem that the advocates of Stevens’ theory, who would make interval assumptions of their measurement systems, and interpret interaction effects without scaling concerns, and those in Michell’s “purist” camp, who might claim additive models more parsimonious, and perform rescalings rather than interpret multiplicative effects, in the end are fighting against the same enemy - error variance. For the purposes of this study, error variance in Y was simply uncorrelated variance added after the effects of X, Z, and X2 were calculated. In real situations, this error variance may be due to simple unpredictability of Y, unreliability in Y, or even unreliability in predictors X or Z. The effects of predictor unreliability on detecting interaction effects are well known (Dunlap & Kemery, 1988), but the role of measurement error in the context of measurement theory are only beginning to be examined (Falmagne 1979). The role of error variance in Y also implies a larger paradox. As theoretical models in psychology become more complex, and better at predicting human behavior, the confidence we place in the interactive effects present in such models decreases, as monotonic transformation is more able to attenuate interactions at higher R2 levels. The ultimate solution lies in the simultaneous development of both psychological theory and additional optimization criterion is currently impossible to implement. 94 psychological measurement, so that advances in understanding the relationships between psychological phenomena are accompanied by the requisite advances in measuring them. Practical Implications of the Study Although the discussions above highlight the potential complex nature of several obtained results, an overall examination of the effects of each study design can provide cautions for everyday psychological research. Table 14 lists the overall extent of attenuation of crossing interaction effects at all levels of study design factors. As can be gleaned from the table, the greatest problem estimating interaction effects would occur in a situation with high additive R2, a large observed interaction effect, high predictor intercorrelation, and two continuous non-interval predictors. Conversely, the least problem occurs at low levels of additive R2, a small observed interaction effect, low predictor intercorrelation”, a non-interval criterion‘ ‘, and two binary predictors. This pattern of results points to a clear role of measurement imprecision in explaining the difference in detecting interactions in experimental settings and field settings. The optimal situation of orthogonal qualitative predictors virtually defines the experimental design, where stimulus control allows the random assignment of observations to factors in crossed designs to force orthogonality. The measurement status of experimental stimuli are not at issue, as they are, by definition, controlled stimuli. In the case of a binary predictor, the experimenter is simply controlling a single qualitative difference. 1° Although orthogonal predictors weren’t included in the study, we can presume it to be the optimal situation ” Optimal within the context of some measurement imprecision. Ideally, all variables would be interval. 95 Table 14. Mean Af2 Values for Study Design Factors DESIGN FACTOR MEAN Ar’ Additive R2 .2 -.O74 .4 -.179 .6 -.487 Incrememl R2 .05 -.O35 .15 -.171 25 -.534 X-Z Intercorrelation .l -.l92 . -.245 .5 -.303 Measurement Stags of Vambles Non-Interval Y -.075 Non-Interval Y,X -.224 Non-Interval Y,X,Z -.307 Non-Interval X -.200 Non-Interval X,Z -.427 Qualitative/Quantitative Nature of Variables Continuous X and Z -.221 Continuous X, Binary Z -.189 Binary X, Binary Z -.l33 96 In contrast, a field study does not have the luxury of experimental control, and therefore, in most cases, cannot force orthogonality of predictors or control the levels of predictors. These are typically observed values which naturally occur in the field setting. While the experimentalists need only fear the effects of non-interval measurement in their dependent measures, the field researchers must also concern themselves with scaling issues in random, continuous predictors. This thesis has also examined the issue of measurement imprecision by assessing reductions in interaction effect size. An alternative approach would have been to examine changes in decisions made on statistical grounds. That is, the impact of i measurement precision would not be problematic if only effect sizes were reduced, but if : these reductions in effect size also resulted in statistical non-significance of the interaction effect. Although effect size is theoretically independent of any given statistical test, the decisions resulting from inferential methods are based on finite, often small, samples. This consideration invariably raises the issue of statistical power. Several factors related to statistical power and moderated multiple regression were discussed earlier in the thesis, but kept distinct from the focus on measurement properties of data. Extending the argument that statistical decisions be the criterion against which measurement imprecision is judged would suggest that the measurement properties of an instrument become more important as sample sizes increase, thereby making any given change in R2 more statistically significant. This logic is problematic, as the measurement properties are inherent in the instrument, and the identical instrument is used whether a sample size is large or small. Whether a given change in R2 is statistically significant or not can be solely a function of sample size and unimportant in evaluating the precision of 97 the measurement instrument involved. The magnitude of the change in effect size is the only relevant consideration. Regardless of its effect on statistical decision-making, measurement imprecision may still have effects on the underlying moderator effect size. Although this study has examined how measurement imprecision may cause researchers to interpret non-existent interaction effects, it is important to note that the converse is also true, i.e., measurement imprecision may also contribute to problems in not finding interaction effects which are empirically present. An effect size observed when using non—interval scales may actually be an underestimate of the actual effect size, and in situations where this underestimate is great enough to result in statistical non- significance of a moderator effect, the researcher has missed a potentially important scientific finding. In this thesis, the choice was made to focus on the reduction of observed moderating effects, based on the scientific principle of parsimony. Typically, when an additive and additive-multiplicative model have equal statistical viability, scientists choose the simpler, additive model. The focus on removal or attenuation of interaction effects was also designed to critically address the interpretation of interactions with extremely small effect sizes. The difficulties associated with finding interaction effects in applied psychology may contribute to their rarity, but in no way increase the scientific value of the interactions we do find. Rather, both the observation and non- observation of moderator effects should be evaluated from a measurement framework. Once we establish a certain level of precision in our measurement, we may better understand which of our observed interaction effects are “real”. If weak measurement is also contributing to applied psychologists not finding significant moderator effects, then increasing the quality and precision of measurement can only improve the situation. 98 APPENDICES 99 APPENDIX A: MORALS Algorithm The MORALS algorithm maximizes the canonical correlation coefficient between two sets of variables, X and Y, by transforming the variables according to specified constraints. The specification of the model follows below. More detail can be found in Young, de Leeuw, & Takane (1976). 1. Let X be a matrix of k observations of n variables. Let Y be a matrix of k observations of m variables. 2. Each xi and yj assumed to be measured at a specified measurement level (nominal, ordinal, interval, ratio). 3. Two parameter vectors, or and B are defined to have n and m elements, respectively. 4. Two matrices, X* and Y*, are defined to have the same dimensions as X and Y. 5. The columns of the X* and Y* matrices, xi* and yj*, have two properties. 1) They are defined at the interval level of measurement. 2) They are related to the corresponding columns in X and Y, xi and yj, by transformations permissible of the specific variable. So: Xi* = 3i (Xi) Yr" = 3r (y,-) 6. 3i and 31- above represent measurement transformations of observed variables X and Y. 100 The goal of the algorithm is to find transformations 3i and 31- and regression weights or and B, so the canonical correlation between X* and Y* is maximized. This is equivalent to minimizing the sum of squared differences between composite variables a and b, defined as: a = X*oc b = Y*[3 subject to minimization criteria: A2 = (a - b)’ (a - b) 7. The minimization is constrained by allowable forms of the 3 functions. These depend on the level of measurement of the variable in question and the processes by which the distributions are generated. The constraints on 3 fall into three types: order (3 °), linear (3'), and polynomial (3"): 3°: (xai < xbi) -> (Xai* S xbi*) 3]: Xai* = 5O + 5lxai 101 ”Ni 0:90:02Hm ”X 0:000:02uv ”Ni? 0:80:02Hm an? 0:90:02HN m> 0:00:02”: - flatompsm m“ “0:02 00 000 _N0 500 .24 E..— 000 0N0 00.0 m _ .0 N00 000 :0 N24 00.2 0m0 2.0 00.0 :0 mm; 000 000 :4 it; 0— .0 0N0 00.0 m00 w _ .0 9 .0 0m0 00.0 000 00.0 2.0 00.0 000 :0 0N0 30 30 00.0 0m0 20 00.0 000 00.0 mm0 000 N00 N00 02 .0 3.0 00.0 _00 000 M00 000 2.0 E .0 00.0 000 00.0 N00 000 2.0 20 2.0 2 .0 0m0 00.0 00.0 m00 m— .0 ~00 2 .0 20 m2 .0 0— .0 000 00.0 00.0 m _ .0 00.0 00 2.0 N00 00 .0 0N0 0V0 000 0:0 000 N00 :0 2.0 0m0 0N0 0V0 _N0 000 02 .0 000 N00 :0 0~0 0N0 000 M00 0— .0 ~00 0N0 0m0 _m0 000 20 0V0 30 02 .0 000 000 mm0 ~m0 0m0 m _ .0 00.0 000 0N0 mm0 0m0 mm0 :0 0_ .0 20 0V0 _00 no.0 000 00.0 30 0_ .0 000 000 00.0 5.0 000 00.0 00.0 0— .0 000 0m0 00.0 0V0 N00 000 000 000 00.0 000 3.0 00.0 00.0 000 N2 .0 000 #00 00.0 20 0m0 0N0 0N0 000 2.0 m00 v0.0 000 00.0 0m0 0N0 0N0 00.0 3.0 00.0 :0 :00 5.0 20 0.0.0 0N0 8.0 000 ~00 mm0 Nm0 mm0 000 2.0 0N0 N00 90 v~0 30 0N0 NNO 0m0 20 0.0.0 ~00 0—0 mm0 _00 v~0 2N0 0— .0 m m .0 0N0 00.0 ~00 00.0 0m0 00.0 00.0 000 000 0N0 _00 v00 ~N0 00.0 000 00.0 0m0 00.0 0N0 ~00 00.0 ~00 000 00.0 00.0 0— .0 00.0 0N0 mm ,0 mm «M s0 a: N .0 my? film. whOwomUth mSOn—Gmu—HOU 03H; £23 52030952.: wcmmmOhU MOM mOS—N> Na .Hm Egmmm< 102 «a «a s: s: «a s: s: s: e: a: s: «a «a «a s: s: «a a: s: «a s: «a «a “a s: «a a: fll 3.0 000 0v; 3 .0 R0 00 .0 00.0 00.0 2.0 N~0 :0 00 N2 .0 -0 0N0 m00 00.0 000 0— .0 v~0 0V0 00.0 2.0 2.010 ~00 v00 00.0 .0050 00 N :00? 03002900 00: 0 0:0 m flaw—000:0 "X 0:000:02”: ”Xvi 0:90:02nm g 0:00:02". - $050000 “0 ”0002 s: s: s: “a s: s: s: «a «a as. s: «a s: «a e: a: s: «a s: s: «E a: s: a: “a «a a: .0 0N0 0—0 000 #N0 20 00.0 00.0 000 00.0 020 02.0 0N0 N2 .0 0N0 m 20 00.0 >00 _00 00.0 02.0 mm0 00.0 00 0N0 N00 #00 500 NM mm.— 00.— 00.— >00 ~00 0:0 02 .0 3.0 m— .0 00.0 $0 v.00 3.0 wm0 R0 :0 02 .0 02 .0 3.0 0V0 000 mm0 0N0 0N0 000 >00 >00 .0. >04 00; >0.— 00.0 00.0 00.0 3.0 E .0 E .0 :0 :0 :0 mm0 mm0 mm0 00.0 000 000 0:0 000 3.0 3.0 3.0 m~0 50.0 >00 >00 0 000 000 0—0 000 0m0 0—0 000 0m0 02.0 000 0m0 20 000 0m0 2.0 000 0m0 02.0 000 0m0 0~0 000 000 20 000 000 20 mam m~0 m~0 0N0 2.0 m _ .0 m 20 00.0 00.0 00.0 2.0 20 20 m H .0 m — .0 m 2 .0 000 000 00.0 0N0 0N0 mm0 m _ .0 m _ .0 m _ .0 000 000 000 Nfl 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 000 0V0 000 0V0 00.0 000 00.0 00.0 0V0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 fl MON—0:00am baa—m DEC mug mfiozfimufiou 0G0 03—3 mGOmuowhDH—E wcmwthU MOM m0§~d> NM .Nm XHDZMQQ 103 .6003 00 0000000000 500 5:3 03000000 00: 800000030 30:00:00: m> 0:00:02u_ - 0000:0003 mm ”0002 .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: 0: .0: .0: 0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0 «0.0 2.— Q0 wm0 0m0 0m0 0—0 000 00.0 3.0 0000 M00 0N0 0N0 _m0 000 000 000 w~0 R0 00.0 30 NNO m~0 00.0 00.0 00.0 .0 00.— 00.0 00.— 00.0 00.0 00.0 :0 30 30 :0 :0 :0 mm0 mm0 mm0 000 000 000 000 00.0 000 3.0 m~0 mm0 >00 000 000 :0 000 0m0 00.0 000 0m0 0—0 0m0 000 000 00.0 0m0 00.0 000 0m0 000 00.0 0m0 00.0 000 0.0.0 000 000 0.0.0 000 0m0 0m0 0_0 Mflm 0N0 0N0 mad 0 0 .0 m 0 .0 m 0 .0 000 000 000 0N0 m~0 0N0 m 0 .0 m _ .0 m 0 .0 00.0 00.0 000 0N0 0N0 m~0 0 0 .0 m 0 .0 m 2.0 000 00.0 000 0% 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 000 000 000 000 000 000 000 000 000 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N0 $10. 0000000000 0.00:5 030. 505 3000008:— wEmmBU :00 0020> 00 ”mm 592.00% 104 80:0 :0: 0:0 004 .00 8000—00000 0035 ”0030t0> 50080008000000 20:00 E .0: 5 ”003000., 50050008000000 80:00 N .0; ”0:02 N00 .30 00.0 30 00.0 00.0 00.0 000 000 00.0 00.0 00.0 00 000 00 .0 00.0 00 .0 00; M00 N00 00 .0 000 00.0 000 N00 30 000 3.0 m w .0 00.0 _00 3.0 00.0 00.0 00 .0 00.— N00 000 mm .0 3.0 30 000 N00 N00 00 .0 000 N00 000 000 000 00.0 000 N00 000 3m 010m N X 0:90:02 30 0.0.0 000 00.0 000 00.— 000 000 00.— 00.0 000 000 00.0 30 00.0 000 00.0 00.0 000 _00 000 00.0 000 000 000 00.0 000 0.00m X 0:000:02 000 30 00.0 00.0 000 N00 mm .0 >00 mm0 :0 00.0 000 N00 000 00.0 w~0 30 N00 2:0 mm .0 0.0.0 0m0 9.0 20 $0 00 .0 00 .0 015m N00 000 000 000 mwd 000 N00 00.0 00.0 m m .0 3.0 00.0 00.0 0.0.0 000 3.0 3.0 000 0m0 000 Nm0 000 000 S0 hm .0 R0 30 am N.x.> 0:80:02 000 00.0 00.0 0w0 000 000 000 $0 00.— 000 000 30 000 00.0 000 000 000 000 000 000 mm0 00.0 m m .0 :0 000 R0 30 30m 00.0 000 $0 000 00.0 3.0 w>0 000 3.0 00.0 30 3.0 00.0 00.0 00.0 N00 000 00.0 00.0 00.0 00.0 00.0 000 000 00.0 00.0 :0 000 N00 N00 0.0.0 000 000 0w.0 30 00.0 20 00.0 mm .0 0.0 3.0 :0 0m .0 00.0 mm0 00.0 3 .0 S .0 N0 .0 0.00 $0 000 00.0 00.0 0.0m 30 x > 0:000:02 000 00.0 000 00.0 00.— 00; 00.— 00.— 00.0 000 000 00.— 000 000 00.0 00.0 00.— 00. 0 00.0 0.00 000 000 000 000 0.00 000 000 am > 0:90:02 00.; 00.0 E.— 00.0 00.0 N00 :0 m 0 .0 m0 .0 N00 :0 :0 _m0 ~m0 _m0 0— .0 000 w00 000 000 $00 m~0 ~N0 _N0 00.0 500 0.00 0 00.0 0m0 0—0 00 .0 0m0 00 .0 00.0 0m0 00 .0 000 00.0 0_ .0 000 0m0 00 .0 00 .0 0m0 0_ .0 00 .0 0m0 00 .0 00 .0 0m0 0_ .0 00 .0 0m0 00 .0 0d 0N0 0N0 0N0 00.0 m 0 .0 20 no.0 00.0 000 0N0 0N0 m~0 m 0 .0 m 0 .0 2 .0 000 000 000 m~0 0N0 m~0 m 0 .0 m 0 .0 m 0 .0 00.0 00.0 000 00. 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 0V0 00.0 00.0 000 00.0 00.0 000 0V0 00.0 0N0 0N0 0N0 0N0 0.0.0 0N0 0N0 0N0 0N0 ml: mucuomfiohnm m303€€€00 03H. 5:5 mfiomuowhous wcmmmOHU 00.“ 39568000 fiOBfi—OEOU “mosaics .Vm 592mm? 105 «00:0 80: 0:0 00,— .00 8000—00—00 0038 $0300? 50080008008000 80:00 8 .5 5 ”0030000, 80080008000000 80:00 N .0:.» ”80 Z .0: 0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: 0: .0: .0: .0: .0: am .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0.40m N X 0:80:02 000 000 000 000 000 00.0 00.0 000 00.0 00.0 00.0 000 00.0 0.00 000 0.0.0 000 00.0 000 00.0 000 000 000 000 00.0 000 00.0 0.4% X 0:80:02 .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: 0: .0: .0: .0: .0: 0: 0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: 0: 0: .0: .0: .0: am N.X.> 0:80:02 .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: Sam 0.0.0 00.0 :0 0 _ .0 0~0 £0 :0 0.0.0 N00 >00 0N0 0N0 000 N00 N00 000 0N.0 m~0 00.0 50 00.0 500 N0 .0 0 0.0 00.0 000 00.0 00.0 00 .0 0 0.0 00.0 000 000 000 000 000 00.0 00.0 000 000 N00 00.0 000 000 N00 2 .0 000 $0 000 30 000 N00 000 000 am Ham x.> 2090082 0.00 00.0 000 000 00.0 00.— 00.— 00.0 00.0 000 0.0.0 000 000 00.0 000 000 00.— 00.— 000 N00 000 00.0 000 0.00 0.00 000 000 Ham 00 0:80:02 50.0 0.0.— 00.0 00.0 00.0 00.0 2 .0 2 .0 2 .0 K0 00.0 00.0 00.0 00.0 000 000 00.0 00.0 000 000 00.0 0N0 0N0 0N0 >00 00.0 0.00 w 00 .0 000 0— .0 00.0 000 000 000 000 00 .0 00 .0 000 00 .0 00 .0 00.0 00 .0 00 .0 00.0 000 00 .0 000 00 .0 00 .0 00.0 0—0 00.0 00.0 00 .0 0.00 0m0 0N.0 0N0 000 000 000 00.0 00.0 00.0 0.00 0N0 0m0 00.0 0—0 20 00.0 00.0 00.0 0N.0 0N0 0m0 0.0 00.0 00.0 00.0 00.0 00.0 .100. 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 000 00.0 00.0 000 00.0 000 000 000 0N0 0N0 0N0 0N0 0N0 0N0 0N0 0N.0 0N0 aw HOHOMUDHAH bdnSm 05 fig mSOSGmuH—OU 05 5:5 0.0030500qu mammmOhU .00.“ mufiomoaoou 00030200000 umOnmrDhnm .mm VOAHZNAE 106 000000 80: 0:0 00.~ .00 0:830:00 0038 00030:? 80080000508000 80:00 8 .0: 5 0003009 3008000058000 80:00 N J; ”802 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 0\: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 0&0 0.4.0 N X 0:80:02 .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: gm X 0:80:02 .0: 0: 0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: 0: .0: .0: 0: .0: mqmw. .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: 0: .0: 0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: am N.X.> 0:80:02 .0: .0: .0: .0: 0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: .0: Ham 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 0\: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 0100 300 X > 0:80:02 000 00.0 000 N00 000 v0.0 e00 v0.0 000 000 ~00 00.0 000 00.0 0.00 000 0.00 000 N00 00.0 00.0 00.0 00.0 00.0 00.0 000 00.0 am 0’ 0:80:02 0.0.~ >0.~ 0.0.~ 00.0 00.0 00.0 2 .0 2 .0 v~ .0 ~00 ~0.0 :0 00.0 000 000 00.0 00.0 00.0 00.0 00.0 00.0 0N0 m~0 0N0 000 0.00 000 M 00.0 000 0~0 00.0 000 20 00.0 000 2.0 00.0 00.0 0~0 00.0 00.0 20 00.0 00.0 20 00.0 000 20 00.0 000 3.0 00.0 000 30 0.0.0. 0N0 0N0 0N0 0 ~ .0 0 ~ .0 0~0 00.0 00.0 00.0 0N0 0N0 0m0 0 ~ .0 0 ~ .0 0 ~ .0 00.0 00.0 00.0 0N0 0N0 0N0 0 ~ .0 0 ~ .0 0 ~ .0 00.0 00.0 00.0 .00 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 000 00.0 000 00.0 00.0 000 00.0 000 0:0 0N0 0N0 0N0 0N0 0N0 0N0 0m0 0N0 0N0 «NI: 8880000 00:5 030. .00.» 0:880:85 08000.5 :00 8:88.080 5020080 8000-000 “0m X~Q7~m~0n~< 107 REFERENCES Achenbach, T. M. (1978). Research in developmental psychology: Concepts, 5393333, methods. New York: Free Press, 1978. Aguinas, H., & Stone-Romero, E. F. (1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology. 82, 192-206. F“ Aiken L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. Alexander, R. A., & DeShon, R. P. (1994). Effect of error variance heterogeneity i on the power of tests for regression slope differences. Psycholomczfljulletin, 115, 308- 3 14. Althauser, R. P. (1971). Multicollinearity and non-additive regression models. In H. M. Blalock, Jr. (Ed.), Causal models in the social sciences. Chicago: Aldine-Atherton. Anderson, N. H. (1961). Scales and statistics: Parametric and nonparametric. Psychological Bulletin. 58, 305-316. Arnold, H. J ., & Evans, M. G. (1979). Testing multiplicative models does not require ration scales. Q_rggn_izational Behavior and Human Performance, 24, 41-59. Baker, B. 0., Hardyk, C. D., & Petrinovich, L. F. (1966). Weak measurement vs. strong statistics: An empirical critique of S. S. Stevens’ proscriptions on statistics. EducationzL and Psychological Measurement, 26, 291-309. Bimbaum, M. H. ( 1973). The devil rides again: Correlations as an index of fit. Psychological Bulletin, 79, 239-242. 108 Bimbaum, M.H. (1974). Reply to the devil’s advocates: Don’t confound model testing with measurement. Psychological Bulletin, 81, 854-859. Bridgman, P. (1922). Dimensional analysis. New Haven: Yale University Press. Burke, C. J. (1953). Additive scales and statistics. Psychological Review, 60, 73— 75. Busemeyer, J. R. (1980). Importance of measurement theory, error theory, and experimental design for testing the significance of interactions. Psychological Bulletin, &, 237-244. Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93, 549- S62. Campbell, N. R. (1920). Plysics, the elements. Cambridge: Cambridge University Press. Campbell, N. R. (1928). An account of the principles of measurementaa_n_d calculation. London: Longmans, Green. Campbell, B. A. & Masterson, F.‘ A. (1969). Psychophysics of punishment. In B. A. Campbell & R. M. Church (Eds), Punishment and Aversive Behavior. New York: Appleton-Century—Crofts, 3-42. Champoux, J. E., & Peters, W. S. (1987). Form, effect size, and power in moderated regression analysis. Journal of Occupational Psychology, 60, 243-25 5. Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5. 115-124. 109 Cliff, N. (1992). Abstract measurement theory and the revolution that never happened. Psychological Science, 3, 186-190. Cohen, J. (1978). Partialled products are interactions; partialled powers are curve components. Psychological Bulletin, 85, 858-866. Cohen, J ., & Cohen, P. (1983). Applied multiple regression / correlational wlysis for the behavioral sciences. Hillsdale, N.J.: Erlbaum. Cronbach, L. J. (1987) Statistical tests for moderator variables: Flaws in analyses recently proposed. Psychological Bulletin. 102, 414-417. de Leeuw, J ., Young, F. W., & Takane, Y. (1976). Additive structure in qualitative data: An alternating least squares method with optimal scaling features. Psychometrik_a,41, 471-503. Dunlap, W. P., & Kemery, E. R. (1988). Effects of predictor intercorrelation and reliabilities on moderated multiple regression. Organizational Behaviorfi and Human Decision Processes. 41, 248-258. Embretson, S. E. (1996). Item response theory models and spurious interaction effects in factorial ANOVA designs. Applied Psychologicfl Measurement, 20, 201-212. Evans, M. G. (1985). A monte carlo study of the effects of correlated method variance in moderated multiple regression analysis. Qrganizationzflehavior and Human Decision Processes. 36, 305-323. Falmagne, J. C., Iverson, G., & Marcovici, S. (1979). Binaural “loudness” summation: Probabilistic theory and data. Psychological Review, 86, 25-43. Ferguson, A., Meyers, C. S. (Vice Chairman), Bartlett, R. J. (Secretary), Banister, H., Bartlett, F. C., Brown, W., Campbell, N. R., Craik, K. J. W., Drever, J ., Guild, J ., 110 l" Houstoun, R. A., Irwin, J. 0., Kaye, G. W. C., Philpott, S. J. F ., Richardson, L. F ., Shaxby, J. H., Smith, T., Thouless, R. H., & Tucker, W. S. (1940). Quantitative estimates of sensory events. The advancement of science. Report of the British Association for the Advancement of Science. 2, 331-349. Gaito, J. (1959). Nonparametric methods in psychological research. Psychological Reports, 5, 115-125. Gaito, J. (1960). Scale classification and statistics. Psychological Review, 67, 277-278. Gaito, J. (1980). Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87, 564-567. Ghiselli, E. E. (1956). Differentiation of individuals in terms of their predictability. Journal of Applied Psychology. 40. 374-3 77. Gregoire, T. G., & Driver, B. L. (1987). Analysis of ordinal data to detect population differences. Psychological Bulletin, 101, 159-165. Gregory (1996). Psychological TestingHistogy. Principles.and Applicmons. Allyn & Bacon. Helmholtz, H. V. (1887). Numbering and measuring from an epistemological viewpoint. (Reprinted in Hermann von Helmholtz: Epistemological writings, P. Hertz & M. Schlick (Eds.), Boston Studies in the philosophy of science, 37, 72-113.) Dordrecht- Holland, Reidel, 1977. Holder, 0. (1901). Die axiome der qualitat und die lehre vom mass. Berichte der Sachsischen Gesellschafi der Wissenschaften. Mathematische-Physickfilass, 53, 1-64. 111 Jensen, A. R. (1974). Cumulative deficit: A testable hypothesis? Developmental Psychology. 10, 996-1019. Jensen, A. R. (1980). gas in mental testing, New York: Free Press. Johnson, H. M. (1936). Pseudo-mathematics in the mental and social sciences. American Journal of Psychology 48, 342-351. Kanfer, R. & Ackerman, P. (1989). Motivation and cognitive abilities: An integrative/aptitude - treatment interaction approach to skill acquisition. Journal of Applied Psychology, 74, 657-690. Krantz , D. H., & Tversky, A. (1971). Conjoint-measurement analysis of composition rules in psychology. Psychological Review,78. 151-169. Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A.. (1971). Foundations of measurement: Vol. I. Additive and polynomial representa_ti_c_>pa New York: Academic Press. Levelt, W. J. M., Riemersma, J. B., & Bunt, A. A. (1971). Binaural additivity of loudness. Technical Report NR: HB-71-7OEX, R. U. Groningen, Netherlands, Heymans Bulletins Psychologische Instituten. Lofius, G. A. (1978). On the interpretation of interactions. Memory & Cognitiona 6, 312-319. Lord, F. M. (1953). On the statistical treatment of football numbers. American Psychologist&, 750-751. Lubinski, D., & Humphreys, L. G. (1990). Assessing spurious “moderator effects”: Illustrated substantively with the hypothesized (“synergistic”) relation between spatial and mathematical ability. Psychological Bulletin, 107, 385-393. 112 Luce, R. D. & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fimdamental measurement. Journal of Mathematical Psychology, 1, 1-27. Luce, R. D., Krantz, D. H., Suppes, P., & Tversky, A. (1990). Foundations of measurement: Vol. HI. Representation, axiorgtization, and invamnc_e. San Diego: Academic Press. McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychologigal Bulletin, 114, 376-3 90. McGregor, D. (1935). Scientific measurement and psychology. Psychological Review 42, 246-266. Michell, J. (1990). An introduction to the logic of psychological measurement, Hillsdale, N.J.: Erlbaum. Morris, J. H., Sherman, J ., & Mansfield, E. R. (1986). Failures to detect moderating effects with ordinary least squares moderated-regression: Some reasons and a remedy. Psycholgical Bulletin, 99. 282-288. Nickerson, C. A., & McClelland, G. H. (1984). Scaling distortion in numerical conjoint measurement. Applied Psycholggicgl Measurement. 8. 183-198. Perline, R., Wright, B. D., & Wainer, H. (1979). The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3, 237-255. Pierce, J. L., Gardner, D. G., Dunham, R. B., Cummings, L. L. (1993). Moderation by organization-based self-esteem of role condition-employee response relationships. fiademy of Management Journal. 36. 271-288. Rasmussen, J. L. (1989). Analysis of Likert-scale data: A reinterpretation of Gregoire and Driver. Psychological Bulletin, 105, 167-170. 113 Roussas, G. G. (1973). A first course in mathematical statistics. Reading, Mass: Addison-Wesley. Saunders, D. R. (1956). Moderator variables in predication. Educaticmj and Psychological Measurement 16, 209-222. Schmidt, F. L. (1973). Implications of a measurement problem for expectancy theory research. Organizational Behavior and Hmaflerfomgce. 10, 243 -25 1. Scott, D. (1964). Measurement models and linear inequalities. Journal of Mathematical Psychology. 1, 233-247. Smith, B. O. (1938). Logical aspects of educational measurement. New York: Columbia University Press. Sockloff, A. L. (1976). The analysis of nonlinearity via linear regression with polynomial and product variables: An examination. Review of Educational Research, 46, 267-291. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667- 680. Stevens, S. S., & Davis, H. (1938). glaring: Itsfpsycholggy and physiology. New York: Wiley. Stine, W. W. (1989). Meaningful inference: The role of measurement in statistics. Psychological Bulletin, 105, 147-155. Stone-Romero, E. F., & Anderson, L. E. (1994). Relative power of moderated multiple regression and the comparison of subgroup correlation coefficients for detecting moderating effects. Journal of Applied Psychology, 79, 354-359. 114 Thomas, H. (1982). IQ, interval scales, and normal distributions. Psychological Bulletin. 91, 198-202. Townsend, J. T., & Ashby, F. G. (1984). Measurement scales and statistics: the misconception misconceived. Psychological Bulletin, 96, 394-401. Vroom, V. H. (1964). Workaflmotivation. New York: Wiley. Wise, S. L., Peters, L. H., & O’Connor, E. J. (1984). Identifying moderator variables using multiple regression: A reply to Darrow and Kahl. Journal of Management, 1_O_, 227-233. Young, F. W., de Leeuw, J ., & Takane, Y. (1976). Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features. Psychometrilga,_41, 505-529. Yuan, P. T. (1933). On the logarithmic frequency distribution and semilogarithmic correlation surface. Annals of Mathematical Statistics, 4, 30-74. Zedeck, S. (1971). Problems with the use of “moderator” variables. Psychological Bulletin, 76, 295-310. Zumbo, B. D. & Zimmerman, D. W. (1993). Is the selection of statistical methods governed by level of measurement?, Canadian Psychology, 34, 390-400. 115 HICHIGRN STQTE UNIV. LIBRRRIES IIHI 1111 1111111111 Ill 1 IWHIII 1 llllllll 1111111 31 29312118017032