MEASUREMENT INVARIANCE OF THE GERIATRIC DEPRESSION SCALE SHORT FORM ACROSS LATIN AMERICA AND THE CARIBBEAN By Ola Stacey Rostant A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILSOPHY Measurement and Quantitative Methods 2011 ABSTRACT MEASUREMENT INVARIANCE OF THE GERIATRIC DEPRESSION SCALE SHORT FORM ACROSS LATIN AMERICA AND THE CARIBBEAN By Ola Stacey Rostant This study examined the measurement invariance of the Geriatric Depression Scale Short Form (GDS-15) in older adults across five countries in Latin America and the Caribbean. Multiple group confirmatory factor analysis (MGCFA) and item response theory likelihood ratio tests (IRTLR-DIF) were used to test the measurement invariance of the GDS-15 by gender and cross-country comparisons. The sample for this study was made up of 7,573 older adults between the ages of 60 and 102. Data for the present study comes from the Survey on Health Well-Being and Aging in Latin America and the Caribbean (SABE)(Pelaez, et al., 2004). The underlying factor structure of the GDS-15 was examined within each country. A one-factor structure was found for the countries of Chile and Cuba and a two-factor was found for the countries of Argentina, Mexico and Uruguay. Results of the multiple group confirmatory factor analysis offered support for full measurement invariance of a one-factor structure by gender within the countries of Chile and Cuba. Partial measurement invariance was found for a two-factor structure by gender within the countries of Argentina, Mexico and Uruguay by gender. The IRTLR-DIF analyses by gender and cross-country comparisons revealed a lack of parameter invariance. Items exhibiting DIF by gender were on average more difficult for men to endorse than women. In summary the IRTLR-DIF procedure identified more non-invariant items than the MGCFA procedure. Implications and recommendations for future research are addressed. DEDICATION To, JC, MS, DR and SR, without your support this would not have been possible. iii ACKNOWLEDGEMENTS Thank you to my committee for your insightful feedback. I extend my heartfelt thanks to my many mentors who have encouraged and supported me through my dissertation, David Lounsbury, Renee Canady, Clare Luz, Karen P. Williams and Clarissa Shavers. I would like to thank the elders who participated in the Survey on Health and Well Being in Latin America and the Caribbean for their time and generosity. Finally, I would like to thank my husband Matthew Swayne, for his patience, love and support. Without you I would not have been able to complete this process. Thank you to my friends and family who provided humor and encouragement along the way. iv TABLE OF CONTENTS LIST OF TABLES……………………………………………………………………….... vii LIST OF FIGURES……………………………………………………………………….. xiv CHAPTER 1: INTRODUCTION……………………………………………………….... 1 INTRODUCTION……………………………………………………………………. 1 STUDY RATIONALE………………………………………………………………. 4 CHAPTER 2: LITERATURE REVIEW…………………………………………………. 5 THE GERIATRIC DEPRESSION SCALE…………………………………………….. 5 CONFIRMATORY FACTOR ANALYSIS……………………………………………….. 10 MEASUREMENT INVARIANCE…………………………………………………….. 15 Configural Invariance………………………………….. 15 Metric Invariance……………………………………………………………… 16 Scalar Invariance……………………………………………………………… 17 Strict Invariance………………………………………… 17 Partial Measurement Invariance…………………………………………….. 18 ITEM RESPONSE THEORY………………………………………………………. 21 DIFFERENTIAL ITEM FUNCTIONING…………………………………………………. 28 NON-ITEM RESPONSE THEORY METHODS……………………………………….. 28 Mantel-Haenszel ……………………………………………………………… 28 Logistic Regression………………………………………………………….. 30 ITEM RESPONSE THEORY METHODS…………………………………………….. 32 DFIT Framework…………………………………………………………….... 32 Item Response Theory Likelihood Ratio Test………………………………. 36 CHAPTER 3: METHODOLOGY………………………………………………………………….. 38 MEASURES……………………………………………………………………… 38 SAMPLE……………………………………………………………………….... 38 ANALYSIS……………………………………………………….. 43 Multiple-group CFA analysis procedure…………………………………… 44 Binary factorial invariance…………………………………………………… 47 IRTLR-DIF analysis procedure……………………………………………… 55 Benjamini-Hochberg procedure…………………………………………….. 57 v CHAPTER 4: RESULTS……………………………………………………………….... 59 INTRODUCTION…………………………………………………. 59 Multiple-group CFA analysis……………………………………………….. 62 Invariance models by gender within country……………………………………………….. 65 Invariance models by cross-country comparisons……………………….. 85 Multiple-group CFA invariance summary………………………………………………. 105 IRTLR-DIF analysis…………………………………………………………… 108 IRTLR-DIF by gender………………………………………………………… 110 IRTLR-DIF by cross-country comparisons………………………………… 117 IRTLR-DIF summary…………………………………………………………. 133 CHAPTER 5: DISCUSSION……………………………………………….. 136 GENERAL OVERVIEW………………………………………… 136 Major Findings Multiple-Group CFA………………………………………… 137 Major Findings IRTLR-DIF………………………………………………………. 144 Research and Clinical Implications……………………………. 148 Limitations and Future Directions……………………………… 150 APPENDICES…………………………………………………………………………….. 295 REFERENCES……………………………………………………………………………. 298 vi LIST OF TABLES Table 1. Factor models 8 Table 2. Sample demographics by gender 42 Table 3. Configural invariance constraints 49 Table 4. Metric invariance constraints 50 Table 5. Scalar invariance constraints 51 Table 6. Residual invariance constraints 52 Table 7. Mean GDS-15 scores 61 Table 8. Demographic characteristics of countries 61 Table 9. One factor EFA Chile 153 Table 10. One factor EFA Cuba 154 Table 11. Two factor EFA Argentina 155 Table 12. Two factor EFA Mexico 156 Table 13. Two factor EFA Uruguay 157 Table 14. One factor CFA by country 158 Table 15. Two factor CFA by country 159 vii Table 16. Model fit for configural and nested models Chile by gender 160 Table 17. Invariance hypothesis tests Chile by gender 161 Table 18. Model fit for configural and nested models Cuba by gender 162 Table 19. Invariance hypothesis tests Cuba by gender 163 Table 20. Model fit for configural and nested models Argentina by gender 164 Table 21. Invariance hypothesis tests Argentina by gender 165 Table 22. Model fit for configural and nested models Mexico by gender 166 Table 23. invariance hypothesis tests Mexico by gender 167 Table 24. Model fit for configural and nested models Uruguay by gender 168 Table 25. Invariance hypothesis tests Uruguay by gender 169 Table 26. Model fit for configural and nested models Chile by Cuba 170 Table 27. Invariance hypothesis tests Chile by Cuba 171 Table 28. Model fit for configural and nested models Mexico by Uruguay 172 Table 29. Invariance hypothesis tests Mexico by Uruguay 173 Table 30. Model fit for configural and nested models Mexico by Argentina 174 viii Table 31. Invariance hypothesis tests Mexico by Argentina 175 Table 32. Model fit for configural and nested models Uruguay by Argentina 176 Table 33. Invariance hypothesis tests Uruguay by Argentina 177 Table 34. Item parameters and standard errors for anchor item Chile by gender 178 Table 35. Item parameters and standard errors for item Exhibiting DIF Chile by gender 179 Table 36. Item parameters and standard errors for anchor item Cuba by gender 180 Table 37. Item parameters and standard errors for item Exhibiting DIF Cuba by gender 181 Table 38. Item parameters and standard errors for anchor item Argentina by gender 182 Table 39. Item parameters and standard errors for item Exhibiting DIF Argentina by gender 183 Table 40. Item parameters and standard errors for anchor item Mexico by gender 184 Table 41. Item parameters and standard errors for item Exhibiting DIF Mexico by gender 185 Table 42. Item parameters and standard errors for anchor item Uruguay by gender 186 Table 43. Item parameters and standard errors for item Exhibiting DIF Uruguay by gender 187 Table 44.Summary of DIF analyses of the GDS-15: gender anchor item 188 Table 45.Summary of DIF analyses of the GDS-15: gender Type of DIF 189 ix Table 46. Summary of DIF analyses of the GDS-15: gender BHAdjustment 190 Table 47. Item parameters and standard errors for anchor item Chile by Cuba 191 Table 48. Item parameters and standard errors for item Exhibiting DIF Chile by Cuba 192 Table 49. Item parameters and standard errors for anchor item Mexico by Uruguay 193 Table 50. Item parameters and standard errors for item Exhibiting DIF Mexico by Uruguay 194 Table 51. Item parameters and standard errors for anchor item Argentina by Mexico 195 Table 52. Item parameters and standard errors for item Exhibiting DIF Argentina by Mexico 196 Table 53. Item parameters and standard errors for anchor item Argentina by Uruguay 197 Table 54. Item parameters and standard errors for item Exhibiting DIF Argentina by Uruguay 198 Table 55. Item parameters and standard errors for anchor item Argentina by Chile 199 Table 56. Item parameters and standard errors for item Exhibiting DIF Argentina by Chile 200 Table 57. Item parameters and standard errors for anchor item Argentina by Cuba 201 Table 58. Item parameters and standard errors for item Exhibiting DIF Argentina by Cuba 202 Table 59. Item parameters and standard errors for anchor item Mexico by Chile 203 Table 60. Item parameters and standard errors for item Exhibiting DIF Mexico by Chile 204 x Table 61. Item parameters and standard errors for anchor item Mexico by Cuba 205 Table 62. Item parameters and standard errors for item Exhibiting DIF Mexico by Cuba 206 Table 63. Item parameters and standard errors for anchor item Uruguay by Chile 207 Table 64. Item parameters and standard errors for item Exhibiting DIF Uruguay by Chile 208 Table 65. Item parameters and standard errors for anchor item Uruguay by Cuba 209 Table 66. Item parameters and standard errors for item Exhibiting DIF Uruguay by Cuba 210 Table 67. Summary of DIF analyses of the GDS-15: country by country anchor item 211 Table 68. Summary of DIF analyses of the GDS-15: country by country anchor item 212 Table 69. Summary of DIF analyses of the GDS-15: country by country Type of DIF 213 Table 70. Summary of DIF analyses of the GDS-15: country by country Type of DIF 214 Table 71: Summary of DIF analyses of the GDS-15: country by country BH-Adjustment 215 Table 72: Summary of DIF analyses of the GDS-15: country by country BH- Adjustment 216 Table 73 Descriptive statistics and correlations for the GDS-15 Argentina 217 Table 74 Descriptive statistics and correlations for the GDS-15 Chile 218 Table 75 Descriptive statistics and correlations for the GDS-15 Mexico 219 xii Table 76 Descriptive statistics and correlations for the GDS-15 Cuba 220 Table 77 Descriptive statistics and correlations for the GDS-15 Uruguay 221 Table 78 GDS-15 cutoff scores by country 222 Table 79 GDS-15 cutoff scores by gender and country 222 Table 80 223 Countries with the most difficulty endorsing items xiii LIST OF FIGURES Figure 1. Chile scree-plot 224 Figure 2. Cuba scree-plot 225 Figure 3. Argentina scree-plot 226 Figure 4. Mexico scree-plot 227 Figure 5. Uruguay scree-plot 228 Figure 6. Chile by gender 229 Figure 7. Test information curve Chile by gender 230 Figure 8. Cuba by gender item 7 231 Figure 9. Cuba by gender item 12 232 Figure 10. test information curve Cuba by gender 233 Figure 11. Argentina by gender item 9 234 Figure 12. Argentina by gender item 11 235 Figure 13. Argentina by gender item 12 236 Figure 14. Argentina by gender item 15 237 Figure 15. Test information curve Argentina by gender 238 xiv Figure 16. Mexico by gender item 6 239 Figure 17. Mexico by gender item 8 240 Figure 18. Test information curve Mexico by gender 241 Figure 19. Uruguay by gender item 5 242 Figure 20. Uruguay by gender item 6 243 Figure 21. Uruguay by gender item 12 244 Figure 22. Uruguay by gender item 14 245 Figure 23. Test information curve Uruguay by gender 246 Figure 24. Chile by Cuba item 4 247 Figure 25. Chile by Cuba item 8 248 Figure 26. Chile by Cuba test information curve 249 Figure 27. Mexico by Uruguay item 6 250 Figure 28. Mexico by Uruguay item 7 251 Figure 29. Mexico by Uruguay item 9 252 Figure 30. Mexico by Uruguay item 12 253 xv Figure 31. Mexico by Uruguay test information curve 254 Figure 32. Mexico by Argentina item 2 255 Figure 33. Mexico by Argentina item 7 256 Figure 34. Mexico by Argentina item 9 257 Figure 35. Mexico by Argentina item 11 258 Figure 36. Mexico by Argentina item 12 259 Figure 37. Mexico by Argentina test information curve 260 Figure 38. Argentina by Uruguay item 2 261 Figure 39. Argentina by Uruguay item 6 262 Figure 40. Argentina by Uruguay item 7 263 Figure 41. Argentina by Uruguay item 11 264 Figure 42. Argentina by Uruguay test information curve 265 Figure 43. Argentina by Chile item 2 266 Figure 44. Argentina by Chile item 7 267 Figure 45. Argentina by Chile test information curve 268 xvi Figure 46. Argentina by Cuba item 2 269 Figure 47. Argentina by Cuba item 5 270 Figure 48. Argentina by Cuba item 7 271 Figure 49. Argentina by Cuba test information curve 272 Figure 50. Mexico by Chile item 7 273 Figure 51. Mexico by Chile item 8 274 Figure 52. Mexico by Chile item 11 275 Figure 53. Mexico by Chile item 12 276 Figure 54. Mexico by Chile item 15 277 Figure 55. Mexico by Chile test information curve 278 Figure 56. Mexico by Cuba item 2 279 Figure 57. Mexico by Cuba item 4 280 Figure 58. Mexico by Cuba item 7 281 Figure 59. Mexico by Cuba item 11 282 Figure 60. Mexico by Cuba item 12 283 xvii Figure 61. Mexico by Cuba item 14 284 Figure 62. Mexico by Cuba test information curve 285 Figure 63. Uruguay by Chile item 2 286 Figure 64. Uruguay by Chile item 3 287 Figure 65. Uruguay by Chile item 10 288 Figure 66. Uruguay by Chile item 14 289 Figure 67. Uruguay by Chile test information curve 290 Figure 68. Uruguay by Cuba item 4 291 Figure 69. Uruguay by Cuba item 6 292 Figure 70. Uruguay by Cuba item 8 293 Figure 71. Uruguay by Cuba test information curve 294 xviii CHAPTER 1: INTRODUCTION INTRODUCTION Health disparities research on the mental health of older adults faces multiple measurement challenges. Most mental health research relies on self-report measures, which represents the subjective reality of the individual. In addition, when self-report measures developed to understand psychological phenomena in one cultural group are applied to other groups, the conceptual and psychometric properties of said measures in these new environments are seldom tested. Since the mechanisms that contribute to mental disorders such as depression, are self-assessed and subjective, it is necessary to know whether the instruments being used to capture the phenomena are conceptually and psychometrically equivalent across different populations (Stalh & Hahn, 2006). Although, research by Hays, Ramirez, Stalh et al. (Hays, Morales, & Reise, 2000; Ramirez, Ford, Stewart, & Teresi, 2005; Stalh & Hahn, 2006) and others have recognized the need for addressing these issues in health measurement, a dearth of research still exists. Health disparities research focuses on significant and persistent differences in disease rates and health outcomes between people of differing, race, ethnicity, socioeconomic position and area of residence (Eberhardt, 2004; Hartley, 2004). Constructs used in health disparities research tend to be abstract and hence not directly observable or measureable (Stewart & Napoles-Springer, 2000), and require statistical techniques that meet this structure, such as structural equation modeling and item response theory. 1 The accuracy or inaccuracy, with which health constructs are measured, can have an impact on study results, by producing biased estimates of symptoms and disorders, which can in turn lead to spurious conclusions, which impact health policy. The first step in addressing health disparities measurement among diverse populations would be to assess how self-reported health measures function. That is, for meaningful comparisons to be made cross-culturally, researchers must first determine whether measures developed among a majority group or in western society, perform the same way when used in nonmajority groups or non-western societies. In addition to the need for assessing how measures function between groups such as older Mexican adults versus older Mexican American adults, researchers need to be mindful of the ―cultural homogeneity‖ pitfall. Cultural homogeneity assumes measurement equivalence based on ethnic populations sharing the same language. Ramirez (2005) states that the assumption of cultural homogeneity can actually ―Exacerbate inaccurate cultural stereotypes and can lead to misleading conclusions in comparing prevalence of disorders, hindering the delivery of quality healthcare to different racial and ethnic groups. In addition, assuming cultural homogeneity based on shared language, is misleading because there are cultural and idiomatic nuances that can potentially exist within populations even though they share the same language” (Ramirez, et al., 2005). In short, failure to address between and within group measurement issues ultimately creates problems for researchers trying to draw research conclusions as well as end-users of said research such as health care providers (pg. 1643).” 2 With a growing and diverse aged population within the United States and globally, there is a need for the validation of existing measures in order to establish crossethnic equivalence of health related assessment tools (Byrne & Watkins, 2003; Myers, Calantone, Page, & Taylor, 2000; Ramirez, et al., 2005). The goal of the current study is to add to the cross-cultural and methodological literature by examining the within and between country differences in the manifestation of depression across Latin America and the Caribbean. 3 Present Study Rationale The aim of this study is to assess measurement equivalence/invariance of the Geriatric Depression Scale Short Form (GDS-15) across five Spanish speaking countries in Latin America and the Caribbean .Two methodological approaches will be used (1) Study 1: multiple group confirmatory factor analysis (MGCFA) and (2) Study 2: multiple group item response theory and item response theory likelihood ratio tests (IRTLR) for the assessment of differential item functioning (DIF). These techniques will be compared to determine each methods consistency in assessing item and scale function across countries and gender. Research Questions Study 1 will be guided by the following questions: 1. What is the factorial structure of the GDS-15 in each of five Spanish speaking countries in Latin America? 2. Is the factor structure of the GDS-15 invariant across countries and gender? Study 2 will be guided by the following questions: 1. How invariant are IRT-based item difficulty estimates across countries and gender? 2. How invariant are IRT-based item discrimination estimates across countries and gender? 4 CHAPTER 2: LITERATURE REVIEW THE GERIATRIC DEPRESSION SCALE The original geriatric depression scale (GDS) was developed 28 years ago by Brink et al. (1982) (Brink, et al., 1982). Prior to the development of the GDS, there were no depression screening instruments developed specifically for older adults. Previous instruments used with older adults contained items which referred to physical manifestations of depressive symptomatology. Research has shown that items referring to physical symptoms are not a good indicator of depression in older populations. For example, (Coleman, et al., 1981) found that sleep disturbances were a common symptom of depression but such disturbances were also common in older adults without depression, while rare in younger adults not suffering from depression (Yesavage, et al., 1983). For these reasons the GDS was developed and validated with older adults in the United States. The original GDS consisted of 100 items which were tested on 46 older adults in San Francisco, CA. Thirty items were selected from the original 100, based on high item-total correlations. These items covered six qualitative domains (1) lowered affect, (2) inactivity, (3) irritability, (4) withdrawal, (5) distressing thoughts and (6) negative judgments of the past and present. Yesavage then repeated the process a year later (1983), selecting 30 items based on highest item-total correlations and identified nine qualitative domains (1) somatic complaints, (2) cognitive complaints, (3) motivation, (4) future/past 5 orientation, (5) self-image, (6) losses, (7) agitation, (8) obsessive traits and (9) mood itself. From the GDS-30, the GDS-15 was developed to ease administration and lessen the time requirement for completing the instrument. In a subsequent validation study Sheikh and Yesavage (1986) (Sheikh & Yesavage, 1986) selected the 15 items from the GDS-30 which had the highest item-total correlations. Of the 15 items, 10 of them reflect the presence of depression when answered positively, while the rest indicate depression when answered negatively. Research on the factorial structure of the GDS-30 and GDS-15 has been inconsistent across the literature (Adams, 2001; Adams, Matto, & Sanders, 2004; L. M. Brown & Schinka, 2005; P. J. Brown, Woods, & Storandt, 2007; Chau, Martin, Thompson, Chang, & Woo, 2006; Cheng & Chan, 2004; Friedman, Heisel, & Delavan, 2005; Ganguli, et al., 1999; D. W. L. Lai, Fung, & Yuen, 2005; Malakouti, Fatollahi, Mirabzadeh, Salavati, & Zandi, 2006; Parmelee & Katz, 1990; Parmelee, Lawton, & Katz, 1989; Salamero & Marcos, 1992; Schreiner, Morimoto, & Asano, 2001; Tang, Wong, Chiu, Ungvari, & Lum, 2005; Wrobel & Farrag, 2006; Yang, Small, & Haley, 2001). For the GDS-30, Parmalee and colleagues found 6 factors (Parmelee, et al., 1989), Salamero and Marcos found 3 factors (Salamero & Marcos, 1992), Adams found 6 factors (Adams, 2001), Adams evaluated the GDS-30 again and found 5 factors (Adams, et al., 2004), the Arabic version of the GDS-30 revealed 7 factors (Chaaya, et al., 2008; 6 Wrobel & Farrag, 2006) while the Portuguese version had 3 factors (Pocinho, Farate, Dias, Yesavage, & Lee, 2009). Studies that evaluated the factor structure of the GDS-15 include Mitchell et al. (1993) in which they found 3 factors (Mitchell, Matthews, & Yesavage, 1993), Brown et al. found 2 factors (P. J. Brown, et al., 2007), the Chinese version had 4 factors (Chau, et al., 2006; D. Lai, Tong, Zeng, & Xu, 2010; D. W. L. Lai, et al., 2005) and finally the Iranian version had 2 factors (Malakouti, et al., 2006). In addition to the inconsistent factorial structures, the definitions of these factors were also wide spread, from general depression and dsyphoria to lack of vigor and agitation. Across studies the factors referenced most often tended to be positive and negative affect, energy loss and life satisfaction. 7 Table 1 Factor models of the GDS-15 Number of Factors Study Factor Definitions Sheikh et al. (1986) 1 Mitchell et al. (1993) 3 General Depressive Affect, Life Satisfaction, Withdrawal Schreiner et al. (2001) 2 Positive Affect, Energy Loss and Depressed Mood Positive Attitude Toward Life, Distressing Thoughts and Negative Judgement, Inactivity and Reduced Self-Esteem Incalzi et al. (2003) 3 Friedman et al. (2005) 2 Lai et al. (2005) 4 Brown et al. (2007) 2 Oinishi et al. (2007) 4 General Depressive Affect General Depressive Affect, Positive Affect Negative Mood, Positive Mood, Inferiority and Disinterest, Uncertainty General Depressive Affect, Life Satisfaction Unhappiness, Apathy and Anxiety, Loss of Hope and Morale, Energy Loss 8 To date the instrument has been translated into 24 languages, with less than 10 of the studies that used the instrument, actually evaluating the factor structure. The primary forms of psychometric evaluation of the GDS-15 have been (1) exploratory factor analyses, (2) test-retest reliability and (3) sensitivity and specificity analyses. Only Brown et al. (2007) conducted multiple group confirmatory factor analysis. The aims of the present study are to (1) assess the factorial structure within each country, (2) conduct multiple group confirmatory factor analyses within each country by gender and (3) conduct multiple group confirmatory factor analyses between countries. 9 CONFIRMATORY FACTOR ANALYSIS Confirmatory factor analysis (CFA) is a form of the factor analytical model which examines the covariation among manifest indicators in order to confirm the hypothesized underlying latent constructs or common-factor. CFA is a theory driven technique in which the researcher specifies (1) the number of factors and their inter-correlation, (2) which items load on which factor and (3) whether errors are correlated. Statistical tests can then be conducted to determine whether the data confirm the theoretical model, thus the model is thought of as confirmatory (Bollen, 1989). With CFA a researcher is able to simultaneously conduct multiple group analyses across time or samples, in order to evaluate measurement invariance/equivalence. The following is a mathematical presentation of linear CFA for testing measurement invariance (Baumgartner & SteenKamp, 2001; Bollen, 1989; Jöreskog, 1971). In the CFA model, the observed response i to an item i i  1,...p  is represented as a linear function of a latent construct and stochastic error term  j  j  1,...m , an intercept , i  i . Thus, x       i i ij j 10 Equation 1  i on j , the slope or factor loading, xi due to a defines the metric of measurement, as it shows the amount of change in Where  ij is the slope of the regression of x unit change in when  j . The intercept  j  0 (Sörbom, 1974). Assuming p items and m latent variables, and specifying the same factor structure for each country g g  1,...G  we get the following measurement model x g   g  g  g   g where x i , in contrast, indicates the expected value of i x g is a Equation 2 p  1 vector of observed variables in country g ,  vector of latent variables, g is a p  1 vector of item intercepts and is a p  1 vector of errors of measurement,  g is a pm  g 0  It is assumed that  and that COV   g    m 1 g is a matrix of factor loadings. g , g  0   . Equation  (2) shows that observed scores on p items are a function of underlying factor scores, but 11 that observed scores may not be comparable across countries because of different intercepts  g     i  g ij . and scale metrics To identify the model, the latent constructs have to be assigned a scale in which they are measured. In multiple group analyses this is done by setting the factor loading of one item per factor to 1. Items for which loadings are fixed at unity are referred to as marker (or reference) items. The same items should be used as marker item(s) in each country. Taking the expectations of equation (2) yields the following relationship between the observed item means and the latent means  g   g  g  g Equation 3 where g is the p  1 vector of item means and  g is the m 1 vector of latent  g ). The parameters  g and  g cannot be identified means (i.e. the means of simultaneously (Sörbom, 1982). In other words, there is no definite origin for the latent variables. To deal with this indeterminacy, constraints are placed on the parameters. There are two approaches to placing constraints. The first is to fix the intercept of each latent variable‘s marker item to zero in each country. This equates the means of the 12 g g m   m , where m latent variables to the means of their marker variables (i.e. indicates that the item is a marker item). A second approach is to fix the vector of latent means at zero in the reference country (i.e.  r  0 , where the superscript r indicates the reference country) and to constrain one intercept per factor to be invariant across countries. The latent means in the other countries are then estimated relative to the latent means in the reference country. These two approaches lead to an exactly identical model with respect to the item intercepts and latent construct means. If further restrictions are imposed on the model (e.g., all intercepts are specified to be invariant across countries), the intercepts and latent means are over-identified, and the fit of the means part of the model can be investigated. In addition to the mean structure given by equation (3), the covariance structure has to be specified. As usual, the variance-covariance matrix of x in country g , g is given by:  g  g  g g   g where g Equation 4  g is the variance-covariance matrix of the latent variables in  is the variance-covariance matrix of g g and which is usually constrained to be a diagonal matrix. The overall fit of the model is based on the discrepancy between the 13 observed variance-covariance matrices matrices Sg and the implied variance-covariance ˆ  g and the discrepancy between the observed vectors of the means mg and g . the implied vectors of ˆ . 14 MEASUREMENT INVARIANCE Measurement invariance is an umbrella term that is really made up of various forms of invariance. For example, measurement invariance can refer to the invariance of factor loadings, intercepts, or errors (Meredith, 1993). These forms or levels of invariance are referred to as configural, metric/weak, scalar/strong and residual error/strict. Each of these levels is a successively more restrictive test of measurement invariance. Configural invariance Configural invariance (J.L. Horn, McArdle, & Mason, 1983), weak invariance (Meredith, 1993) or pattern invariance (R. E. Millsap, 1997) is the lowest or weakest level of measurement invariance that can be obtained. Configural invariance refers to the pattern of salient (non-zero) and non-salient (zero or near zero) loadings which define the structure of a measurement instrument. Configural invariance is supported if the specified model with zero-loadings on non-target factors fits the data well in all groups; all salient factor loadings are significantly below unity. Simply stated, configural invariance holds when the same items load on the same factors for both groups of interest (e.g. men vs. women). The configural model is also used as the baseline model to which the decrements in fit associated with more constrained nested models are compared. 15 Metric invariance Metric (Thurstone, 1947), weak (Meredith, 1993), or factor pattern invariance (R. E. Millsap, 1995) is more restrictive than configural invariance. This level of invariance requires that the loadings in a CFA be constrained to be equivalent in each group while permitting the factor variances and covariances to vary across groups. Equation 5 1  2  ....G If the factor loadings are found to be invariant this means that the factor loadings in one group are proportionally equivalent to corresponding loadings in other groups (Bontempo, et al., 2008). ―Loadings standardized to the common-factor variance would each differ from the corresponding loading in another group by the same proportion—the ratio of the variance in each group. It is essential that the common-factor variances are freely estimated in all but the first group. This condition is what creates a test of proportionality when equality constraints are imposed on the loadings (pg.51)‖ (Bontempo, 2007). If metric invariance holds it allows a researcher to claim that there are similar interpretations of the factors across groups. However, others have suggested that higher levels of invariance provide greater evidence of the equivalent construct interpretation across groups (Meredith, 1993). 16 Scalar invariance Scalar (SteenKamp & Baumgartner, 1998) or Strong (Meredith, 1993) invariance is more restrictive than metric/weak invariance because it constrains factor loadings as well as intercepts to be equal across groups. Equation 6  1   2  .... G  ―This requires the model to account for all mean differences in the items solely through the common-factor mean (pg. 52)‖ (Bontempo, 2007). If scalar invariance is obtained then comparison of factor means across groups is supported. Strict invariance Strict invariance is the most restrictive constraint; it requires that equality constraints for loadings, intercepts and unique-nesses (errors) be held equal across groups. Equation 7 1  2  ....G This level of measurement requires that the specific and random error components of each item be equivalent across groups, such that differences in variance across groups can only take place at the latent variable level. If strict invariance is obtained, a researcher can be confident in making measurement comparisons based on factor mean and factor covariance structures across groups. There is however, a lack of consensus in 17 the literature as it relates to the necessity of acquiring strict invariance. Researchers such as Byrne and Vandenberg & Lance, refer to strict invariance as being too restrictive and not of import (Byrne, 2008) (R.J. Vandenberg & C.E. Lance, 2000). Partial measurement invariance The aforementioned measurement invariance tests build upon one another and with each level of invariance obtained a researcher can build support for the equivalence of a measure across groups and time. Initially researchers testing the invariance of an instrument had two options (1) obtain full measurement invariance or (2) if full measurement invariance is not obtained, abandon further invariance testing. Byrne and colleagues (Byrne, Shavelson, & Muthen, 1989) presented the idea that there could be a middle ground between full measurement invariance and no measurement invariance. That middle ground is partial measurement invariance. Partial measurement invariance is the idea that some invariance is better than noinvariance. What this means is that if, for example, a researcher obtains configural invariance and then proceeds to test metric invariance and finds that they have a decrement in fit, future invariance analyses do not have to be abandoned. Instead they can investigate the source of the misfit and then relax the constraints for specific parameters that are exhibiting misfit. The relaxation of constraints then allows the researcher to recalibrate the model for fit and move on to another level of invariance testing, with the understanding that further analyses are based on the partial invariance of that level (e.g. partial metric invariance leads to partial scalar invariance and so on). 18 The implementation of partial measurement invariance allows analyses to proceed but it also introduces two additional issues (1) how much measurement invariance or lack thereof is acceptable and (2) how does one identify misfit? Byrne and colleagues address both of these issues in the context of metric invariance. Byrne et al. (Byrne, et al., 1989) state that the measurement invariance literature leaves researchers with the impression that ―given a non-invariant pattern of factor loadings (metric invariance), further testing of invariance and the testing for differences in factor mean scores is unwarranted‖(pg. 458). This idea is unfounded when the model specification includes multiple indicators of a construct and at least one item (other than the one that is fixed to 1.00 for identification purposes) is invariant (Byrne, et al., 1989; Muthen & Christoffersson, 1981). Byrne and colleagues provide evidence that partial metric invariance only requires cross-group invariance of zero loadings and some, but not necessarily all of the salient loadings (Byrne, et al., 1989). The second issue with partial measurement invariance focuses on how to decide which constraints need to be relaxed. Generally a researcher relies on substantive reasons when deciding which loadings or intercept constraints to relax across-groups. This information is not always available which means that the researcher must depend on modification indices when respecifying their model. Structural equation modeling software packages such as Mplus provide modification indices that identify parameters with the poorest fit. The values of the modification indices provide information on the 19 estimated decrease in the  2 value (misfit) that would occur if the constrained parameter in question was relaxed (freely estimated). Caution must be taken when using modification indices in order to avoid over fitting of the model which would impair generalizability (Tomarken & Waller, 2003). Steenkamp et al. (SteenKamp & Baumgartner, 1998) state that ―invariance constraints should be relaxed only when modification indices are highly significant (both in absolute magnitude and in comparison with the majority of other modification indices)‖ (pg.81). Changes in alternative indices of overall model fit (CFI, TLI, RMSEA) should be evaluated especially those that take model parsimony into account (Steiger, 1990). As a rule the use of modification indices should be kept to a minimum, by only implementing modifications that would correct severe problems with model fit, this would in turn minimize capitalization on chance and once again maximize the cross-validity of the model (MacCallum, Roznowski, & Necowitz, 1992). The reason for proceeding with partial measurement invariance analyses can be summed up by Horn (1991, p. 125) (J.L. Horn, 1991; J.L. Horn, et al., 1983) ―metric invariance is a reasonable ideal…a condition to be striven for, not one expected to be fully realized…and scientifically unrealistic”(pg.125). The use of partial measurement invariance allows researchers to deal with the realities of working with latent constructs measured by self-reported items all of which have some level of inherent bias, which ultimately influences the equivalence of measures. 20 ITEM RESPONSE THEORY Item response theory (IRT) is less of a theory and more of a collection of mathematical models, mostly non-linear latent variable/trait models which attempt to explain how people respond to items. IRT models present a picture of the performance of each item on an instrument and how the instrument measures the construct of interest in the population being studied. Basic IRT definitions, models and assumptions are presented below: Basic IRT definitions Latent variable In classical test theory the ―latent variable/trait‖ is represented by the ―true score‖ in structural equation modeling it is referred to as the ―latent factor‖ in IRT the latent trait is represented by theta ; theta is the unobservable construct being measured (e.g., depression). Item threshold The item threshold (b) (item location, item difficulty, item severity) provides information on the location of an item along the continuum indicating the level of the underlying variable (e.g. depression) needed to endorse an item (e.g., do you feel like your life is empty?) with a specified probability, typically set at .50 (Reise, 2005). 21 Item discrimination Item discrimination (a,  item slope) describes the strength of an items ability to discriminate among people with trait levels below and above the item threshold- b. The a- parameter can also be interpreted as how related an item is to the trait measured by the instrument (Reise, 2005). Item characteristic curve (ICC) The item characteristic curve models the relationship between a person‘s probability for endorsing an item category (e.g., yes or no) and the level on the construct measured by the instrument. Information function The information function for items (IIF) or scales (SIF) is an index which indicates the range of trait level over which an item or scale is most useful for distinguishing among individuals. ―For any item, the IIF or SIF characterizes the precision of measurement for persons at different levels of , with higher information denoting better precision (lower standard error)(pg.427) (Reise, 2005)‖. Item response theory models for dichotomous data The three commonly used IRT models are the 1(1PL), 2 (2-PL) and the 3parameter logistic model (3-PL). 22 The 1-parameter model is specified as follows: 1 P   1  exp  1.7a  b  Equation 8 where -1.7 is a scaling factor and (a) is the slope/discrimination parameter which is constant for all items (and is often scaled to equal 1). Having ―a‖ scaled to 1 implies that all items on the scale have equal discrimination, which is an assumption of the 1 parameter model. Items can however, discriminate at different places along the continuum; with items that are easy to endorse discriminating among people who are low on the construct of interest (e.g. depression) and items that are hard to endorse discriminating among people who are high on the construct of interest. Theta is the unobservable construct being measured and (b) is the item difficulty/severity which provides information on the location of an item along the continuum indicating the level of the underlying variable needed to endorse an item. Although, the 1 parameter-model is used in many settings, the strict assumption of equal discrimination rarely holds for health measures; instead the 2 parameter model is most commonly applied to dichotomous health data. 23 The 2-parameter model is specified as: P    i 1 Equation 9   1  exp  1.7 a    b   j  i i   Where Pi   the probability of endorsing item i  , a i is the  slope/discrimination parameter for item i which indicates how related a particular item is to the construct being measured. The higher the slope, the more variability in item responses can be attributed to differences in the latent construct (Edwards, 2009). The discrimination parameter is analogous to a factor loading. Parameter bi is the threshold  or severity parameter for item i . The threshold parameter indicates the point along the latent continuum where an individual would have a 50% chance of endorsing an item. The higher the threshold, the higher an individual must be on the latent trait to have a 50% chance of endorsing that item. With the 1- parameter model the only parameter allowed to vary was the threshold (b), in the 2-parameter model both the discrimination (a) and threshold (b) are allowed to vary. The last model presented will be the 3 parameter model, which is primarily used in educational testing. 24 The 3-parameter model is specified as: P   c  1  c  1  exp  1.7 a  b  Equation 10 where ―a‖, ―b‖,  are defined as in the 1 and 2 parameter models. The 3parameter model introduces the ―c‖ parameter or guessing parameter, which is a lower asymptote parameter. The 3-PL simultaneously, estimates the discrimination, difficulty/severity and lower asymptote parameters. With multiple-choice items, questions can be solved by ―guessing‖ and because of this the probability of success is greater than zero, even for persons with lower trait/ability levels. Finally, it should be noted that the (b) parameter, is interpreted differently in the 3-PL model. Item difficulty still occurs at the point of inflection, however, the probability of endorsement is no longer at .50, but rather the inflection point is shifted by the lower asymptote (Embretson, 2000). This type of model is primarily used in educational testing, but is not applicable to psychological or health measurement instruments. The reason the use of the 3-PL model is not used in health measurement is because it has a guessing parameter which is difficult to interpret in the context of health items. 25 Assumptions There are four assumptions that IRT models make (1) monotonicity, (2) unidimensionality, (3) local independence and (4) invariance. Monotonicity implies that as levels of the latent construct increase (absence of depression to severe depression), individuals have a higher probability of endorsing item response categories indicating poorer mental health (higher depression). This assumption is evaluated by examining graphs of summed scale scores compared to item endorsement rates or item means. The unidimensionality assumption means that there is only one common latent variable being measured. This means that no other variable except a person‘s level on ―depression‖ accounts for the variation in responses to the 15 items on the GDS-15 scale. Currently there is no gold standard for assessing whether data is unidimensional for IRT application (Reise, 2005). With that said, the most commonly used approaches for evaluating unidimensionality is to use a combination of exploratory (EFA) and confirmatory (CFA) factor analyses. With exploratory factor analysis a researcher is looking for a large ratio between the first and second Eigen values, which would indicate one primary dimension, with all items loading highly on a single common factor. In addition, after extracting one factor, residuals should be small which would indicate that one dimension accounts for a high percentage of item covariance. The measures of fit for a CFA (CFI, TLI, and RMSEA) provide sufficient evidence for unidimensionality. The assumption of local independence means that once one common factor has been extracted from an item covariance matrix, the residuals are zero; or that after 26 accounting for the latent variable, item responses are independent of one another (Reise, 2005). The fourth assumption of invariance follows the following tenants: 1. An individual‘s position on a latent-trait continuum can be estimated from their responses to any set of items with known item characteristic curves. 2. Item properties do not depend on the characteristics of a particular population. 3. The scale of the trait does not depend on an item set, but instead exits independently (Reise, Ainsworth, & Haviland, 2005). 27 DIFFERENTIAL ITEM FUNCTIONING The purpose of this section is to provide a general overview of various differential item functioning (DIF) detection methods. DIF detection methods involve three elements: (1) item response which may be treated as observed or latent, (2) an estimate of ability (depression) level and (3) subgroup membership such as gender, country of origin, or race/ethnicity. The overarching question in a DIF analysis is how a person‘s item response is related to their level of ability (depression) based on their subgroup membership. To investigate this question, DIF analyses focus on differences in item parameters. In other words, a DIF analysis is concerned with whether or not the likelihood of item or category endorsement is the same across subgroups. DIF methods fall into two categories: (1) IRT based methods and (2) non-IRT based methods. Non-IRT DIF Methods Mantel-Haenszel The Mantel-Haenszel (MH) procedure proposed by Holland and Thayer (Holland & Thayer, 1988) is one of the most widely used nonparametric methods for detecting differential item functioning (DIF). The MH procedure is based on the analysis of contingency tables. Subjects are matched on an observed variable (e.g., total score on the GDS-15), and then counts of subjects in the focal and reference groups endorsing or not endorsing the item are compared. The reference group consists of individuals for whom the test/instrument is expected to favor and the focal group consists of individuals who were at risk of being disadvantaged by the test/instrument. The MH procedure can be 28 used to examine whether within each depression score grouping, the odds of a symptom (endorsement of a particular item) is the same across groups. A common odds ratio (which tests whether or not the likelihood of item symptom response is the same across the depression groupings) also can be used to construct a DIF magnitude measure (J. A. Teresi, Ramirez, Lai, & Silver, 2008). This is accomplished by converting the odds to log odds and applying transformations, which in turn provide interpretable measures of magnitude (J. A. Teresi, et al., 2008). The MH common odds ratio assesses the strength of association in three-way 2X2xJ contingency tables. It estimates how stable the association between two factors is in a series of ―J‖ partial tables. This procedure tests the following null hypothesis P P Ri  Fi H : 0 q q Ri Fi Equation 11 which reflects the odds of endorsing item i for the reference group R, equal to the corresponding odds for the focal group F, Note that q i  1 P i . 29 P Fi q Fi . P Ri q , are Ri The alternative hypothesis is P P Ri   Fi H : iq 1 q Ri Fi P Riq Fi   where i P Fiq Ri provides a Equation 12     is the common odds ratio  i1  . This procedure  2 test statistic as well as an estimator of i across the 2x2xJ tables. The latter is a measure of effect size, how much the data depart from the null hypothesis. Logistic regression Another DIF detection procedure similar to the Mantel-Haenszel method is logistic regression (LR). The LR procedure predicts the probability of endorsement of an item based on e       1   0  PU  1      1  e  0  1     Equation 13 where U is the response to the item,  is the observed ability or symptomatology (depression),   0 is the intercept and 1 is the slope parameter. This formula is the 30 standard LR model for predicting a dichotomous dependent variable from given independent indicators (Steele, et al., 2006; Swaminathan & Rogers, 1990). This model can then be extended to evaluate differences between groups such that Equation 14 y     x totalscore   x grp    x x  0 11 2 2 3 1 2    where y is the item response, group membership,  x x 0 is the intercept, 1 is the total score, 2 is the  1 is the coefficient for the total score,  2 is the coefficient for group membership and  3 is the interaction between the total score and group membership. DIF is said to be present if persons with the same level of depression but from different groups do not have the same probability of endorsing an item. There are two types of DIF: uniform and non-uniform. Looking at equation (14) DIF is not present if the LR curves for the two groups are the same, that is  x 11 11   x . If 12 12   x  x 01  02 ,  12 12 but 01 11 11  02 the curves will be parallel indicating uniform DIF. Uniform DIF is present when the probability of endorsing an item is greater for one group over the other, uniformly across all levels of the construct of interest. 31 Conversely if  01   x  x 02 but 11 11 12 12 the curves are not parallel, indicating non-uniform DIF. Non-uniform DIF is present when there is an interaction between the total score and group membership, which means that the difference in the probabilities of item endorsement for the two groups are not the same at all levels of the construct of interest. ITEM RESPONSE THEORY METHODS DFIT framework The DFIT method developed by Raju and colleagues (Flowers, Oshima TC, & Raju, 1999; Raju, van der Linden WJ, & Fleer, 1995) is based on IRT. Raju‘s DFIT framework has several characteristics: (1) it can be used with both dichotomous and polytomous items, (2) it can handle both unidimensional and multidimensional IRT models, (3) it evaluates DIF at both the item and scale level and (4) it provides two types of DIF: compensatory DIF (CDIF) and non-compensatory DIF (NCDIF). Within the DFIT framework, DIF is defined as the difference between the probabilities of a positive item response for individuals from different groups at the same level of the underlying attribute (Jeanne A. Teresi, 2006) . This framework uses a weighted difference in the conditional item probabilities which is the probability of people endorsing the item at each score for the reference group (women) and the focal group (men). Before the DFIT framework can be applied, however, separate IRT item parameter estimates for a reference group (women) and a focal group (men) must be obtained. Because the two sets of item parameters are obtained from separately estimated 32 IRT models, they must be placed on a common metric before comparisons to evaluate DIF can be made. This process is referred to as linking or equating. Once linked, the two sets of item parameters can be used to make item and scale level assessments of DIF using the DFIT framework. Non-compensatory DIF (NCDIF) To examine the magnitude of DIF Raju et al. (Raju, et al., 1995) and later Flowers et al. (Flowers, et al., 1999) developed the compensatory DIF index (CDIF) as well as the non-compensatory DIF index (NCDIF). According to Raju and colleagues (Oshima TC & Morris, 2008; Raju, et al., 1995), the NCDIF is defined as the average squared distance between the item characteristic functions for the focal and reference groups. For dichotomous IRT models, the gap is defined as the difference in the probability of a correct response, di  s   P  s  P  s  iF iR Equation 15 . NCDIF is defined as the expected value of the square distance, NCDIF i E  2   d  F i s  33 Equation 16 where E F denotes the expectation taken over the  distribution from the focal group. Squaring ( d ) is important so that differences in opposite directions will not cancel each other out. This allows NCDIF to assess both uniform and non-uniform DIF. When item characteristic functions (ICF‘s) differ based on the ―b‖ parameter, this is referred to as uniform DIF. Uniform DIF is present when the probability of endorsing an item or getting an item correct is greater for one group over the other, uniformly across all levels of the construct of interest. Non-uniform DIF occurs when the ―a‖ parameters differ across groups. In this case non-uniform DIF is present when there is an interaction between the ability and group membership, which means that the difference in the probabilities of item endorsement or a correct response for the two groups are not the same at all levels of the construct of interest. When DIF is non-uniform, differences in both directions will contribute to NCDIF. The DFIT framework is able to assess DIF at both the item and test level (DTF) differential test functioning. The DTF is similar to the NCDIF except the curves being compared are test characteristic functions instead of item characteristic functions. D s   T  s  T  s  F R 34 Equation 17 DTF is defined as the expected value of the squared difference between focal and reference groups, where the expectation is taken across the  distribution from the focal group, DTF   E D  s F  2  Equation 18   Despite the similarity in how NCDIF and DTF are defined, the relationship between the two is not straightforward. NCDIF assumes that all items other than the studied item are DIF free, while DTF depends not only on the level of DIF on each item, but also the pattern of DIF across items (Oshima TC & Morris, 2008). As a result, if you removed an item with a large NCDIF it would not necessarily result in a large decrease in DTF. However, the compensatory DIF index does a better job at reflecting an item‘s contribution to DTF. Compensatory DIF (CDIF) Compensatory DIF (CDIF) takes the item covariances into account. CDIF is defined as Equation 19 CDIF     di, D   Cov di, D        l l  di D     where COV stands for covariance. The CDIF index is additive such that: 35 n DTF   CDIFi i 1 Equation 20 The additive nature of CDIF makes it possible for a researcher to investigate the net effect of removing particular items, on DTF. Item response theory likelihood ratio tests The item response theory log-likelihood ratio test (IRTLR) is another IRT-based approach to DIF detection procedure. The IRTLR involves the statistical comparison of two hierarchically nested item response models, (1) a constrained or compact model and (2) an unconstrained or augmented model. The unconstrained model contains all of the parameters of the constrained model, hence the constrained model is said to be hierarchically nested within the unconstrained model. As explained by Thissen et al. (1993): ―the goal of the procedure is to test whether the additional parameters in the unconstrained model is significantly different from zero….The IRTLR test takes the form of   G 2 df   2 log  Likelihoodunconstrained   Likelihoodconstrained   36 Equation 21 where Likelihood[.] represents likelihood of the data given the maximum likelihood estimates of the parameters of the model, and df is the difference between the number of parameters in the unconstrained model and the number of parameters in the constrained model (pg. 73)”. The value of G 2 is assumed to be distributed as  2 df  under the null hypothesis. Thus, if the value of G 2 df is large, representing an unlikely value from a  2 df  distribution, we reject the null hypothesis and the constrained model (Thissen, Steinberg, & Wainer, 1993). The IRTLR method employs the Marginal Maximum Likelihood estimation algorithm developed by Bock & Aitkin (Bock & Aitkin, 1981), which makes it possible for the item parameters to be reliably estimated without using information about the ability distribution and uses likelihood ratio tests to evaluate the statistical differences of models between groups (Thissen, et al., 1993). 37 CHAPTER 3: METHODOLOGY MEASURES Geriatric Depression Scale-Short Form (GDS-15): A 15-item short scale version of the 30 item GDS, it consists of 10 items which indicate the presence of depression when answered positively and 5 items that indicate depression when answered negatively. Each item has a yes or no response format (1 = yes, 0 = no) and five items are reverse scored. A score on the GDS-15 which is less than 6 indicates no depression and scores greater than 6 indicate the presence of depressive symptoms. SAMPLE The sample for this study is made up of 7,573 older adults between the ages of 60 and 102. Data for the present study comes from the Survey on Health, Well-Being and Aging in Latin America and the Caribbean (SABE)(Pelaez, et al., 2004). The SABE study collected data during 1999 and 2000 with the primary purpose of examining health conditions and functional limitations of persons 60 and older in the countries of Argentina, Chile, Cuba, Mexico and Uruguay. The study was conducted in the official language of each country, which is Spanish. The sample came from a population over 60 years of age that resided in private households in each of the urban areas in the respective countries, Argentina (Buenos Aires), Cuba (Havana), Mexico (Mexico City), Chile (Santiago) and Uruguay (Montivedo). The sampling framework for the SABE came from national employment surveys, household surveys, national census and national electoral registries in the 38 respective countries. Data was collected through face to face interviews. All countries in the SABE adhered to the same data collection protocol whereby subjects were only interviewed if they demonstrated that they were cognitively sound. The universe of study was a population aged 60 and older who resided in private households occupied by permanent residents in urban areas of each country. A multistage clustered sample with stratification was employed in a three stage process. The plan for sampling in each country was the following: first the primary sampling unit (PSU) was established; the PSU is a cluster of independent households within predetermined geographic areas. PSU‘s were grouped into either geographic or socioeconomic strata. The sample distribution by geographic or socioeconomic strata was proportional to the size of the elderly population within each country. Secondly, the PSU‘s were then divided into secondary sampling units, (SSU) each containing a smaller number of independent households. These SSU‘s were comprised of tertiary sampling units (TSU) formed by interviewees in the selected households or by single individuals in those countries where only one person was selected out of each household. As such, the household or target individuals constitute the last layer of aggregation in the sample. The first stage in the sampling process led to sampling a predetermined number of PSU‘s which were each selected with probability proportional to the household distribution within each stratum. The second stage of sampling led to the selection of SSU‘s and the third stage of sampling consisted of the selection of households within each SSU. Finally, both secondary sampling units (SSU) and tertiary sampling units 39 (TSU) were selected with equal probabilities within each chosen primary sampling unit (PSU). There were also some country specific sampling design differences. The first is the stratification of the clusters within each country. In some countries stratification was conducted in terms of geography only while in others the strata were defined by geography and aggregate indicators of socioeconomic conditions. The second area of difference was oversampling. In three countries (Cuba, Uruguay and Chile) the samples included oversamples for people age 80 and above. In Cuba and Uruguay an individual in a household who was 80 or above was chosen with a probability of one. In Chile selection of a person among eligible household members was done randomly but if an individual aged 80 or above was present and not chosen by the random process, he/she was also interviewed. The third area of difference was with the secondary sampling unit. In three countries (Cuba, Argentina and Uruguay) only one target individual was selected per household. In Mexico all eligible individuals found in the household were interviewed. 40 The final sample observations broke down in the following way: Argentina (N = 1043), Chile (N = 1301), Cuba (N = 1905), Mexico (N = 1876) and Uruguay (N = 1450). These samples are proportional to the size of the elderly population within each country. The SABE (across all countries) was made up of 54.7 % female and 45.3% male participants. Forty-six percent of the female sample was married while 44% were widows and 10% never married. At the time of the study 71% of males surveyed were married, 21% were widowers and 8% had never married. The men in the sample tended to be younger, with 44% between the ages of 50 and 65 while 27% of women fell within this range. Women made up a larger portion of participants 66-85 years of age (66%) while men made up 52%. In subjects age 86-102, women represented 7% while men made up 4% of this grouping. The larger number of females at the upper end of the age grouping reflects earlier mortality among men. The average age across countries was 70. In the overall sample, 95% of participants had 12 years of education or less while 5% had more than 12 years of education. The average years of education across all countries are 7, Table 2 summarizes these statistics. 41 Table 2 Sample demographics by gender Variable Age Unmarried Married No formal education Elementary – middle school H.S. > H.S. 3308 (43.7%) 4267 (56.3%) 883 (11.8%) 4621 (61.0%) 1304 (17.5%) Women n (%) M (SD) 72 (8.26) 2301 (55.6%) 1838 (44.4%) 451 (11.1%) 2640 (63.8%) 659 (16.2%) 1006 (29.3%) 2428 (70.7%) 431 (12.8%) 1980 (57.7%) 645 (19.1%) 645 (8.7%) 328 (7.9%) 317 (9.4%) n (%) Total M (SD) 70 (9.01) 42 n (%) Men M (SD) 68 (9.31) ANALYSIS Prior to invariance testing exploratory factor analyses (EFA) will be run for each country individually. Because of the lack of consensus on the factorial structure of the GDS-15 and the absence of psychometric work on the instrument in Chile, Argentina, Cuba, Uruguay and Mexico, it is necessary to conduct EFA‘s to determine factor structures. In order to establish measurement invariance a nested sequence of increasingly restrictive CFA models (invariance hypotheses) are tested. These levels of invariance are referred to as configural, metric, scalar and strict invariance The sequence of nested invariance hypotheses are well established in the literature (Cheung & Rensvold, 2002; Reise, Widaman, & Pugh, 1993; Robert J. Vandenberg & Charles E. Lance, 2000). They are based on establishing a baseline model and additively testing hypotheses of metric, scalar, and strict invariance. 43 Multiple-Group CFA Analysis Procedures Configural invariance Configural invariance requires that the same number of factors and pattern of salient factor loadings be equivalent across groups. The baseline model tests the hypothesis of zero-loadings needed to specify a degree of simple structure. The principle of simple structure states that items comprising an instrument should exhibit the same configuration of salient and non-salient factor loadings across groups being compared (Beckstead, Yang, & Lengacher, 2008). The configural (baseline) model can be identified by giving the factor means and variances a scale for each group (men vs. women). This is done by fixing the mean of the factor(s) to zero and fixing a single factor loading to one. Note that there is more than one way to identify the baseline model. A researcher may choose to fix the mean and variance of a factor, or fix the intercept and loading of a reference item. Either approach to identifying the baseline model will have the same degrees of freedom and model fit, only the scaling of the parameters will differ (Bontempo, 2007; Reise, et al., 1993). Configural invariance is used as a baseline model, to which the decrements in fit associated with more constrained nested models are compared against. In other words, the configural model is the model with freely estimated parameters against which subsequent nested models with constrained parameters will be compared against. 44 Summary of configural model constraints: 1. The same indicators are specified in each group. 2. The variance of the factor(s) is fixed to one in the 1st group. 3. The mean of the factor(s) is fixed to zero in the 1st group. 4. The intercept and loading of a reference item is constrained to be equal across groups. 5. All other non-fixed parameters are freely estimated If configural invariance is obtained the next step is a test of metric invariance. Metric invariance Metric invariance implies that items are measured according to the same scale units, in that the factor loadings are equivalent across groups. When an items factorloading is non-invariant, the regression slope relating a score on the item to a score on the latent construct differs across groups. Metric invariance requires that equality constraints be placed on factor loadings across groups, while allowing the factor variances and covariances to be free. Note, for the metric of the factor to be identified, the factor variance or one of the factor loadings must be fixed to 1. If the factor loadings are found to be invariant, this does not mean that they are actually identical because the factor variances and covariances are allowed to vary across groups. Instead, invariant factor loadings in one group are said to be proportionally equivalent to corresponding loadings in the other 45 group (Bontempo & Hofer, 2007). ―Loadings standardized to the common-factor variance would each differ from the corresponding loading in another group by the same proportion—the ratio of the variance in each group‖ (pg.51) (Bontempo, 2007). The factor variances are freely estimated in all but the first group. This condition creates a test of proportionality when equality constraints are imposed on the loadings. Because the metric model is a more constrained model, the fit will be poorer than the configural model. The issue then becomes, ‗is the fit significantly worse‘; if not, metric invariance has been obtained. If metric invariance is not obtained this implies that the factor(s) or groups of items have different meanings across groups. Scalar invariance Scalar invariance requires equality constraints on corresponding factor loadings and item intercepts across groups. This level of invariance requires the fitting of mean and covariance structure models. When assessing configural and metric invariance, only the covariance structures are examined. The scalar invariance model is compared against the metric invariance model and any significant worsening of fit suggests that the hypothesis of equal item intercepts is not supported. What this says, is that group comparisons of observed and factor means, factor variances and covariances may not be defensible. Strict invariance Strict (error) invariance requires that constraints be imposed on unique variances, unique means and factor loadings. Strict invariance implies that the item reliabilities and 46 therefore scale reliabilities are the same across groups (Beckstead, et al., 2008). Bontempo (2007) states that the strict invariance model forces the combined specific and random error components of each variable to be equivalent across groups such that differences in variance across groups are permitted only at the latent variable level. Thus, if a strict invariance model fits the data well, a researcher can be confident that measurement comparisons across groups involving factor mean and factor covariance structures are valid (Bontempo, 2007). Binary Factorial Invariance The instrument used in this study is the GDS-15 which has a yes/no response format. Because of this, multiple-group CFA measurement models with binary indicators require a different parameterization which requires modifications to the aforementioned procedures (Jöreskog & Moustaki, 2001; Roger E. Millsap & Yun-Tein, 2004; B. Muthén, & Asparouhov, T., 2002). Each item on the measure is connected to its respective construct through a latent continuous response variable. This variable is cut by m-1 threshold parameters (where ―m‖ is the number of response options) which produce observed response frequencies. Analyses are then based on a matrix of tetrachoric correlations. The latent response variables require additional scaling factors in order to assess group differences in the common factor mean and variance. To identify the model the following steps must be taken 1. The intercept parameters for all latent response variables must be fixed to zero in the first group. 47 2. Uniqueness variances need to be fixed to one in the first group. 3. The test of strict invariance will require the constraint of fixing uniqueness parameters in both groups. As with any standard multiple-group confirmatory factor analysis, additional constraints are necessary in order to place the common-factor mean and variance on the same metric across groups. There are two approaches presented for setting up these additional constraints (1) the Millsapp and Tein (2004) approach and (2) the Muthen and Asparouhov (2002) approach. The Millsap and Tein approach requires that 1. The first m-1 thresholds be constrained across all groups. 2. A second threshold or uniqueness (in the case of binary items, there would be no 2nd threshold) be constrained for one reference item in each group. The Muthen and Asparouhov approach requires that thresholdsand loadings are constrained in a reduced model and that tests of selected items are conducted against a full model where thresholds and loadings for these items are freed while maintaining model identification through fixing the specific-variance to unity for the selected items (B. Muthén, & Asparouhov, T., 2002). MIMIC models (multiple indicator multiple causes) are suggested as a means of selecting items to be tested, because MIMIC models are sensitive to threshold invariance; modification indices produced by Mplus can also be used, tables 3 through 6 present the sequence of constraints used in the present study. 48 Table 3 Configural invariance constraints Parameter Name Constraints Reference Group Loadings 1  15 Free Thresholds  1   15 Free Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Fixed to 1 for factor 1 & factor 2 Focal Group Loadings 1  15 Free Thresholds  1   15 Free Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Fixed to 1 for factor 1 & factor 2 (1)- (15) refer to item 1 through item 15 49 Table 4 Metric invariance constraints Parameter Name Constraints Reference Group Loadings 1  15 Held equal Thresholds  1   15 Free Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Fixed to 1 for factor 1 & factor 2 Focal Group Loadings 1  15 Held equal Thresholds  1   15 Free Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Free for factor 1 & factor 2 (1)- (15) refer to item 1 through item 15 50 Table 5 Scalar invariance constraints Parameter Name Constraints Reference Group Loadings 1  15 Held equal Thresholds  1   15 Held equal Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Fixed to 1 for factor 1 & factor 2 Focal Group Loadings 1  15 Held equal Thresholds  1   15 Held equal Residuals  1   15 Fixed to 1 Factor mean   Free for factor 1 & factor 2 Factor variance   Free for factor 1 & factor 2 (1)- (15) refer to item 1 through item 15 51 Table 6 Residual invariance constraints Parameter Name Constraints Reference Group Loadings 1  15 Held equal Thresholds  1   15 Held equal Residuals  1   15 Fixed to 1 Factor mean   Fixed to 0 for factor 1 & factor 2 Factor variance   Fixed to 1 for factor 1 & factor 2 Focal Group Loadings 1  15 Held equal Thresholds  1   15 Held equal Residuals  1   15 Free Factor mean   Free for factor 1 & factor 2 Factor variance   Free for factor 1 & factor 2 (1)- (15) refer to item 1 through item 15 52 Assessing model fit The Mplus program version 5.2, will be used to fit models to a matrix of tetrachoric correlations, using robust weighted least squares (RWLS) estimation (this is the weighted least squares mean and variance adjusted estimator (Flora & Curran, 2004; L. Muthén & B. O. Muthén, 2008). To assess the overall fit of a model to the data global fit indices such as CFI (comparative fit index), TLI (Tucker-Lewis index) chi-square goodness of fit test and RMSEA (root mean square error of approximation) will be used. The comparative fit index (CFI) (Bentler, 1990) is a revised version of the Bentler and Bonnet normed fit index (Bentler & Bonnet, 1980) which adjusts for degrees of freedom and ranges in value of 0.00 to 1.00. The comparative fit index (CFI) compares the model with the baseline model. Hu and Bentler (Hu & Bentler, 1999) suggest that CFI be .95 or greater for good fit, although others have suggested that .90 is also considered to be acceptable fit (Byrne & Campbell, 1999). The Tucker-Lewis index (TLI) indicates where a model lies on a continuum between a baseline model with unrelated observed variables and an ideal model that fits the data perfectly. A value of .95 or greater is also considered good fit for the TLI index. The root-mean-square-error-of approximation or RMSEA is an index of discrepancy between the model and the data per degree of freedom (P. J. Brown, et al., 2007). A value less than .06 suggests close fit between the model and the data (Browne & Cudeck, 1993; Hu & Bentler, 1999). The hypothesis of the chi-square goodness of fit test is that the model with the specified number of factors holds. With large samples, almost all models are rejected and with smaller samples model fit may go undetected. Because 53 of the chi-squares sensitivity to large sample sizes the aforementioned fit indices will be relied upon for this study. 54 IRTLR Analysis Procedure DIF analyses with the IRTLR method (Thissen, 2001) consists of two parts: (1) anchor purification and (2) DIF detection. The presence of possible biased items can lead to inaccurate estimation of depression (ability), which in turn would contaminate the DIF investigation. As such prior to a DIF analysis a subset of DIF-free items should be identified and used to link the two groups being evaluated in the analysis. At large testing companies these anchor items are selected from a pool of established unbiased items (R.E. Millsap & Everson, 1993). When prior anchor items are not available items can be prescreened using procedures such as the Mantel-Haenszel test, logistic regression or MIMIC models. Within IRTLR anchor items are identified through an iterative procedure. This process involves testing every item for DIF as a first step while treating all other items as anchor items. In other words if there were fifteen items on a depression instrument, item 1 would be evaluated for DIF while items 2 through 15 would be considered DIF free ―temporarily‖ (temporary anchor set) in order to link items for both groups (women vs. men). As mentioned before IRTLR involves the estimation of two hierarchically nested item response models which generate relevant  2 difference tests for DIF detection. As such during the first iteration in which all items are considered study items, IRTLR generates several nested model comparisons at least one for each item. For each item a model with all parameter estimates constrained to be equal for the reference and for the 55 focal groups is first compared with a model in which the parameters for the studied item are free to be estimated separately for the 2 groups. For items modeled using the 2PL, the model comparison will have 2 df, one for each parameter constrained in the first model and freed in the second model. If the resulting model comparison has a  2 value greater than or equal to 3.84 (which indicates that at least one parameter might differ between groups at a nominal   0.05 ), the item is classified as displaying DIF in one or more of its parameters, and the individual parameters would then be evaluated (Edelen, Thissen, Teresi, Kleinman, & OcepekWelikson, 2006). Once all items on the instrument have been tested once using all other items as temporary anchor items, the items with potential DIF are removed and the process is repeated until no items are classified as potentially exhibiting DIF. This final set of items is then used as an anchor item set. With anchor items established, the studied items are now retested for DIF relative to the now-specified anchor. If the model comparison test indicates that at least one of the study item‘s parameters might differ between groups at a nominal   0.05 , IRTLR then generates parameter specific model comparison tests so that the source of the DIF can be identified. The  2 values associated with these model comparison tests are generated first for the c-parameter if a 3PL is used, then for the a-parameter if the 2PL or graded response model is used and finally for the b-parameter. For each test the IRTLR lists the item parameter estimates associated with the less constrained model for the reference and 56 for the focal group and provides the focal group overall mean and standard deviation (relative to a N [0,1] distribution for the reference group) (Edelen, et al., 2006). After item parameters with significant DIF have been identified, parameters for a final 2-group model that incorporates the identified DIF can be specified and estimated using MULTILOG (Edelen, et al., 2006; Jeanne A. Teresi, 2006). MULTILOG software generates item parameter estimates and standard errors, as well as summary statistics, item information, reliability estimates and the focal group overall mean and standard deviation (relative to a N [0,1] distribution for the reference group) (Jeanne A. Teresi, 2006; Thissen, W-H., & Bock, 2003). The results of this calibration can be used to interpret the DIF and to assess its impact at the item and scale levels. The IRTLR procedure will be used to assess DIF within each of five countries by gender and between countries. Benjamini-Hochberg Procedure Due to the multiple comparisons associated with the IRTLR procedure, the Benjamini-Hochberg procedure will be used to control the false discovery rate (type 1 error)(Thissen, Steinberg, & Kuang, 2002) . The Benjamini-Hochberg procedure has been used in the reporting of results from the National Assessment of Educational Progress (NAEP) (Braswell, et al., 2001) as well as other research contexts (Edelen, et al., 2006; Steinberg, 2001; Thissen, Steinberg, & Wainer, 1988; Thissen, et al., 1993) such as DIF analyses. The B-H procedure is a sequential approach which has greater power than the Bonferroni adjustment and an easier to implement. 57 As explained by (Steinberg, 2001) observed chi- square p-values are ranked from largest to smallest. The ranks (1 to 15) are used to adjust the critical p-values for statistical inference, according to the following formula: Equation 22  rank  of  observed  p  value     number  of  comparisons     level  of  significance A level of .05 will be used for all comparisons; this procedure controls the false positive rate, so that no more than 5% of the results marked significant for DIF may be in the wrong direction. The total number of comparisons is 15 (15 items) for generating the critical p-values used for each statistical test. For a more detailed treatment of the procedure see (Thissen, et al., 2002) & (Benjamini & Hochberg, 1995). 58 CHAPTER 4: RESULTS INTRODUCTION In this section, findings of the measurement invariance analyses performed on the Geriatric Depression Scale Short Form (GDS-15) are described in detail. The overall goal of this study was to compare two psychometric techniques (1) multiple group confirmatory factor analysis for binary indicators and (2) item response theory likelihood ratio tests, in their ability to evaluate measurement invariance across gender and country of origin. The results are sequenced as follows: (1) descriptive statistics with country gender group comparisons, (2) multiple group confirmatory factor analysis and (3) item response theory likelihood ratio test analyses. Descriptive Statistics: Country-Gender Comparisons The GDS-15 is a 15 item yes/no response format depression screening instrument with scores that can range from 0 to 15. The scoring protocol states that individuals with scores less than 6, no active depressive symptomatology, 6 and above further screening with a medical professional is needed. Across all five countries the average depression score fell below a score of six, which would indicate no active depressive symptomatology see Table 7. Within all countries and across all countries, the mean score on the GDS-15 was higher for women; additional demographics are presented in Table 8. Across all five countries the depression scores were positively skewed, which based on the cutoff value for depression being greater than 6, indicates that a large 59 portion of each sample would not be classified as having depression and only a small group of people would be classified as exhibiting depressive symptoms. GDS-15 total test scores had a skewness of 1.565 in Argentina, 1.403 in Cuba, 1.178 in Chile, 1.345 in Mexico and 1.649 in Uruguay. The skewness of the data was addressed via the Mplus program and the robust weighted least squares (RWLS) estimation procedure. Research by (Flora & Curran, 2004) has shown that the (RWLS) is a ―theoretically appropriate estimation procedure for binary response items and it produces accurate test statistics, parameter estimates and standard errors under both normal and non-normal latent response distributions across all sample sizes and model complexities‖ (p.489) (P. J. Brown, et al., 2007; Flora & Curran, 2004). 60 Table 7 Mean GDS-15 scores Women Men Country Argentina M 2.53 SD 2.9 M 2.29 SD 2.72 Mexico 2.94 3.21 2.35 2.63 Chile 3.96 3.58 3.24 3.10 Cuba 3.27 3.41 1.81 2.41 Uruguay 2.70 3.08 2.02 2.55 Across all countries 3.11 3.30 2.27 2.69 Table 8 Demographic characteristics of countries Argentina Mexico Uruguay Chile Cuba N 1043 1876 1450 1301 1905 Age 71 (7.2) 65 (9.8) 71 (7.3) 72 (8.0) 72 (8.9) 61 Female 63% 73% 63% 66% 63% Male 37% 27% 37% 34% 37% Multiple-Group Confirmatory Factor Analysis Preliminary exploratory factor analyses The first stage of these analyses focused on establishing what the underlying factor structure of the GDS-15 was by fitting exploratory factor analysis models across all five countries (Argentina, Mexico, Cuba, Uruguay and Chile) separately. The Mplus program (Version 5.2) (L. Muthén & B. O. Muthén, 2008), was used to fit models to a matrix of tetrachoric correlations, using robust weighted least squares estimation (RWLS) (Flora & Curran, 2004). An initial EFA analysis was necessary because to date there have only been four factor analytical studies of the GDS-15 (P. J. Brown, et al., 2007; Friedman, et al., 2005; Incalzi, Cesari, Pedone, & Carbonin, 2003; Mitchell, et al., 1993) (only one used multiple group analysis, Brown & Woods), with two studies identifying a 2 factor model and 2 studies identifying a 3 factor model. Brown and colleagues replicated two of the 3 factors that Mitchell and colleagues found initially. With such discordant reporting of the factor structure across studies an EFA was a necessary first step. One, two and three factors were extracted for the GDS-15 within each country, eigenvalues were (Argentina: 7.53, 1.88, 0.90, Mexico: 7.87, 1.42, 0.89, Cuba: 8.57, 1.05, 0.93, Chile: 8.11, 1.01, 0.95, and Uruguay: 8.06, 1.19, 0.93 respectively. Representative scree-plots for each country are presented in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5. Factor loading patterns are presented in Table 9, Table 10, Table 11, 12 and 13. A third factor was also extracted, however, the 1 and 2 factors were selected for further 62 evaluation because they provided the most parsimonious, substantively meaningful solution and because each factor was well represented (more than 3 items loaded on each factor). Across countries either no items loaded on a third factor or there were less than three items loading on the third factor. Examination of the screeplots by country indicated that there was 1 dominant factor and possibly a second factor across countries. To rule out the existence of a one dimensional construct, a one-factor CFA model was estimated for all countries (N=7575) simultaneously and fit was assessed. The one-factor CFA fit the data poorly, with a χ2 (328) = 2167.856, p<.00001, RMSEA = .062, CFI=.935 and TLI=.968. Based on the results of the exploratory factor analyses and the 1-factor total group CFA, separate CFA‘s were run within each country to confirm a final factor structure. A summary of the results of individual 1 and 2 factor CFA‘s within each country can be found in Table 14 and Table 15. These results indicate that for the countries of Argentina, Mexico and Uruguay a one- factor model fit poorly; however for the countries of Chile and Cuba, the one- factor model fit the data adequately. There was an increase in overall global model fit, between the one- factor CFA and the two- factor CFA for the countries of Argentina, Mexico and Uruguay. Based on these results, for the countries of Chile and Cuba a one-factor structure was qualitatively defined as general depressive affect and in the countries of Argentina, Mexico and Uruguay a two-factor structure was qualitatively defined as life satisfaction (v1, v5, v7, v11 and v13) and general depressive affect (v2-v4, v6, v8-v10, v12 and v14-v15). This two-factor pattern of loadings replicates the results of Brown (2007) and two of the three original factors that Mitchell 63 and colleagues found (Mitchell, et al., 1993) and is defined in the same way. After establishing the underlying factor structure, within country invariance analyses by gender was evaluated. 64 Invariance Models by Gender within Country The following section presents the results of the measurement invariance hypotheses with respect to the equivalence of factor loadings (metric invariance), thresholds (scalar invariance), and uniqueness‘ (strict invariance) by gender within the countries of Chile, Cuba, Argentina, Mexico and Uruguay. Chile The extent to which an item factor model measuring geriatric depression (with 15 observed items) exhibited measurement invariance between women and men in the country of Chile was examined using Mplus v. 5.2 (L. Muthén & B. O. Muthén, 2008). WLSMV estimation including a probit link and the THETA parameterization was used to estimate all models (L. Muthén & B. Muthén, 2008). Missing data was handled by maximum likelihood estimation assuming missing at random. WLSMV provides weighted least squares parameter estimates using a diagonal weighted matrix with standard errors and mean- and- variance adjusted chi-squared test statistic that use a full weight matrix (B. Muthén, du Toit, & Spisic, 1997). For nested models the difference between the fit functions is not distributed as chi-square and the degrees of freedom is not the simple difference in free parameters. However, an appropriate chi-square statistic and degrees of freedom are calculated according to formulas in the MPlus Technical Appendices (www.statmodel.com).Thus, model fit statistics describe the fit of the item factor model to the polychoric correlation matrix among the items for each group. 65 Nested model comparisons were conducted using the Mplus chi-square DIFFTEST procedure. Model fit was evaluated with relative fit indices CFI, TLI, and RMSEA. For the CFI and TLI indices values above .95 indicate a good fit. For the RMSEA, a value less than .06 is considered to indicate good fit. For the Chilean gender model global fit indices were adequate and ranged from a CFI of .966 to .983, TLI of .983 to .990 and RMSEA of .037 to .047, all global fit indices are summarized in Table 16. Invariance hypotheses tests Configural A configural invariance model was initially specified in which one factor was estimated simultaneously for Chilean women and men. The factor variance was fixed to 1 and the factor mean was fixed to 0 in each group for identification, such that all item factor loadings and thresholds (1 per item given a binary response option) were then estimated. The residual variances are not uniquely identified in the configural invariance model and as such were all constrained to 1 in both groups. As shown in  2 119   287 .15, p .0000 Table 10, the configural invariance model fit adequately across groups, , RMSEA = .047, CFI = .966, TLI = .983. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between Chilean men and Chilean women, with women as the reference group and men as the focal group. 66 Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The factor variance was fixed to 1 in women for identification but was freely estimated in men; the factor mean was fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did not result in a decrement in fit, DIFFTEST (12) = 4.640, p = .9689. Modification indices did not suggest any points of localized misfit for the constrained loadings. The fact that the metric invariance hypothesis was supported indicates that the items were related to the latent factor equivalently across groups, or more simply, that the same latent factor was being measured in each group. Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in women for identification, but the factor variance and mean were then estimated for men. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model did not fit significantly worse than the metric invariance model, DIFFTEST (13) = 18.922, p < .1256. 67 The fact that scalar invariance (i.e., ―strong invariance‖) held indicates that all items have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model comparison at this step proceeded backwards, such that a model with all residual variances freely estimated in the men was fitted first, and then compared with a model in which the residual variances for the invariant items (v1-v15) were fixed to 1 in the men. The residual variances in the women were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described for the last scalar invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal to the women) did not fit significantly worse than the model with those residual variances freed, DIFFTEST (13) = 6.670, p = .9183, indicating that residual variance invariance held for all items. Residual variance invariance (i.e., ―strict invariance‖) being supported indicates that the amount of item variance not accounted for by the factor was the same across Chilean men and women for all items. 68 Summary for Chile In conclusion, these analyses showed that full measurement invariance was obtained across Chilean men and women – that is, the relationships of the items to the latent factor of general depressive affect was equivalent between men and women. In addition, full scalar invariance held which indicates that all items have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. Finally, full residual variance invariance (i.e., ―strict invariance‖) being supported indicated that the amount of item variance not accounted for by the factor was the same across groups for all items. Invariance hypotheses tests are summarized in Table 17. . 69 Cuba For the Cuban gender model, global fit indices were adequate and ranged from a CFI of .966 to .981, TLI of .981 to .987 and an RMSEA of .041 to .049, all global fit indices are summarized in Table 18. Invariance hypotheses tests Configural A configural invariance model with a one-factor structure (general depressive affect) was estimated simultaneously for Cuban women and men. The previous protocol for model identification was followed for these analyses. As shown in Table 18, the    348.55, p .0000 , RMSEA configural invariance model had adequate fit  2 113 = .049, CFI = .966, TLI = .981. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between Cuban men and women, with women as the reference group and men as the focal group. Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did not result in a decrement in fit, DIFFTEST (12) = 11.06, p = .5236. Support for the metric invariance hypothesis 70 indicates that the items were related to the latent factor equivalently across groups, or more simply, that the same latent factor was being measured in each group. Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in women for identification, but the factor variance and mean were then estimated for men. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model did not fit significantly worse than the metric invariance model, DIFFTEST (13) = 21.95, p < .0561. The fact that scalar invariance (i.e., ―strong invariance‖) held indicates that all items have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal to the women) did not fit significantly worse than the model with those residual variances freed, DIFFTEST (13) = 14.60, p = .3326, indicating that residual variance invariance held for all items. 71 Residual variance invariance (i.e., ―strict invariance‖) being supported indicates that the amount of item variance not accounted for by the factor was the same across groups in all items. Summary for Cuba In conclusion, these analyses indicate that full measurement invariance was obtained across Cuban men and women – that is, the relationships of the items to the latent factor of general depressive affect was equivalent between men and women. In addition, full scalar invariance held which indicates that all items have the same expected response for each item threshold at the same absolute level of the trait. Finally, full residual variance invariance (i.e., ―strict invariance‖) being supported indicated that the amount of item variance not accounted for by the factor was the same across groups for all items. Invariance hypotheses tests are summarized in Table 19. 72 Argentina For the Argentinean gender model, global fit indices were adequate and ranged from a CFI of .949 to .962, TLI .972 to .978 and RMSEA .046 to .052. All global fit indices are summarized in Table 20. Invariance hypotheses tests Configural A configural invariance model with a two-factor structure (Life Satisfaction and General Depressive Affect) was estimated simultaneously for Argentinean women and men. As shown in Table 20, the configural invariance model had marginal fit  292  217.88, p .0000, RMSEA = .052, CFI = .949, TLI = .972. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between Argentinean men and women, with women as the reference group and men as the focal group. Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The metric invariance model did not result in a decrement in fit, DIFFTEST (10) = 10.24, p = .5245. Modification indices did not suggest any points of localized misfit for the constrained loadings. The fact that the metric invariance hypothesis was supported indicates that the items were related to the 73 latent factor equivalently across groups, or more simply, that the same latent factor was being measured in each group. Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in women for identification, but the factor variance and mean were then estimated for men. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the metric invariance model, DIFFTEST (12) = 25.700, p < .001. The modification indices suggested that the threshold of item 9 (―do you prefer to stay at home rather than going out and doing new things?‖) was the largest source of misfit and should be freed. After doing so, the partial scalar invariance model did not fit significantly worse than the full metric invariance model, DIFFTEST (11) = 25.7, P<.2799. Support for partial scalar invariance indicates that items 1-8 and 10-15 have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. 74 Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal to the women) did not fit significantly worse than the model with those residual variances freed, DIFFTEST (12) = 15.478, p = .2163, indicating that residual variance invariance held for items 1-8 and 10-15. Partial residual variance invariance (i.e., ―strict invariance‖) being supported indicates that the amount of item variance not accounted for by the factor was the same across groups in items 1-8 and 10-15. The residual variance for item 9 was assumed noninvariant because of lack of threshold/scalar invariance found previously, and was not tested. Summary for Argentina In conclusion, these analyses indicate that partial measurement invariance was obtained across Argentinean men and women – that is, full metric invariance was obtained, meaning that the relationships of the items to the latent factor of general depressive affect were equivalent between men and women. Partial scalar invariance was obtained for items 1-8 and 10-15 indicating that these items had the same expected response for each item threshold at the same absolute level of the trait across Argentinean men and women. Finally, partial strict invariance was obtained for items 1-8 and 10-15, indicating that the amount of item variance accounted for by the factor was the same across groups. 75 Based on the lack of full scalar invariance, the observed values for item 9 will differ between men and women in Argentina at a given level of the latent factors. Invariance hypotheses tests are summarized in Table 21. . 76 Mexico For the Mexican gender model, global fit indices were adequate and ranged from a CFI of.969 to .980, TLI .982 to .986 and RMSEA .038 to .043; these results are summarized in Table 22. Invariance hypotheses tests Configural A configural invariance model with a two-factor structure (Life Satisfaction and General Depressive Affect) was estimated simultaneously for Mexican women and men. As shown in Table 16, the configural invariance model had adequate fit  2 108  295 .67 , p .0000 , RMSEA = .043, CFI = .969, TLI = .982. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between Mexican men and women, with women as the reference group and men as the focal group. Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The metric invariance model did not result in a decrement in fit, DIFFTEST (11) = 13.427, p = .2663. The fact that the metric invariance hypothesis was supported indicates that the items were related to the latent factor equivalently across groups, or more simply, that the same latent factor was being measured in each group. 77 Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in women for identification, but the factor variance and mean were then estimated for men. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the metric invariance model, DIFFTEST (12) = 37.123, p < .0002. The modification indices suggested that the threshold of item 8 (―do you often feel helpless?‖) was the largest source of misfit and should be freed. After doing so, the partial scalar invariance model still had significantly worse fit than the full metric invariance model, DIFFTEST (11) = 19.978, P<.0456. The modification indices suggested that the threshold of item 6 (―are you afraid that something bad is going to happen to you?‖) was the largest remaining source of misfit and should be freed. After doing so, the new partial scalar invariance model (with thresholds for items 8 and 6 freed) did not fit significantly worse than the full metric invariance model, DIFFTEST (10) = 12.809, p = .2346. Support for partial scalar invariance indicates that items 1-5, 7 and 9-15 have the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. 78 Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal to the women) did not fit significantly worse than the model with those residual variances freed, DIFFTEST (10) = 15.106, p = .1282, indicating that residual variance invariance held for items 1-5, 7 and 9-15. Partial residual variance invariance (i.e., ―strict invariance‖) being supported indicates that the amount of item variance not accounted for by the factor was the same across groups in items 1-5, 7 and 9-15. The residual variance for items 8 and 6 was assumed non-invariant because of lack of threshold/scalar invariance found previously, and was not tested. Summary for Mexico In conclusion, these analyses indicate that partial measurement invariance was obtained across Mexican men and women – that is, full metric invariance was obtained, meaning that the relationships of the items to the latent factors of life satisfaction and general depressive affect were equivalent between men and women. Partial scalar invariance was obtained for items 1-5, 7 and 9-15 indicating that these items had the same expected response for each item threshold at the same absolute level of the trait across Mexican men and women. Finally, partial strict invariance was obtained for items 1-5, 7 and 9-15; indicating that the amount of item variance not accounted for by the factor was the same across groups. 79 Based on the lack of full scalar invariance, the observed values for items 6 and 8 will differ between men and women in Mexico at a given level of the latent factors. Invariance hypotheses tests are summarized in Table 23. . 80 Uruguay For the Uruguayan gender model, global fit indices were adequate and ranged from a CFI of .968 to .978, TLI of .983 to .987 and an RMSEA of .039 to .043, global fit indices are summarized in Table 24. Invariance hypotheses tests Configural A configural invariance model with a two-factor structure (Life Satisfaction and General Depressive Affect) was estimated simultaneously for Uruguayan women and men. As shown in Table 24, the configural invariance model had adequate fit  2 113  262 .91, p .0000 , RMSEA = .043, CFI = .968, TLI = .983. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between Uruguayan men and women, with women as the reference group. Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The metric invariance model did not result in a decrement in fit, DIFFTEST (11) = 14.469, p = .2081. The fact that the metric invariance hypothesis was supported indicates that the items were related to the latent factor equivalently across groups, or more simply, that the same latent factor was being measured in each group. 81 Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in women for identification, but the factor variance and mean were then estimated for men. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the metric invariance model, DIFFTEST (12) = 22.296, p = .0343. The modification indices suggested that the threshold of item 15 (―do you think that most people are better off than you?‖) was the largest source of misfit and should be freed. After doing so, the new partial scalar invariance model did not fit significantly worse than the full metric invariance model, DIFFTEST (11) = 15.647, P<.1547. Support for partial scalar invariance indicates that items 1-14 have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to factor mean differences only. 82 Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal to the women) did not fit significantly worse than the model with those residual variances freed, DIFFTEST (12) = 14.503, p = .2698, indicating that residual variance invariance held for items 1-14. Partial residual variance invariance (i.e., ―strict invariance‖) being supported indicates that the amount of item variance not accounted for by the factor was the same across groups for items 1-14. The residual variance for item 15 was assumed noninvariant because of lack of threshold/scalar invariance found previously, and was not tested. Summary for Uruguay In conclusion, these analyses indicate that partial measurement invariance was obtained across Uruguayan men and women – that is, full metric invariance was obtained, meaning that the relationships of the items to the latent factors of life satisfaction and general depressive affect were equivalent between men and women. Partial scalar invariance was obtained for items 1-14 indicating that these items had the same expected response for each item threshold at the same absolute level of the trait across Uruguayan men and women. Finally, partial strict invariance was obtained for items 1-14, indicating that the amount of item variance not accounted for by the factor was the same across groups. 83 Based on the lack of full scalar invariance, the observed values of item15 will differ between men and women in Uruguay at a given level of the latent factors. Invariance hypotheses tests are summarized in Table 25. . 84 Invariance Models by Cross-Country Comparisons The following section presents the results of the measurement invariance testing with respect to the equivalence of factor loadings ‗metric invariance‘, thresholds „scalar invariance‟, and uniqueness‘ „strict invariance‟ by cross-country comparisons between, Chile and Cuba, Mexico and Uruguay, Mexico and Argentina and Argentina and Uruguay. The cross-country comparisons are based on factor structures within each country, for example in the countries of Chile and Cuba there is a one-factor structure and in Mexico, Cuba and Uruguay there is a two-factor structure with the same pattern of loadings. Chile by Cuba For the Chile by Cuba model, global fit indices were adequate and ranged from a CFI of .963 to .978, TLI of .982 to .988 and RMSEA of .042 to .05. These results are summarized in Table 26. Invariance hypotheses tests Configural A configural invariance model was initially specified in which a single factor (General Depressive Affect) was estimated simultaneously for the countries of Chile and Cuba. The factor variance was fixed to 1 and the factor mean was fixed to 0 in each group for identification, such that all item factor loadings and thresholds (1 per item given a binary response option) were then estimated. 85 The residual variances are not uniquely identified in the configural invariance model and as such were all constrained to 1 in both groups. As shown in Table 26, the configural invariance model had adequate fit across countries,  2 139  637 .50, p .0000 , RMSEA = .049, CFI = .963, TLI = .983. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between the countries of Chile and Cuba, with Chile as the reference group and Cuba as the focal group. Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The factor variance was fixed to 1 in Chile for identification but was freely estimated in Cuba; the factor mean was fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did result in a decrement in fit, DIFFTEST (12) = 31.31, p = .0018. Modification indices suggested that the lack of a correlation between the errors of item 5 and 7 was the largest source of misfit, so a correlation was added. After doing so the partial metric invariance model did not fit significantly worse than the full configural invariance model, DIFFTEST (11) 17.49, p=.0941. Support for partial metric invariance indicates that items 1-4, 6, and 8-15 were related to the latent factor equivalently across countries (with the exception of items 5 and 7). 86 Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in Chile for identification, but the factor variance and mean was then estimated for Cuba. All factor loadings and item thresholds were constrained equally across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the partial metric invariance model, DIFFTEST (13) = 232.88, p < .0000. Modification indices suggested that the thresholds of item 1 and item 15 were the largest source of misfit and should be freed. After doing so, the partial scalar invariance model still had significantly worse fit than the partial metric invariance model, DIFFTEST (11) = 98.76, p <.0000. The modification indices then suggested that the thresholds of item 5 and 7 were the largest source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds of items 1, 5, 7 and 15 freed) still had significantly worse fit than the partial metric invariance model, DIFFTEST (19) = 19.05, p=.0248. The modification indices then suggested that the threshold of item 9 was the largest source of misfit and should be freed. After doing, so the new partial scalar invariance model (with the thresholds of 1, 5, 7, 9 and 15 freed) did not fit significantly worse than the partial metric invariance model, DIFFTEST (8) =15.15, p=.0562. Support for partial scalar invariance indicates that items 2-4, 6, 8 and 10-14 have the same expected response for each item threshold at the same absolute level of the trait, 87 or more simply, that the observed differences in the proportion of responses in each category for those items was due to the factor mean differences only. Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model comparison at this step proceeded backwards, such that a model with all residual variances freely estimated in the country of Cuba was fitted first, and then compared with a model in which the residual variances for the invariant items (2-4, 6, 8 and 10-14) were fixed to 1 in Cuba. The residual variances in the country of Chile were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described for the last scalar invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal in Cuba) fit significantly worse than the model with those residual variances freed, DIFFTEST (9) = 22.92, p =0.0064. The modification indices suggested that the residual variance for items 2 and 4 were the largest source of remaining misfit and should be freed. After doing so, the new fixed partial residual invariance model did not fit significantly worse than the freed residual invariance model DIFFTEST (7) = 12.22, p = 0.0935. Support for partial residual variance invariance (i.e., ―strict invariance‖) indicates that the amount of item variance not accounted for by the factor was the same across groups in items 3, 6, 8 and 10-14; the residual variance for items 1, 5, 7, 9 and 15 88 were assumed non-invariant because of lack of threshold invariance found previously, and were not tested. Summary for Chile by Cuba In conclusion, these analyses indicate that partial measurement invariance was obtained across Chile and Cuba – that is, the relationships of items 1-4, 6 and 8-15 were related to the latent factor of general depressive affect equivalently between the countries of Chile and Cuba or that the same latent factor was being measured in each group. In addition, for items 2-4, 6, 8 and 10-14 the observed differences in the proportion of responses in each category for these items was due to differences in the factor mean only. Finally, the amount of item variance not accounted for by the factor was the same across groups for items 3, 6, 8 and 10-14. The lack of full metric and scalar invariance means that the observed values of items 1, 5, 7, 9 and 15 will differ between older adults in Chile and Cuba at a given level of the latent factor. Invariance hypotheses tests are summarized in Table 27. 89 Mexico by Uruguay For the Mexico by Uruguay model, global fit indices were adequate and ranged from a CFI of .966 to .978, TLI of .983 to .987 and RMSEA of .037 to .044. These results are summarized in Table 28. Invariance hypotheses tests Configural A configural invariance model was initially specified in which two factors (Life Satisfaction and General Depressive Affect) were estimated simultaneously for the countries of Mexico and Uruguay. The factor variance was fixed to 1 and the factor mean was fixed to 0 in each group for identification, such that all item factor loadings and thresholds (1 per item given a binary response option) were then estimated. The residual variances are not uniquely identified in the configural invariance model and as such were all constrained to 1 in both groups. As shown in Table 28, the   configural invariance model fit well across groups,  2 132  521 .54 , p .0000 RMSEA = .044, CFI = .966, TLI = .983. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between the countries of Mexico and Uruguay, with Uruguay as the reference group and Mexico as the focal group. 90 Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The factor variance was fixed to 1 in Uruguay for identification but was freely estimated in Mexico; the factor mean was fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did result in a decrement in fit, DIFFTEST (12) = 29.88, p = .0029. Modification indices suggested that adding a correlation between the errors for item 1 and item 3 would improve fit. After doing so the partial metric invariance model still had significantly worse fit than the full configural invariance model, DIFFTEST (11) 19.87, p=.0471. The modification indices suggested that adding a correlation between the errors for item 10 and item 15 would improve fit. After doing so the new partial metric invariance model (with error correlations for items 1, 3, 10 and 15) did not have significantly worse fit than the configural invariance model DIFFTEST (10) 15.16, p=.1261. Support for partial metric invariance indicates that items 2-9 and 11-14 were related to the latent factor equivalently across countries (with the exception of items 1, 3, 10 and 15). 91 Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in Uruguay for identification, but the factor variance and mean was then estimated for Mexico. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the partial metric invariance model, DIFFTEST (12) = 93.70, p < .0001. Modification indices suggested that the thresholds of items 9 and 10 were the largest source of misfit and should be freed. After doing so, the partial scalar invariance model still had significantly worse fit than the partial metric invariance model, DIFFTEST (10) = 48.608, p <.0001. The modification indices then suggested that the thresholds of items 2 and 4 were the largest source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds of items 9, 10, 2 and 4 freed) still fit significantly worse than the partial metric invariance model, DIFFTEST (9) = 18.46, p=.0301. The modification indices then suggested that the threshold for item 15 was the largest remaining source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds for items 9, 10, 2, 4 and 15 freed) did not fit significantly worse than the partial metric invariance model, DIFFTEST (8) = 12.88, p=.1158. 92 Support for partial scalar invariance indicates that items 1, 3, 5-8, and 11-14 have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to the factor mean differences only. Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model comparison at this step proceeded backwards, such that a model with all residual variances freely estimated in the country of Mexico was fitted first, and then compared with a model in which the residual variances for the invariant items (1, 3, 5-8, and 11-14) were fixed to 1 in Mexico. The residual variances in the country of Uruguay were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described for the last scalar invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal in Uruguay) fit significantly worse than the model with those residual variances freed, DIFFTEST (8) = 16.45, p =0.0363. The modification indices suggested that relaxing the constraints on the residual variance for item 3 would improve fit. After doing so, the new fixed partial residual invariance model did not fit significantly worse than the freed residual invariance model DIFFTEST (7) = 11.487, p = 0.1187. Support for partial residual variance invariance (i.e., ―strict invariance‖) indicates that the amount of item variance not accounted for by the factor was the same across groups in items 1, 5-8, and 11-14); the residual variance for items 2, 93 4, 9, 10 and 15 were assumed non-invariant because of lack of threshold invariance found previously, and were not tested. Summary for Mexico by Uruguay In conclusion, these analyses indicate that partial measurement invariance was obtained across Mexico and Uruguay – that is, the relationships of items 2, 4, 9 and 11-14 (sans the error correlations of items 1 and 3 and 10 and 15), to the latent factors of life satisfaction and general depressive affect were equivalent between the countries of Mexico and Uruguay. The same expected response for the item thresholds of questions 1, 3, 5-8, and 11-15 were the same at absolute level of the trait. Finally, the amount of item variance not accounted for by the factor was the same across groups for items 1, 5-8, and 11-14. The lack of full metric and scalar invariance means that the observed values of items 1, 2, 3, 4,9,10 and 15 will differ between older adults in Mexico and Uruguay at a given level of the latent factor. Invariance hypotheses tests are summarized in Table 29. 94 Mexico by Argentina For the Mexico by Argentina model global fit indices were adequate and ranged from a CFI of .961 to .974, TLI of .979 to .984 and RMSEA of .040 to .046, these results are summarized in Table 30. Invariance hypotheses tests Configural A configural invariance model was initially specified in which two factors (Life Satisfaction and General Depressive Affect) were estimated simultaneously for the countries of Mexico and Argentina. The factor variance was fixed to 1 and the factor mean was fixed to 0 in each group for identification, such that all item factor loadings and thresholds (1 per item given a binary response option) were then estimated. The residual variances are not uniquely identified in the configural invariance model and as such were all constrained to 1 in both groups. As shown in Table 30, the configural invariance model had adequate fit across countries,  2 122  483 .26, p .0000 RMSEA = .046, CFI = .961, TLI = .979. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between the countries of Mexico and Argentina, with Argentina as the reference group and Mexico as the focal group. 95 Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The factor variance was fixed to 1 in Argentina for identification but was freely estimated in Mexico; the factor mean was fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did result in a decrement in fit, DIFFTEST (11) = 22.26, p = .0224. Modification indices suggested that adding a correlation between the errors for item 14 and item 15 would improve fit. After doing so the partial metric invariance model did not fit significantly worse than the full configural invariance model, DIFFTEST (11) 17.30, p=.0993. Support for partial metric invariance indicates that items 1-13 were related to the latent factor equivalently across countries (with the exception of items 14 and 15). Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in Argentina for identification, but the factor variance and mean was then estimated for Mexico. All factor loadings and item thresholds were constrained equally across groups; all residual variances were still constrained equal to 1 in both groups. The 96 full scalar invariance model fit significantly worse than the partial metric invariance model, DIFFTEST (12) = 134.60, p < .0001. Modification indices suggested that the threshold of item 10 was the largest source of misfit and should be freed. After doing so, the partial scalar invariance model still had significantly worse fit than the partial metric invariance model, DIFFTEST (11) = 88.74, p <.0000. The modification indices then suggested that the thresholds of items 6 and 15 were the largest source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds of items 6, 10, and 15 freed) still fit significantly worse than the partial metric invariance model, DIFFTEST (10) = 33.21, p=.0003. The modification indices then suggested that the thresholds for items 5 and 8 were the largest remaining source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds for items 5, 6, 8, 10 and 15 freed) did not fit significantly worse than the partial metric invariance model, DIFFTEST (8) = 9.51, p=.3011. Support for partial scalar invariance indicates that items 1-4, 7, 9 and 11-14 have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to the factor mean differences only. 97 Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model comparison at this step proceeded backwards, such that a model with all residual variances freely estimated in the country of Mexico was fitted first, and then compared with a model in which the residual variances for the invariant items (1-4, 7, 9, and 11-14) were fixed to 1 in Mexico. The residual variances in the country of Argentina were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described for the last scalar invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal in Argentina) fit significantly worse than the model with those residual variances freed, DIFFTEST (12) = 22.35, p =0.0337. The modification indices suggested that relaxing the constraints on the residual variance for item 13 would improve fit. After doing so, the new fixed partial residual invariance model did not fit significantly worse than the freed residual invariance model DIFFTEST (12) = 16.78, p = 0.1144. Support for partial residual variance invariance (i.e., ―strict invariance‖) indicates that the amount of item variance not accounted for by the factor was the same across groups in items 1-4, 7, and 11-12 and 14); the residual variance for items 5, 6, 8,10 and 15 were assumed non-invariant because of lack of threshold invariance found previously, and were not tested. 98 Summary for Mexico by Argentina In conclusion, these analyses indicate that partial measurement invariance was obtained across Mexico and Argentina – that is, the relationships of items 1-13 to the latent factors of life satisfaction and general depressive affect were equivalent between the countries of Mexico and Argentina. In addition, for items 1-4, 7, 9 and 11-14 the observed differences in the proportion of responses in each category for these items was due to differences in the factor mean only. Finally, the amount of item variance not accounted for by the factor was the same across groups for items 1-4, 7, and 11-12 and 14. The lack of full metric and scalar invariance means that the observed values of items 5, 6, 8, 10, 14 and 15 will differ between older adults in Mexico and Argentina at a given level of the latent factor. Invariance hypotheses tests are summarized in Table 31. . 99 Uruguay by Argentina For the Uruguay by Argentina model, global fit indices were adequate and ranged from a CFI of .963 to .976, TLI of .980 to .984 and RMSEA of .040 to .045, these results are summarized in Table 32. Invariance hypotheses tests Configural A configural invariance model was initially specified in which two factors (Life Satisfaction and General Depressive Affect) were estimated simultaneously for the countries of Uruguay and Argentina. The factor variance was fixed to 1 and the factor mean was fixed to 0 in each group for identification, such that all item factor loadings and thresholds (1 per item given a binary response option) were then estimated. The residual variances are not uniquely identified in the configural invariance model and as such were all constrained to 1 in both groups. As shown in Table 32, the configural invariance model had adequate fit across countries,  2 123  434 .28, p .0000 , RMSEA = .045, CFI = .963, TLI = .980. The analysis proceeded by applying parameter constraints in successive models to examine potential decreases in fit resulting from measurement non-invariance between the countries of Uruguay and Argentina, with Uruguay as the reference group and Argentina as the focal group. 100 Metric Equality of the unstandardized item factor loadings between groups was then examined in a metric invariance model. The factor variance was fixed to 1 in Uruguay for identification but was freely estimated in Argentina; the factor mean was fixed to 0 in both groups for identification. All factor loadings were constrained equal across groups, all item thresholds were estimated, and all residual variances were constrained to 1 across groups. The metric invariance model did result in a decrement in fit, DIFFTEST (12) = 50.16, p = .0000. Modification indices suggested that adding a correlation between the errors for item 3 and item 4 would improve fit. After doing so the partial metric invariance model did not fit significantly worse than the full configural invariance model, DIFFTEST (10) 11.50, p=.3198. Support for partial metric invariance indicates that items 1, 2 and 5-15 were related to the latent factor equivalently across countries (with the exception of items 3 and 4). 101 Scalar Equality of the unstandardized item thresholds across groups was then examined in a scalar invariance model. The factor variance and mean were fixed to 1 and 0, respectively, in Uruguay for identification, but the factor variance and mean was then estimated for Argentina. All factor loadings and item thresholds were constrained equal across groups; all residual variances were still constrained equal to 1 in both groups. The full scalar invariance model fit significantly worse than the partial metric invariance model, DIFFTEST (12) = 55.60, p < .0000. Modification indices suggested that the threshold of item 15 was the largest source of misfit and should be freed. After doing so, the partial scalar invariance model still had significantly worse fit than the partial metric invariance model, DIFFTEST (11) = 28.95, p <.0023. The modification indices then suggested that the threshold of item 12 was the largest source of misfit and should be freed. After doing so, the new partial scalar invariance model (with the thresholds of items 12 and 15 freed) did not fit significantly worse than the partial metric invariance model, DIFFTEST (10) = 16.38, p=.0893. Support for partial scalar invariance indicates that items 1-11 and 13-14 have the same expected response for each item threshold at the same absolute level of the trait, or more simply, that the observed differences in the proportion of responses in each category for those items was due to the factor mean differences only. 102 Strict Equality of the unstandardized residual variances across groups was then examined in a residual variance invariance model. The model comparison at this step proceeded backwards, such that a model with all residual variances freely estimated in the country of Argentina was fitted first, and then compared with a model in which the residual variances for the invariant items (1-11 and 13-14) were fixed to 1 in Argentina. The residual variances in the country of Uruguay were all fixed to 1 for identification in both models, and the rest of the model parameters were estimated as described for the last scalar invariance model. The model with the residual variances for invariant items constrained to 1 (to be equal in Argentina) fit significantly worse than the model with those residual variances freed, DIFFTEST (11) = 20.94, p =0.0340. The modification indices suggested that relaxing the constraints on the residual variance for item 9 would improve fit. After doing so, the new fixed partial residual invariance model did not fit significantly worse than the freed residual invariance model DIFFTEST (11) = 14.92, p = 0.1849. Support for partial residual variance invariance (i.e., ―strict invariance‖) indicates that the amount of item variance not accounted for by the factor was the same across groups in items 1-8, 10, 11, 13 and 14); the residual variance for items 12 and 15 were assumed non-invariant because of lack of threshold invariance found previously, and were not tested. 103 Summary for Uruguay by Argentina In conclusion, these analyses indicate that partial measurement invariance was obtained across Uruguay and Argentina – that is, the relationships of items 1, 2 and 5-15 to the latent factors of life satisfaction and general depressive affect were equivalent between the countries of Uruguay and Argentina. In addition, for items 1-11and 13-14 the observed differences in the proportion of responses in each category for these items was due to differences in the factor mean only. Finally, the amount of item variance not accounted for by the factor was the same across groups for items 1-8, 10, 11, 13 and 14. The lack of full metric and scalar invariance means that the observed values of items 3, 4, 12 and 15 will differ between older adults in Uruguay and Argentina at a given level of the latent factor. Invariance hypotheses tests are summarized in Table 33. . 104 Multiple-Group CFA Invariance Summary The analyses here represent the most robust examination of the measurement properties of the GDS-15 in older adults in Latin America and the Caribbean. The GDS15‘s binary response format was appropriately modeled with EFA and CFA models in Mplus. Multiple group binary CFA models were used to test hypotheses metric, scalar and strict invariance across gender and country of origin in older adults. EFA and CFA analyses found support for a one-dimensional structure of general depressive affect in the countries of Chile and Cuba. While in the countries of Argentina, Mexico and Uruguay a two-factor model defined as life satisfaction and general depressive affect best reflected the data, and replicated previous work by (Brown and Woods, 2007). Across all countries by gender metric invariance was supported which indicates that the relationships of items to their latent factors were equivalent between men and women. Full measurement invariance by gender was obtained in the countries of Chile and Cuba, which indicates that within these countries the GDS-15 performs equivalently across men and women at all, levels of invariance. In addition, because full metric and scalar invariance was obtained, group comparisons of the latent mean of ―general depressive affect‖ can be conducted. Across the countries of Argentina, Mexico and Uruguay partial scalar invariance was obtained, which means that for items 9 (Argentina), 6 & 8 (Mexico) and 15 (Uruguay) there is ―something‖ other than the factors that is causing the thresholds to differ and that ―something‖ is related to gender. Although full metric invariance was obtained across the countries of Argentina, Mexico 105 and Uruguay, a lack of full scalar invariance indicates that men and women within these countries will differ in the threshold parameter of the item exhibiting misfit. This means that all predicted observed scores will differ at various levels of the latent factors of life satisfaction and general depressive affect. There were no country by country comparisons that obtained full measurement invariance. For the country by country comparison of Cuba by Chile, 5 out of 15 items contributed to the lack of scalar invariance (1, 5, 7, 9, and 15), which indicates that comparisons based on these items would be suspect. In the country by country comparison of Mexico and Uruguay 5 out 15 items contributed to a lack of scalar invariance (2, 4, 9, 10 and 15) also indicating that group comparisons which include these items would be suspect. The comparison of Mexico and Argentina also had 5 items that contributed to lack of scalar invariance (5, 6, 8, 10, and 15) while the comparison of Argentina and Uruguay had 2 items that contributed to lack of scalar invariance (12 and 15). Across the country by country comparisons, item 9 (―do you prefer to stay at home, rather than going out and doing things‖) and 15 (―do you think that most people are better off than you are‖) presented most often as having lack of threshold invariance. A direct cause of the invariance in items 9 and 15 is difficult to ascertain because there are countless factors that may influence the responses to these items. However, the wording of these items may tap into constructs that are unrelated to depression; for example with item 9 (―do you prefer to stay at home, rather than going out and doing things‖) could relate to a lack of security/safety in an individuals‘ environment, such as living in a high crime 106 neighborhood, thus endorsement of this item might be a reflection of the community you live in and not you level of depression. Similarly, with item 15 (―do you think that most people are better off than you are‖) persons might interpret this item as being related to socioeconomic position and not depression. 107 IRTLR-DIF Analysis In the current study, IRT likelihood-ratio tests were used to test for invariance of item response parameters across gender and country of origin for the Geriatric Depression Scale Short Form (GDS-15). The IRTLR procedure uses a 2-parameter logistic item response model. This procedure requires that members from each group under study are matched on an estimate of theta (i.e. depression). In order to get a valid estimate of theta, it is best to match the groups using items that are DIF-free. This set of items is referred to as an anchor set and, if there is no prior knowledge that there are scale items that are DIF-free, then an item purification process must be used. The IRTLR-DIF procedure was used to first identify anchor items and then to test the studied items for DIF relative to these anchor items. The Benjimini-Hochberg procedure was used to control for multiple comparisons in determining statistically significant DIF. MULTILOG software was then used to obtain the final parameter estimates for the groups under study, while modeling the DIF that was significant. The first step in the analysis was to ensure that the measure under study was sufficiently unidimensional. There is currently no single gold-standard procedure for determining whether a data-set is sufficiently unidimensional for IRT analyses (Reise, 2005). With that said, there are several commonly used approaches for evaluating unidimensionality: (1) examination of screeplots for a dominant first factor, (2) large st ratio of 1 to 2 nd eigenvalues (3 to 1) and (3) dominant first factor accounts for at least 20% of the variance in the survey items and (4) the factor with the second largest 108 eigenvalue only explains a small amount of the variance present (Reckase, 1979; Reise, 2005). Based on the previous MGCFA analysis, screeplots for all five countries indicate st that there is a dominant first factor, and the ratio of 1 to 2 requisite ratio of (3 to 1) (Argentina ratio of 1 st of 1 (7.87) to 2 st nd st nd eigenvalues meets the (7.53) to 2nd (1.88) = 4.0) (Mexico ratio st (1.42) = 5.5) (Cuba ratio of 1 (8.57) to 2 ratio of 1 (8.11) to 2 nd (1.01) = 8.0) (Uruguay ratio of nd (1.05) = 8.1) (Chile 1st (8.06) to 2nd (1.19) = 6.7), so the GDS-15 can be considered sufficiently unidimensional in all countries. 109 IRTLR-DIF by Gender The following section presents the results of the IRTLR-DIF analysis within each country by gender and then between countries. Chile DIF was assessed within the country of Chile with respect to gender, with women as the reference group and men as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 14 DIF-free items using a p-value of .05, and one item was identified as having potential non-uniform DIF with respect to gender. Item 13 ―do you feel full of energy‖, showed significant non-uniform DIF using IRTLR, prior to, but not after the BH-Adjustment. Although, item 13 lacked significance after the BH-Adjustment, the discrimination and location parameters were estimated freely (not constrained) for men and women in the MULTILOG analysis. Final 2PL model estimates showed that for the anchor items in Table 34,, items 18 and 10-14 discriminated well with a high value of 2.63 for item 8 and a low value of .54 for item 9, as a group they covered a wide range of depression severity (b-parameters range from -.05 to 1.56) implying that they were well suited to serve as an anchor set. The right most column of Table 35, lists the chi-square values and associated probabilities for the IRTLR nested model comparison tests of a and b DIF for item 13, Figure 6 displays the item characteristic curve for item 13 by gender. The test information function for Chile, suggests that the GDS-15 is best differentiating for individuals between (-.5 < ϴ < 2) on the depression continuum and is less differentiating 110 for individuals on the lower end, evidenced by the large spread of measurement errors for individuals with very low scores on the latent trait, see Figure 7. 111 Cuba DIF was assessed within the country of Cuba with respect to gender, with women as the reference group and men as the focal group. Using all other items as tentative anchor, the initial IRTLR-DIF procedure identified 13 DIF-free items, while two items 7 and 12, exhibited potential uniform DIF: item 7 ―do you feel happy most of the time‖ and item 12 ―do you feel worthless the way you are now‖, were then retested for DIF relative to the anchor item set. After the second item purification with IRTLR-DIF, items 7 and 12 remained significant for uniform DIF prior to the BH-Adjustment, but not after the BH-Adjustment. Parameters for both items were estimated freely despite their non-BHAdjustment significance. Final 2PL model estimates showed that the anchor items in Table 36 discriminated well (i.e. were strongly related to the underlying construct) with a high value of 2.91 for item 3 and low value of .92 for item 9 and as a group the anchor set covered a wide range of depression severity (-.01 to 1.26). The right most column of Table 37, lists chi-square values and associated probabilities for the IRTLR nested model comparison tests of non-uniform and uniform DIF for the two studied items 7 and 12. Figure 8 and Figure 9 display item characteristic curves for items 7 and 12 by gender. The test information function indicates that this scale better differentiates between individuals who are in the middle range of -1 <ϴ <2 on the latent trait continuum and is less differentiating at either extreme. Figure 10 also shows that there are relatively large measurement errors for individuals with very low scores on the latent trait. 112 Argentina DIF was assessed by gender in the country of Argentina with women as the reference group and men as the focal group. The anchor items, for the gender DIF analysis in the country of Argentina were moderately discriminating in a narrow range of difficulty parameters (a parameters range from 1.13 to 2.94 and b parameters ranged from .44 to 2.00); results are summarized in Table 38. The analysis based on gender in the country of Argentina initially identified 10 DIF-free anchor items and 5 with potential DIF; however, after purification 4 items with DIF were identified, and all but one with uniform DIF. Application of the BHAdjustment revealed that none of the four items evidenced DIF. Results, are summarized in Table 39 and Figure 11 through Figure 14 , display the item characteristic curves for items 9, 11, 12 and 15 by gender. The test information function indicates that the scale differentiates best for individuals who are above average on the latent trait, and least for those on the lower end of the continuum as evidenced by the large spread of measurement errors; see Figure 15. 113 Mexico DIF was assessed by gender in the country of Mexico with women as the reference group and men as the focal group. The analysis based on gender in the country of Mexico initially identified 11 DIF-free anchor items and 4 with potential DIF. The anchor item set was moderately discriminating with a high value of 2.76 for item 3 and a low value of 1.22 for item 9, in addition there was a wide range of severity (difficulty) parameters, see Table 40. After item purification there were only two items exhibiting uniform DIF, item 6 ―are you afraid something bad is going to happen to you‖ and item 8 ―do you often feel helpless‖. After the BH-Adjustment both items 6 and 8 remained statistically significant for uniform DIF. As shown in Figure 16 and Figure 17, items 6 and 8 were both more severe indicators (difficult to endorse) for men than women, so it would require a higher amount of depression for men to endorse either of these items (item 6 Men: a=1.54, b=.39, Women: a=1.54, b=.00), (item 8 Men a=3.16, b=.72, Women: a=3.16, b=.00); results are summarized in Table 40 and Table 41. In addition, the test information plot shows that the scale differentiates better for individuals who are just above average on the depression continuum and is less differentiating at either extreme as evidenced by the spread of the measurement errors; see Figure 18. 114 Uruguay DIF was assessed by gender in the country of Uruguay with women as the reference group and men as the focal group. The analysis based on gender in the country of Uruguay initially identified 10 DIF-free anchor items and 5 with potential DIF; see Table 42. However, after item purification there were only 4 items exhibiting nonuniform DIF, item 5 “are you in good spirits most of the time”, item 6 “are you afraid something is going to happen to you”, item 12 “do you feel worthless the way you are now” and item 14 “do you feel that your situation is hopeless”. Items 5,6,12 and 14 evidenced DIF prior to the BH-Adjustment, but after the adjustment only item 5 and 14 continued to exhibit non-uniform DIF, see Table 43. For item 5 men had a higher difficulty parameter than women (men a=1.76, b=1.07 and women a=2.81, b=1.02) indicating that it was more difficult or required slightly more depression for men to endorse this item, while women had a larger a parameter, indicating a stronger relationship with the item and the underlying construct relative to gender, see Table 43. Item 5 is one of five negatively coded items (1, 5, 7, 11, 13) on the GDS-15; what this means is that all but five items on the GDS-15 are coded as 0=no and 1=yes; for the other five items the coding is 0=yes and 1=no. So in the case of item 5, if an individual endorses this item with the GDS-15 coding scheme, the interpretation of item 5 is the following: men find it harder to endorse “I am not in good spirits most of the time” relative to women, however based on the b parameter for women, they also find it difficult to endorse this item, see Table 43 and Figure 19. 115 For item 14, men also have a harder time endorsing ―do you feel that your situation is hopeless” relative to women (men a=2.34, b=1.17, women a=2.90, b=1.14), see Table 43 and Figure 22. The test information plot, indicates that for men and women in Uruguay, the GDS-15 does a better job of differentiating between individuals in the middle to the upper end of the continuum 0 < ϴ < 2; in addition, measurement errors were larger for individuals on the lower end of the continuum, see Figure 23. With respect to the anchor item set used, all of the items provided modest discrimination with the highest value being 3.33 for item 7 and the lowest value being .89 for item 9. Difficulty parameters ranged from -.16 to 1.93, but tended to cluster between .72 and 1.05, see Figure 42. A summary of the aforementioned DIF analyses of the GDS-15 by gender is provided in Table 44 to Table 46. 116 IRTLR-DIF by Cross Country Comparisons Chile by Cuba DIF was assessed between the countries of Chile and Cuba, with Chile as the reference group and Cuba as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 7 DIF-free anchor items see Table 47. Two out of 8 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment, both of items showed DIF: item 4 ―do you often get bored‖ (uniform) and item 8 ―do you often feel helpless‖ (uniform), see Table 48. For both indicators older adults in the country of Chile found it more difficult to endorse these items (item 4 Chile: a=3.19, b=.08 Cuba: a=3.19, b=.00) (item 8 Chile: a=2.83, b=.56, Cuba: a=2.83, b=.00). In other words, the location parameter is lower for older adults in Chile, thus the DIF in this item indicates that given the same level of overall depression, it takes more depression for Chileans to endorse these items, see Figure 24 and Table 48. The test information plot showed that across the countries of Chile and Cuba the GDS-15 is better at differentiating older adults between -1 < ϴ < 2 on the depression continuum and less differentiating for individuals with lower scores on the latent trait, evidenced by the larger spread of measurement errors; see Figure 26. The final anchor item set of 13 indicators, were moderately discriminating (strong relationship to the underlying construct), with a high value of 2.89 for item 3 and a low 117 value of .83 for item 9 having the lowest value; in addition the location parameters clustered in a narrow range -.82 to 1.22; see Table 47. Mexico by Uruguay DIF was assessed between the countries of Mexico and Uruguay, with Uruguay as the reference group and Mexico as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 6 DIF-free anchor items, see Table 49. Five out of 9 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment 4 of the 5 items showed DIF, all uniform DIF: item 6 ―are you afraid something bad is going to happen to you, item 7 ―do you feel happy most of the time, item 9 ―do you prefer to stay at home, rather than going out and doing things and item 12 ―do you feel worthless the way you are now‖; see Table 50. All DIF items were highly discriminating relative to item 9. For items 6 and 12 older adults from Uruguay had to have more depression to endorse these items than older adults from Mexico (item 6: Uruguay a=1.66, b=.29, Mexico a=1.66, b=.00; item 12: Uruguay a=3.11, b=.66, Mexico a=3.11, b=.00). Alternatively, given the same level of overall depression, DIF in these items (6 and 12) indicates that older adults in Mexico find it easier to endorse these items. For item 7, the interpretation of the question is based on the reverse coding scheme for items (1, 5, 7, 11 and 13), with that said, item 7 should be interpreted as ―I do not feel happy most of the time‖. Consequently, older adults from Uruguay find it more difficult to endorse this item (item 7: Uruguay a=3.39, b=.21, Mexico a=3.39, b=.00). Older adults from Mexico found it more difficult to endorse 118 item 9, while persons from Uruguay found it easier to endorse (item 9 Uruguay a=.85, b= -.70, Mexico a=.85, b=.00); see Figure 27 through Figure 30 and Table 50. Examination of the test information plot for the GDS-15 across the countries of Mexico and Uruguay, shows that the instrument differentiates best for older adults in the middle of the distribution and not at either extreme -1.50 < ϴ < 1.75; see Figure 31. In addition the measurement errors for individuals scoring lower on the continuum are large, indicating poor differentiation. With respect to the anchor item set, all ten items were moderately discriminating with a high value of 3.21 for item 8 and a low value of 1.27 for items 10 and 15; in addition the location parameters clustered around the upper range of the distribution .10 to 1.03; see Table 49. 119 Mexico by Argentina DIF was assessed between the countries of Mexico and Argentina, with Argentina as the reference group and Mexico as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 6 DIF-free anchor items; see Table 51. Five out of 9 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment all five items showed DIF, two with non-uniform and three with uniform: item 2 ―do you feel that your life is empty‖ (uniform DIF), item 7 ―do you feel happy most of the time (non-uniform DIF), item 9 ―do you prefer to stay at home, rather than going out and doing things (uniform DIF), item 11 ―do you think it is wonderful to be alive now‖ (non-uniform DIF) and item 12 ―do you feel worthless the way you are now (uniform DIF); see Table 52. For older Mexicans with average standing on the latent trait, it was easier to endorse items 2 and 12 (item 2 Mexico a=1.88, b=.00, Argentina a=1.88, b=.29; item 12 Mexico a=2.54, b=.00, Argentina a=2.54, b=.69), while item 9 was easier for persons from Argentina to endorse (item 9 Mexico a=.98, b=.00, Argentina a=.98, b=-46); see Table 52. Item 7 “do you Not feel happy most of the time” (reverse scored and interpreted), had higher discrimination (stronger relationship to the construct) but a smaller location parameter for older adults in Argentina. In Mexico, item 7 had smaller discrimination but a higher location parameter than Argentina. Thus, older adults in Mexico find it more difficult to endorse item 7 than persons from Argentina (Mexico a=2.27, b=.72, Argentina a=2.60, b=33); see Table 52. . 120 For item 11 ―do you think it is wonderful to be alive now‖, persons from Argentina had higher discrimination but lower b-parameters relative to persons from Mexico. Older adults from Mexico had lower discrimination but a larger location parameter relative to persons from Argentina (Argentina a=1.83, b=1.01, Mexico a=1.78, b=1.62). Item 11 is interpreted under the same coding scheme as item 7, so the interpretation is actually “do you not think it is wonderful to be alive now”, and persons from Mexico find it more difficult to endorse this item than people from Argentina; see Table 52 and Figure 32 to Figure 36. Examination of the test information plot for the GDS-15 across the countries of Mexico and Argentina, shows that the instrument differentiates best for older adults between -1 < ϴ < 1.75 and not for persons on the lower end of the continuum, as evidenced by large measurement errors, see Figure 37. With respect to the anchor item set, all ten items were strongly related to the underlying construct, with a high value of 3.01 for item 8 and a low value of 1.16 for item 15; in addition the location parameters were widely spread across the continuum .16 to .93; see Table 51. 121 Argentina by Uruguay DIF was assessed between the countries of Argentina and Uruguay, with Argentina as the reference group and Uruguay as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 8 DIF-free anchor items; see Table 53. Four out of 7 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment four items showed DIF: item 2 ―do you feel that your life is empty‖ (uniform), item 6 ―are you afraid something bad is going to happen to you‖ (non-uniform), item 7 ―do you not feel happy most of the time‖ (non-uniform) and item 11 ―do you not think it is wonderful to be alive now‖ (non-uniform); see Table 54. For item 2, older adults from Uruguay with average standing on the latent trait depression find it easier to endorse this item relative to persons from Argentina (Uruguay a=1.67, b=.00 Argentina a=1.67, b=.70). For item 6, the discrimination parameter was larger in the country of Uruguay, indicating a stronger relationship to the construct than in the country of Argentina, but a smaller location parameter. While in Argentina the discrimination parameter is smaller and the location parameter is larger, indicating that persons from Argentina require larger amounts of theta/depression to endorse this item (Argentina: a=1.07, b=1.23, Uruguay: a=1.54, b=.77). Item 7 ―do you not feel happy most of the time‖ is highly discriminating especially in the country of Uruguay (a=3.37, b=.66), but the location parameter is smaller relative to Argentina (a=2.66, b=.68), indicating that it is easier for older adults from Uruguay to endorse item 7. Finally, item 11 ―do you not think it is wonderful to be alive now‖ is more discriminating in the 122 country of Uruguay (a=2.45, b=1.28) than Argentina (a=1.86, b=1.36), but has a smaller location parameter which indicates that endorsement of item 11 was easier for older adults from Uruguay, see Figure 38 to Figure 41 and Table 54.. The test information plot for the countries of Uruguay and Argentina differentiates best for individuals higher on the latent continuum and less so for individuals lower on the latent trait continuum, evidenced by large measurement errors; see Figure 42. The anchor item set used had large discrimination parameters, indicating a strong relationship between the items and the underlying construct, with item 9 and 15 having the smallest discrimination values. The location parameters clustered around a narrow range -.20 to 1.94; see Table 53. 123 Argentina by Chile DIF was assessed between the countries of Argentina and Chile, with Argentina as the reference group and Chile as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 7 DIF-free anchor items, see Table 55. Two out of 8 items (item 2 and item 7) originally identified with DIF, evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment only one item showed DIF: item 2 ―do you feel that your life is empty‖ (uniform), see Table 56. Item 2 was easier to endorse for older adults from Chile (a=1.39, b=.00) relative to Argentina (a=1.39, b=.09), evidenced by the item characteristic curves for this item, see Figure 43. The item characteristic curve for item 7 is presented in Figure 44; item 7 did not have significant DIF after the BH- Adjustment. Across the countries of Argentina and Chile, the GDS-15 best differentiates for individuals above average on the depression continuum, and less so for individuals on the lower end of the continuum, as evidenced by large measurement errors; see Figure 45. The anchor item set used for this analysis, had discrimination parameters which were strongly related to the underlying construct, with a high value of 2.66 for item 8 and a low value of .83 for item 9. Location parameters also reflect a wide range of values .04 to 1.45; see Table 55. 124 Argentina by Cuba DIF was assessed between the countries of Argentina and Cuba, with Argentina as the reference group and Cuba as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF procedure identified 9 DIF-free anchor items; see Table 57. Three out of 6 items originally identified with DIF, evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment all three items showed DIF: item 2 ―have you dropped many of your activities and interests‖ (uniform), item 5 ―are you not in good spirits most of the time‖ (non-uniform) and item 7 ―do you not feel happy most of the time‖ (non-uniform); see Table 58. Item 2 was easier for older adults from Cuba to endorse (a=1.66, b=.00) than older adults from Argentina (a=1.66, b=.65). Item 5 ―are you not in good spirits most of the time‖, was more discriminating for older adults from Cuba, but also had a smaller location parameter indicating that it was easier for Cubans to endorse item 5 than Argentineans (Cuba: a=2.47, b=.75; Argentina a=2.09, b=.89). Finally, for item 7 ―do you not feel happy most of the time‖, Cubans had larger discrimination parameters and smaller location parameters indicating that the item was easier to endorse than for older adults from Argentina (Cuba: a= 2.90, b=.49; Argentina a=2.63, b=.62); see Figure 46 to Figure 48 and Table 58. The test information plot for the GDS-15 across the countries of Cuba and Argentina indicates that the instrument differentiates best for persons above average on the latent trait continuum and less so, for persons on the lower end, evidenced by larger measurement errors; see Figure 49. 125 With respect to the anchor item set, all items had modest discrimination values, with a high value of 2.80 (item 8) and a low value of .99 (item 9) and as group the location parameters covered a wide range -.35 to 1.64; see Table 57. Mexico by Chile DIF was assessed between the countries of Mexico and Chile, with Chile as the reference group and Mexico as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF identified 9 DIF-free anchor items see Table 59. Five out of 6 items originally identified with DIF, evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment only 3 of the 5 items showed DIF: item 7 ―do you not feel happy most of the time‖ (uniform), item 8 ―do you often feel helpless‖ (non-uniform) and item 12 ―do you feel worthless the way you are now (uniform); see Table 60. Older adults from Mexico found both items 7 and 12 easier to endorse (item 7 Mexico: a=3.15, b=.00, Chile: a=3.15, b=.40), (item 12 Mexico: a=3.09, b=.00, Chile a=3.09, b=.51) than older adults from Chile; see Figure 50, Figure 53 and Table 60. For item 8, Chileans had smaller discrimination parameters but larger location parameters (item 8 Chile a=2.97, b=.44), indicating that this item was difficult to endorse relative to Mexicans. Mexicans had higher discrimination parameters for item 8 but smaller location parameters, indicating that this item was easier to endorse (item 8 Mexico: a=3.46, b=.14), see Figure 51 and Table 60. 126 The test information plot for the GDS-15 across the countries of Cuba and Chile indicates that the instrument differentiates best for individuals in the middle of the distribution but not at either extreme see Figure 55. With respect to the anchor item set, all items had modest discrimination with a high value of 2.98 for item 3 and a low value of 1.01 for item 9; as a group the location parameters were clustered in a narrow range .69 to .67; see Table 59. 127 Mexico by Cuba DIF was assessed between the countries of Mexico and Cuba, with Mexico as the reference group and Cuba as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF identified 8 DIF-free anchor items; see Table 61. Six out of 7 items originally identified with DIF, evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment only 5 of the 6 items showed DIF: item 2 ―have you dropped many of your activities and interests‖ (uniform), item 4 ―do you often get bored‖ (uniform), item 7 ―do you not feel happy most of the time‖ (uniform), item 12 ―do you feel worthless the way you are now‖ (uniform) and item 14 ―do you feel that your situation is hopeless‖ (uniform); see Table 62. Older adults in Cuba found items 2, 4, 7, 12 and 14 easier to endorse than older adults from Mexico (item 2 Cuba a=1.91, b=.00, Mexico a=1.91, b=.10), (item 4 Cuba a=2.98, b=.00, Mexico a=2.98, b=.03), (item 7 Cuba a=2.95, b=.00, Mexico a=2.95, b=.39), (item 12 Cuba a=2.99, b=.00, Mexico a=2.99, b=.35) and (item 14 Cuba a=3.15, b=.00, Mexico a=3.15, b=.23); see Figure 56 to Figure 61 and Table 62. The test information plot for the GDS-15 across the countries of Mexico and Cuba indicates that the instrument differentiates best for individuals in the middle of the distribution but not at either extreme, see Figure 62. With respect to the anchor item set, all items had modest discrimination with a high value of 3.21 for item 8 and a low value of 1.16 for item 9, as a group the location parameters were clustered in a narrow range -.68 to .64; see Table 62. 128 Uruguay by Chile DIF was assessed between the countries of Uruguay and Chile, with Chile as the reference group and Uruguay as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF identified 7 DIF-free anchor items; see Table 63. Four out of 8 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment, only 3 of the 4 items showed DIF: item 2 ―have you dropped many of your activities and interests‖ (non-uniform), item 10 ―do you feel that you have more problems with memory than most‖ (nonuniform) and item 14 ―do you feel that your situation is hopeless‖ (uniform), see Table 64. Item 2 had a higher discrimination parameter for older adults from Uruguay and a smaller location parameter indicating that this item was easier to endorse, while older adults from Chile had lower discrimination parameters but higher location parameters indicating more difficulty in endorsing this item (Uruguay a=1.71, b=.30, Chile a=1.27, b=.36); see Table 64 and Figure 63. Item 10 was easier to endorse for Chileans than persons from Uruguay, in addition item 10 more discriminating in Chile than Uruguay (Chile a=1.39, b=.92, Uruguay a=1.15, b=1.45) see Table 64 and Figure 65. Finally, for item 14 older adults from Uruguay found this item easier to endorse than Chileans (Uruguay a=2.61, b=.00, Chile a=2.61, b=.57), see Figure 66 and Table 64. The test information plot for the GDS-15 across the countries of Uruguay and Chile indicates that the instrument differentiates best for individuals above average on the latent trait distribution but not at the lower end see Figure 67. 129 With respect to the anchor item set, all items had modest discrimination with a high value of 3.05 for item 7 and a low value of 1.02 for item 15; as a group the location parameters were clustered in a narrow range of -.73 to .80; see Table 63. 130 Uruguay by Cuba DIF was assessed between the countries of Uruguay and Cuba, with Uruguay as the reference group and Cuba as the focal group. Using all other items as a tentative anchor, the initial IRTLR-DIF identified 9 DIF-free anchor items see Table 65. Three out of 6 items originally identified with DIF evidenced DIF after item purification, but before the BH-Adjustment. After the BH-Adjustment all three items showed DIF: item 4 ―do you often get bored‖ (uniform), item 6 ―are you afraid something bad is going to happen to you‖ (non-uniform) and item 8 ―do you often feel helpless‖ (non-uniform), see Table 66. Item 4 was easier for older adults from Cuba to endorse than Uruguay (Cuba a=2.51, b=.00, Uruguay a=2.51, b=.39). For item 6, older adults form Uruguay had higher discrimination parameters but lower location parameters indicating that this item was easier to endorse relative to Cubans (item 6 Uruguay a=1.54, b=.65, Cuba a=1.52, b=.67). Finally for item 8, older adults from Uruguay had higher discrimination parameters but lower location parameters indicating that this item was easiest to endorse relative to Cubans (item 8 Uruguay a=2.96, b=.72, Cuba a=2.77, b=.90); see Figure 68 to Figure 70 and Table 66. The test information plot for the GDS-15 across the countries of Uruguay and Cuba indicates that the instrument differentiates best for individuals above average on the latent trait distribution but not at the lower end; see Figure 71. With respect to the anchor item set, all items had modest discrimination with a high value of 3.08 for item 7 and a low value of .92 for item 9; as a group the location parameters were widely spread -.44 to 131 .1.52, see Table 65. A summary of the aforementioned DIF analyses of the GDS-15 by country of origin is provided in Table 67 through Table 72. 132 IRTLR-DIF Summary DIF analyses by gender showed that men had more difficulty endorsing item 5 ―are you not in good spirits most of the time‖, item 6 ―are you afraid that something bad is going to happen to you‖, item 8 ―do you often feel helpless‖ and item 14 ―do you feel that your situation is hopeless‖, than women across countries. With the exception of Mexico, item 9 ―do you prefer to stay at home, rather than going out and doing things‖, consistently had the lowest discrimination values across Chile, Cuba, Argentina and Uruguay. This indicates that this item consistently provided lower levels of information and if the GDS-15 were to be revised or shortened, item 9 would be a candidate item to be removed from the scale. In addition, test information plots across countries indicate a need for items that can differentiate between individuals on the lower end of the continuum. DIF analyses with country to country comparisons showed that item 2 ―have you dropped many of your activities and interests‖, item 4 ―do you often get bored‖, item 6 ―are you afraid something bad is going to happen to you‖, item 7 ―do you not feel happy most of the time‖ and item 12 ―do you feel worthless the way you are now‖, were consistently more difficult for older adults from Mexico to endorse. In addition, older adults from Argentina found it difficult to endorse item 2 ―have you dropped many of your activities and interests‖, item 6 ―are you afraid something bad is going to happen to you‖, item 7 ―do you not feel happy most of the time‖, item 9 ―do you prefer to stay at home, rather than going and doing things and item 11 ―do you not think it is wonderful to be alive now‖, across several country comparisons. 133 Like the gender DIF analyses, item 9 tended to have the lowest discrimination values across country to country comparisons, which means this item provided little information, and were the scale to be revised, consideration should be given to its removal. In addition just like in the gender DIF analyses, the test information plots for the country to country comparisons indicate overall, that the GDS-15 is best at differentiating between individuals who are high on the continuum of depression, but could benefit from items that can differentiate between individuals across the complete continuum of depression. Full invariance on discrimination and difficulty parameters were not obtained within any country. With that said, full invariance is generally not a realistic expectation. Instead, these results can be seen as a means of understanding how the scale behaves at the item and scale level across gender and country of origin. At the item level, the results provide researchers with information on how items are related to the underlying construct via high or low a-parameter estimates, and over what levels of the latent trait the item best discriminates among groups. This DIF analysis provides insight into what items on the GDS-15 might be problematic based on gender or country of origin; in addition these results can be used to help in the interpretation of the depressive etiology underlying group differences in response patterns to particular items. At the scale level, end users such as clinicians, nurses or social workers, can gain insight into the measurement precision of the scale over the levels of the depressive continuum. Because the GDS-15 was designed to identify the presence of depression, i.e. individuals with high levels of depression, the test information curves for 134 the scale is expected to function in the upper level of ϴ. This pattern was observed across gender and cross-comparison analyses. Finally, because of the relationship between physical and mental health evidenced by the work of (Ayotte, Yang, & Jones, 2010; A. T. F. Beekman, Kriegsman, Deeg, & Tilburg, 1995; A.T.F. Beekman, et al., 1997; Berkman, et al., 1986; Braam, et al., 2005; Ormel, et al., 1997) and others, future DIF work on the GDS-15 should assess item and test functioning in relationship to chronic disease and functional disability associated with aging. Finding out which items differentiate best among older adults with functional disabilities and chronic illnesses can inform the development of group specific depression interventions. 135 CHAPTER 5: DISCUSSION This chapter presents a summary of the major findings of the current study, an evaluation of practical implications of these findings as well as suggestions for future research. Limitations of the current study are also discussed. GENERAL OVERVIEW Many disciplines such as psychology, sociology, education and epidemiology use group comparisons to investigate differences in psychological phenomena, social systems, learning styles and chronic disease. These comparisons explicitly or not are predicated on the assumption that the measures used to evaluate differences are the same conceptually across groups of interest. Unfortunately measurement invariance is often assumed and not tested. This can lead to spurious conclusions, which can ultimately affect educational outcomes, health outcomes or policy decisions. Testing the assumption of measurement invariance provides support for the suitability of an instrument for different populations. The assessment of measurement invariance can improve the quality of research in many disciplines. The need for making sure that an instrument is functioning in the same way for different groups was the driving force behind the current study. The current study used two commonly used frameworks for assessing measurement invariance (1) structural equation modeling (SEM) and (2) item response theory (IRT). The two approaches used to test the measurement invariance of the GDS15 within the countries of Chile, Cuba, Argentina, Mexico and Uruguay were multiple 136 group confirmatory factor analysis (MGCFA) and item response theory likelihood ratio tests (IRTLR-DIF). The multiple group approach was used to evaluate the invariance of factor loadings, thresholds and uniqueness‘s and the IRTLR-DIF approach was used to assess the invariance of discrimination and difficulty parameters. Major Findings Multiple Group Confirmatory Factor Analyses Gender Previous psychometric research on the GDS-15 was limited and provided a lack of consensus on the underlying structure of the instrument (P. J. Brown, et al., 2007; Friedman, et al., 2005; Incalzi, et al., 2003; D. W. L. Lai, et al., 2005; Mitchell, et al., 1993; Onishi, et al., 2006; Schreiner, et al., 2001). As such, the current study first evaluated the underlying structure of the instrument within each country. Exploratory factor analyses and CFA‘s within each country showed that a onefactor model provided the best fit in the countries of Chile and Cuba and this factor was defined as general depressive affect. Within the countries of Argentina, Mexico and Uruguay, a two-factor model fit the data best and these factors were defined as general depressive affect and life satisfaction. These two factors replicate the work of (P. J. Brown, et al., 2007) and (Mitchell, et al., 1993). As discussed in chapters 2 and 3, in order to establish measurement invariance a nested sequence of increasingly restrictive CFA models (invariance hypotheses) are tested. These levels of invariance are referred to as configural, metric, scalar and strict invariance. The sequence of nested invariance hypotheses are well established in the literature (Cheung & Rensvold, 2002; Reise, et al., 1993; Robert J. Vandenberg & 137 Charles E. Lance, 2000). They are based on establishing a baseline/configural model and additively testing hypotheses of metric, scalar, and strict invariance. MGCFA models of the GDS-15 did not support full measurement invariance of all parameters for men and women across countries. A summary of results from invariance testing is presented in Table 16 through Table 24. There was adequate fit by gender across countries for the baseline/configural models. Within the countries of Chile and Cuba a one-factor model fit the data best and full measurement invariance was obtained. These results provide support for the measurement equivalence of the GDS-15 by gender in the countries of Chile and Cuba, which means that group comparisons of the latent factor mean can be supported. Across the countries of Argentina, Mexico and Uruguay by gender, a two-factor model fit the data best; however, support for full measurement invariance by gender was not supported. Within each of the three countries, metric invariance held, indicating that that there are similar interpretations of the factors across men and women within the countries of Argentina, Mexico and Uruguay. For the countries of Argentina, Mexico and Uruguay, partial scalar invariance was obtained. In Argentina the threshold for item 9 “do you prefer to stay at home rather than going out and doing new things” was noninvariant, in Mexico the thresholds for items 6 “are you afraid that something bad is going to happen to you” and Item 8 “do you often feel helpless” were non-invariant and in Uruguay the threshold for item 15 “do you think that most people are better off than you are” was non-invariant. 138 Why are these particular items non-invariant? Substantive reasons for non-invariance are difficult to ascertain and speculative at best. In the MGCFA gender analysis items 9, 6 and 8, and 15 displayed non-invariance in Argentina, Mexico and Uruguay respectively. An examination of the item content suggests that these items speak to a sense of vulnerability with phrases and words like ―being afraid‖, ―helpless‖, ―prefer to stay at home‖ and ―people are better off than you‖. Cultures have their own gender roles/expectations, in cultures with rigid gender roles that do not allow or support men being ―vulnerable‖ or women being ―independent‖, questions with content that go against these cultural norms may be difficult for individuals to endorse. Work by Djernes, Zunzunegui and colleagues (Djernes, 2006; Zunzunegui, Alvarado, Beland, & Vissandjee, 2009) have found that older women in Latin America and the Caribbean had a 63% higher odds for depressive symptoms compared with older men and one of the main predictors of depressive disorders and depressive symptom cases is gender. In Latin America women generally have lower levels of education than men and they are not encouraged to be socially or economically independent, which leads to economic disadvantages in later life; the sociocultural context that creates financial stress in later life may also be a predictor of depression (Zunzunegui, et al., 2009). In addition, the content of these items could also be tapping into constructs that are unrelated to depression; for example with item 9 (―do you prefer to stay at home, rather than going out and doing things‖) could relate to living in an unsafe community which would influence whether a person felt safe enough to go out by themselves. Thus, 139 endorsement of this item might be a reflection of the community you live in and not your level of depression. Cross-country comparisons Country by country comparisons were based on having the same configural pattern of loadings. With that said, country by country comparisons were the following: Chile by Cuba (1-factor model general depressive affect), Mexico by Uruguay (2-factor model general depressive affect and life satisfaction), Mexico by Argentina (2-factor model general depressive affect and life satisfaction) and Uruguay by Argentina (2-factor model general depressive affect and life satisfaction). Full measurement invariance was not obtained in the Chile by Cuba comparison. Partial metric invariance was obtained for the cross-country comparison of Chile and Cuba. To obtain partial metric invariance the errors of item 5 “are you in good spirits most of the time” and item 7 “do you feel happy most of the time” needed to be correlated. This indicates that there is some additional shared multidimensionality between these two items. Partial scalar invariance was obtained by freeing the thresholds for items 1 “are you basically satisfied with your life”, item 5 “are you in good spirits most of the time” and item 7 “do you feel happy most of the time” and item 15 “do you think that most people are better off than you are”. The cross-country comparison of Mexico by Uruguay did not obtain full measurement invariance. To obtain partial metric invariance the errors of items 1 “are you basically satisfied with your life”, item 3 “do you feel that your life is empty”, item 10 “do feel 140 you have more problems with memory than most” and item 15 “do you think that most people are better off than you are” needed to be correlated; this indicates that there is additional multidimensionality amongst these items. Partial scalar invariance was obtained by relaxing the threshold constraints of items 2 “have you dropped many of your activities and interests”; item 4 “do you often get bored”, item 9 “do you prefer to stay at home rather than going out and doing new things”, item 10 “do you feel you have more problems with memory than most” and item 15 “do you think that most people are better off than you are”. The cross-country comparison of Mexico by Argentina obtained partial metric invariance by correlating the errors of item 14 “ do you feel that your situation is hopeless” and item 15 “do you think that most people are better off than you are”. Partial scalar invariance was obtained by relaxing the threshold constraints of items 5 “are you in good spirits most of the time”, item 6 “are you afraid that something bad is going to happen to you”, item 8 “do you often feel helpless”, item 10 “do you feel you have more problems with memory than most” and item 15 “do you think that most people are better off than you are”. Finally the cross-country comparison of Uruguay by Argentina did not obtain full measurement invariance. Partial metric invariance was obtained by correlating the errors of items 3 “do you feel that your life is empty‖ and 4 “do you often get bored”. Partial Scalar invariance was obtained by relaxing the threshold constraints of items 12 “do you feel pretty worthless the way you are now” and 15 “do you think that most people are better off than you are”. 141 Comparing the results of the gender MGCFA and the cross-group comparison MGCFA revealed that item 9 ―do you prefer to stay at home rather than going out‖ and item 15 ―do you think that most people are better off than you displayed non-invariance across both testing situations. It is difficult to interpret why these two items exhibit misfit within countries by gender and with cross-group comparisons, they don‘t share similar wording and the content of the items are getting at two different things (1) withdrawal and (2) self-esteem. Finally, the results of the cross-country invariance testing between Mexico, Argentina, Uruguay, Chile and Cuba reflect an important point made by Byrne and Watkins (2003), which is that “although the factorial structure of a measuring instrument may yield a similar pattern when tested within each of two or more groups, such findings represent no guarantee that the instrument will operate equivalently across these groups”(pg. 156, pg. 556) (Byrne & Campbell, 1999; Byrne & Watkins, 2003). This statement can be expanded to also include the idea that although an instrument such as the GDS-15 may be administered in the same language, in this case Spanish, there can be linguistic and cultural nuances that contribute to a lack of invariance. Non-invariance in the cross-country comparisons and gender comparisons is most likely due to translation errors. The use of the GDS-15 in the SABE study, involved no back-translation protocol, which means that the English version of the GDS-15 would have had to have been translated into Spanish and then back into English in order to assess whether the content of the back translation matched the original instrument 142 conceptually. These five countries share a common language but their cultures may be very different; based on history, economics and ethnic make-up, all of which can influence the interpretation and meaningfulness of constructs and items. 143 Major Findings Item Response Theory Likelihood Ratio Tests-DIF The second framework used in the assessment of measurement invariance was IRTLR-DIF. Item response theory likelihood ratio tests were used to assess the invariance of discrimination and difficulty parameters across gender and cross-country comparisons. In other words the primary research question to be answered in this DIF analysis is “how is item response related to level of depression and subgroup membership” (i.e. gender or country of origin). The GDS-15 was put through an item purification process with the goal of identifying a core group of DIF free items (anchor items). After the anchor items were identified, a 2PL model which accounted for items with DIF was estimated for groups of interest simultaneously. The first step in the analysis was to evaluate whether the assumption of unidimensionality had been met. For the GDS-15 exploratory factor analyses in each of the five countries found a dominant first factor which accounted for the majority of the covariance of item responses, which was satisfactory for meeting the assumption of essential unidimensionality. This indicated that the item content within the dominant factor reflected the scale content. The two-parameter logistic IRT model was used to model item responses for the GDS-15. The 2PL model allows for variation in both the discrimination and difficulty parameters among items on a scale. Items on a scale are evaluated by examining item characteristic, information and standard error of measurement curves. Examination of the item characteristic and information curves, allows a researcher to identify low and 144 high discriminating items as well as identify the range over the underlying trait in which an item is most effective. The test information curve and standard error of measurement curve provide a picture of the measurement precision of a scale across the latent trait continuum. The GDS-15 is a depression screening instrument designed to identify the presence of depression over the past week. As such the test information curves are expected to function in the upper end of the theta distribution. This pattern was observed across gender and cross-country comparisons in this study. An item is said to exhibit DIF if two individuals from distinct groups have the same standing on the underlying trait, but have different probabilities of endorsing an item or getting an item correct. For example a Cuban man and Cuban woman with the same level of depressive symptomatology (theta) should have the same chance of responding ―yes‖ to the item “I believe it is wonderful to be alive now”. If this is not the case, the item is said to be exhibiting DIF. What this means is that responses to an item with DIF are not equivalent across the groups under study, which can lead to potentially misleading group comparisons. An item purification procedure was employed in this study to identify DIF-free items or anchor items as well as items exhibiting DIF after the purification procedure and the BH-Adjustment. After item parameters with significant DIF have been identified, parameters for a final 2-group model that incorporates the identified DIF can be specified and estimated using MULTILOG. 145 Gender Depression items that showed DIF with respect to gender, even after the BHAdjustment indicate that the instrument is performing differently based on gender. In the gender DIF analysis items 5, 6, 8 and 14 were classified as having significant DIF most often after the BH-Adjustment. In the IRTLR-DIF analysis Items 6 and 8 were both more severe indicators (difficult to endorse) for men than women, so it would require a higher amount of depression for men to endorse either of these items (item 6 Men: a=1.54, b=.39, Women: a=1.54, b=.00), (item 8 Men a=3.16, b=.72, Women: a=3.16, b=.00). For item 5 men had a higher difficulty parameter than women (men a=1.76, b=1.07 and women a=2.81, b=1.02) indicating that it was more difficult or required slightly more depression for men to endorse this item, while women had a larger a parameter, indicating a stronger relationship with the item and the underlying construct relative to gender. For item 14, men also find it more difficult to endorse ―do you feel that your situation is hopeless” relative to women (men a=2.34, b=1.17, women a=2.90, b=1.14). Of note is that item 6 and item 8 were also classified as having gender noninvariance in the MGCFA analysis. Results of the gender DIF analysis indicate that just like the MGCFA analysis, items with content that could be interpreted as indicating vulnerability is also more difficult for men to endorse. Research on the relationship between gender and depression has found that even when men and women have similar levels of depression; women tend to report having more depressive symptoms (Angst, 1992). 146 Cross-country comparison The two cross-country comparisons with the largest number of items exhibiting DIF after the BH-Adjustment were Argentina by Mexico with five items (2,7,9,11,12) and Mexico by Cuba with five items (2,4,7,12,14). There was no meaningful pattern of DIF found across the ten cross-group comparisons, for example in Chile by Cuba items 4 and 8 were significant, with Mexico by Uruguay items 6, 7, 9 and 12 were significant and in Argentina by Uruguay items 2, 6, 7 and 11 were significant. What is evident is that DIF is context specific, in other words just because item 2 is significant for DIF in the Uruguay by Argentina comparison this does not mean that it will be significant in the Mexico by Uruguay comparison (which it was not). Overall the IRTLR-DIF procedure identified the same items as exhibiting DIF as those in the MGCFA analysis; however, they were not necessarily in the same crossgroup comparison as in the MGCGA procedure and after adjusting for multiple comparisons they were not always significant for DIF. In the cross-country comparison IRTLR-DIF analysis, items 2, 7 and 12 were classified as having significant DIF most often after the BH-Adjustment. With such a disparate pattern of items exhibiting DIF the only interpretable reason for item misfit might be attributable to instrument translation issues as well as a range of other possible cultural, historical, economic and ethnic differences in these countries. 147 Research and Clinical Implications For each country in the study, items were identified as exhibiting differential item functioning. There are several actions a researcher can take in order to deal with DIF items. The first step would be to qualitatively assess why the item is functioning differently between groups with a content expert. This could be in the form of backtranslating the DIF items, to determine whether people from different countries might be interpreting the item content differently, or are they responding to items based on cultural mores‘ or is the construct ill-defined, does it lack meaningfulness in certain groups. The second step would be to remove the problematic items, rewrite the items or score the item(s) differently based on group membership. For researchers in the instrument development stage, dropping DIF items may be the most efficient thing to do, but for an existing measure, dropping items may not be a good alternative. With respect to the GDS-15 and other health measures they tend to be short scales to begin with, so dropping items may reduce reliability and validity. For researchers who are strictly interested in group comparisons, dropping the items that exhibit DIF and using the DIFfree items to make group comparisons may also be a viable option. If DIF with respect to gender was found, then a researcher may consider accounting for that DIF by adjusting scores with respect to gender. The issue with this approach is that if we used a gender adjusted score that accounted for men not endorsing items that indicate vulnerability, the DIF effect is an average over all men. So that means that there will be some men who are more sensitive and aware of their feelings and others who are not. So the ―correction‖ for gender may be correct on average, but in adjusting 148 scores this way we may be running the risk of creating an illusion of a ―gender fair‖ depression instrument, while in reality we might have just replaced one form of bias with another form of bias. So where does this leave the clinician who wants to use this instrument with a population of adults in Latin America and the Caribbean? There is no clear guidance, but based on the results of the current study caution should be used in any cross-country or gender comparisons. Further research needs to be done with GDS-15 in Latin America and Caribbean before firm recommendations for its use can be made. 149 Limitations and Future Directions Several of the more important study limitations merit attention. Of central concern is that there may be alternative factor structures for the GDS-15 that have not been explored, but might fit the data equally well for other sub-groups within the data, such as age-cohort groupings. In addition, the sampling framework for the SABE study focused specifically on older adults who resided in urban centers. With that said, the results presented in the current study may only be generalizable to individuals from urban communities and not rural communities. Limitations with respect to the study design are that cross-sectional data such as the SABE does not allow us to measure and document changes in depression longitudinally which could help researchers better understand and illustrate the complexities of depression in later life. Limitations with respect to the factor structure(s) of the GDS-15 are that the GDS-15 has five items that must be reverse scored, because they are negatively worded items. The two-factor model found in the countries of Argentina, Mexico and Uruguay was defined as (1) general depressive affect and (2) life satisfaction. The life satisfaction factor was comprised of items 1, 5, 7, 11 and 13. All of these items are reverse scored (negatively worded) items that loaded on the second factor (life satisfaction). Research by (Chen, Rendina-Gobioff, & Dedrick, 2010; Roszkowski & Soven, 2010; Schriesheim, Eisenbach, & Hill, 1991) suggests that the inclusion of positive and negative items in the same scale can make constructs conceptualized as unidimensional appear multidimensional (e.g., positively and negatively worded items may form two separate factors). As such the second factor found in Argentina, Mexico and Uruguay may be a 150 result of the negative wording of the items on the GDS-15 scale. Further analyses, such as parallel analysis may be necessary to investigate how many factors should be retained. Parallel analysis is a method based on the generation of random variables to determine the number of factors to retain. Parallel analysis, compares the observed eigenvalues extracted from the correlation matrix to be analyzed with those obtained from uncorrelated normal variables (J. L. Horn, 1965). Finally, the sensitivity and specificity of the GDS-15 accounting for DIF and not accounting for DIF could not be evaluated with the current data. Sensitivity and Specificity analyses require a comparison to a gold standard criterion of depression. The SABE study only has one measure of depression, the GDS-15. Future research should involve the assessment of sensitivity and specificity based on DIF analyses, crosstabulating the GDS-15 scaled score against a gold standard depression scale score such as the CESD. Assessing the sensitivity and specificity of the GDS-15 with respect to a DIF analyses can inform health policy. It is important to find out how well the GDS-15 is at identifying individuals with and without depression, in order to appropriately treat depression in the elderly. This information would aid in avoiding under treatment or overtreatment of depressive conditions, which would ultimately save money and lives. Future directions for work with the GDS-15 would be to move away from the assumption that the items that comprise the existing instrument represent the ―best 15 items‖. Initial development of the GDS-15 did not involve rigorous psychometric work. Moving forward I suggest that the original 100 item GDS go through a factor analytical 151 and IRT analysis in order to select items that will be most effective along the depression continuum, in order to ultimately, build a better GDS-15. Although invariance analyses of the kind used in this study should be applied at the time of translation, when changes can be made to items in order to eliminate or minimize DIF, this study adds to the methodological literature by illustrating both SEM and IRT based procedures for examining measurement invariance with an existing instrument. Finally, these analyses reflect the first rigorous psychometric assessment of the GDS-15 in the countries of Argentina, Cuba, Uruguay, Chile and Mexico. 152 Table 9 One factor EFA Chile GDS-15 Items V7: Do you feel happy most of the time V8: Do you feel helpless V12: Do you feel pretty worthless the way you are now V14: Do you feel that your situation is hopeless V3: Do you feel that your life is empty V4: Do you often get bored V1: Are you basically satisfied with your life V5: Are you in good spirits most of the time V13: Do you feel full of energy V11: Do you think its wonderful to be alive now V10: Do you feel you have more problems with memory than most V6: Are you afraid that something bad is going to happen to you V2:Have you dropped many of your activities and interests V15: Do you think that most people are better off than you V9: Do you prefer to stay at home rather than going out and doing new things 153 Factor General Depressive Affect 0.85 0.84 0.83 0.82 0.81 0.80 0.79 0.79 0.75 0.74 0.61 0.59 0.59 0.49 0.32 Table 10 One factor EFA Cuba GDS-15 Items V4: Do you often get bored V7: Do you feel happy most of the time V3: Do you feel that your life is empty V5: Are you in good spirits most of the time V8: Do you feel helpless V1: Are you basically satisfied with your life V14: Do you feel that your situation is hopeless V12: Do you feel pretty worthless the way you are now V13: Do you feel full of energy V11: Do you think its wonderful to be alive now V2:Have you dropped many of your activities and interests V15: Do you think that most people are better off than you V6: Are you afraid that something bad is going to happen to you V10: Do you feel you have more problems with memory than most V9: Do you prefer to stay at home rather than going out and doing new things 154 Factor General Depressive Affect 0.87 0.87 0.84 0.82 0.82 0.82 0.80 0.79 0.76 0.73 0.66 0.64 0.63 0.54 0.46 Table 11 Two factor EFA Argentina Factors GDS-15 Items V11: Do you think its wonderful to be alive now V7: Do you feel happy most of the time V5: Are you in good spirits most of the time V13: Do you feel full of energy V1: Are you basically satisfied with your life V3: Do you feel that your life is empty V12: Do you feel pretty worthless the way you are now V2:Have you dropped many of your activities and interests V4: Do you often get bored V9: Do you prefer to stay at home rather than going out and doing new things V14: Do you feel that your situation is hopeless V8: Do you feel helpless V6: Are you afraid that something bad is going to happen to you V15: Do you think that most people are better off than you V10: Do you feel you have more problems with memory than most 155 Life Satisfaction 0.92 0.87 0.79 0.71 0.47 General Depressive Affect 0.80 0.80 0.76 0.75 0.70 0.65 0.64 0.60 0.54 0.48 Table 12 Two factor EFA Mexico Factors GDS Items V7: Do you feel happy most of the time V5: Are you in good spirits most of the time V1: Are you basically satisfied with your life V13: Do you feel full of energy V11: Do you think its wonderful to be alive now V14: Do you feel that your situation is hopeless V10: Do you feel you have more problems with memory than most V8: Do you feel helpless V9: Do you prefer to stay at home rather than going out and doing new things V12: Do you feel pretty worthless the way you are now V15: Do you think that most people are better off than you V3: Do you feel that your life is empty V6: Are you afraid that something bad is going to happen to you V4: Do you often get bored V2:Have you dropped many of your activities and interests 156 Life Satisfaction 0.95 0.83 0.66 0.64 0.54 General Depressive Affect 0.80 0.74 0.73 0.69 0.69 0.66 0.61 0.55 0.52 0.48 Table 13 Two factor EFA Uruguay Factors GDS Items V7: Do you feel happy most of the time V5: Are you in good spirits most of the time V1: Are you basically satisfied with your life V11: Do you think its wonderful to be alive now V13: Do you feel full of energy V12: Do you feel pretty worthless the way you are now V9: Do you prefer to stay at home rather than going out and doing new things V3: Do you feel that your life is empty V2:Have you dropped many of your activities and interests V4: Do you often get bored V14: Do you feel that your situation is hopeless V8: Do you feel helpless V10: Do you feel you have more problems with memory than most V15: Do you think that most people are better off than you V6: Are you afraid that something bad is going to happen to you 157 Life Satisfaction 0.92 0.85 0.63 0.62 0.45 General Depressive Affect 0.80 0.67 0.67 0.66 0.65 0.60 0.59 0.59 0.51 0.43 Table 14 One factor CFA by Country Country Argentina Chile Cuba Mexico Uruguay Chi-Square 377.89 269.943 367.103 518.617 349.851 df 52 70 69 64 66 CFI 0.89 0.96 0.96 0.93 0.94 TLI 0.93 0.98 0.98 0.96 0.97 158 RMSEA 0.08 0.04 0.05 0.06 0.05 Table 15 Two- factor CFA by country Country Argentina Mexico Uruguay Chi-Square 200.334 287.968 234.107 df 57 66 66 CFI 0.95 0.96 0.97 TLI 0.97 0.98 0.98 RMSEA 0.05 0.04 0.04 ***Two factor models were not estimated for the countries of Chile and Cuba because a one factor model was more parsimonious and provided better fit** 159 Table 16 Model fit for configural and nested models Chile by gender Model # Free Parms Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1a. Configural 60 287.156 119 <.0000 0.966 0.983 0.047 2a. Metric 46 176.826 93 <.0000 0.983 0.989 0.038 3a. Scalar 32 189.152 102 <.0000 0.982 0.99 0.037 4a. Residuals Free 47 267.718 123 <.0000 0.969 0.985 0.044 4b. Residuals Fixed 32 189.152 101 <.0000 0.982 0.99 0.037 160 Table 17 Invariance hypothesis tests Chile by gender Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric vs. Configural 4.640 12 0.9689 2a. Scalar vs. Metric 18.922 13 0.1256 3a. Residual fixed vs. Residual Free 6.670 13 0.9183 Model 161 Table 18 Model fit for configural and nested models Cuba by gender Model # Free Parm Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1a. Configural 60 348.551 113 <.0000 0.966 0.981 0.049 2a. Metric 46 222.800 88 <.0000 0.981 0.986 0.042 3a. Scalar 32 235.256 95 <.0000 0.980 0.987 0.041 4a. Residuals Free 47 304.784 109 <.0000 0.972 0.984 0.046 4b. Residuals Fixed 32 235.256 95 <.0000 0.980 0.987 0.041 162 Table 19 Invariance hypothesis tests Cuba by gender Chi-Square DIFFTESTValue df 1a. Metric vs. Configural 11.062 12 0.5236 2a. Scalar vs. Metric 21.955 13 0.0561 3a. Residual fixed vs. Residual Free 14.606 13 0.3326 Model Chi-Square p-value 163 Table 20 Model fit for configural and nested models Argentina by gender Model # Free Parm Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1a. Configural 62 217.883 92 <.0000 0.949 0.972 0.052 2a. Metric 49 179.440 85 <.0000 0.962 0.977 0.047 3a. Scalar1 36 192.255 91 <.0000 0.959 0.977 0.047 3b. Scalar2 (v9) 37 187.398 91 <.0000 0.961 0.978 0.046 4a. Residuals Free 52 207.339 93 <.0000 0.953 0.974 0.049 4b. Residuals Fixed 38 190.319 91 <.0000 0.960 0.977 0.047 164 Table 21 Invariance hypothesis tests Argentina by gender Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric vs. Configural 10.248 10 0.5245 2a. Scalar1 vs. Metric 25.700 12 0.0118 2b. Scalar2 vs. Metric 13.209 11 0.2799 3a. Residual fixed vs. Residual Free 15.478 12 0.2163 Model 165 Table 22 Model fit for configural and nested models Mexico by gender Model # Free Parm Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1a. Configural 62 295.673 108 <.0000 0.969 0.982 0.043 2a. Metric 49 203.898 85 <.0000 0.980 0.986 0.039 3a. Scalar1 36 224.634 92 <.0000 0.978 0.985 0.039 3b. Scalar2 (v8) 37 217.584 92 <.0000 0.979 0.986 0.038 3c. Scalar3 (v8 & v6) 38 212.943 91 <.0000 0.980 0.986 0.038 4a. Residuals Free 53 278.663 109 <.0000 0.972 0.984 0.041 4b. Residuals Fixed 40 219.026 92 <.0000 0.979 0.986 0.039 166 Table 23 Invariance hypothesis tests Mexico by gender Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric vs. Configural 13.427 11 0.2663 2a. Scalar1 vs. Metric 37.123 12 0.0002 2b. Scalar2 vs. Metric 19.978 11 0.0456 2c. Scalar3 vs. Metric 12.809 10 0.2346 3a. Residual fixed vs. Residual Free 15.106 10 0.1282 Model 167 Table 24 Model fit for configural and nested models Uruguay by gender Model # Free Parm Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1a. Configural 62 262.916 113 <.0000 0.968 0.983 0.043 2a. Metric 49 197.551 94 <.0000 0.978 0.986 0.039 3a. Scalar1 36 210.742 101 <.0000 0.976 0.986 0.039 3b. Scalar2 (v15) 37 206.335 100 <.0000 0.977 0.987 0.038 4a. Residuals Free 52 257.916 113 <.0000 0.969 0.984 0.042 4b. Residuals Fixed 38 216.483 103 <.0000 0.976 0.986 0.039 168 Table 25 Invariance hypothesis tests Uruguay by gender Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric vs. Configural 14.469 11 0.2081 2a. Scalar1 vs. Metric 22.296 12 0.0343 2b. Scalar2 vs. Metric 15.647 11 0.1547 3a. Residual fixed vs. Residual Free 14.503 12 0.2698 Model 169 Table 26 Model fit for configural and nested models Chile by Cuba Model 1. Configural 2a. Metric1 2b. Metric2 (Ecov V5&V7) 3a. Scalar1 3b. Scalar2 (V1 & V15) 3c. Scalar3 (V7 & V5) 3d. Scalar4 (V9) 4a. Residuals Free 4b. Residual Fixed1 4c. Residual Fixed2 (V2 & V4) # Free Parm 60 46 47 33 35 37 38 53 43 45 Chi-Square Chi-Square Chi-Square Value DF p-value 637.503 415.797 388.958 522.949 446.245 403.223 399.612 547.288 422.023 425.809 139 102 102 111 109 108 107 141 115 118 170 <.0000 <.0000 <.0000 <.0000 <.0000 <.0000 <.0000 <.0000 <.0000 <.0000 CFI TLI RMSEA Estimate 0.963 0.977 0.979 0.970 0.975 0.978 0.978 0.970 0.977 0.977 0.983 0.985 0.987 0.982 0.985 0.987 0.987 0.986 0.987 0.988 0.049 0.045 0.043 0.050 0.045 0.043 0.043 0.044 0.042 0.042 Table 27 Invariance hypothesis tests Chile by Cuba Model 1a. Metric1 vs. Configural 2a. Metric2 vs. Configural 2b. Scalar1 vs. Metric2 3a. Scalar2 vs. Metric2 3b. Scalar3 vs. Metric2 3c. Scalar4 vs. Metric2 4a. Residual Fixed1 vs. Residual Free 4b. Residual Fixed2 vs. Residual Free Chi-Square DIFFTESTValue 31.317 17.493 232.881 98.760 19.051 15.153 22.926 12.222 df 12 11 13 11 9 8 9 7 Chi-Square p-value 0.0018 0.0941 0.0000 0.0000 0.0248 0.0562 0.0064 0.0935 171 Table 28 Model fit for configural and nested models Mexico by Uruguay # Free Parm Chi-Square Value ChiSquare DF ChiSquare p-value CFI TLI RMSEA Estimate 1a. Configural 62 521.543 132 <.0000 0.966 0.983 0.044 2a. Metric1 49 383.299 106 <.0000 0.976 0.985 0.040 2b. Metric2 (Ecorr v1&v3) 50 365.427 106 <.0000 0.977 0.986 0.039 2c. Metric3 (Ecorr v1,v3, v10,v15) 51 353.656 105 <.0000 0.978 0.987 0.038 3a. Scalar1 38 414.987 114 <.0000 0.974 0.985 0.040 3b. Scalar2 (v9&v10) 40 385.680 112 <.0000 0.976 0.986 0.038 3c. Scalar3 (v2, v4,v9,v10) 42 367.836 111 <.0000 0.978 0.987 0.037 3d. Scalar4 (v2, v4,v9,v10,v15) 43 363.617 110 <.0000 0.978 0.987 0.037 4a. Residuals Free 58 485.083 134 <.0000 0.969 0.985 0.040 4b. Residuals Fixed1 (ResVar v2, v4, v9, v10) 48 413.295 121 <.0000 0.974 0.986 0.038 4c. Residuals Fixed2 (ResVar v2,v4,v9, v10, v3) 48 413.351 122 <.0000 0.975 0.986 0.038 Model 172 Table 29 Invariance hypothesis tests Mexico by Uruguay Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric1 vs. Configural 29.882 12 0.0029 1b. Metric2 vs. Configural 19.873 11 0.0471 1c. Metric3 vs. Configural 15.168 10 0.1261 2a. Scalar1 vs. Metric3 93.703 12 0.0000 2b. Scalar2 vs. Metric3 48.608 10 0.0000 2c. Scalar3 vs. Metric3 18.469 9 0.0301 2d. Scalar4 vs. Metric3 12.886 8 0.1158 3a. Residual fixed1 vs. Residual Free 16.453 8 0.0363 3b. Residual fixed2 vs. Residual Free 11.487 7 0.1187 Model 173 Table 30 Model fit for configural and nested models Mexico by Argentina Model # Free Parm Chi-Square Chi-Square Chi-Square Value DF p-value CFI TLI RMSEA Estimate 1. Configural 62 483.263 122 <.0000 0.961 0.979 0.046 2a. Metric1 49 353.048 100 <.0000 0.973 0.982 0.042 2b. Metric2 (Ecorr v14&v15) 50 341.644 100 <.0000 0.974 0.983 0.041 3a. Scalar1 37 420.268 108 <.0000 0.966 0.980 0.045 3b. Scalar2 (v10) 38 393.026 107 <.0000 0.969 0.981 0.043 3c. Scalar3 (v6, v10,v15) 40 361.707 106 <.0000 0.972 0.983 0.041 3d. Scalar4 (v5,v6,v8,v10,v15) 42 350.199 105 <.0000 0.974 0.984 0.040 4a. Residuals Free 57 462.993 124 <.0000 0.963 0.981 0.044 4b. Residuals Fixed1 42 350.199 105 <.0000 0.974 0.984 0.040 4c. Residuals Fixed2 (ResVarV13) 43 348.843 106 <.0000 0.974 0.984 0.040 174 Table 31 Invariance hypothesis tests Mexico by Argentina Chi-Square DIFFTESTValue df Chi-Square p-value 1a. Metric1 vs. Configural 22.264 11 0.0224 1b. Metric2 vs. Configural 17.300 11 0.0993 2a. Scalar1 vs. Metric2 134.600 12 0.0000 2b. Scalar2 vs. Metric2 88.740 11 0.0000 2c. Scalar3 vs. Metric2 33.213 10 0.0003 2d. Scalar4 vs. Metric2 9.510 8 0.3011 3a. Residual fixed1 vs. Residual Free 22.357 12 0.0337 3b. Residual fixed2 vs. Residual Free 16.785 11 0.1144 Model 175 Table 32 Model fit for configural and nested models Uruguay by Argentina # Free Parm Chi-Square Value ChiSquare DF ChiSquare p-value CFI TLI RMSEA Estimate 1a. Configural 62 434.286 123 <.0000 0.963 0.980 0.045 2a. Metric1 49 321.474 101 <.0000 0.974 0.983 0.042 2b. Metric2 (Ecorr v3&v4) 50 300.557 100 <.0000 0.976 0.984 0.040 3a. Scalar1 37 335.222 108 <.0000 0.973 0.983 0.041 3b. Scalar2 (v15) 38 322.393 108 <.0000 0.975 0.984 0.040 3c. Scalar3 (v12,v15) 39 314.153 107 <.0000 0.975 0.985 0.040 4a. Residuals Free 54 396.452 122 <.0000 0.967 0.982 0.043 4b. Residuals Fixed1 41 334.306 110 <.0000 0.973 0.984 0.041 4c. Residuals Fixed2 (ResVar v9,v12,v15) 42 334.422 112 <.0000 0.974 0.984 0.040 Model 176 Table 33 Invariance hypothesis tests Uruguay by Argentina Chi-Square DIFFTESTValue df ChiSquare p-value 1a. Metric1 vs. Configural 50.169 12 0.0000 1b. Metric2 vs. Configural 11.502 10 0.3198 2a. Scalar1 vs. Metric2 55.601 12 0.0000 2b. Scalar2 vs. Metric2 28.958 11 0.0023 2c. Scalar3 vs. Metric2 16.380 10 0.0893 3a. Residual fixed1 vs. Residual Free 20.943 11 0.0340 3b. Residual fixed2 vs. Residual Free 14.925 11 0.1859 Model 177 Table 34 Item parameters and standard errors for anchor items Chile by Gender Item 1 2 3 4 5 6 7 8 9 10 11 12 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Are you afraid that something bad is going to happent to you Do you feel happy most of the time Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel worthless the way you are now Do you feel that your situation is hopeless Do you think that most people are better off than you are 178 a 2.22 (.23) 1.18 (.12) 2.41 (.20) 2.30 (.19) 2.18 (.20) 1.26 (.12) 2.59 (.23) 2.63 (.24) .54 (.08) 1.29 (.14) 1.86 (.19) 2.44 (.22) 2.31 (.21) .83 (.11) b 1.15 (.08) .59 (.09) .34 (.05) .33 (.05) .97 (.07) .56 (.08) .81 (.06) .84 (.06) -1.13 (.21) 1.19 (.13) 1.56 (.13) .96 (.06) .83 (.07) -.05 (.11) Table 35 Item parameters and standard errors for items exhibiting DIF Chile by Gender Item 13 Content Do you feel full of energy Group a Men 1.68 (.26) Women 2.00 (.23) Item 13 did not exhibit DIF after using the B-H multiple comparisons adjustment 179 b .74 (.15) .79 (.08) Tests for DIF: χ2 (P) a Dif b Dif 3.1 (.078) 1.5 (.220) Table 36 Item parameters and standard errors for anchor items Cuba by Gender Item 1 2 3 4 5 6 8 9 10 11 13 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Are you afraid that something bad is going to happent to you Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 180 a 2.68 (.20) 1.62 (.12) 2.91 (.21) 3.32 (.22) 2.54 (.19) 1.56 (.12) 2.88 (.23) .92 (.08) 1.18 (.12) 2.23 (.22) 2.11 (.16) 2.60 (.21) 1.43 (.11) b .30 (.05) .29 (.07) -.01(.04) -.02 (.03) .33 (.05) .29 (.07) .50 (.05) -.85 (.08) 1.06 (.16) 1.26 (.12) .51 (.07) .59 (.06) .21 (.08) Table 37 Item parameters and standard errors for items exhibiting DIF Cuba by Gender Item Do you feel worthless the way you are now a b Men 3.10 (.20) .13 (.07) 3.10 (.20) .00 (.00) Men 2.46 (.17) .46 (.09) Women 12 Do you feel happy most of the time Group Women 7 Content 2.46 (.17) .00 (.00) Tests for DIF: χ2 (P) a Dif b Dif Items 7 and 12 did not exhibit DIF After using the B-H multiple comparisons adjustment 181 2.3 (.129) 4.1 (.042) 2.8 (.094) 3.5 (.061) Table 38 Item parameters and standard errors for anchor items Argentina by Gender Item a Content b 1 Are you basically satisfied with your life 2.94 (.31) .44 (.06) 2 Have you dropped many of your activities and interests 1.67 (.18) .49 (.10) 3 Do you feel that your life is empty 2.59 (.28) .43 (.07) 4 Do you often get bored 1.97 (.22) .49 (.09) 5 Are you in good spirits most of the time 2.19 (.25) .71 (.09) 6 Are you afraid that something bad is going to happent to you 1.13 (.16) .97 (.19) 7 Do you feel happy most of the time 2.82 (.30) .44 (.07) 8 Do you often feel helpless 2.79 (.35) .80 (.09) 10 Do you feel that you have more problems with memory than most 1.03 (.21) 2.00 (.43) 13 Do you feel full of energy 1.47 (.17) .54 (.11) 14 Do you feel that your situation is hopeless 2.18 (.27) .84 (.12) 182 Table 39 Item parameters and standard errors for items exhibiting DIF Argentina by Gender Tests for DIF: χ2 (P) Group a b a Dif b Dif Men .87 (.10) .02 (.19) 0.7 (.402) 3.6 (.057) Women .87 (.10) .00 (.00) Men 2.23 (.60) .97 (.27) 6.0 (.014) 0.0 (1.000) Women 1.82 (.34) 1.17 (.20) Men 2.22 (.20) .72 (.12) 1.0 (.317) 4.0 (.045) Women 2.22 (.20) .00 (.00) Men 1.00 (.13) Women 1.00 (.13) No items exhibited DIF after using the B-H multiple comparisons adjustment -.26 (.18) .00 (.00) 0.0 (1.000) 6.1 (.013) Item 9 11 12 15 Content Do you prefer to stay home rather than going out and doing things Do you think its wonderful to be alive now Do you feel worthless the way you are now Do you think that most people are better off than you are 183 Table 40 Item parameters and standard errors for anchor items Mexico by Gender Item 1 2 3 4 5 7 9 10 11 12 13 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Do you feel happy most of the time Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 184 a 2.14 (.19) 1.48 (.12) 2.76 (.19) 2.40 (.17) 1.77 (.15) 2.13 (.18) 1.22 (.10) 1.26 (.11) 1.68 (.19) 2.21 (.18) 1.63 (.13) 2.38 (.18) 1.31 (.11) b 1.00 (.08) .50 (.07) .29 (.04) .36 (.05) .99 (.09) .85 (.07) -.31 (.06) .79 (.10) 1.78 (.18) .78 (.07) .83 (.09) .62 (.06) .30 (.07) Table 41 Item parameters and standard errors for items exhibiting DIF Mexico by Gender Item 6 Content Are you afraid that something bad is going to happent to you Group a 1.54 (.10) 8 Do you often feel helpless Women Items bolded were significant after BH-Adjustment 185 .72 (..08) 3.16 (.20) Men .00 (.00) 3.16 (.20) Women .39 (.10) 1.54 (.10) Men b .00 (.00) Tests for DIF: χ2 (P) a Dif b Dif 2.8 (.094) 7.2 (.007) 0.6 (.438) 24.3 (.000) Table 42 Item parameters and standard errors for anchor items Uruguay by Gender Item 1 2 3 4 7 8 9 10 11 13 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Do you often get bored Do you feel happy most of the time Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel full of energy Do you think that most people are better off than you are 186 a 2.12 (.19) 1.56 (.14) 2.04 (.17) 1.85 (.16) 3.33 (.30) 2.86 (.26) .89 (.09) 1.08 (.15) 2.42 (.27) 1.67 (.15) .96 (.11) b 1.05 (.07) .72 (.07) .90 (.07) .74 (.07) .79 (.05) .99 (.06) -.16 (.09) 1.93 (.24) 1.42 (.10) .92 (.08) .80 (.13) Table 43 Item parameters and standard errors for items exhibiting DIF Uruguay by Gender Item Tests for DIF: χ2 (P) a Dif b Dif Content Group a b 5 Are you in good spirits most of the time Men Women 1.76 (.30) 2.81 (.31) 1.07 (.17) 1.02 (.07) 5.9 (.015) 3.3 (.069) 6 Are you afraid that something bad is going to happent to you Men Women 1.16 (.22) 1.66 (.17) 1.27 (.27) .81 (.09) 2.3 (.129) 1.8 (.179) 12 Do you feel worthless the way you are now Men Women 2.93 (.59) 2.21 (.30) 1.25 (.14) 1.44 (.12) 4.0 (.045) 0.1 (.751) 14 Do you feel that your situation is hopeless Men Women 2.34 (.43) 2.90 (.35) 1.17 (.16) 1.14 (.07) 11.4 (.000) 1.1 (.294) Items 6 and 12 did not exhibit DIF After using the B-H multiple comparisons adjustment Items bolded were significant after BH-Adjustment 187 Table 44 Summary of DIF analyses of the GDS-15 gender anchor items Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happen to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are **√=Anchor Items by Country** 188 Argentina √ √ √ √ √ √ √ √ √ √ Chile √ √ √ √ √ √ √ √ √ √ √ √ √ √ Cuba √ √ √ √ √ √ Mexico √ √ √ √ √ Uruguay √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Table 45 Summary of DIF analyses of the GDS-15: gender type of DIF Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happen to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are **NU=Non-Uniform DIF, U=Uniform DIF** 189 Argentina Type of DIF, if Present Chile Type of DIF, if Present Cuba Type of DIF, if Present Mexico Type of DIF, if Present Uruguay Type of DIF, if Present U NU NU U U U NU U U NU NU NU U Table 46 Summary of DIF analyses of the GDS-15 gender BH-Adjustment Argentina DIF After BH Adjustment Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happent to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are Chile DIF After BH Adjustment Cuba DIF After BH Adjustment Mexico DIF Uruguay DIF After BH After BH Adjustment Adjustment yes yes no no yes no no no no no no yes no ** Yes=significant after Benjimini-Hochberg Adjustment, No=non-significant after adjustment** 190 Table 47 Item parameters and standard errors for anchor items Chile by Cuba Item 1 2 3 5 6 7 9 10 11 12 13 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Are you in good spirits most of the time Are you afraid that something bad is going to happen to you Do you feel happy most of the time Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 191 a 2.43 (.15) 1.29 (.13) 2.89 (.15) 2.45 (.15) 1.38 (.13) 2.88 (.26) .83 (.06) 1.34 (.10) 2.06 (.23) 2.56 (.16) 2.15 (.13) 2.61 (.16) 1.28 (.08) b .62 (.04) .34 (.09) .10 (.03) .55 (.04) .32 (.08) .53 (.05) -.82 (.07) .99 (.09) 1.22 (.11) .69 (.05) .55 (.05) .62 (.04 .11 (.06) Table 48 Item parameters and standard errors for items exhibiting DIF for Chile by Cuba Item a b Chile 3.19 (.16) .08 (.03) 3.19 (.16) .00 (.00) Chile 2.87 (.14) .56 (.06) Cuba 8 Group Cuba 4 Content 2.87 (.14) .00 (.00) Tests for DIF: χ2 (P) a Dif b Dif Do you often get bored Do you often feel helpless **Items bolded were significant after BH-Adjustment** 192 -0.0 (1.000) 23 (.0000) 0.3 (.584) 16.4 (.0001) Table 49 Item parameters and standard errors for anchor items Mexico by Uruguay Item 1 2 4 5 8 10 11 13 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you often get bored Are you in good spirits most of the time Do you often feel helpless Do you feel that you have more problems with memory than most Do you think it is wonderful to be alive now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 193 a 2.41 (.13) 1.69 (.09) 2.40 (.12) 2.29 (.15) 3.21 (.19) 1.27 (.09) 2.25 (.16) 1.88 (.10) 2.71 (.17) 1.27 (.08) b .55 (.05) .18 (.05) .11 (.03) .51 (.05) .25 (.03) .78 (.09) 1.03 (.08) .40 (.05) .41 (.04) .10 (.06) Table 50 Item parameters and standard errors for items exhibiting DIF for Mexico by Uruguay Tests for DIF: χ2 (P) Item Content Group a b a Dif b Dif 3 Do you feel that your life is empty Uruguay Mexico 2.20 (.17) 3.26 (.23) .30 (.06) .03 (.03) 5.4(.020) 1.0 (.317) 6 Are you afraid that something bad is going to happen to you Uruguay Mexico 1.66 (.08) 1.66 (.08) .29 (.06) .00 (.00) 1.1 (.294) 36.6 (.000) 7 Do you feel happy most of the time Uruguay Mexico 3.39 (.17) 3.39 (.17) .21 (.04) .00 (.00) 2.9 (.089) 11.1 (.001) 9 Do you prefer to stay at home, rather than going out and doing things Uruguay Mexico .85 (.06) .85 (.06) -.70 (.09) .00 (.00) 2.9 (.089) 7.6 (.006) 12 Do you feel worthless the way you are now 3.11 (.16) 3.11 (.16) .66 (.05) .00 (.00) 1.4 (.237) 42.1 (.000) Uruguay Mexico **Item 3 did not exhibit DIF using the B-H multiple comparisons procedure** **Items bolded were significant after BH-Adjustment** 194 Table 51 Item parameters and standard errors for anchor items Argentina by Mexico Item a Content b 1 Are you basically satisfied with your life 2.30 (0.16) 0.70 (0.05) 3 Do you feel that your life is empty 2.86 (0.17) 0.23 (0.03) 4 Do you often get bored 2.37 (0.14) 0.28 (0.04) 5 Are you in good spirits most of the time 1.91 (0.14) 0.79 (0.07) 6 Are you afraid that something bad is going to happent to you 1.37 (0.09) 0.37 (0.06) 8 Do you often feel helpless 3.01 (0.19) 0.44 (0.04) 10 Do you feel that you have more problems with memory than most 1.25 (0.10) 0.93 (0.10) 13 Do you feel full of energy 1.55 (0.10) 0.64 (0.07) 14 Do you feel that your situation is hopeless 2.41 (0.16) 0.57 (0.05) 15 Do you think that most people are better off than you are 1.16 (0.08) 0.16 (0.07) 195 Table 52 Item parameters and standard errors for items exhibiting DIF for Argentina by Mexico Tests for DIF: χ2 (P) a Dif b Dif 35.4 (.000) 1.3 (.254) Item 2 Content Do you feel that your life is empty Group Argentina Mexico a 1.88 (0.10) 1.88 (0.10) b 0.29 (0.07) .00 (.00) 7 Do you feel happy most of the time Argentina Mexico 2.60 (0.29) 2.27 (0.19) 0.33 (0.07) 0.72 (0.07) 5.1 (.024) 8.6 (.003) 9 Do you prefer to stay at home, rather than going out and doing things Argentina Mexico 0.98 (0.07) 0.98 (0.07) -0.46 (0.10) 0.00 (0.00) 0.1 (.752) 7.5 (.006) Argentina Mexico 1.83 (0.25) 1.78 (0.20) 1.01 (0.17) 1.62 (0.17) 6.9 (.009) 0.4 (.527) Argentina Mexico **Items bolded were significant after BH-Adjustment** 2.54 (0.13) 2.54 (0.13) 0.69 (0.07) .00 (.00) 1.4 (.237) 26.9 (.000) 11 Do you think it is wonderful to be alive now 12 Do you feel worthless the way you are now 196 Table 53 Item parameters and standard errors for anchor items Argentina by Uruguay Item 1 3 4 5 8 9 10 12 13 14 15 Content Are you basically satisfied with your life Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 197 a 2.35 (.16) 2.20 (.15) 1.90 (.13) 2.25 (.16) 2.81 (.21) .98 (.08) 1.06 (.12) 2.32 (.20) 1.57 (.11) 2.43 (.18) .91 (.08) b .83 (.05) .73 (.05) .65 (.05) .92 (.05) .92 (.05) -.20 (.06) 1.94 (.21) 1.21 (.07) .78 (.07) 1.04 (.06) .55 (.10) Table 54 Item parameters and standard errors for Items exhibiting DIF for Argentina by Uruguay Item 2 Content Do you feel that your life is empty 6 Are you afraid that something bad is going to happent to you 7 Do you feel happy most of the time 11 Do you think it is wonderful to be alive now Tests for DIF: χ2 (P) a Dif b Dif 8.1 (.004) 2.5 (.113) Group Argentina Uruguay a 1.67 (.10) 1.67 (.10) b .70 (.07) .00 (.00) Argentina Uruguay 1.07 (.15) 1.54 (.14) 1.23 (.20) .77 (.08) 9.4 (.002) 2.8 (.094) Argentina Uruguay 2.66 (.29) 3.37 (.30) .68 (.07) .66 (.05) 7.2 (.007) 0.1 (.751) Argentina Uruguay 1.86 (.28) 2.45 (.27) 1.36 (.16) 1.28 (.10) 10.7 (.001) 0.1 (.751) **Items bolded were significant after BH-Adjustment** 198 Table 55 Item parameters and standard errors for anchor items Argentina by Chile Item 1 3 4 5 6 8 9 10 11 12 13 14 15 Content Are you basically satisfied with your life Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Are you afraid that something bad is going to happen to you Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think that it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 199 a 2.06 (.14) 2.49 (.16) 2.17 (.14) 2.01 (.13) 1.23 (.09) 2.66 (.19) .83 (.07) 1.25 (.11) 1.70 (.15) 2.35 (.17) 1.53 (.10) 2.22 (.16) .86 (.08) b .90 (.06) .32 (.04) .33 (.04) .87 (.06) .63 (.08) .79 (.05) -.65 (.08) 1.31 (.12) 1.45 (.11) .89 (.06) .68 (.07) .79 (.06) -.04 (.09) Table 56 Item parameters and standard errors for items exhibiting DIF for Argentina by Chile Item 2 Content Do you feel that your life is empty 7 Group Argentina Chile a 1.39 (.09) 1.39 (.09) b .56 (.09) .00 (.00) Do you feel happy most of the time Argentina 2.46 (.25) Chile 2.60 (.23) **Item 7 did not exhibit DIF using the B-H multiple comparisons procedure** **Items bolded were significant after BH-Adjustment** .46 (.08) .74 (.06) 200 Tests for DIF: χ2 (P) a Dif b Dif 60.6 (.000) 0.1 (.751) 5.2 (.023) 0.0 (1.000) Table 57 Item parameters and standard errors for anchor items Argentina by Cuba Item 1 3 4 6 8 9 10 11 12 13 14 15 Content Are you basically satisfied with your life Do you feel that your life is empty Do you often get bored Are you afraid that something bad is going to happen to you Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you think that it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 201 a 2.64 (.17) 2.76 (.16) 2.72 (.16) 1.39 (.10) 2.80 (.20) .99 (.07) 1.14 (.11) 1.89 (.17) 2.30 (.17) 1.68 (.12) 2.35 (.17) 1.17 (.09) b .68 (.04) .45 (.03) .44 (.03) .80 (.07) .93 (.04) -.35 (.06) 1.64 (.15) 1.62 (.11) 1.04 (.06) .88 (.07) 1.01 (.06) .52 (.07) Table 58 Item parameters and standard errors for items exhibiting DIF for Argentina by Cuba Item 7 Are you in good spirits most of the time Do you feel happy most of the time a b Argentina 1.66 (.09) .65 (.08) 1.66 (.09) .00 (.00) Argentina 2.09 (.24) .89 (.10) Cuba 5 Group Cuba 2 Content Have you dropped many of your activities and interests 2.47 (.19) .75 (.05) Argentina Cuba 2.63 (.29) 2.90 (.22) .62 (.07) .49 (.04) Tests for DIF: χ2 (P) a Dif b Dif **Items bolded were significant after BH-Adjustment** 202 1.1 (.294) 10.6 (.001) 5.5 (.019) 1.5 (.220) 13 (.000) 0.0 (1.000) Table 59 Item parameters and standard errors for anchor items Mexico by Chile Item 1 2 3 4 5 6 9 10 13 14 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Do you often get bored Are you in good spirits most of the time Are you afraid that something bad is going to happen to you Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you feel full of energy Do you feel that your situation is hopeless 203 a 2.46 (.16) 1.51 (.10) 2.98 (.16) 2.71 (.15) 2.22 (.14) 1.48 (.09) 1.01 (.07) 1.35 (.10) 2.01 (.12) 2.60 (.15) b .67 (.05) .22 (.05) .01 (.03) .04 (.03) .59 (.05) .11 (.05) -.69 (.05) .63 (.07) .42 (.05) .37 (.04) Table 60 Item parameters and standard errors for items exhibiting DIF for Mexico by Chile Tests for DIF: χ2 (P) a Dif b Dif 10.8 (001) 0.4 (527) Item 7 Content Do you feel happy most of the time Group Chile Mexico a 3.15 (.16) 3.15 (.16) b .40 (.04) .00 (.00) 8 Do you often feel helpless Chile Mexico 2.97 (.27) 3.46 (.26) .44 (.05) .14 (.04) 4.8 (.028) 3.3 (.069) 11 Do you think that it is wonderful to be alive now Chile Mexico 2.08 (.23) 1.94 (.22) 1.10 (.11) 1.31 (.15) 4.4 (.035) 0.4 (.527) 12 Do you feel worthless the way you are now Chile Mexico 3.09 (.15) 3.09 (.15) .51 (.04) .00 (.00) 3.4 (.065) 4.9 (.026) 15 Do you think that most people are better off than you are Chile 1.30 (.08) Mexico 1.30 (.08) **Items 11 and 15 did not exhibit DIF using the B-H multiple comparisons procedure** **Items bolded were significant after BH-Adjustment** -.36 (.07) .00 (.00) 0.2 (.654) 4.9 (.026) 204 Table 61 Item parameters and standard errors for anchor items Mexico by Cuba Item 1 3 5 6 8 9 10 13 15 Content Are you basically satisfied with your life Do you feel that your life is empty Are you in good spirits most of the time Are you afraid that something bad is going to happen to you Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more problems with memory than most Do you feel full of energy Do you think that most people are better off than you are 205 a 2.64 (.16) 3.14 (.16) 2.36 (.13) 1.60 (.09) 3.21 (.17) 1.16 (.07) 1.35 (.09) 2.09 (.12) 1.53 (.09) b .39 (.04) -.04 (.02) .39 (.04) .10 (.04) .24 (.03) -.68 (.04) .64 (.08) .40 (.05) .04 (.05) Table 62 Item parameters and standard errors for items exhibiting DIF for Mexico by Cuba Item Tests for DIF: χ2 (P) a Dif b Dif 2 Content Have you dropped many of your activities and interests Group a b Mexico Cuba 1.91 (.08) 1.91 (.08) .10 (.04) .00 (.00) 3.3 (.069) 12.9 (000) 4 Do you often get bored Mexico Cuba 2.98 (.14) 2.98 (.14) .03 (.03) .00 (.00) 1.1 (.294) 6.4 (.011) 7 Do you feel happy most of the time Mexico Cuba 2.95 (.13) 2.95 (.13) .39 (.04) .00 (.00) 0.0 (1.000) 12 (.000) 11 Do you think that it is wonderful to be alive now Mexico Cuba 1.96 (.21) 2.45 (.25) 1.25 (.15) 1.07 (.11) 3.6 (.057) 0.4 (.527) 12 Do you feel worthless the way you are now Mexico Cuba 2.99 (.14) 2.99 (.14) .35 (.04) .00 (.00) 1.6 (.205) 27.9 (.000) 14 Do you feel that your situation is hopeless Mexico Cuba 3.15 (.15) 3.15 (.15) .23 (.03) .00 (.00) 0.0 (1.000) 22.8 (.000) **Item 11 did not exhibit DIF using the B-H multiple comparisons procedure** **Items bolded were significant after BH-Adjustment** 206 Table 63 Item parameters and standard errors for anchor items Uruguay by Chile Item 1 4 5 6 7 8 9 11 12 13 15 Content Are you basically satisfied with your life Do you often get bored Are you in good spirits most of the time Are you afraid that something bad is going to happen to you Do you feel happy most of the time Do you often feel helpless Do you prefer to stay at home, rather than going out and doing things Do you think that it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you think that most people are better off than you are 207 a 2.22 (0.15) 2.26 (0.13) 2.36 (0.15) 1.51 (0.10) 3.05 (0.19) 2.94 (0.19) 0.81 (0.06) 2.21 (0.18) 2.61 (0.18) 1.89 (0.12) 1.02 (0.08) b 0.77 (0.05) 0.21 (0.04) 0.66 (0.05) 0.41 (0.05) 0.47 (0.04) 0.57 (0.04) -0.73 (0.07) 1.13 (0.08) 0.80 (0.05) 0.51 (0.05) 0.09 (0.07) Table 64 Item parameters and standard errors for items exhibiting DIF for Uruguay by Chile Item Content Have you dropped many of your activities and interests Group a b Chile 1.27 (0.13) 0.36 (0.09) Uruguay 1.71 (0.15) 0.30 (0.07) Chile Uruguay 2.60 (0.22) 2.24 (0.19) Chile Tests for DIF: χ2 (P) a Dif b Dif 3 Do you feel that your life is empty 10 Do you feel that you have more problems with memory than most 14 Do you feel that your situation is hopeless 4.6 (.031) 41.3 (.000) 0.13 (0.04) 0.46 (0.06) 2.9 (.089) 2.3 (.129) 1.39 (0.15) 0.92 (0.12) 4.6 (.032) 18.5 (.000) Uruguay 2 1.15 (0.16) 1.45 (0.23) Chile Uruguay 2.61 (0.14) 2.61 (0.14) 0.57 (0.05) 0.00 (0.00) 1.3 (.254) 26.7 (.000) **Item 3 did not exhibit DIF using the B-H multiple comparisons procedure** **Items bolded were significant after BH-Adjustment** 208 Table 65 Item parameters and standard errors for anchor items Uruguay by Cuba Item 1 2 3 5 7 9 10 11 12 13 14 15 Content Are you basically satisfied with your life Have you dropped many of your activities and interests Do you feel that your life is empty Are you in good spirits most of the time Do you feel happy most of the time Do you prefer to stay at home, rather than going out and doing things Do you feel that you have more memory problems than most Do you think that it is wonderful to be alive now Do you feel worthless the way you are now Do you feel full of energy Do you feel that your situation is hopeless Do you think that most people are better off than you are 209 a 2.40 (0.14) 1.58 (0.10) 2.47 (0.13) 2.44 (0.15) 3.08 (0.18) 0.92 (0.06) 1.15 (0.10) 2.21 (0.18) 2.36 (0.16) 1.82 (0.11) 2.61 (0.17) 1.21 (0.08) b 0.73 (0.04) 0.58 (0.05) 0.47 (0.03) 0.73 (0.04) 0.50 (0.03) -0.44 (0.06) 1.52 (0.13) 1.45 (0.08) 1.06 (0.06) 0.81 (0.06) 0.94 (0.05) 0.56 (0.07) Table 66 Item parameters and standard errors for items exhibiting DIF for Uruguay by Cuba Item 4 Content Do you often get bored 6 Are you afraid that something bad is going to happen to you 8 Do you often feel helpless Group Uruguay Cuba Uruguay Cuba Uruguay Cuba **Items bolded were significant after BH-Adjustment** 210 a 2.51 (0.12) 2.51 (0.12) b 0.39 (0.04) 0.00 (0.00) 1.54 (0.14) 0.65 (0.08) 1.52 (0.13) 0.67 (0.07) 2.96 (0.27) 2.77 (0.25) 0.72 (0.05) 0.90 (0.05) Tests for DIF: χ2 (P) a Dif b Dif 12.7 (.000) 0.0 (1.000) 4.6 (.031) 8.9 (.002) 20.4 (.000) 0.1 (.752) Table 67 Summary of DIF analyses of the GDS-15: country by country anchor items Mexico/ Argentina/ Argentina/ Chile/Cuba Uruguay Mexico Uruguay Argentina/Chile Anchor Items Anchor Items Anchor Items Anchor Items Anchor Items √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happen to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are ** √ =Anchor Items by Country** 211 Table 68 Summary of DIF analyses of the GDS-15: country by country anchor items Argentina/ Mexico/ Mexico/ Uruguay / Uruguay/ Cuba Chile Cuba Chile Cuba Anchor Items Anchor Items Anchor Items Anchor Items Anchor Items √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happent to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are ** √ =Anchor Items by Country** 212 Table 69 Summary of DIF analyses of the GDS-15: country by country type of DIF Mexico/ Argentina/Me Argentina/ Argentina/ Chile/Cuba Uruguay xico Uruguay Chile Type of DIF, Type of DIF, Type of DIF, Type of DIF, Type of DIF, if Present if Present if Present if Present if Present Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happen to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are **NU=Non-Uniform DIF, U=Uniform DIF** U U U U U NU NU NU NU U U U NU U NU U U 213 NU Table 70 Summary of DIF analyses of the GDS-15: country by country type of DIF Argentina/Cu Mexico/ Mexico/ Uruguay / Uruguay/ ba Chile Cuba Chile Cuba Type of DIF, Type of DIF, Type of DIF, Type of DIF, Type of DIF, if Present if Present if Present if Present if Present Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happent to you v7. Do you feel happy most of the time v8. Do you often feel helpless v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are **NU=Non-Uniform DIF, U=Uniform DIF** U U NU NU U U NU NU NU U NU U NU U NU U NU NU U U 214 U Table 71 Summary of DIF analyses of the GDS-15: country by country BH-Adjustment Argentina/ Chile/Cuba Mexico/ Mexico DIF After Uruguay DIF DIF After BH After BH BH Adjustment Adjustment Adjustment Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests v3. Do you feel that your life is empty no v4. Do you often get bored yes v5. Are you in good spirits most of the time v6. Are you afraid that something bad is going to happen to you yes v7. Do you feel happy most of the time yes v8. Do you often feel helpless yes v9. Do you prefer to stay at home, rather than going out and doing things yes v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now v12. Do you feel worthless the way you are now yes v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are ** Yes=significant after Benjimini-Hochberg Adjustment, No=non-significant after adjustment** 215 Argentina/ Uruguay DIF After BH Adjustment Argentina/ Chile DIF After BH Adjustment yes yes no yes yes yes no yes yes yes yes Table 72 Summary of DIF analyses of the GDS-15: country by country BH-Adjustment Argentina/Cu Mexico/ ba Chile DIF After DIF After BH BH Adjustment Adjustment Item Content v1. Are you basically satisfied with your life v2. Have you dropped many of your activities and interests yes v3. Do you feel that your life is empty v4. Do you often get bored v5. Are you in good spirits most of the time yes v6. Are you afraid that something bad is going to happent to you v7. Do you feel happy most of the time yes yes v8. Do you often feel helpless yes v9. Do you prefer to stay at home, rather than going out and doing things v10. Do you feel that you have more problems with memory than most v11. Do you think it is wonderful to be alive now no v12. Do you feel worthless the way you are now yes v13. Do you feel full of energy v14. Do you feel that your situation is hopeless v15. Do you think that most people are better off than you are no ** Yes=significant after Benjimini-Hochberg Adjustment, No=non-significant after adjustment** 216 Mexico Cuba DIF After BH Adjustment Uruguay/ Chile DIF After BH Adjustment yes yes no yes Uruguay/ Cuba DIF After BH Adjustment yes yes yes yes yes no yes yes yes Table 73 Descriptive statistics and correlations for the GDS-15 Argentina V1 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 M SD V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 0.60 0.68 0.47 0.65 0.45 0.76 0.64 0.40 0.26 0.64 0.61 0.56 0.67 0.41 0.16 0.37 0.54 0.51 0.36 0.50 0.46 0.47 0.55 0.32 0.16 0.64 0.32 0.55 0.28 0.21 0.41 0.78 0.50 0.46 0.63 0.75 0.45 0.33 0.39 0.65 0.28 0.61 0.34 0.18 0.38 0.46 0.37 0.50 0.61 0.48 0.37 0.39 0.60 0.35 0.48 0.40 0.19 0.40 0.24 0.83 0.54 0.27 0.28 0.73 0.46 0.64 0.47 0.27 0.14 0.34 0.26 0.45 0.34 0.32 0.23 0.43 0.20 0.42 0.29 0.19 0.39 0.70 0.26 0.27 0.79 0.50 0.72 0.53 0.18 0.17 0.38 0.46 0.39 0.55 0.64 0.43 0.67 0.39 0.10 0.30 0.32 0.12 0.52 0.29 0.41 0.36 0.44 0.50 0.14 0.42 0.34 0.42 0.32 0.09 0.29 0.35 0.68 0.49 0.05 0.09 0.28 0.40 0.69 0.43 0.10 0.31 0.37 0.23 0.37 0.22 0.11 0.37 0.41 0.32 0.48 217 V14 V15 Table 74 Descriptive statistics and correlations for the GDS-15 Chile V1 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 M SD V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 0.52 0.65 0.63 0.60 0.40 0.76 0.62 0.16 0.41 0.67 0.58 0.58 0.62 0.39 0.14 0.35 0.37 0.45 0.46 0.37 0.46 0.41 0.23 0.39 0.42 0.54 0.53 0.50 0.21 0.33 0.47 0.76 0.62 0.51 0.65 0.75 0.27 0.44 0.52 0.62 0.44 0.62 0.38 0.33 0.47 0.69 0.43 0.67 0.66 0.26 0.47 0.57 0.64 0.51 0.57 0.31 0.34 0.47 0.48 0.72 0.61 0.17 0.50 0.58 0.59 0.62 0.60 0.39 0.18 0.38 0.55 0.59 0.21 0.34 0.21 0.41 0.43 0.47 0.30 0.32 0.46 0.71 0.20 0.39 0.63 0.60 0.67 0.62 0.43 0.20 0.40 0.27 0.44 0.53 0.72 0.57 0.70 0.40 0.19 0.39 0.25 0.24 0.30 0.27 0.34 0.11 0.61 0.48 0.51 0.57 0.52 0.51 0.40 0.20 0.40 0.63 0.63 0.62 0.40 0.10 0.30 0.64 0.78 0.42 0.17 0.38 0.62 0.33 0.47 0.24 0.20 0.48 0.42 0.40 0.49 218 V14 V15 Table 75 Descriptive statistics and correlations for the GDS-15 Mexico V1 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 M SD V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 0.52 0.68 0.59 0.64 0.35 0.72 0.57 0.32 0.30 0.56 0.53 0.58 0.55 0.39 0.11 0.32 0.51 0.52 0.40 0.36 0.41 0.48 0.43 0.36 0.28 0.50 0.48 0.46 0.39 0.25 0.43 0.70 0.54 0.46 0.56 0.73 0.49 0.45 0.53 0.65 0.52 0.67 0.42 0.24 0.43 0.63 0.53 0.59 0.67 0.46 0.46 0.41 0.54 0.46 0.59 0.39 0.24 0.42 0.37 0.78 0.54 0.24 0.33 0.50 0.39 0.58 0.42 0.33 0.14 0.35 0.50 0.60 0.40 0.40 0.27 0.44 0.31 0.50 0.36 0.30 0.46 0.62 0.28 0.27 0.61 0.47 0.70 0.51 0.38 0.14 0.35 0.51 0.52 0.52 0.66 0.48 0.71 0.50 0.19 0.40 0.43 0.19 0.40 0.31 0.43 0.40 0.46 0.50 0.34 0.50 0.29 0.51 0.47 0.22 0.41 0.58 0.57 0.51 0.35 0.05 0.22 0.58 0.70 0.52 0.15 0.35 0.46 0.37 0.60 0.18 0.17 0.31 0.38 0.38 0.46 219 V14 V15 Table 76 Descriptive statistics and correlations for the GDS-15 Cuba V1 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 M SD V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 0.56 0.70 0.74 0.65 0.49 0.77 0.67 0.29 0.29 0.66 0.58 0.56 0.58 0.53 0.18 0.39 0.54 0.59 0.48 0.48 0.47 0.47 0.43 0.37 0.44 0.56 0.51 0.51 0.48 0.24 0.42 0.81 0.63 0.50 0.70 0.75 0.41 0.41 0.61 0.58 0.48 0.68 0.49 0.26 0.44 0.71 0.57 0.71 0.70 0.42 0.41 0.60 0.65 0.63 0.65 0.49 0.26 0.44 0.49 0.83 0.62 0.34 0.44 0.63 0.57 0.63 0.59 0.47 0.18 0.38 0.54 0.59 0.34 0.32 0.30 0.46 0.43 0.52 0.44 0.24 0.43 0.70 0.32 0.39 0.60 0.54 0.65 0.62 0.57 0.23 0.42 0.32 0.45 0.60 0.62 0.55 0.71 0.53 0.14 0.34 0.33 0.14 0.36 0.36 0.35 0.32 0.52 0.50 0.40 0.52 0.51 0.41 0.43 0.15 0.36 0.68 0.62 0.56 0.36 0.05 0.22 0.73 0.72 0.52 0.13 0.33 0.64 0.47 0.58 0.16 0.13 0.26 0.37 0.33 0.44 220 V14 V15 Table 77 Descriptive statistics and correlations for the GDS-15 Uruguay V1 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 M SD V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 0.55 0.58 0.48 0.66 0.48 0.75 0.66 0.26 0.31 0.69 0.45 0.48 0.62 0.34 0.16 0.37 0.49 0.55 0.41 0.50 0.50 0.57 0.42 0.36 0.40 0.55 0.47 0.49 0.38 0.26 0.44 0.72 0.53 0.37 0.63 0.70 0.34 0.40 0.54 0.62 0.38 0.63 0.33 0.19 0.39 0.57 0.45 0.58 0.60 0.41 0.35 0.51 0.57 0.45 0.54 0.30 0.24 0.43 0.53 0.84 0.61 0.28 0.29 0.70 0.48 0.60 0.55 0.30 0.16 0.36 0.57 0.63 0.31 0.42 0.37 0.49 0.37 0.56 0.24 0.23 0.42 0.74 0.26 0.30 0.72 0.61 0.69 0.70 0.40 0.18 0.38 0.39 0.34 0.65 0.73 0.45 0.69 0.35 0.15 0.36 0.30 0.26 0.43 0.39 0.38 0.34 0.49 0.50 0.47 0.48 0.33 0.43 0.37 0.12 0.33 0.59 0.59 0.67 0.38 0.08 0.28 0.55 0.71 0.44 0.09 0.29 0.59 0.43 0.54 0.21 0.12 0.30 0.41 0.33 0.46 221 V14 V15 Table 78 GDS-15 cutoff scores by country Country GDS<6 GDS>6 Argentina 85% 15% Chile 73% 27% Cuba 80% 20% Mexico 80% 20% Uruguay 85% 15% Across all countries 81% 19% ***Scores below 6 indicate no depression, scores greater than 6 indicate depression Table 79 GDS-15 cutoff scores by gender and country GDS <6 GDS ≥6 Women Men Women Men Argentina 84% 87% 16% 13% Chile 70% 79% 30% 21% Cuba 75% 88% 25% 12% Mexico 79% 85% 21% 15% Uruguay 82% 90% 18% 9% Across all countries 77% 86% 23% 14% ***Scores below 6 indicate no depression, scores greater than 6 indicate depression Country 222 Table 80 Countries with the most difficulty endorsing items Country Item Mexico 2 have you dropped many of your activities and interests 4 do you often get bored 6 are you afraid something bad is going to happen to you 7 do you not feel happy most of the time 12 do you feel worthless the way you are now 2 have you dropped many of your activities and interests 6 are you afraid something bad is going to happen to you 7 do you not feel happy most of the time 9 do you prefer to stay at home, rather than going and doing things 11 do you not think it is wonderful to be alive now Argentina Content 223 Figure 1 Chile scree-plot 224 Figure 2 Cuba scree-plot 225 Figure 3 Argentina scree-plot 226 Figure 4 Mexico scree-plot 227 Figure 5 Uruguay scree-plot 228 Figure 6 Chile by gender 1 0.9 0.8 0.7 0.6 0.5 Chile Men Item 13 a=1.68, b=.74 0.4 Chile Women Item 13 a=2.00, b=.79 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 229 Figure 7 Test information curve Chile by gender: for interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation Test Information and Measurement Error 30 0.94 25 0.78 20 Information 15 0.47 10 0.31 5 0 -3 -2 -1 0 Scale Score 230 1 2 3 0.16 Standard Error 0.62 Figure 8 Cuba by gender item 7 1 0.9 0.8 0.7 0.6 0.5 Cuba Men Item 7 a=3.10, b=.13 0.4 Cuba Women Item 7 a=3.10, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 231 Figure 9 Cuba by gender item 12 1 0.9 0.8 0.7 0.6 0.5 Cuba Men Item 12 a=2.46, b=.46 0.4 Cuba Women Item 12 a=2.46, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 232 Figure 10 Test information curve Cuba by gender Test Information and Measurement Error 40 0.93 0.77 30 Information 20 0.45 10 0.30 0 -3 -2 -1 0 Scale Score 233 1 2 3 0.14 Standard Error 0.61 Figure 11 Argentina by gender item 9 1 0.9 0.8 0.7 0.6 0.5 Argentina Men Item 9 a= .87, b= .02 0.4 Argentina Women Item 9 a=.87, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 234 Figure 12 Argentina by gender item 11 1 0.9 0.8 0.7 0.6 0.5 Argentina Men Item 11 a=2.23, b=.97 0.4 Argentina Women Item 11 a=1.82, b=1.17 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 235 Figure 13 Argentina by gender item 12 1 0.9 0.8 0.7 0.6 0.5 Argentina Men Item 12 a=2.22, b=.72 0.4 Argentina Women Item 12 a=2,22, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 236 Figure 14 Argentina by gender item 15 1 0.9 0.8 0.7 0.6 0.5 Argentina Men Item 15 a=1.00, b= -.26 0.4 Argentina Women Item 15 a=1.00, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 237 Figure 15 Test information curve Argentina by gender Test Information and Measurement Error 40 1.11 0.92 0.73 20 0.54 10 0.34 0 -3 -2 -1 0 Scale Score 238 1 2 3 0.15 Standard Error Information 30 Figure 16 Mexico by gender item 6 1 0.9 0.8 0.7 0.6 0.5 Mexico Men Item 6 a=1.54, b=.39 0.4 Mexico Women Item 6 a=1.54, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 239 Figure 17 Mexico by gender item 8 1 0.9 0.8 0.7 0.6 0.5 Mexico Men Item 8 a=3.16, b=.72 0.4 Mexico Women Item 8 a=3.16, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 240 Figure 18 Test information curve Mexico by gender Test Information and Measurement Error 30 0.96 25 0.80 20 Information 15 0.48 10 0.32 5 0 -3 -2 -1 0 Scale Score 241 1 2 3 0.16 Standard Error 0.64 Figure 19 Uruguay by gender item 5 1 0.9 0.8 0.7 0.6 0.5 Uruguay Men Item 5 a=1.76, b=1.07 0.4 Uruguay Women Item 5 a=2.81 b=1.02 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 242 Figure 20 Uruguay by gender item 6 1 0.9 0.8 0.7 0.6 0.5 Uruguay Men Item 6 a=1.16, b=1.27 0.4 Uruguay Women Item 6 a=1.66, b=.81 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 243 Figure 21 Uruguay by gender item 12 1 0.9 0.8 0.7 0.6 0.5 Uruguay Men Item 12 a=2.93, b=1.25 0.4 Uruguay Women Item 12 a=2.21, b=1.44 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 244 Figure 22 Uruguay by gender item 14 1 0.9 0.8 0.7 0.6 0.5 Uruguay Men Item 14 a=2.34, b=1.17 0.4 Uruguay Women Item 14 a=2.9, b=1.14 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 245 Figure 23 Test information curve Uruguay by gender Test Information and Measurement Error 40 1.10 0.91 0.72 20 0.53 10 0.34 0 -3 -2 1 0 Scale Score 246 1 2 3 0.15 Standard Error Information 30 Figure 24 Chile by Cuba item 4 1 0.9 0.8 0.7 0.6 0.5 Item 4 Chile a=3.19 b=.08 Item 4 Cuba a=3.19, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 247 3 Figure 25 Chile by Cuba item 8 1 0.9 0.8 0.7 0.6 Item 8 Chile a=2.83, b=.56 0.5 Item 8 Cuba a=2.83, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 248 Figure 26 Chile by Cuba test information curve Test Information and Measurement Error 40 0.98 0.82 30 Information 20 0.48 10 0.31 0 -3 -2 -1 0 1 Scale Score 249 2 3 0.14 Standard Error 0.65 Figure 27 Mexico by Uruguay item 6 1 0.9 0.8 0.7 0.6 0.5 Item 6 Uruguay a=1.66,b=.29 0.4 Item 6 Mexico a=1.66, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 250 Figure 28 Mexico by Uruguay item 7 1 0.9 0.8 0.7 0.6 Item 7 Uruguay a=3.39, b=.21 0.5 Item 7 Mexico a=3.39, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 251 3 Figure 29 Mexico by Uruguay item 9 1 0.9 0.8 0.7 0.6 0.5 Item 9 Uruguay a=.85, b=.07 0.4 Item 9 Mexico a=.85, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 252 Figure 30 Mexico by Uruguay item 12 1 0.9 0.8 0.7 0.6 0.5 Item 12 Uruguay a=3.11, b=.66 0.4 Item 12 Mexico a=3.11, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 253 Figure 31 Mexico by Uruguay test information curve Test Information and Measurement Error 40 0.93 0.77 30 Information 20 0.45 10 0.30 0 -3 -2 -1 0 Scale Score 1 254 2 3 0.14 Standard Error 0.61 Figure 32 Mexico by Argentina item 2 1 0.9 0.8 0.7 0.6 0.5 Item 2 Argentina a=1.88, b=.29 0.4 Item 2 Mexico a=1.88, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 255 Figure 33 Mexico by Argentina item 7 1 0.9 0.8 0.7 0.6 0.5 Item 7 Argentina a=2.6, b=.33 0.4 Item 7 Mexico a=2.27,b=.72 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 256 Figure 34 Mexico by Argentina item 9 1 0.9 0.8 0.7 0.6 0.5 Item 9 Argentina a=.98, b=-.46 0.4 Item 9 Mexico a=.98, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 257 Figure 35 Mexico by Argentina item 11 1 0.9 0.8 0.7 0.6 0.5 Item 11 Argentina a=1.83, b=1.01 0.4 Item 11 Mexico a=1.78, b=1.62 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 258 Figure 36 Mexico by Argentina item 12 1 0.9 0.8 0.7 0.6 0.5 Item 12 Argentina a=2.54, b=.69 0.4 Item 12 Mexico a=2.54, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 259 Figure 37 Mexico by Argentina test information curve Test Information and Measurement Error 40 1.09 0.91 30 Information 20 0.53 10 0.34 0 -3 -2 -1 0 Scale Score 260 1 2 3 0.15 Standard Error 0.72 Figure 38 Argentina by Uruguay item 2 1 0.9 0.8 0.7 0.6 0.5 Item 2 Argentina a=1.67, b=.70 0.4 Item 2 Uruguay a=1.67, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 261 Figure 39 Argentina by Uruguay item 6 1 0.9 0.8 0.7 0.6 0.5 Item 6 Argentina a=1.07, b=1.23 0.4 Item 6 Uruguay a=1.54, b=.77 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 262 Figure 40 Argentina by Uruguay item 7 1 0.9 0.8 0.7 0.6 0.5 Item 7 Argentina a=2.66, b=.68 0.4 Item 7 Uruguay a=3.37, b=.66 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 263 3 Figure 41 Argentina by Uruguay item 11 1 0.9 0.8 0.7 0.6 0.5 Item 11 Argentina a=1.86, b=1.36 0.4 Item 11 Uruguay a=2.45, b=1.28 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 264 Figure 42 Argentina by Uruguay test information curve Test Information and Measurement Error 30 0.90 25 0.75 20 Information 15 0.45 10 0.30 5 0 -3 -2 -1 0 Scale Score 265 1 2 3 0.15 Standard Error 0.60 Figure 43 Argentina by Chile item 2 1 0.9 0.8 0.7 0.6 Item 2 Argentina a=1.39, b=.56 0.5 Item 2 Chile a=1.39, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 266 Figure 44 Argentina by Chile item 7 1 0.9 0.8 0.7 0.6 Item 7 Argentina a=2.46, b=.46 0.5 Item 7 Chile a=2.6, b=.74 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 267 Figure 45 Argentina by Chile test information curve Test Information and Measurement Error 30 0.96 25 0.80 20 Information 15 0.48 10 0.32 5 0 -3 -2 -1 0 Scale Score 268 1 2 3 0.16 Standard Error 0.64 Figure 46 Argentina by Cuba item 2 1 0.9 0.8 0.7 0.6 Item 2 Argentina a=1.66, b=.65 0.5 Item 2 Cuba a=1.66, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 269 Figure 47 Argentina by Cuba item 5 1 0.9 0.8 0.7 0.6 Item 5 Argentina a=2.09, b=.89 0.5 Item 5 Cuba a=2.47, b=.75 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 270 Figure 48 Argentina by Cuba item 7 1 0.9 0.8 0.7 0.6 Item 7 Argentina a=2.63, b=.62 0.5 Item 7 Cuba a=2.9, b=.49 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 271 Figure 49 Argentina by Cuba test information curve Test Information and Measurement Error 40 1.11 0.91 30 Information 20 0.53 10 0.34 0 -3 -2 -1 0 Scale Score 272 1 2 3 0.15 Standard Error 0.72 Figure 50 Mexico by Chile item 7 1 0.9 0.8 0.7 0.6 Item 7 Chile a=3.15, b=.40 0.5 Item 7 Mexico a=3.15, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 273 Figure 51 Mexico by Chile item 8 1 0.9 0.8 0.7 0.6 Item 8 Chile a=2.97, b=.44 0.5 Item 8 Mexico a=3.46, b=.14 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 274 Figure 52 Mexico by Chile item 11 1 0.9 0.8 0.7 0.6 0.5 Item 11 Chile a=2.08, b=1.1 0.4 Item 11 Mexico a=1.94, b=1.31 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 275 Figure 53 Mexico by Chile item 12 1 0.9 0.8 0.7 0.6 0.5 Item 12 Chile a=3.09, b=.51 0.4 Item 12 Mexico a=3.09, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 276 3 Figure 54 Mexico by Chile item 15 1 0.9 0.8 0.7 0.6 0.5 Item 15 Chile a=1.3, b=.36 0.4 Item 15 Mexico a=1.3, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 277 Figure 55 Mexico by Chile test information curve Test Information and Measurement Error 40 0.92 0.76 30 Information 20 0.45 10 0.29 0 -3 -2 -1 0 Scale Score 278 1 2 3 0.14 Standard Error 0.61 Figure 56 Mexico by Cuba item 2 1 0.9 0.8 0.7 0.6 Item 2 Mexico a=1.91, b=.10 0.5 Item 2 Cuba a=1.91, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 279 Figure 57 Mexico by Cuba item 4 1 0.9 0.8 0.7 0.6 Item 4 Mexico a=2.98, b=.03 0.5 Item 4 Cuba a=2.98, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 280 Figure 58 Mexico by Cuba item 7 1 0.9 0.8 0.7 0.6 Item 7 Mexico a=2.95, b=.39 0.5 Item 7 Cuba a=2.95, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 281 3 Figure 59 Mexico by Cuba item 11 1 0.9 0.8 0.7 0.6 0.5 Item 11 Mexico a=1.96, b=1.25 0.4 Item 11 Cuba a=2.45, b=1.07 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 282 Figure 60 Mexico by Cuba item 12 1 0.9 0.8 0.7 0.6 0.5 Item 12 Mexico a=2.99, b=.35 0.4 Item 12 Cuba a=2.99, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 283 Figure 61 Mexico by Cuba item 14 1 0.9 0.8 0.7 0.6 0.5 Item 14 Mexico a=3.15, b=.23 0.4 Item 14 Cuba a=3.15, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 284 Figure 62 Mexico by Cuba test information curve Test Information and Measurement Error 40 0.84 30 0.66 20 0.48 10 0.31 Information 1.02 0 -3 -2 -1 0 Scale Score 285 1 2 3 0.13 Standard Error 50 Figure 63 Uruguay by Chile item 2 1 0.9 0.8 0.7 0.6 Item 2 Chile a=1.27, b=.36 0.5 Item 2 Uruguay a=1.71, b=.3 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 286 Figure 64 Uruguay by Chile item 3 1 0.9 0.8 0.7 0.6 Item 3 Chile a=2.6, b=.13 0.5 Item 3 Uruguay a=2.24, b=.46 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 287 Figure 65 Uruguay by Chile item 10 1 0.9 0.8 0.7 0.6 Item 10 Chile… Item 10 Uruguay… 0.5 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 288 Figure 66 Uruguay by Chile item 14 1 0.9 0.8 0.7 0.6 0.5 Item 14 Chile a=2.61, b=.57 0.4 Item 14 Uruguay a=2.61, b=.00 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 289 Figure 67 Uruguay by Chile test information curve Test Information and Measurement Error 40 1.03 0.85 30 Information 20 0.50 10 0.32 0 -3 -2 -1 0 Scale Score 290 1 2 3 0.15 Standard Error 0.68 Figure 68 Uruguay by Cuba item 4 1 0.9 0.8 0.7 0.6 Item 4 Uruguay a=2.51, b=.39 0.5 Item 4 Cuba a=2.51, b=.00 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 291 Figure 69 Uruguay by Cuba item 6 1 0.9 0.8 0.7 0.6 0.5 Item 6 Uruguay a=1.54, b=.65 0.4 Item 6 Cuba a=1.52, b=.67 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 292 Figure 70 Uruguay by Cuba item 8 1 0.9 0.8 0.7 0.6 Item 8 Uruguay a=2.96, b=.72 0.5 Item 8 Cuba a=2.77, b=.90 0.4 0.3 0.2 0.1 0 -3 -2.5 -2 -1.5 -1 0 1 1.5 2 2.5 3 293 Figure 71 Uruguay by Cuba test information curve Test Information and Measurement Error 40 1.08 0.89 30 Information 20 0.52 10 0.33 0 -3 -2 -1 0 Scale Score 294 1 2 3 0.15 Standard Error 0.71 Appendix 295 GERIATRIC DEPRESSION SCALE SHORT FORM 1. Are you basically satisfied with your life? 2. Have you dropped many of your activities and interests? 3. Do you feel that your life is empty? 4. Do you often get bored? 5. Are you in good spirits most of the time? 6. Are you afraid that something bad is going to happen to you? 7. Do you feel happy most of the time? 8. Do you often feel helpless? 9. Do you prefer to stay at home, rather than going out and doing new things? 10. Do you feel that you have more problems with memory than most? 11. Do you think that it is wonderful to be alive now? 12. Do you feel worthless the way you are now? 13. Do you feel full of energy? 14. Do you feel that your situation is hopeless? 15. Do you think that most people are better off than you are? 296 GERIATRIC DEPRESSION SCALE SHORT FORM SPANISH VERSION 1. En general , est· satisfecho/a con su vida? 2. Ha abandonado muchas de sus tareas habituales y aficiones? 3. Siente que su vida est· vacÌa? 4. Se siente con frecuencia aburrido/a? 5. Se encuentra de buen humor la mayor parte del tiempo? 6. Teme que algo malo pueda ocurrirle? 7. Se siente feliz la mayor parte del tiempo? 8. Con frecuencia se siente desamparado/a, desprotegido/a? 9. Prefiere usted quedarse en casa, m·s que salir y hacer cosas nuevas? 10. Cree que tiene m·s problemas de memoria que. la mayorÌa de la gente? 11. En estos momentos, piensa que es estupendo estar vivo? 12. Actualmente se siente una in_til? 13. Se siente lleno la de energÌa? 14. Se siente sin esperanza en este momento? 15. Piensa que la mayorÌa de la gente est· en mejor situaciÛn que usted? 297 References 298 References Adams, K. B. (2001). Depressive symptoms, depletion, or developmental change? Withdrawal, apathy, and lack of vigor in the Geriatric Depression Scale. The Gerontologist, 41(6), 768-777. Adams, K. B., Matto, H. C., & Sanders, S. (2004). Confirmatory Factor Analysis of the Geriatric Depression Scale. The Gerontologist, 44(6), 818-826. Angst, J. (1992). How predictable is depressive illness? In S. Montgomery & F. Rouillon (Eds.), Long-term treatment of depression (pp. 1-15). New York, NY: Wiley. Ayotte, B. J., Yang, F. M., & Jones, R. N. (2010). Physical Health and Depression: A Dyadic Study of Chronic Health Conditions and Depressive Symptomatology in Older Adult Couples. Journal of Gerontology: Psychological Sciences, 65B(4), 438-448. Baumgartner, H., & SteenKamp, J.-B. E. M. (2001). Response Styles in Marketing Research:A Cross-National Investigation. Journal of Marketing Research, 38(2), 143-156. Beckstead, J. W., Yang, C. Y., & Lengacher, C. A. (2008). Assessing cross-cultural validity of scales: A methodological review and illustrative example. International Journal of Nursing Studies, 45(1), 110-119. Beekman, A. T. F., Kriegsman, D. M. W., Deeg, D. J. H., & Tilburg, W. (1995). The association of physical health and depressive symptoms in the older population: age and sex differences. Social Psychiatry and Psychiatric Epidemiology, 30(1), 32-38. Beekman, A. T. F., Penninx, B. W. J. H., Deeg, D. J. H., Ormel, J., Braam, A. W., & Van Tilburg, W. (1997). Depression and physical health in later life: results from the Longitudinal Aging Study Amsterdam (LASA). Journal of Affective Disorders, 46, 219-231. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289-300. Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238-246. Bentler, P. M., & Bonnet, D. G. (1980). Significance tests and goodness of fit on the analysis of covariance structures. Psychological Bulletin, 88, 588-606. 299 Berkman, L. F., Berkman, C. S., Kasl, S., Freeman, D. H., Leo, L., Ostfeld, A. M., et al. (1986). Depressive Symptoms in Relation to Physical Health and Functioning in the Elderly American Journal of Epidemiology, 124(3), 372-388. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-460. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bontempo, D. E. (2007). Polytomous factor analytic models in developmental research. Dissertation Abstracts International: Section B: The Sciences and Engineering, 67(8-B), 4754. Bontempo, D. E., & Hofer, S. M. (2007). Assessing factorial invariance in cross-sectional and longitudinal studies. In A. D. v. D. Ong, Manfred H (Ed.), Oxford handbook of methods in positive psychology. Series in positive psychology (pp. 153-175). New York, NY: Oxford University Press. Bontempo, D. E., Hofer, S. M., Mackinnon, A., Gray, K., Einfeld, S., Tonge, B., et al. (2008). Factor structure of the Developmental Behavior Checklist using confirmatory factor analysis of polytomous items. Journal of Applied Measurement, 9(3), 265-280. Braam, A. W., Prince, M. J., Beekman, A. T. F., Delespaul, P., Dewey, M. E., Geerlings, S. W., et al. (2005). Physical health and depressive symptoms in older Europeans: Results from EURODEP. British Journal of Psychiatry, 187, 35-42. Braswell, J. S., Lutkus, A. D., Grigg, W. S., Santapau, S. L., Tay-Lim, B., & Johnson, M. (2001). The nation’s report card: Mathematics 2000. NCES 2001-517. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement, National Center for Education Statistics. Brink, T., Yesavage, J. A., Lum, O., Heersema, P. H., Adey, M., & Rose, T. L. (1982). Screening Tests For Geriatric Depression. Clinical Gerontologist, 1(1), 37-43. Brown, L. M., & Schinka, J. A. (2005). Development and initial validation of a 15-item informant version of the Geriatric Depression Scale. [Peer Reviewed]. International Journal of Geriatric Psychiatry, 20(10), 911-918. doi: 10.1002/gps.1375 Brown, P. J., Woods, C. M., & Storandt, M. (2007). Model stability of the 15-item Geriatric Depression Scale across cognitive impairment and severe depression. Psychology and Aging, 22(2), 372-379. 300 Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. B. J. S. Long (Ed.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage. Byrne, B. M. (2008). Testing for multigroup equivalence of a measuring instrument: A walk through the process. Psicothema, 20(4), 872-882. Byrne, B. M., & Campbell, T. L. (1999). Cross-cultural comparisons and the presumption of equivalent measurement and theoretical structure - A look beneath the surface. Journal of Cross-Cultural Psychology, 30(5), 555-574. Byrne, B. M., Shavelson, R. J., & Muthen, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456-466. Byrne, B. M., & Watkins, D. (2003). The issue of measurement invariance revisited. Journal of Cross-Cultural Psychology, 34(2), 155-175. Chaaya, M., Sibai, A. M., El Roueiheb, Z., Chemaitelly, H., Chahine, L. M., Al-Amin, H., et al. (2008). Validation of the Arabic version of the short Geriatric Depression Scale (GDS-15). International Psychogeriatrics, 20(3), 571-581. Chau, J., Martin, C. R., Thompson, D. R., Chang, A. M., & Woo, J. (2006). Factor structure of the Chinese version of the Geriatric Depression Scale. Psychology, Health & Medicine, 11(1), 48-59. Chen, Y. H., Rendina-Gobioff, G., & Dedrick, R. F. (2010). Factorial Invariance of a Chinese Self-Esteem Scale for Third and Sixth Grade Students: Evaluating Method Effects Associated with Positively and Negatively Worded Items. The International Journal of Educational and Psychological Assessment, 6(1), 21-35. Cheng, S.-T., & Chan, A. C. M. (2004). A Brief Version of the Geriatric Depression Scale for the Chinese. Psychological Assessment, 16(2), 182-186. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233-255. Coleman, R. M., Miles, L. E., Guilleminault, C., Zarcone, V. P., Van Der Hoed, J., & Dement, W. C. (1981). Sleep-wake disorders in the elderly: a polysomnographic analysis. J. Am. Geriat. Soc. , 29, 289–296. Djernes, J. K. (2006). Prevalence and predictors of depression in populations of elderly: a review. Acta Psychiatrica Scandinavica, 113(5), 372-387. 301 Eberhardt, M. S., & Pamuk, E. P. (2004). The importance of place of residence: Examining health in rural and nonrural areas. American Journal of Public Health, 94, 1682-1686. Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of Differential Item Functioning Using Item Response Theory and the Likelihood-Based Model Comparison Approach: Application to the MiniMental State Examination. Medical Care. Special Issue: Measurement in a multiethnic society, 44(11, Suppl 3), S134-S142. Edwards, M. C. (2009). An Introduction to Item Response Theory Using the Need for Cognition Scale. Social and Personality Psychology Compass, 3(4), 507-529. Embretson, S. E. R., S.P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum. Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466-491. Flowers, C., Oshima TC, & Raju, N. (1999). A description and demonstration of the polytomous DFIT framework. Applied Psychological Measurement, 23, 309-326. Friedman, B., Heisel, M. J., & Delavan, R. L. (2005). Psychometric Properties of the 15Item Geriatric Depression Scale in Functionally Impaired, Cognitively Intact, Community-Dwelling Elderly Primary Care Patients. Journal of the American Geriatrics Society, 53(9), 1570-1576. Ganguli, M., Dube, S., Johnston, J. M., Pandav, R., Chandra, V., & Dodge, H. H. (1999). Depressive symptoms, cognitive impairment and functional impairment in a rural elderly population in India: A Hindi version of the Geriatric Depression Scale International Journal of Geriatric Psychiatry, 14(10), 807-820. Hartley, D. (2004). Rural health disparities, population health, and rural culture. American Journal of Public Health, 94, 1675-1678. Hays, R. D., Morales, L. S., & Reise, S. P. (2000). Item Response Theory and Health Outcomes Measurement in the 21st century. Medical Care, 28(9SII), II-28-II-42. Holland, P. W., & Thayer, D. T. (1988). Differential Item Performance and the MantelHaenszel Procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 129145). Hillsdale, NJ: Lawrence Erlbaum Associates. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179-185. 302 Horn, J. L. (1991). Comments on Issues in Factorial Invariance. In L. M. C. a. J. L. Horn (Ed.), Best Methods for the Analysis of Change (pp. 114-125). Washington, DC: American Psychological Association. Horn, J. L., McArdle, J. J., & Mason, R. (1983). When is invariance not invariant: A practical scientist's look at the ethereal concept of factor invariance. Southern Psychologist, 1, 179-188. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. Incalzi, R. A., Cesari, M., Pedone, C., & Carbonin, P. U. (2003). Construct validity of the 15-item Geriatric Depression Scale in older medical inpatients. Journal of Geriatric Psychiatry and Neurology, 16(1), 23-28. Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409-426. Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36(3), 347387. Lai, D., Tong, H. M., Zeng, Q., & Xu, W. Y. (2010). The factor structure of a Chinese Geriatric Depression Scale-SF: use with alone elderly Chinese in Shanghai, China. International Journal of Geriatric Psychiatry, 25(5), 503-510. Lai, D. W. L., Fung, T. S., & Yuen, C. T. Y. (2005). The Factor Structure of A Chinese Version of The Geriatric Depression Scale. International Journal of Psychiatry in Medicine, 35(2), 137-148. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model Modifications in Covariance Structure Analysis: The Problem of Capitalization on Chance. Psychological Bulletin, 111 (May), 490-504. Malakouti, S. K., Fatollahi, P., Mirabzadeh, A., Salavati, M., & Zandi, T. (2006). Reliability, validity and factor structure of the GDS-15 in Iranian elderly. International Journal of Geriatric Psychiatry, 21, 588-593. Meredith, W. (1993). Measurement Invariance, Factor-Analysis and Factorial Invariance. Psychometrika, 58(4), 525-543. Millsap, R. E. (1995). Measurement invariance, predictive invariance, and the duality paradox. Multivariate Behavioral Research, 30(4), 577-605. 303 Millsap, R. E. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2(3), 248-260. Millsap, R. E., & Everson, H. T. (1993). Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. Millsap, R. E., & Yun-Tein, J. (2004). Assessing Factorial Invariance in OrderedCategorical Measures. Multivariate Behavioral Research, 39(3), 479-515. Mitchell, J., Matthews, H. F., & Yesavage, J. A. (1993). A Multidimensional Examination of Depression Among the Elderly. Research on Aging, 15(2), 198219. Muthén, B., & Asparouhov, T. (2002). Latent Variable Analysis With Categorical Outcomes: Multiple-Group And Growth Modeling In Mplus. . Mplus Web Notes: No. 4 Version 5. Retrieved from http://www.statmodel.com/download/webnotes/CatMGLong.pdf website: Muthen, B., & Christoffersson, A. (1981). SIMULTANEOUS FACTOR-ANALYSIS OF DICHOTOMOUS-VARIABLES IN SEVERAL GROUPS. Psychometrika, 46(4), 407-419. Muthén, B., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished manuscript. Muthén, L., & Muthén, B. (2008). Mplus user's guide ((5th ed.) ed.). Los Angeles: Muthén & Muthén. Muthén, L., & Muthén, B. O. (2008). Mplus user's guide ((5th ed.) ed.). Los Angeles: Muthén & Muthén. Myers, M. B., Calantone, R. J., Page, T. J., & Taylor, C. R. (2000). Academic insights: An application of multiple-group causal models in assessing cross-cultural measurement equivalence. Journal of International Marketing, 8(4), 108-121. Onishi, J., Suzuki, Y., Umegaki, H., Kawamura, T., Iguchi, A., & Endo, H. (2006). A Comparison of Depressive Mood of Older Adults in a Community, Nursing Homes, and a Geriatric Hospital: Factor Analysis of Geriatric Depression Scale. Journal of Geriatric Psychiatry and Neurology, 19(1), 26-31. Ormel, J., Kempen, G. I. J. M., Penninx, B. W. J. H., Brilman, E. I., Beekman, A. T. F., & Van Sonderen, E. (1997). Chronic medical conditions and mental health in older people: disability and psychosoical resources mediate specific mental health effects. Psychological Medicine, 27, 1065-1077. 304 Oshima TC, & Morris, S. (2008). Raju's Differential Functioning of Items and Tests (DFIT). NCME Instructional Module, Fall 2008, 43-50. Parmelee, P. A., & Katz, I. R. (1990). Geriatric Depression Scale. Journal of the American Geriatrics Society, 38(12), 1379. Parmelee, P. A., Lawton, M. P., & Katz, I. R. (1989). Psychometric properties of the Geriatric Depression Scale among the institutionalized aged. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 1(4), 331-338. Pelaez, M., Palloni, A., Albala, C., Alfonso, J. C., Ham-Chande, R., Hennis, A., et al. (2004). SABE - Survey on Health, Well-Being, and Aging in Latin America and The Caribbean, 2000. Ann Arbor, MI: Inter-university Consortium for Political and Social Research. Pocinho, M. T. S., Farate, C., Dias, C. A., Yesavage, J. A., & Lee, T. T. (2009). Clinical and psychometric validation of the Geriatric Depression Scale (GDS) for Portuguese elders. Clinical Gerontologist, 32(2), 223-236. Raju, N., van der Linden WJ, & Fleer, P. (1995). IRT-based internal measures of differential item functioning of items and tests. Applied Psychological Measurement, 19, 353-368. Ramirez, M., Ford, M. E., Stewart, A. L., & Teresi, J. A. (2005). Measurement Issues in Health Disparities Research. Health Services Research, 40(5 Part 2), 1640-1657. Reckase, M. D. (1979). Unifactor latent trait models applied to multi-factor tests: Results and implications. Journal of Educational Measurement, 4, 207-230. Reise, S. P. (2005). Item response theory and its applications for cancer outcomes measurement. In J. Lipscomb, C. C. Gotay & C. Snyder (Eds.), Outcomes assessment in cancer: Measures, methods, and applications (pp. 425-444). New York, NY: Cambridge University Press. Reise, S. P., Ainsworth, A. T., & Haviland, M. G. (2005). Item response theory Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95-101. Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory Factor-Analysis And Item Response Theory - 2 Approaches For Exploring Measurement Invariance. Psychological Bulletin, 114(3), 552-566. Roszkowski, M. J., & Soven, M. (2010). Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 117-134. 305 Salamero, M., & Marcos, T. (1992). Factor study of the Geriatric Depression Scale. Acta Psychiatrica Scandinavica, 86(4), 283-286. Schreiner, A. S., Morimoto, T., & Asano, H. (2001). Depressive symptoms among postroke patients in Japan: frequency distribution and factor structure of the GDS. International Journal of Geriatric Psychiatry, 16, 941-949. Schriesheim, C. A., Eisenbach, R. J., & Hill, K. D. (1991). The Effect of Negation and Polar Opposite Item Reversals on Questionnaire Reliability and Validity: An Experimental Investigation. Educational and Psychological Measurement, 51, 6778. Sheikh, J. I., & Yesavage, J. A. (1986). Geriatric Depression Scale (GDS): Recent evidence and development of a shorter version. Clinical Gerontologist, 5(1-2), 165-173. Sörbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British J. of Mathematical and Stat. Psychology 27, 229-239. Sörbom, D. (1982). Structural equation models with structured means. In K. G. a. W. Jöreskog, H. (Ed.), Systems under indirect observation: Causality, structure, prediction. North-Holland, Amsterdam.: Elsevier Science. Stalh, S. M., & Hahn, A. A. (2006). The National Institute on Aging's Resource Centers for Minority Aging Research. Medical Care, 44(11 Suppl 3), S1-S2. Steele, R. G., Little, T. D., Ilardi, S. S., Rex F.R., Brody, G. H., & Hunter, H. L. ( 2006). A Confirmatory Comparison of the Factor Structure of the Children‘s Depression Inventory between European American and African American Youth. J Child Fam Stud, 15, 779-794. SteenKamp, J.-B. E. M., & Baumgartner, H. (1998). Assessing Measurement Invariance in Cross-National Consumer Research. Journal of Consumer Research 25, 78-90. Steiger, J. H. (1990). Structural Model Evaluation and Modification: An Interval Estimation Approach. Multivariate Behavioral Research, 25(April), 173-180. Steinberg, L. (2001). The Consequences of Pairing Questions: Context Effects in Personality Measurement. Journal of Personality and Social Psychology, 81(2), 332-342. Stewart, A. L., & Napoles-Springer, A. (2000). Health-related quality-of-life assessments in diverse population groups in the United States. Medical Care, 38(9), 102-124. 306 Swaminathan, H., & Rogers, H. J. (1990). Detecting Differential Item Functioning Using Logistic Regression Procedures. Journal of Educational Measurement, 27(4), 361-370. Tang, W. K., Wong, E., Chiu, H. F. K., Ungvari, G. S., & Lum, C. M. (2005). The Geriatric Depression Scale should be shortened: Results of Rasch analysis. International Journal of Geriatric Psychiatry, 20(8), 783-789. Teresi, J. A. (2006). Overview of Quantitative Measurement Methods: Equivalence, Invariance, and Differential Item Functioning in Health Applications. Medical Care. Special Issue: Measurement in a multi-ethnic society, 44(11, Suppl 3), S39S49. Teresi, J. A., Ramirez, M., Lai, J.-S., & Silver, S. (2008). Occurences and sources of Differential Item Functioning (DIF) in patient-reported outcome measures: Description of DIF methods, and review of measures of depression, quality of life and general health. Psycholoy Science Quarterly, 50(4), 538-612. Thissen, D. (2001). IRTLRDIF v2.0b Software for the Computation of the Statistics Involved in Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning. Available on Dave Thissen's web page www.unc.edu/~dthissen. Thissen, D., Steinberg, L., & Kuang, D. (2002). Quick and Easy Implementation of the Benjamini-Hochberg Procedure for Controlling the False Positive Rate in Multiple Comparisons. Journal of Educational and Behavioral Statistics, 27(1), 77-83. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. W. H. Braun (Ed.), Test Validity (pp. 147-169). Hillsdale, NJ: Erlbaum. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum Associates. Thissen, D., W-H., & Bock, R. D. (2003). MULTILOG User's Guide. Multiple, Categorical Item Analysis and Test Scoring Using Item Response Theory. (Version 7). Lincolnwood, IL: Scientific Software International. Thurstone, L. L. (1947). Multiple-factor analysis: a development and expansion of The Vectors of Mind. Chicago, IL: University of Chicago Press. Tomarken, A. J., & Waller, N. G. (2003). Potential problems with "well fitting" models. Journal of Abnormal Psychology, 112, 578-598. 307 Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69. Vandenberg, R. J., & Lance, C. E. (2000). A Review and Synthesis: of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research. Organizational Research Methods, 3(1), 4-70. Wrobel, N. H., & Farrag, M. F. (2006). A Preliminary Report on the Validation of the Geriatric Depression Scale in Arabic. Clinical Gerontologist, 29(4), 33-45. Yang, Y., Small, B. J., & Haley, W. E. (2001). Cross-cultural comparability of the Geriatric Depression Scale: comparison between older Koreans and old Americans. Aging & Mental Health, 5(1), 31-37. Yesavage, J. A., Brink, T. L., Rose, T. L., Lum, O., Huang, V., & Adey, M. (1983). Development and validation of a geriatric depression screening scale: a preliminary report. Journal of Psychiatry Research, 17(1), 37-49. Zunzunegui, M., Alvarado, B., Beland, F., & Vissandjee, B. (2009). Explaining health differences between men and women in later life: A cross-city comparison in Latin America and the Caribbean. Social Science & Medicine, 68, 235–242. 308