“HM 11W \ \ WIWINWMWINl\\l\ll\ll|1\\\!| \ r“ . 7 4’ ‘1 I ,' . , l k) " (0 3 (0/1) 0%"? This is to certify that the dissertation entitled ATHLETES” EVALUATIONS OF THEIR HEAD COACH’S COACHING COMPETENCIES: A MULTILEVEL CONFIRMATORY FACTOR ANALYSIS presented by Nicholas Daniel Myers has been accepted towards fulfillment of the requirements for the Dual-Major degree in Department of Kinesiology & Doctoral Department of Counseling, Educational Psychology, Special Education galaxy/Z EM? 1% /Major Professors’ Signatures l1! INLD‘] Date MSU is an Afiirmative Action/Equal Opportunity Institution LIBRARIES MICHIGAN STATE UNIVERSITY EAST LANSING, MICH 48824-1048 ‘—-—.-———--—.v— - I ‘ v ' PLACE IN RETURN Box to remove this checkout from your record. 1'0 AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE | DATE DUE DATE DUE , l0 5208?;539‘4 MAIN B 2010.0 2/05 mElRWeDueIndd-pjs‘ ATHLETES’ EVALUATIONS OF THEIR HEAD COACH’S COACHING COMPETENCIES: A MULTILEVEL CONFIRMATORY FACTOR ANALYSIS By Nicholas Daniel Myers A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY-DUAL MAJOR Department of Kinesiology Department of Counseling, Educational Psychology, and Special Education 2005 ABSTRACT ATHLETES’ EVALUATIONS OF THEIR HEAD COACH’S COACHING COMPETENCIES: A MU LTILEVEL CONFIRMATORY FACTOR ANALYSIS By Nicholas Daniel Myers This study (a) provided initial validity evidence for the Coaching Competency Scale (CCS), and (b) introduced multilevel confirmatory factor analysis (MCFA) as an appropriate methodology to use when data are meaningfully nested and an evaluation of the factor structure of a set of indicators is desired. Data were collected from intercollegiate men’s (g = 8) and women’s (g = 13) soccer and women’s ice hockey teams (g = l 1). Results offered some support for the proposed multilevel multidimensional conceptualization of coaching competency, the internal consistency ' reliabilities of the coaching competency estimates, and a relationship between motivation competency and satisfaction with the coach within teams. Validity concerns were observed for the original rating scale structure and the relationship between motivation competency and satisfaction with the coach between teams. Results were interpreted to guide future research with the CCS, to provide recommendations for revisions to the instrument, and to assist researchers in physical education and exercise science in understanding when and how to apply MCFA to their data. DEDICATION This work is dedicated, in part, to my partner, Ahnalee. Thank you for hanging in there over the past five years, babe. I look forward to our life together. This work is also dedicated, in part, to my family and friends. Please know that although you have often been pushed aside over the last few years, you have never been forgotten. Thank you for your love, companionship, and strength. I take you with me wherever I go. iii ACKNOWLEDGEMENTS Deb Feltz: Thank you for your mentorship, guidance, and support over the last five years. And, thank you for providing an opportunity to an applicant who had little “traditional” background five years ago. Nobody has had a larger impact on my graduate training. Ed Wolfe: Thank you for your mentorship, excellent teaching in complex subjects, and active advisement over the last three years. Your decision to write a note on my final exam three years ago, asking me if I ever thought about pursuing a degree in MQM, has changed my career. Thank you for taking the time that such insight and curiosity require. Kim Maier: Thank you for your excellent teaching in complex subjects, willingness to consult, and encouragement in my job search. Although we have not known each other for long, I hope that our paths will cross at regular intervals down the road. Mark Reckase: Thank you for your excellent teaching in complex subjects, and honesty in our interactions. And, thank you for packaging your deep knowledge and experience in probes that are available to people, like me, with considerably less knowledge and experience. You provide a model that many students in MQM aspire to emulate. iv TABLE OF CONTENTS LIST OF TABLES ................................................................................... vii LIST OF FIGURES ................................................................................ viii KEY TO SYMBOLS AND ABBREVIATIONS ................................................ ix CHAPTER 1 INTRODUCTION AND REVIEW OF COACHING EFFECTIVENESS LITERATURE ......................................................................................... 1 Nature of the Problem ........................................................................ 1 Review of Coaching Effectiveness Literature ............................................ 3 Establishing an Initial Validity Framework for the CCS ................................ 6 Interpretive framework ............................................................ .7 Instrument development ............................................................. 7 Instrumentation ....................................................................... 8 External model ....................................................................... 9 Internal model ........................................................................ 9 Statement of Purpose ....................................................................... 12 Research Questions ......................................................................... 12 CHAPTER 2 INTRODUCTION OF MCFA ..................................................................... l3 Consequences of Ignoring Multilevel Data Structures .................................. 14 Steps in a MCFA ............................................................................ 15 A Technical Synopsis ....................................................................... 15 Sample size ........................................................................ 19 Judging the fit of MCFA ......................................................... 19 An Application ofMuthén’s Steple CHAPTER 3 METHOD ............................................................................................. 23 Sample ....................................................................................... 23 Procedure ..................................................................................... 23 Measures ..................................................................................... 24 Coaching competence ............................................................. 24 Satisfaction with the coach ....................................................... 24 Treatment of Data ........................................................................... 25 Missing data ........................................................................ 25 Outliers .............................................................................. 25 Normality ............................................................................ 26 Analyses ..................................................................................... 26 Rating scale ........................................................................ 26 Internal models ..................................................................... 30 Internal consistency reliability .................................................... 31 Forming measures to test questions 4 and 5 .................................... 32 Testing questions 4 and 5 ......................................................... 33 Model estimation and fit .......................................................... 36 Reliability estimates ............................................................... 36 CHAPTER 4 RESULTS ............................................................................................ 37 Did Athletes Employ the Rating Scale Structure in the Manner that the Authors Intended? ............................................................................................................... 37 To What Degree did the Proposed Internal Models fit the Data? ......................... 40 Step 1 ............................................................................... 40 Step 2 ............................................................................... 40 Step 3 ............................................................................... 44 Step 4 ............................................................................... 48 How Reliable were the Rank Ordering of Coaching Competency Estimates? ..... 55 Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach Within Teams? ............................................................................................ 55 Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach Between Teams? ......................................................................................... 58 CHAPTER 5 DISCUSSION ....................................................................................... 60 APPENDICES ....................................................................................... 67 REFERENCES ....................................................................................... 69 FOOTNOTES ....................................................................................... 75 vi LIST OF TABLES Table 1 Original and Post Hoc Rating Scale Structures .................................... 38-39 Table 2 Item Characteristics for the CCS ......................................................... 41 Table 3 Pooled Within-Teams Correlations and Covariances ................................. 42 Table 4 Scales Between-Teams Correlations and Covariances ............................... 43 Table 5 Model-Data Fit Statistics .................................................................. 47 Table 6 Within-Teams and Between-Teams Estimates ........................................ 50 Table 7 Comparisons of Factor Loadings and Correlations Among Factors by Gender...52 Table 8 Comparisons of Factor Loadings and Correlations Among Factors by Sport ...... 53 Table 9 Comparisons of Factor Loadings and Correlations Among Factors by Year. ....54 Table 10 Correlations Between Competency Judgments and Satisfaction with the Coach at the Individual and Team-Level ................................................................. 56 Table 11 Hierarchical Linear Models where Satisfaction was the Dependent Variable. . .59 vii LIST OF FIGURES Figure 1 . Hom’s working model of coaching effectiveness .................................... 2 Figure 2. Multidimensional internal model of the CCS ........................................ 11 Figure 3. Multidimensional MUML model of CCS ............................................ 18 Figure 4. Coherence example for a well-fitting five category structure ...................... 29 viii KEY TO SELECT ABBREVIATIONS Any format acceptable to the departments may be used. or: Cronbach’s coefficient alpha AERA: American Educational Research Association APA: American Psychological Association CAIC: consistent Akaike information criterion CBAS: coaching behavior assessment system CBC: character building competence CBQ: coaching behavior questionnaire CCS: coaching competency scale CEQ: coaching evaluation questionnaire CES: coaching efficacy scale CFA: confirmatory factor analysis CFI: comparative fit index 8: item difficulty DQS: decision style questionnaire EFA: exploratory factor analysis FIML: full maximum likelihood GSC: game strategy competence ICC: intraclass correlation coefficient LM: Lagrange multiplier test LSS: leadership scale for sports MC: motivation competence MCF A: multilevel confirmatory factor analysis WSQ: outfit mean square fit statistic MUML: Muthén’s maximum likelihood NASPE: National Association for Sport and Physical Education NCME: National Council on Measurement in Education RMSEA: root mean square error of approximation RSM: rating scale model 8.3: scaled between group covariance matrix 23: between group population covariance matrix Spw: within group covariance matrix SRMR: standardized root mean squared residual ST: total covariance matrix 2w: within group population covariance matrix ‘t: threshold TC: technique competence 9: person ability TLI: Tucker-Lewis index ix CHAPTER 1 INTRODUCTION AND REVIEW OF COACHING EFFECTIVENESS LITERATURE Nature of the Problem To date, much of the research in sport leadership has been directed toward identifying particular coaching styles that elicit successful performance and/or positive psychological responses from athletes (Horn, 2002). The two most prominent models of leadership effectiveness in sport, the Multidimensional Model of Leadership (Chelladurai, 1978) and the Mediational Model of Leadership (Smoll & Smith, 1989), have served as frameworks for much of the related research. Recently, Horn combined elements of both models to form a working model of coaching effectiveness. Horn’s (2002) model of coaching effectiveness, as displayed in Figure 1, is founded on at least three assumptions. First, both antecedent factors (e.g., coach’s personal characteristics, organizational climate) and personal characteristics of athletes influence a coach’s behavior indirectly through a coach’s expectancies, beliefs, and goals. Second, a coach’s behavior directly affects athletes’ perceptions and evaluations of a coach’s behavior. Third, athletes’ perceptions and evaluation of a coach’s behavior mediate the influence that a coach’s behavior has on athletes’ self-perceptions (e.g., self- efiicacy) and attitudes (e. g., satisfaction with a coach), which in turn directly affects athletes’ motivation and performance. Because athletes’ perceptions and evaluation of a coach’s behavior are believed to play a critical role in coaching effectiveness, providing a tool to assess athletes’ evaluations of key coaching competencies is important to the continued improvement of coaching, and to the further development of coaching effectiveness models. 358 N . :38 20:85 Boga om coco—nan «$83588. menace—25— 09:9: OamaNmaosm. 0:358 008:8. Ramona . arm—808838 08038. «E83598. Gama? mom—m >298. @883— ‘ 039808838 Oocoram. flow—5&3 >388. 688358 an 85:539. o». 88:8. worm—$2. ‘ >538. 81.2.8880 Ba warm—$8 >958. 8:. wanna—ones? case? an 83:58 _ >958. _o mc23 —' gsc8 Game Strategy '_—’ 939 Competency -—v gsctt —* gsc17 -—-—> gsc21 —--V tc7 —' tc14 -——-> tc16 Technique Competency -—-—> tc18 r—V tc20 —' 1622 ————* cbc5 *—’ cbc13 haracter Building Competency ——> cbc19 --—+ cbc24 11 Statement of Purpose The purposes of this study were to (a) provide initial evidence for key aspects of the validity framework described, and (b) introduce MCF A as an appropriate methodology under the previously stated circumstances. Research Questions This study provided initial evidence for key aspects of the validity framework described by examining the following questions: 1. Did athletes employ the rating scale structure in the manner that the authors intended? 2. To what degree did various multilevel internal models fit the data? 3. How reliable were the rank orderings of coaching competency estimates? 4. Were coaching competency estimates positively related to satisfaction with the coach within teams? 5. Were coaching competency estimates positively related to satisfaction with the coach between teams? Additionally, in Chapter 2, this study provides a modest primer of applying MCF A methodology when athletes are nested in teams. 12 CHAPTER 2 INTRODUCTION OF MCFA This introduction should not be construed as original thinking by the author. Rather, it is based on the author’s understanding of Muthén’s pioneering work (1989; 1994) and previous syntheses of MCFA in the education literature (Kaplan, 2000; Hox, 2002). The rationale for providing it is to put a relatively new and complex methodology which may have important applications in sport and exercise science, in a context that may be more familiar and accessible to this audience. A working understanding of confirmatory factor analysis (CF A) is assumed. The call to separate substantive levels of variance, such as within students and between classrooms, has been voiced for decades in education (Cronbach, 1976; Hamqvist, 1978). Empirically, nonhierarchical analysis of the total covariance matrix of meaningfully nested data violates the assumption of independence. Conceptually and practically, nonhierarchical analyses often confound within-group and between-group relations and hamper theory development at both levels. Whether nesting is due to students clustered within classrooms, or athletes within teams, if both individual attributes and group characteristics are relevant then multilevel modeling, also known as hierarchical modeling, should be considered. Most multilevel modeling applications in socialscience research are multilevel extensions of the conventional multiple regression model (Hox, 2002).2 But, there are models of substantive interest, MCFA in this study, that contain multiple levels which cannot be analyzed within the linear multilevel framework. Muthén (1989) and Muthén and Satorra (1989) have provided an approach to latent variable modeling when data are 13 meaningfully nested. This section provides an example of applying this methodology when athletes are nested within teams and evaluation of the factor structure of an instrument is desired.3 However, before introducing the complex methodology, it is worthwhile to explain typical and practical consequences of ignoring multilevel data structures when evaluating the factor structure of latent variables. Consequences of Ignoring Multilevel Data Structures In a simulation study, Julian (2001) generated multilevel data, manipulating three design factors: (a) intraclass correlation (ICC): .05, .15, and .45; (b) group and member configuration: 100 groups with five members each (100/5), 50/10, 25/20, and 10/50; and (c) the internal model of the between group variance components: four factor oblique within and four factor oblique between (4FOBW/4FOBB), 4FOBW/2FOBB, and 4FOBW/5FOBB. Data were fitted to CFA models that ignored the multilevel structure. Biases in the chi-square statistic (x2) , model parameters (i.e., factor loadings, variances, covariances, and error variances), and standard errors were examined. The design factor that exerted the most influence was the ICC. When the ICC was > .05 the x2 was inflated, model parameters were inflated, and standard errors were deflated. Practically, CF A on the total covariance matrix (ST) when the ICCs are non-trivial (Z. 10, as a guideline; Muthen, 1997) will likely result in the model exhibiting more overall misfit (i.e., elevated x2 test), hypothesis tests that are overly optimistic (i.e., deflated standard errors leading to increased Type I error), and inflation of the absolute value of parameter estimates (e.g., factor loadings). A correction by Satorra and Bentler (1990, 1994) which is often used to correct for such bias in the x2 statistic and its standard 14 errors does not adjust for bias in the absolute value of the factor loadings (Julian, 2001), and does not allow modeling at both levels. Steps in a MCFA Muthén (1994) advised that MCF A should generally follow four steps. First, factor analyses of S; are conducted, exploratory (EFA) and/or confirmatory depending on the adequacy of an a priori internal model. Although this solution will likely be biased if the data truly exhibit a multilevel structure (Julian, 2001; Hox & Maas, 2001), it can provide a rough estimate of model fit. Second, estimate the degree of between group variation in the variables of interest (i.e., the ICC). The ICC for each item is estimated, a; where ICC = —2_2— , 023 is the between group variance, and 62w is the within 0' B + 0' W . group variance. If the ICCs are trivial a multilevel model may be unnecessary. Third, estimate the within factor structure of the within group covariance matrix (Spw). Because Spw is an unbiased estimator of the within group population covariance matrix (SW) and is not confounded by the scaled between group covariance (S's), model fit should be at least as good as in Step 1. Fourth, estimate the between factor structure. Because 8.3 is not an unbiased estimator of the between group population covariance matrix (23) and includes 2w, this is the most difficult part of the analysis and requires a technical synopsis. A Technical Synopsis Assume that athletes are nested within teams, and that the number of athletes within teams is unbalanced. To perform a MCFA athletes’ competency scores need to be broken down into an athlete-level component (i.e., athlete’s deviation from the team 15 mean: Yg, — Y g) and a team-level component (the disaggregated team mean: Y g ). An unbiased estimate of the population within team covariance matrix (21w) is provided by 0 N3 _ _ 2:2 :(Ygi_YgXXgi—Y8)' _ 8 ’ Spw, where S PW "' N—G and subscript g = 1...G teams, and subscript i = 1...Ng athletes per group with a total of N athletes (Muthen, 1989). Because Spw is the maximum likelihood estimator of 2w with sample size N-G, Step 3 is reasonable (Muthen, 1990). in (I, -l—’g)(1-’-I;g )' In the unbalanced case, 8.3 is calculated where S‘B = g G — 1 SIB estimates the composite 2w + c*23 where c* is a scaling parameter. Full Maximum Likelihood (FIML) estimation of 23 is problematic because a SIB would need to be computed for each distinct group size (Muthen, 1990). Muthén suggests computing a 2 G 2 N in. 8 N(G -- 1) single 8.3 by including c“, where 0* = - The value of c* is approximately equal to the average team size. This estimation procedure is referred to as Muthén’s maximum likelihood (MUML). Simulation studies and comparisons of FIML and MUML estimates suggests that Muthén’s methodology provides a reasonable approximation of FIML estimates in many applications (Hox, 1993; Hox & Maas, 2001; McDonald, 1994, Muthen, 1990). 16 Once the appropriate covariance matrices are estimated, the within and between factor structures can be evaluated simultaneously via the multi-group option found in most structural equation modeling software (Hox, 2002). However, because S‘B estimates the composite 2w + 6'23, two models need be specified for S'B: one for the within team structure and one for the between team structure. Thus, the within team structure is specified at both levels with equality restrictions between both “groups”. The between team structure is specified only in the second “group” with the square root of the scaling factor, sc *, built into the model. See Figure 3 for an example where the model specified in Figure 2 is specified at both the within and between team levels. 17 Eucqm w. 253353.03». 2.53.. 30am. 0* 00m magma: "om—.3 l/ 5380 mo." 3o mecca 3\|/ o" 0* o. \ \) \.\ /, 2.0 / $33.: 823 18 Sample size. Hox and Maas (2001) generated multilevel data to explore the accuracy of Muthén’s methodology with pseudobalanced groups and small samples at both levels. They concluded that the within teams part of the model performs well even with small samples at both levels (i.e., n = 5 to 15 within “small” groups, and G S 50, as guidelines). They also concluded that problems can arise when estimating the between teams part of the model when the number of teams is small. In such instances inadmissible estimates, such as negative error variances, may be observed at the between teams level. In such cases the error variance is fixed to zero. But, even when an admissible solution is reached, residual error variances and standard errors may still be underestimated. However, factor loadings are generally accurate. Due to the said biases, Type I error rates may be approximately 8%, when an alpha of .05 is specified. Thus, ideally, within teams sample sizes should _>_ 5, and team-level sample size should be 2 50. In cases where resources limit the number of teams but MCFA is still deemed appropriate (i.e., modeling the between teams variance is important), as in this study, it is necessary to qualify inferences made from the between teams portion of the model. Judging the fit of MCFA. As in CFA, there is no shortage of indices to judge the fit of a MCF A to the observed data. Given common practice in sport and exercise science and guidelines by Hu and Bentler (1999) and Kline (1998), the following fit indices are suggested: the x2 test, Bentler’s comparative fit index (CFI; 1990), the Tucker-Lewis index (TLI; Tucker & Lewis, 1973), the standardized root mean squared residual (SRMR), and the root mean square error of approximation (RMSEA; Browne & Cudek, 1992). In instances where comparing the fit of competing models is desired, the 19 likelihood ratio chi-square statistic (fig) and the consistent Akaike information criterion (CAIC; Bozdogan, 1987) are also suggested. Introductions to and criterion guidelines for the suggested fit indices are offered here guided by recommendations from Hu & Bentler (1999) and Kline (1998). The x2 test represents a test of the significance of the difference in fit between the specified model and a just-identified version of it. Because this test is sensitive to sample size a common guideline is to divide the x2 by the degrees of freedom (50‘), where a ratio 5 3 suggests reasonable fit. The CF] and TLI are incremental fit indices which indicate the proportion in the improvement of the overall fit of the specified model to a null model. Well-fitting models have values 2.95, while marginally acceptable models have values 2 .90. The SRMR denotes the average residual from fitting the correlation matrix for the specified model to the correlation matrix of the observed data. Well-fitting models have a value 5 .08, while marginally acceptable models have a value 5 .10. The RMSEA represents the amount of error per df in the specified model. Well-fitting models have a value 5 .06, while marginally acceptable models have a value 5 .10. The szR statistic can evaluate the relative fit of nested models (sza = stimplc - szplcx) where 02 is the deviance value for the model in question. The fin statistic is distributed with degrees of freedom equal to the difference between the number of parameters in the nested models (McCullagh & Nelder, 1990). Because the szR statistic is sensitive to sample size, the CAIC is also suggested. The CAIC depicts the fit of the model in question relative to the number of parameters estimated, where lower values indicate better fit and competing model do not need to be nested (Wicherts & Dolan, 2004) 20 An Application of Muthén ’s Steps Assume that the internal model in Figure 3 is posited. It bears mentioning that the internal models need not be the same at both levels. Step 1 and Step 2 can be performed on the raw data matrix. Assume that a CFA on ST in Step 1 provided some support for the posited internal model in Figure 2 but with more misfit than is acceptable. Assume further that Step 2 revealed ICCs for all of the 24 items that were considerably greater than .10 and that MCFA was deemed appropriate. Practically, this decision implies that athletes’ perceptions of their head coach’s coaching competency were influenced by athlete-level attributes as well as shared team-level characteristics. Given the construct of interest, athletes’ evaluations of their head coach’s coaching competencies, strong within teams and between teams effects were expected. Step 3 and Step 4 are multi-staged steps. Step 3 begins with estimating both Spw and 8.3 from the raw data matrix, ST. All MCFA analyses that will be reviewed can, and later will be, performed in EQS 6.1 (Bentler, 2004). The covariances in 8.3 should be larger than the values in Spw because this matrix equals the within team matrix plus the between team matrix multiplied by c*. Spw is fitted to the hypothesized model, Figure 2 in this case, ignoring S}. J ustifiable modifications to the internal model should then made, if necessary. The final step, evaluating the fit of the within teams and between teams models simultaneously, also can be considered multi-staged (Hox, 2002). Using the multi-group procedure, the final within teams model is specified in both “groups” with equality constraints across groups for all parameter estimates. This part of the model is constant throughout the proceeding analyses. A series of between teams models should then be 21 specified. A null between teams model (i.e., the lack of any between team model) should be specified. If this model fits, there is no between teams model. Assuming poor fit, an independent model (i.e., only variances are estimated in the between model) should be specified. If this model fits, team-level variance exists but a team-level structural model does not. Assuming improved yet still unacceptable fit, the hypothesized between model should be specified. J ustifiable modifications to the internal model should then be made, if necessary. 22 CHAPTER 3 METHOD Sample To maximize group-level sample size, teams were recruited from both lower division intercollegiate soccer and ice hockey programs. Soccer and ice hockey were chosen because both can be considered open team sports, because head coaches tend to be involved in the coaching of most, if not all, positions on the team, and because both have a fairly large number of athletes within teams. Men’s (g = 14) and women’s (g = 29) soccer teams (g = 43) were recruited from the midwest of the United States, and 31 teams (12 men’s and 19 women’s) agreed to participate. Women’s ice hockey teams (g = 28) were recruited from the northeast and midwest regions of the United States, and 16 teams agreed to participate. Despite numerous reminders from a variety of sources, only 21 soccer teams (8 men’s, 13 women’s) and 11 hockey teams submitted data (response rate = 68%; G = 32). Within the soccer sample, three coaches of the women’s teams were women; men coached the men’s teams. Within the women’s ice hockey sample, five of the coaches were women. At the athlete-level, participants were 407 soccer players (165 male, 242 female) and 183 ice hockey players (M = 18.44 athletes per team). Within teams, number of participants varied from 13 to 25 (N = 590). Across teams, most participants were Caucasian (94%) and between the ages of 18 - 23 years (99%; M = 19.53, SD = 1.34). Distribution of year on team was 43% first-year, 23% second-year, 20% third-year, and 13% fourth-year. Procedure 23 All necessary permissions were obtained from the institutional review board and the 32 head coaches prior to data collection. An explanation of the study was presented to each team by the head coach. Informed consent was obtained from all athletes. Athletes were guaranteed confidentiality for their responses. Questionnaires were completed at approximately the one-half mark of the season to ensure that athletes had enough experience to make informed judgments regarding their head coach’s coaching competency and their own satisfaction with the coach. On each team, an identified trainer or team manager administered the questionnaires. Completed questionnaires were returned to the trainer or team manager, who mailed the returns to the researchers. Trainers or team managers who successfully followed through were given a $60 honorarium. Measures Coaching competence. Coaching competence was measured at the athlete-level by the CCS as described in the initial validity framework subsection of the Introduction. The CCS items are listed in Appendix A. Satisfaction with the coach. Satisfaction with the coach was measured at the athlete-level and consisted of selected items from a scale that was intended to measure, in part, attitudes toward the head coach (Smith, Smoll, & Curtis, 1978).4 Indicators used in this study were the same as those used by Feltz et al. (1999) and Myers, Vargas-Tonsing et al. (in press) and included (a) how much do you like playing for your coach, (b) if you were able to play next year how much would you like to have the same coach again, (c) how much does your coach like you and, ((1) how much does your coach know about soccer/hockey. Ratings were made on a 7-point Likert scale ranging from 1 (very little) to 24 7 (a lot). Myers, Vargas-Tonsing et al. provided psychometric evidence for the unidimensionality of the scale for a similar purpose. Specifically, in their study, the principal component accounted for 70% of the total variance, all items had a high loading (2 .70) on this component, and the set of items displayed good internal consistency, a = .85. Treatment of Data Missing data. Five cases were missing data (i.e., < 1% of all cases). In each case data were missing for no more than two responses. Thus, incomplete cases had at least 93% of the responses of primary interest. These cases were retained and data were judged to be randomly missing. Scores for missing responses were imputed using observed responses based on case-wise maximum likelihood estimation using the J amshidian- Bentler EM algorithm (1999). Outliers. Multivariate outliers were identified using Mahalanobis distance estimates. Five cases, or less than 1% of the data, were identified (p < .001). As suggested by Tabachnick and Fidell (2001) visual inspection of the cases and stepwise regression ensued to inform judgments regarding what caused the cases to be outliers prior to deciding their fate. In the identified cases, data were entered correctly, subjects were within expected age ranges, represented both genders, and were nested within different teams. Empirically, six to eight competency items distinguished each outlier from the other cases. The specific items that generated outlying responses were not consistent across the outliers. The five outlying cases were determined to be random and were dropped. The final athlete-level sample size was, N = 585. 25 Normality. Normality was assessed with univariate skew and univariate kurtosis estimates and Mardia’s (1970) coefficient. Due to the size of the sample, absolute values of univariate estimates, not significance tests, were examined. Guidelines used were values 2 Bl for extreme skewness and values 2 HO] for extreme kurtosis (Kline, 1998). None of the univariate estimates suggested extreme non-normality (see Table 2). Mardia’s coefficient suggested multivariate departures from the kurtosis of a normal distribution, 120.94, p < .001. However, large samples can inflate values for this coefficient (Bollen, 1989). A common adjustment for multivariate kurtosis, Satorra and Bentler’s (1994) correction, was not applied because it applies the correction on the raw data matrix and most analyses in MUML methodology use covariance matrices. However, because nonnorrnal distributions can inflate test statistics (Muthén & Kaplan, 1985), subsequent model fit may have been artificially inflated (i.e., worse fit). Analyses Rating scale. Competency data were calibrated to the Rasch Rating Scale Model (RSM; Andrich, 1978) using Winsteps (Wright & Linacre, 1998). Rasch models are a family of l-pararneter item response theory (IRT) measurement models. IRT is an alternative to true score test (TST) theory and is well suited to analyze rating scale data (Wright & Masters, 1982). In this case, the RSM described the probability that a specific athlete (n) would rate a particular item (i) using a specific rating scale category (k), conditioned on the athlete’s competency judgment (9,.) and the item’s difficulty (6,). The log-odds equation for this probability, log (Pk/ Pk-1)= 0,, - 8, - ‘tk, contains three parameters: 0“, 8,, and category threshold (1a,), the threshold between two adjacent rating scale categories. Parameters of this model are estimated from the observed data via a 26 Joint Maximum Likelihood Estimation method, which does not impose distributional assumptions on the parameter estimates (Wright & Masters, 1982). In addition to the parameter estimates, Winsteps produces standard errors of these estimates and model-to- data fit indices. In accordance with posited internal models, both an omnibus unidimensional construct (TCC) and consecutive unidimensional dimensions (MC, GSC, TC, CBC) were explored.5 The consecutive approach was just a unidimensional approach repeated for each dimension. Following calibration of the data to the RSM, the degree to which athletes employed the rating scale structure in the manner that the authors intended was evaluated according to guidelines suggested by Linacre (2002). These guidelines can be summarized as: (a) all categories should have at least 10 observations, (b) distributions of ratings for each category should be unirnodal, (c) average measures should increase with the categories, ((1) unweighted mean square (UMS) fit statistics should be less than 2.0 for each threshold, (e) category thresholds should increase with the categories, (f) ratings imply measures (coherence > 39%), (g) measures imply ratings (coherence > 39%), (h) category thresholds should increase by at least 1.2 logits, and, (i) category thresholds should increase by no more than 5 logits. Because criteria (d), (f) and (g) are not well-defined in the sport and exercise science literature, an elaboration of these guidelines is provided. The UMS fit statistic depicts the degree to which the observed ratings are consistent with the expected values, and is sensitive to large residuals from pairings of item difficulty and ability estimates that are far apart on the underlying scale. The UMS fit statistic is reported as a chi-square divided by its degrees of freedom, resulting in an expected value of 1.00 and a range fi'om 27 0.00 to co. In rating scale analyses, thresholds with UMS fit statistics < 2.00 are considered to demonstrate adequate fit (Linacre, 2002). Coherence refers to the degree to which the observed ratings match the modeled expectations for a particular rating scale category, and vice versa. Each rating can be depicted as belonging to a rating scale m. The logit scale upon which athletes are located can be demarked into measurement Agnes that are defined by the locations of the category thresholds. As illustrated in Figure 4, the location of each athlete can be associated with one measurement zone (X axis—one zone per rating scale category with a zone representing the expected rating for that coach-by-item combination), and the observed ratings assigned by that athlete (X) can be placed in one of the rating scale categories (located in the Y axis). Each point indicates a rating assigned by an athlete to a single item. Note that two of the three ratings in Zone 1 are in the observed rating category of l and one of these ratings is in the rating category of 3. Hence, 67% of the measures in Zone 1 are go_h_emt_ with Category 1 observations. Similarly, three of the five Category 2 observations fall in Zone 2. Hence, 60% of Category 2 ratings are coherent with Zone 2. 28 Emaxm a. 00:82.8 98:66 m2. c 22—53% 96 ofiomoa. 83888. No.8 _ Nana N No.5 u No.5 A No.8 u W 0000 W Genome—Q m 0 000000 0.38.5. A M o o o o o 9.33 ... W .3 o o o o o M 09.32.! N s. o o o A Guano—u. — . Hm: _ Hm: N Am: u Hm: a - . _ - _ _ _ - -wbc -Nbo -_ be abs :5 P8 wbc Poo Nona _ -v Nona ~ -v Nona w -v Nona A -v Nona u -v Gammon! _ OmgnmoQ N 08803. u Genome—u. 5. OmnomoQ m n 3.x. u do} u. 3:. u 8.x. u we..\.. Genome—u. u -v Nona m u 39x. 08808. A -v Nona A u 3.x. nfiomoa. u -v Nona w n 8.x. 08803. N -v None N n 8.x. nanomoQ _ -v Nosa _ .I. 3..\.. 29 Because a number of Linacre’s guidelines (2002) were not realized in the original rating scale structure (to be discussed in the Results section), a post-hoe approach was applied to arrive at an improved rating scale structure. To determine a post-hoe structure, categories were collapsed based on general principles (Linacre, 1995; Wright & Linacre, 1992) and statistical indicators (Zhu et al., 1997). General principles for collapsing categories state that collapsed categories (a) should be explainable, and, (b) should balance observed fi'equencies as much as possible. Statistical indicators of improved fit for a post-hoe structure should include (a) improved model-data fit statistics, (b) category and parameter estimates that come closer to satisfying Linacre’s guidelines, and, (c) separation indices should not decrease drastically as compared to the original rating scale structure.6 Of interest in Rasch analysis are the person and item separation indices. In non-technical terms (see Linacre, 1994, for a technical introduction), both of these indices indicate how well the scale distinguishes individual people and items, respectively, where larger values indicate greater separation. Internal models. A MCF A approach to model fit was used to evaluate the utility of a unidimensional model (TCC) and a multidimensional model both within teams and between teams (see Figure 3). Post hoc respecification was considered only for models that approached acceptable fit (MacCallum, Roznowski, & Necowitz, 1992). Empirically-based possibilities for model respecification were guided by inspection of Lagrange multiplier tests [LM] (Silvey, 1959), Wald tests (Wald, 1943), and standardized residuals. LM values approximate the amount by which the model’s overall )6 would decrease if the identified parameter were estimated. Wald test values estimate the amount by which the model’s overall x2 would increase if the identified parameter were fixed to 30 zero. Standardized residuals indicate the degree of misfit between the observed correlation matrix and the predicted correlation matrix, where absolute values greater than .10 can indicate misfit (Kline, 1999). Empirically-based possibilities were evaluated, in part, based on relevant coaching effectiveness literature. Disattenuated correlations among latent factors were also examined to depict the degree of redundancy in the multidimensional models. Once an intemal model was determined, variability of the factor structure between sub-samples was evaluated to determine the degree to which the assumption that all of the athletes were a random sample of a single population was reasonable. Sub-samples of note were males (n = 165) and females (n = 420), soccer athletes (n = 403) and women’s hockey athletes (n = 182), and year on team, where 1St = 150, 2nd = 135, and 3rd and 4’" = 190. Athletes in their third and fourth year were collapsed so that each sub-sample had at least 100 observations. Because determining the degree of factorial invariance between sub-samples was not the purpose of this study, and because of the relatively large size of the sub-samples, only the invariance of factor loadings and factor covariances were explored, and the alpha level selected for these comparisons was equal to .001. Internal consistency reliability. The consistency of rank orderings of competence estimates across measurement contexts was examined with reliability of separation coefficients (or). The reliability of separation coefficient is analogous to Cronbach’s (1951) alpha, but it is based on estimates of true and error variance derived from IRT models. Specifically, the reliability of separation for competency estimates = [V (d) - MSE (é) ] / V09), where V(6l) is the variance of the competency estimates and MSE (d) is the mean error variance of the competency estimates. This equation is comparable to 31 the TST theory definition of reliability as the ratio of true variance to observed variance. As suggested by Kline (1999) and in relation to the given purpose, as greater than 0.90 were considered excellent, as greater than 0.80 were considered very good, and as greater than 0.70 were considered adequate. Forming measures to test questions 4 and 5. Satisfaction with the coach data were calibrated to the RSM using Winsteps. In this case, the RSM described the probability that a specific athlete (n) would rate a particular item (i) using a specific rating scale category (It), conditioned on the athlete’s satisfaction with the coach (0,.) and on the item’s difficulty (8,). The log-odds equation for this probability, log (Pk/ PM) = 0., - 8, - 1],, contains three parameters: 0“, 8,, and category threshold (tk). Calibration of the data to the RSM resulted in a satisfaction estimate for each athlete. Estimates were on a single linear continuum in logistic ratio units (logits). A logit is the natural logarithm of the odds of an event. Because the data in this study were polytomous, odds were defined by the likelihood of assigning a rating in one category versus the odds of assigning a rating in the next lower category. A Principal Component Analysis (PCA) of the residuals from the Rasch model was performed on the four indicators of satisfaction to judge the adequacy of the assumption of unidimensionality. The residual from the Rasch model is defined as the difference between the observed rating and the model-based expectation, where the expected score (E) for the nth athlete on the ith item was given by (Linacre, 1998): m 15.:ka m m'k , where k is the value of the rating scale category, ranging from 0 to a k=0 maximum number, m, and Pm, is the probability of observing a response in category k for 32 athlete n on item i as defined by the Rasch model. Scaling data to the Rasch model is equivalent to extracting the first principal component, which is referred to as the Rasch component, from the data with the restriction that all items have equal loadings on that component (McDonald, 1985). A PCA of the residuals reveals whether systematicity (i.e., multidimensionality) exists once the variance accounted for by the Rasch component was taken into account. That is, the results of the PCA indicate whether the assumption of unidimensionality is tenable. Competency data also were calibrated to the RSM using Winsteps, although initially these data were attempted to be fitted to a multidimensional extension of this model as implemented in ConQuest (Wu, Adams, & Wilson, 1998). But, due to a non- positive definite psi matrix (i.e., high correlations among the factors, which will be presented in the Results section), convergence to a stable solution for the multidimensional model was not possible. Empirically, what this means is that at least some of the factors could not be distinguished by the subjects within this particular measurement model. Instead, four unidimensional models were fit to the data, to produce logit-based scores, guided by the final internal model (i.e., MC, GSC, TC, CBC). The decision to maintain separate factors was consistent with the results of the MCF A and will be defended in the Discussion section. PCA of the residuals from these models were not performed because the dimensionality of these data. was confirmed in the MCF A. Testing questions 4 and 5. Satisfaction and coaching competency measures were dependent because athlete observations were nested within teams. Hierarchical linear modeling (HLM), as implemented in HLM5 (Raudenbush, Bryk, Cheong, & Congdon, 2000), is well-suited to handle observed dependent data. Congruent with Horn’s model of 33 coaching effectiveness (see Figure 1), satisfaction with the coach was treated as the dependant variable and the proposed positive relationships with coaching competency were tested within and between teams. To avoid problems caused by multicollinearity, bivariate correlations between coaching competencies and satisfaction with the coach, at the individual-level and team-level, were explored to determine which competency measure would be used as the independent variable. Model-building consisted of at least three steps to test questions 4 and 5 (Raudenbush & Bryk, 2002). First, an unconditional model (i.e., Model 1) was imposed: Level 1: £8 = flog + r,g, where flog = the mean of athlete satisfaction for team g, r,g= the unique effect of athlete i for team g, Level 2: flag = 700 + uog, where 700 = the average team mean of athlete satisfaction, uog = the unique effect of team g on the average team satisfaction mean. Of particular interest in Model 1 was the variance of r,g or the within team variance, 62w, and the variance of uog or the between team variance, 0'23. These variances were used to estimate the ICC of satisfaction with the coach. Second, a random coefficient regression model (i.e., Model 2) was imposed where the coaching competency slope was estimated within teams and was free to vary between teams: Level I: Y ,g = flog + fllgmthlete ’s competency judgment),g + r,g, where flog = the mean of athlete satisfaction for team g, ,618 = the expected amount that an athlete’s satisfaction score would change given a one-unit change in his/her competency judgment for team g, r,g = the residual of athlete i for team g, 34 Level 2: flag = 700 + uog ,6“, = 710 + ulg, where 700 = the average team mean of athlete satisfaction, 710 = the average satisfaction-competency slope across teams, uog = the unique effect of team g on the average team satisfaction mean, u ,8 = the unique effect of team g on the average satisfaction-competency slope. Of particular interest in Model 2 was 7,0 and the variance of ulg , 6231. 710 was the average satisfaction-competency slope and addressed question 4. 02,3, was the variance of the estimated satisfaction-competency slopes, [3133, around the average satisfaction- competency slope, 7,0. An alpha equal to .05 was selected for all hypotheses tests, and the magnitude of the standardized betas were interpreted according to Cohen’s (1988) guidelines for effect sizes, where 0.20, 0.50, and 0.80 indicated small, medium, and large effect sizes, respectively. Third, an intercepts as outcomes model (i.e., Model 3) was imposed where the team competency score was added to model between team variance on yoo: Level I .' Y,g = [908 + fllgathlete ’s competency judgment),g + mg, where flog = the mean of athlete satisfaction for team g, fllg= the expected amount that an athlete’s satisfaction score would change given a one-unit change in his/her competency judgment for team g, r,g= the residual of athlete i for team g, Level 2: flag = 700 + 701(team ’s competency judgment)g + uog fllg = 7,0 + ulg, where 700 = the average team mean of athlete satisfaction, 701== the expected amount that team satisfaction would change given a one- unit change in the team competency score, 710 = the average satisfaction-competency slope across teams, uog = the residual of team g on the average team satisfaction mean, 35 u 1 g = the unique effect of team g on the average satisfaction-competency slope. Of particular interest in Model 3 was yo, and the variance of uog, 0'23. yo, was the average satisfaction-competency slope across teams and addressed question 5. 62,3,was the variance of the adjusted average team satisfaction scores. Sport played (0 = soccer, 1 = ice hockey) was entered as a Level-2 predictor to ensure that none of the fixed effects varied on the basis of sport played. Model estimation and fit. Final parameters were estimated via restricted maximum likelihood, and differences in model fit were examined via FIML estimation (Raudenbush & Bryk, 2002). Relative fit of nested models was evaluated with a szR statistic and comparing CAIC values. Reliability estimates. Point reliability estimates described how reliable, on average, the slopes were based on computing ordinary least square regressions separately for each team. A reliability estimate was provided by averaging individual team estimates. Raudenbush and Bryk (2002) suggest, as a guideline, that the point reliability estimate should be greater than .05. Slopes that do not meet this heuristic were candidates to be fixed across groups. 36 CHAPTER 4 RESULTS Did Athletes Employ the Rating Scale Structure in the Manner that the Authors Intended? Athletes did not employ the original rating scale structure, for either the omnibus unidimensional conceptualization or the consecutive unidimensional conceptualization, in the manner that the authors intended. Specific problems were observed with threshold estimates-criteria (e), (h), and (i), and coherences-criteria (f) and (g) (see Table 1). In the omnibus unidimensional model, all threshold estimates and most (65%) coherences failed to firlly meet these criteria. In the consecutive unidimensional model, most (60%) thresholds and more than one-third (3 8%) of coherences failed to fully meet these criteria. In both conceptualizations, the lower portion of the scale was used less frequently than the middle and upper end of scale, and the majority of problems were observed in the lower and middle ranges with some problems associated with Category 9. Post-hoe categorizations were evaluated in accordance with previously mentioned criteria. The best fitting, and accepted, post hoc categorization was a five-category structure where responses were collapsed in Categories 0 and 1, Categories 2 through 4, Categories 5 and 6, Category 7, and Categories 8 and 9 (see Table 1). The majority of problems, according to Linacre’s (2002) guidelines, encountered in the omnibus unidimensional conceptualization were improved in the post hoc categorization. All of the problems, according to Linacre’s guidelines, observed in the consecutive unidimensional conceptualization were improved in the post hoc categorization. Last, person and item and separation statistics changed little in the new categorization, which was consistent with Zhu et al.’s (1997) guidelines for post hoc categorization. 37 48.8.8 . 02.5.38 9R we... Ion. 55.3.“ and? M26228... H2». 88......" 7.2.32.2. 0.2.8 2885. 4.8.5.98 0.5828 3.5....“ OOBbGnOn—OG OOBUGnGH—OO Goagnflaoa OOSUQHODOO OOBRHGDOG Cain... $87.5." Clash: .88.-..8 Clash. .08.-..8 01%|... .8815." Gangs. 32-78 .39 A... c .3 3 .~ 8 8 2.8238. . .8... .8 .3 .N. .8 a. N. 3 we 3 N .8 3 8.. .8 .N ... ... .NS .3. 33 3 Na. .8 83 3 NS . 3o 3. .3 a. .8 3 .8. ... a... 8.. 3. 3 N38 33 3. .8. 3.. a: 3.. 8. a... .8. a 33 83 :5 :5 . .2 . .8 So so 3.. 33 a 8.... 3.. .2. .5 3.. e .3. .33 .8 . . a 3.. .33 .3 .N: .3 8.. a. c -9... - -93 - .8 - -93 8.8 33% . -..3 .93 -93 93 a... 9... -9... ...~. -5. tr... aaaca N .93 -..3 -93 -...u .3 a .93 -98 .93 -5. .93 -98 .93 .93 -..8 -..3 . 93 .98 .93 .93 .9... 3 9.8 9.. 9... 93 9... a 98 .... Z. .8 ...o .3 .... .a. .3 .3 . .... N. .. NS 98 93 a... .3 .3 8.3 8.8 m .8 8.8 .93 N3 93 e we. 8... .... a... 3.3 .3 u... .9... .3 .8 a. c 93 98 93 ..o. ...c Seams... . S. ..o. S. 98 9d 93 .8 ...N .... ...o 38.. 3:8... N ..Nc ..No 93 .... 9.3 8 r... 93 ...e 98 .8 .3 ..o. 98 .8 r... . .8 .8 r... .8 .2 a 98 98 93 .8 9... a .8 9a.. .9. 98 .8 9e. 93 98 93 93 q 98 98 93 9o. 98 ..... 93 98 93 98 m 93 93 98 93 9.. e Z: .3 ...: S. 93 .2 .3 .2 Z. .... 38 A... a... a... H.532... .9: ._.MA Luu Arms .cba PcA aha FaA v.8 VONQGM-th—O Ac.\. we: 3... .A.\. NN.\. Nae} mac} mA.\. A...\. 3.x. 3 0.8.82 Iv 3.3:... VomflO\M-hWN—‘O am... an... no... nix. 93$ 8.x. qu\o A ..\. AA.\. .33. .m. 7.03:... Iv 038.5. VOWQGUIAUN—O v2.3... 3.........Q ohm v2.8: 833...... Abe .8... 833...... Sea 23.. win... on... 5&8... 3...... 8 38. m nc...o....o. -w .A@ .93 —.au who m..\. .3... m ..\. 33. um... .33. 33. ma... we... .33. obA Pm. 9.3 -ubw -98 who ”ran. 8.x. mm... 3.x. 3.x. mo... m . .X. um... aA.\. Am.\. 3.... 98 Po. mbu -9; .92. -~.wu -23 .ebA Pun ruc u. .0 Mac mA.\. Aq.\. Au.\. 3.x. uu.\. .33. u ..\. we... 3.x. we... mm... 3.3. Am.\. Ao.\. A .3. Au... A33. mm... max. .33. 90. who PM. -93 -obA N.A© A. .A ac... Am.\. 9.3. S... aA.\. no... .33. 3... .9... 3.... 93 .3 -N.Au gs...» Aha .9: Arms P. A F; who u b. 3.x. Am.\. 33 —a.\. NN.\. uA.\o Au.\. um... um... Nao\o mm... Aw.\. nae} u. .\o um.\o uu.\o A . .\. Aq.\. Am.\. aw... 9mm Nu: mbm Ado .98 N...“ Pun Am.\. 3.x. 3... 3.x. Ac.\. 3.x. .3... 33. Am.\. 3.... Pam Mum Ahm Luna .N._.. ha... -—.=m Avg chm ..—m Now Abe 8.x. 3 o\. 5.} we!- Ma... moi. uw.\. an... 3.x. Ao.\. 3.x. mm... was} we.\. ma... AN.\. AA.\. 3.x. um... 3A.\. 93 N3 93 -u..A .98 ~. .A ubu mu... am... 3.x. an... Aa.\. ac... 3N... 3... u ..\. 3A... Paw N: 9A.. 39 To What Degree did the Proposed Internal Models Fit the Data? Step1. The proposed unidimensional model and multidimensional model were imposed on ST. The unidimensional model fit the data poorly, x20”) = 2565.41, p <.001, ledf= 10.18, CFI = .81, TLI = .79, SRMR = .07, and RMSEA = .13. The multidimensional model fit better than the unidimensional model, CAICdm= 1316.56. However, the multidimensional model exhibited only marginally acceptable fit to the data, x2046) = 1204.93, p <.oo1, xz/df= 4.90, CFI = .92, TLI = .91, SRMR = .05, and RMSEA = .08. In both models the x2 value was likely inflated due to the fact that the multilevel structure of the data was ignored (Julian, 2001). Step 2. ICC values for the 24 CCS items ranged from .16 to .36 (M = .25, SD = .06; see Table 2), which made it reasonable to proceed to Step 3. In order to proceed to Step 3 and Step 4, Spw and 8*3 were calculated (see Table 3 and Table 4, respectively). 40 Table 2 Item Characteristics for the C CS Item M SD skewness kurtosis ICC MCl 3.67 1.07 -0.33 -0.69 .22 MC3 3 .76 1.06 -0.36 -0.79 .24 MC6 3.42 1.12 -0.09 -0.88 .29 MClO 3.61 1.11 -0.31 -0.72 .31 MC 1 2 3.70 1.10 -0.37 -0.71 .29 MC15 3.40 1.1 1 -0.10 -0.78 .25 MC23 3.71 1.12 -0.47 -0.65 .31 GSC2 4.16 0.94 -0.90 0.14 .21 GSC4 4.13 0.99 -0.86 -0.10 .24 GSC8 3 .97 1.02 -0.63 -0.42 .24 GSC9 4.15 0.98 -0.92 0.06 .20 GSCll 3.82 1.06 -0.51 -0.54 .16 GSC17 3.84 1.03 -0.53 -0.54 .22 GSC2] 3 .92 1.05 -0.56 -0.65 .18 TC7 3.91 1.08 -0.52 -0.80 .34 TC 1 4 3.61 1.03 -0.21 -0.68 .20 TC16 3.71 1.04 -0.32 -0.69 .20 TC18 3.87 1.12 -0.62 -0.52 .16 TC20 3 .85 1.03 -0.55 -0.42 .20 TC22 4.02 1.02 -0.77 -0.16 .26 CBCS 3.94 1.12 -0.76 -0.40 .35 CBC 1 3 3.88 1.09 -0.62 -0.51 .25 CBC 1 9 4.09 1.06 -0.93 -0.04 .36 CBC24 3 .95 1.13 -0.80 -0.29 .29 Note. ICC = intraclass correlation. 41 adv—a w 3833 33.33.333.33». 95.3385. ask 032.8358 l 3 6 m u U 3 2 4 8 9 H U ml... 7 4 6 8 0 2 5 B .01, M. C C C 2 C C C C .l I. .l. 2 2 C C C C Mmmmmmmaamammmmmmmmmmwmm :0. .3 :3 .3 :3 :3 .3 .33 .3 .33 .33 .3 .33 .3 .3». .33 .33 :3 .33 :3 .33 .33 .33 :3 .33 30w .3 .3 .33 .3 .33 .3 .3 3 3 .3 .33 .3 .3 AV .3 3 x3 .3 .3 .3 :3 :3 :3 :3 303 .3 .3 .2 :3 .3 3 3 .3 33 .3 .3 .33 .3 .3 3w .3 :3 .33 .3 .33 .3 .3 :3 .33 :03 m3 .3 .3 .3 :3 .3 .3313 :3 .33 .33 :3 .33 :3 .3 :3 :3 .3 .3 .33 x3 .33 .3 .3 305 .3 :3 .33 .3 .3 :3 .3 .3 .3 13 .3 :3 :3 .3V :3 .33 :3 :3 .33 x3 33 .3 .3 :3 303 .3 .3 .3 .3 .3 bu .33 :3 .3.» .3 :3 :3 :3 :3 .3 .3 .33 :3 :3 :3 :3 .33 .3 :3 anw .3 .3 .3 .3 .3 .3 .3 .3 .33 .33 .3 :3 :3 .3 :3 :3 :3 :3 .33 .3 .33 .33 13 :3 033 .3 .3 .3 .3 .3 .3 .3 .3 :3 A3 .3 .3 :3 :3 .3 .3 .3 .3 .3V .3 .3 .3.» .3 :3 Own.“ .3 .AN .3 .3 .3 .3 .3 .3 .3 .3 :3 :3 .3 :3 :3 .33 .3 :3 :3 .3 .33 .3 .33 .3 Oman 2.: .33 .3 33 .2 .33 .3 .3 m3 .3 A3 :3 :3 .3 :3 .3 .33 .3 :3 :3 .3 .3 .3 .3 OwOe .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 :3 :3 :3 .33 .3 .3 .33 .33 .33 :3 .33 .3 .3 Qm0_ _ .3 .3 .3 .3 .3 .3 ha .3 x3 .3 ha .35 .3 :3 .3 :3 :3 .3 :3 :3 .3 .3 V .3 .3 Own: .3 .3 .3 .3 .3 x3 33 be .3.» ha .3 .3 .3 .3 .3 .3 .3 .33 :3 :3 x3 .3 .33 .33 DmOB .3 .3 .3 .3 .AN .3 m3 .3 .Am .3 .33 .3 .3 .3 .3 :3 .3 :3 .3 .3 .33 .3 .33 .3 4.0.3 .3 .u. .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 3m .33 .3 :3 .3 :3 .33 :3 :3 :3 3.03 .A. :3 .3 .3 .3 .3 .3 .3 .3 .Ac .3 x3 .3 x3 .3 .3 A3 :3 :3 .3 .3 .3 .3 :3 ad 3 .3 .3 .Am .3 .33 .3 .3 .3 .3 .3 .3 .Am .3 x3 .3 .3 .3 :3 :3 A3 .33 .3 .3 .3 HO _ a .3 .3 .3 .Am 33 .3 be .3 .3 x3 .3. .3 .3 .3 .3 .3 .3 :3 :3 :3 .33 .33 :3 :3 H03 .3 .3 .3 .3 .AN .3 .Au .3 .AN .AN .3 33 .Am .3 .3 .3 .3 .3 .3 :3 .3 .33 .33 .3 H03 .3 .3 :3 .3 .3 .3 .3 .3 .2 .AN .3 .3 .3 .3 .3 .3 m3 33 x3 .3 .3 2.3 .3 V .3 nwnm .53 .3 .3 x3 .3 .3 .3 .3 .3 .3 .3 .53 .3 .3 .3 .3 .3 .3 .3 .3 .3 :3 .3 .3 own _ u .3 .3 .3 .3 he .3 km .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .83 .3 .33 .3 own; .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3: .Na .3 .3 .3 .3 .3 .3 x3 .3 .3 Gamma .3 .3 .Au .3 .3 .3 .3 .3 .3 .mm .3 .3 .3 be .3 .3 .3 .3 .Ac .3 .3 .3 .3 .3— 22» 5.332. oaiom 3 5a :25. Emma—5. m8 :5 8.3.388“ songs—manna 2:13. Ed 8&1888. 42 32a A @833 33.68.- 3.33% 0933588. $3 035.33an .l 7 .l 3 9 4 M M M M M M M G G G G a b a T T T T T T c m m m 30— bb .%A .33 bb 3% .33 b3 .3 bb .Vb .bb .3 3% .3 A3 .%3 .%b .3 b% .3 .%3 bu A3 .%3 30b A3 b3 .3 .%b .%A .33 .%A .%b .%b .%3 .%b .%b .33 .%N :3 .3 .%b .%b .3 .3 .3 .3 bu .%3 30b bb b._ qb .%V .%3 .33 .33 b3 :3 :3 :3 b% b% bb :3 .3 .3 .3 bu 3% .%3 .VV .3 .%% Z05 bb bb b3 3.3 .%b .%% .33 .3 .3 bu bb .3 .%3 .3 .AV .3 3% .%3 b% bb .%b .3 .3 .%b :0 5 Ab b.A bb bb ab .%3 .%% .3 bb bu .bb 3% .%b .33 .33 b% .3 .33 .3 bb .%A .%b . 3 .33 :03 bb Ab bb bb bb bb .%3 A3 bb bb :3 bb .Vb bu bb .3 .3 .3 bb b% .%b .VV .3 .%b :03. bb bb qb 3b bb b.A 3b bb bb bw .bb .3 3% .3 :3 .3 .3 .mb bu bb .%V .%3 .3 bb Om0b b.A Ab bb bb Ab bb bb A._ .3 bb bb bb bb .%b b% .3 .%b .%b .%b .%3 b% bu bb .Vb Dm0A b3 Ab b3 Ab Ab buy A._ A._ b._ bb bb .3 .33 .%3 .3 .%b .%V .%A .%V .%% bb .33 b3 bb Qm0b bb Ab b3 Ab A.A bb Ab Ab Ab bb bb .3 bb .3 .3 .%b .%V .%b .%% .%b bu b% bb 3% Qw0b b._ Ab b._ bb bb Nb bb Ab Ab A.A A.A .%3 .%b .%b bb 3% .%b 3% .%b .3 bb bb :3 .33 Qm0: bb Ab bb Ab Ab bb Ab bb Ab Ab bb Ab bb bb bb .%A .%3 .%% .33 .%b .3 . 3 .3 3% Qw0: Ab Ab Ab bb Ab Ab bb Ab Ab Ab Ab A.A b._ bb bb .%3 .%3 .%b .%% .%A .3 .VV .3 3% OM02 bb A. _ bb Ab Ab b.A Ab bb Ab A.A b3 Ab A.A A.A .3 .3 .%b .%b .33 .%b .3 .3 .3 .3. 40.3 Ab bb A.A b3 bb Ab Ab bb A3 Ab bb b3 Ab b3 bb 3% 3% b3 bb .%A b% :3 .33 :3 H0 _A A._ Ab A.A Ab Ab Ab A.A bb Ab Ab bb bb Ab bb Ab Ab bb 3% .%b .33 .3 .3 .3 b% H0 _ b Ab A.A Ab Ab A3 Ab Ab bb Ab A.A b3 A.A A.A Ab Ab Ab Ab .%% .33 .33 .3 .3 bb .3 3.0 A b bb Ab Ab Ab Ab A. _ Ab b3 A. A Ab bb Ab A.A bb bb b3 Ab Ab .%b 3% .3 .3 .3 3% d0bb bb A._ bb A._ Ab bb A._ b3 Ab A.A bb Ab Ab A._ Ab Ab Ab Ab A3 .%% .3 .3 .3 .3 ._.0bb Ab Ab A.A A.A A.A A.A Ab bb Ab Ab bb Ab Ab Ab b3 A3 bb A._ Ab b3 .3 .3 .bb .3 0w0b bb bb qb bb b3 b.A 3b A._ A.A Ab bb Ab bb Ab Ab Ab bb Ab Ab bb bb .3 bb bA 0w0_b bb Ab bb b._ b3 Ab bb bb bb bb bb Ab Ab bb bb b3 A._ Ab Ab Ab b3 bb bb .3 0&0; A.A Ab bb bb bb b._ b._ bb bb bb bb Ab Ab Ab bb bb Ab Ab A.A A._ 3.3 bb bb .3 0WONA bb bb b3 bb bb bb .3.— Ab Ab A.A bb A.A Ab Ab A._ A._ Ab A3 A.A Ab 3b bb .3.— 3b 43 Step 3. Prior to imposing the internal models on Spw, item-level correlations‘were examined. Correlations among items within proposed subscales appeared to be greater than correlations between items outside of respective subscales: MC items with MC items (M = .58, SD = .07), MC items with non-MC items (M = .47, SD = .06); GSC items with GSC items (M = .58, SD = .05), GSC items with non-GSC items (M = .48, SD = .06); TC items with TC items (M = .54, SD = .08), TC items with non-TC items (M = .48, SD = .06); and, CBC items with CBC items (M = .59, SD = .04), CBC items with non-CBC items (M = .45, SD = .05). This provided some support for the multidimensional model. The proposed unidimensional model and multidimensional model were imposed on Spw (see Table 5). The unidimensional model fit the data poorly, x2052, = 1732.15, p <.001, xz/df= 6.87, CFI = .83, TLI = .82, SRMR = .06, and RMSEA = .10. The multidimensional model fit better than the unidimensional model, CAICdm= 715.84. However, the multidimensional model exhibited only marginally acceptable fit to the data, {(2.6) = 972.39, p <.001, xz/df= 3.95, CFI = .92, TLI = .91, SRMR = .05, and RMSEA = .07. As expected, both models appeared to fit the data better than the parallel models in Step 1. Because the multidimensional model marginally fit the data, model respecification was considered. None of the Wald tests were statistically significant. LM values and standardized residuals suggested several potential modifications, three of which were considered defensible and were adopted: correlated error terms between items GSC2 (recognize Opposing team ’s strength during competition) and GSC9 (recognize opposing team’s weaknesses during competition), and GSC9 and GSC8 (adapt to diflrent game situations), and MC3 loaded on MC and GSC. In regard to the correlated errors, both pairs of items may have measured athletes’ perceptions of their coach’s competence in recognizing and making critical decisions about the other team during competition, in addition to measuring a more general competency to lead during competition. In regard to the within item multidimensionality for MC3 (mentally prepare athletes for game strategies), it makes sense that it may have measured both MC (athletes ’ evaluations of their head coach’s ability to aflect the psychological mood and skills of athletes) and GSC (athletes ’ evaluations of their head coach ’s ability to lead during competition). First, error terms for GSC2 and GSC9 were allowed to correlate (r = .39, p = .02); and this model fit better than the previous model, 38”.“, = 75.55, p <.001, CAICm= 68.23, and negated what had been the second largest standardized residual (.15). Second, error terms for GSC9 and GSC8 were allowed to correlate (r = .27, p = .02); and this model fit better than the previous model, szR“) = 42.76, p <.001, CAICdm= 35.44, and dramatically reduced (.02) what had been the fifih largest standardized residual (.12). Third, MC3 was allowed to load on both MC, .46, SE = .09, and GSC, .57, SE = .11; and this model fit better than the previous model, X213“) = 28.67, p <.001, CAICdm= 21.35, and dramatically reduced (.05, .04, .03) what were three of the eight largest standardized residuals (.13, .12, .11). Overall, this final model marginally fit the data, X2043) = 825.41 , p <.001, xz/df= 3.40, CFI = .93, TL] = .93, SRMR = .04, and RMSEA = .07. Table 6 illustrates within-teams estimates for this model. Factors in the retained model were moderately to highly correlated with one another, with the strongest association occurring between GSC and TC, r = .92, and the weakest associations occurring between CBC and GSC, and CBC and TC, r = .76. 45 Because of the high correlation between GSC and TC another model was specified where all of these items loaded on only one factor. This simpler model fit worse than the previous model, CAICdm= 53.74. Thus, the previous model was retained as the final model. Reasons for retaining, and implications of, subscales with such high correlations with one another will be explicated in the Discussion section. 46 flea—o m 3289639 3‘ M8523 :89 9o 3 3% 9o; 3.... 020 names a: a: was» 83mm», Maw u Q: mesa Czamagmmosa 33.3 Mum 93 ..... l--- N393 l--- Paw Paw oba 95 zeaaaoeaoea 88.8 88 98 ..... o 83.3 :93 98 98 98 93 5588588: 893 88 8.8 8.8 a 88.3 8.8 98 98 93 93 83888588: 898 8t 98 8.8 _ 83.8 8.: 98 98 98 93 83888588: 89.: 88 9.5 8.3 _ 8.8.8 8.8 98 98 98 93 mam A 3: 9.2 Ba mowv Coaéoeeoea 8898 8_ 8.8 .......... 8:93 ..... 98 98 9; 98 wagon: zaeeaoaaoa 32.8 88 98 ..... a 8893 8.8 98 98 93 98 0388: 35883828 88.8 8: 8.3 8.8 o 829: 9: 98 98 98 98 23m. &\u mamaoom om moon—ea. «SE n mam—dam em @6803 “—5.938 83$me 8 So “.8305 Senor 020 n 8:839: >588 mama—.538: 01813. 02 M 833835 w: :53. a: u Hen—nourrocsm manner mESW n mgamamuaa 33 Bow: 3:88 8895—. men 35mm.» n 83 Bow: $.58 2.8.. cm 3681338? 47 Step 4. Prior to imposing the internal models, item-level correlations within 8*3 were examined. Correlations between items within proposed subscales appeared to be greater than correlations between items outside of respective subscales: MC items with MC items (M = .87, SD = .06), MC items with non-MC items (M = .72, SD = .09); GSC items with GSC items (M = .91, SD = .02), GSC items with non-GSC items (M = .74, SD = .09); TC items with TC items (M = .83, SD = .09), TC items with non-TC items (M = .74, SD = .09); and, CBC items with CBC items (M = .92, SD = .02), CBC items with non-CBC items (M = .72, SD = .09). This provided some support for the multidimensional model. Using the multi-group procedure, the accepted within teams model was specified in both groups for all subsequent analyses. The scaling factor, c*, equaled 4.27. Both a between teams null model, 12(567) = 2087.01, p <.001, and a between teams independent model, x2(541)= 1751.25, p <.001, were rejected. In both models, the estimated covariances of the pair of correlated error terms (GSC8 and GSCll = .139, GSC10 and GSCll = .096) were inserted as starting values due to convergence problems.7 To maintain model comparability, these starting values were inserted fiom this point forward. Next, the unidimensional structure was specified between teams (see Table 5). This model fit the data poorly to marginally, 9662,, = 1539.33, p <.001, ledf= 2.95, CFI = .92, TLI = .91, SRMR = .18, and RMSEA = .06. Next, the original multidimensional structure (i.e., no correlated error terms or cross-loadings) was specified between teams (see Table 5). This model fit better than the previous model, CAICdm= 86.35. However, this model exhibited only marginally acceptable fit to the data, X2615) = 1401 .39, p <.001, 48 xz/df= 2.72, CF I = .93, TLI = .92, SRMR = .07, and RMSEA = .05. Because the multidimensional model marginally fit the data, model respecification was considered. Due to concerns associated with a modest sample size at Level 2 (e.g., stability of between team estimates), only one model respecification was considered defensible: MC3 loaded on both MC and GSC. MC3 was allowed to load on both MC, .50, SE = .13, and GSC, .66, SE = .15; this model fit better than the previous model, szR“) = 15.50, p <.001, CAICdm= 8.13, and dramatically reduced (range = .00 - .05) what were seven of the 11 largest standardized residuals at the between-level (range = .28 to .19). Overall, this final model marginally fit the data, 12514) = 1385.89, p <.001, ledf= 2.70, CFI = .93, TLI = .93, SRMR = .06, and RMSEA = .05. Table 6 illustrates between-teams estimates for this model. Because a multidimensional model was retained at both the within-teams and between-teams levels, the remaining analyses focus on multidimensional estimates of coaching competency only. Factors in the between teams portion of the model were also moderately to highly correlated with one another, with the strongest association occurring between GSC and TC, r = .93, and the weakest association occurring between MC and CSC, and GSC and CBC, r = .74. Because of the high correlation between GSC and TC another model was specified where all of these items loaded on only one factor. This simpler model fit worse than the previous model, CAICdm= 72.00. Thus, the previous model was retained as the final model. Reasons for retaining, and implications of, such high correlations among subscales will be explicated in the Discussion section. 49 Table 6 Within- Teams and Between- Teams Estimates Within teams Between teams Item Loading SE Std Std Loading SE Std Std Loading Residual Loading Residual MC 1 1.00‘ 0.00 .74 .68 1.00‘ 0.00 0.98 0.20 MC3 .46”.57“ .09b.11° .35".36c .74 .sob‘.66°‘ .13”.15° .48b.55° 0.26 MC6 1.08“ 0.06 .79 .62 1.18“ 0.1 1 0.97 0.25 MC 10 1.05* 0.06 .79 .61 1.18“ 0.11 0.97 0.25 MC 12 0.95" 0.06 .71 .70 1.05* 0.14 0.89 0.45 MC15 1.15* 0.06 .83 .55 1.07“ 0.1 l 0.96 0.29 MC23 1.1 1" 0.06 .83 .57 1.22* 0.1 1 0.99 0.17 GSC2 1.00‘al 0.00 .70 .71 1.00‘I 0.00 1.00 0.00 GSC4 1.10* 0.07 .75 .66 1.13* 0.10 0.98 0.19 GSC8 1.17* 0.07 .77 .64 1.14" 0.10 0.99 0.14 GSC9 1.05* 0.05 .7] .70 1.03" 0.08 0.99 0.14 GSC] 1 1.26“ 0.07 .77 .64 .99“ 0.09 1.00 0.00 GSC17 1.22* 0.07 .79 .61 1.11"“ 0.09 0.99 0.06 GSC21 1.27* 0.07 .79 .62 0.99" 0.10 0.98 0.18 TC7 l .00‘ 0.00 .62 .79 1.00II 0.00 0.83 0.56 TC 14 1.29" 0.09 .76 .65 0.84“ 0.14 0.97 0.24 TC 1 6 1.35* 0.09 .79 .61 0.88" 0.14 1.00 0.00 TC 1 8 1.38“ 0.09 .73 .68 0.75" 0.14 0.92 0.39 TC20 1.25“ 0.09 .74 .68 0.81 * 0.14 0.96 0.27 TC22 1.24* 0.08 .77 .64 0.96* 0.15 0.99 0.17 CBCS 1.00‘I 0.00 .78 .63 1.00‘| 0.00 0.99 0.15 CBC 13 0.98" 0.06 .73 .68 0.78* 0.07 0.97 0.25 CBC 19 0.96“ 0.05 .79 .62 0.89* 0.08 0.95 0.32 CBC24 1.06“ 0.06 .79 .62 0.92“ 0.06 0.99 0.06 Note. " indicates that the parameter was fixed to set the metric; * indicates statistical significance at the .001 level; b indicates MC; and ° indicates GSC. 50 Comparisons of the vast majority (98%) of factor loadings (n = 21 for each comparison) and factor covariances (n = 6 for each comparison) did not reject the hypothesis of invariant structures across sub-samples. All of the factor loadings, and most of the factor covariances were not significantly different for males and females (see Table 7). The two factor covariances that were significantly different suggested that CBC was more strongly related to MC and TC for females, r = .84 and .77, than for males, r = .70 and .69, respectively. None of the said estimates varied by sport (see Table 8) or year on team (see Table 9). Because 106 of 108 of the said estimates were not significantly different across selected sub-samples, assuming that all of the athletes were a random sample of a single population was considered reasonable. It is noted that the possible violations of invariance of the identified factor covariances would likely cause greater misfit than would be observed if all of the estimates were invariant. 51 Table 7 Comparisons of F actor Loadings and Correlations Among Factors by Gender Standardized Unstandardized Standard Factor Loading Loading Error Correlation female male female male female male female male V] are * "‘ * * * * .... ---- v2 mc .36 .41 0.46 0.51 .08 .16 --- ---- v2 gsc .43 .36 0.59 0.53 .08 .18 --- -- v3 mc .84 .78 1.12 1.03 .06 09 ---- ---- v4 mc .84 .80 1.13 1.02 .06 09 -- ---- v5 mc .77 .69 1.03 0.87 .06 09 -- -~ v6 mc .85 .87 1.10 1.17 .06 .09 --- ---- v7 mc .88 .82 1.22 1.02 .06 .09 ---- -- vg gsc t t It at at It: ____ ___. v9 gsc .81 .80 1.07 1.11 .06 .10 --- ---- v10 gsc .86 .80 1.14 1.20 .06 .11 --- ---- v11 gsc .81 .77 1.05 1.09 .06 .11 --- ---- v12 gsc .81 .80 1.13 1.21 .06 .11 --- ---- v13 gsc .82 .82 1.13 1.14 .06 .10 ---- ---- v14 gsc .83 .79 1.16 1.14 .06 .11 ---- ---- v15 tc "' * * * * * --.. --.. v16tc .81 .74 1.13 1.15 .07 .16 ---- ---- v17 to .83 .84 1.15 1.43 .07 .18 ---- --- v18 tc .79 .67 1.21 1.22 .08 .18 -- ---- v19 tc .79 .76 1.12 1.18 .07 .16 ---- ---- v20 to .83 .75 1.17 1.10 .07 .15 ---- «— V21 CbC * t t t t * -_-_ __-_ v22 cbc .82 .64 .93 .88 .05 .10 ---- -- v23 cbc .85 .78 .97 .96 .04 .09 ---- --- v24 cbc .86 .81 1.03 1.06 .05 .09 ---- ---- mc 85¢ "“ ..-- --- --- --- ---- .75 .84 mc “3 m- --- ---- --- ---- ---- .81 .85 me cbc --- .83 .70 85° to "“ ---- ---- --- ---- ---- .92 .91 gsc cbc -- -- --- -- .76 .68 t9 cbc "" “*- “" ~- -- ---- .77 .69 Note. v = variable; me = motivation competence; gsc = game strategy competence; tc = technique competence; cbc = character building competence; "' = fixed; Bolded cells indicate significant (p < .001) invariance between estimates. 52 Table 8 Comparisons of F actor Loadings and Correlations Among Factors by Sport Standardized Unstandardized Standard Factor Loading Loading Error Correlation soccer hockey soccer hockey soccer hockey soccer hockey V1 “10 t t t t i * -___ ____ v2 mc .41 .31 0.52 0.38 .08 .1 1 ---- --- v2 gsc .35 .53 0.54 0.65 .10 .11 --- --- v3 mc .83 .86 1.13 1.03 .06 .07 ---- ---- v4 me .82 .88 1.09 1.08 .06 .07 --- --- v5 me .72 .84 0.94 1.01 .06 .07 -- ---- v6 mc .87 .85 1.17 1.00 .06 .07 --- ---- v7 mc .87 .88 1.16 1.13 .06 .07 --- --- V8 gsc e at e t It at ..-- -__ v9 gsc .79 .85 1.14 1.02 .07 .06 --.. ---- v10 gsc .80 .91 1.19 1.12 .07 .06 --- --- v11 gsc .77 .87 1.11 1.01 .07 .06 ---- --- v12 gsc .77 .86 1.16 1.12 .07 .07 ---- ---- v13 gsc .77 .89 1.12 1.15 .07 .07 --- ---- v14 gsc .77 .87 1.17 1.11 .08 .07 ---- ---- v15 tc * * * r * * ..-- ---- v16 tc .78 .86 1.25 1.04 .10 .08 ---- ---- v17 to .82 .87 1.31 1.08 .10 .08 ---- ---- v18tc .69 .85 1.15 1.19 .10 .09 --- --- v19 to .76 .83 1.17 1.05 .10 .08 --- ---- v20 to .78 .90 1.22 1.09 .10 .08 ---- --- v21 cbc * * ‘* * * * ..-- ---- v22 cbc .75 .85 0.88 0.92 .05 .06 -- --- v23 cbc .82 .88 0.92 0.96 .05 .06 ---- ---- v24 cbc .84 .87 1.00 1.00 .05 .06 ---- --- mc gsc --- .74 .82 mc “3 m- --- ---- --- ---- ---- .81 .87 me cbc --- -—-- --- -- —- .79 .85 850 “3 “"- ---- ---- --- --- ---- .88 94 gsc cbc ~--- -- -- -- .67 .84 tc cbc -- -- -- .... .... ..-- ,7] 34 Note. v = variable; me = motivation competence; gsc = game strategy competence; to = technique competence; cbc = character building competence; "' = fixed; Bolded cells indicate significant (p < .001) invariance between estimates. 53 Table 9 Comparisons of F actor Loadings and Correlations Among Factors by Year Standardized Unstandardized Standard Factor Loading Loading Error Correlation 1 st 2nd 3rd 1 st 2nd 3rd 1 st 2nd 3rd 1 st 2nd 3rd V1 mc * t It i t t t t # ____ ____ ____ v2 mc .51 .45 .25 0.67 0.55 0.30 .13 .11 .10 ---- ---- ---- v2 gsc .25 .40 .58 0.37 0.55 0.82 .14 .12 .12 ---'- ---- --- v3 mc .82 .87 .84 1.10 1.14 1.10 .08 .09 .08 ---- ---- ---- v4 mc .85 .85 .83 1.14 1.11 1.06 .08 .09 .08 an ---- ---- v5 mc .75 .81 .80 0.99 1.04 1.00 .08 .10 .08 ---- --- ---- v6 mc .84 .92 .83 1.13 1.19 1.06 .08 .09 .08 ---- --- ---- v7 mc .87 .91 .85 1.16 1.23 1.08 .08 .09 .08 ---- ---— —-- v8 gsc c c at: at a: at t t e __-_ ,___ __-_ v9 gsc .79 .84 .83 1.09 1.04 1.10 .08 .09 .09 --- ---- ---- v10gsc .85 .85 .83 1.24 1.06 1.12 .09 .09 .09 ---- ---- ---- v11 gsc .79 .80 .83 1.08 0.98 1.11 .08 .09 .09 ---- ---- --- v12 gsc .81 .84 .77 1.21 1.14 1.08 .09 .09 .09 --- ---- ---- V13 gsc .85 .87 .77 1.25 1.15 1.02 .09 .09 .09 ---- ---- ---- v14 gsc .81 .85 .78 1.19 1.21 1.06 .09 .10 .09 ---- ---- ---- v15 to t t a: t a: a: an t a. ____ ____ ____ v16tc .82 .80 .78 1.20 1.22 0.98 .11 .14 .10 ---- ---- ---- v17 to .86 .87 .79 1.28 1.35 1.00 .1 1 .14 .10 ---- ---- -- v18 tc .77 .82 .71 1.22 1.34 0.98 .11 .15 .11 --- ---- ---- v19 tc .79 .85 .79 1.18 1.19 1.03 .11 .13 .10 --- ---- --- v20 tc .83 .84 .80 1.23 1.24 0.98 .1 1 .13 .10 --- ---- ---- v2] cbc * * I. I! t It It II t ____ ____ ____ v22 cbc .76 .87 .77 0.84 1.00 0.91 .06 .07 .08 ---- ---- ---- v23 cbc .88 .88 .77 0.94 1.00 0.89 .05 .07 .07 ---- ---- ---- v24 cbc .87 .88 .86 1.03 1.04 1.05 .06 -08 .07 ---- --- --- mc gsc ~--- -- --- --- --- -- .82 .75 .73 me tc --- .85 .87 .79 me cbc --- --- .79 .85 .84 gsc tc --- --- .93 .91 .87 gsc cbc --- --- -- --- .74 .76 .74 tc cbc ---- ---- --- ---- --- ---- ---- -- --- .75 .85 .72 Note. v = variable; mc = motivation competence; gsc = game strategy competence; tc = technique competence; cbc = character building competence; "' = fixed; Bolded cells indicate significant (p < .001) invariance between estimates. 54 How Reliable were the Rank Orderings of Coaching Competency Estimates? Reliability coefficients were .90 GVIC), .87 (GSC), .85 (TC) and .82 (CB). These coefficients indicated very good to excellent levels of internal consistency for multidimensional coaching competency estimates. Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach Within Teams? Prior to answering this question, the assumption of unidimensionality for the set of satisfaction items was assessed. The Rasch component extracted at least one—half of the total variance in each of the items (range of extracted communalities = .49 to .86), produced an eigenvalue equal to 2.91 , was reliably measured (range of loadings = .70 to .93 ), and accounted for 73% of the total variance. The eigenvalue for the next unaccepted component produced was equal to .49. The reliability of separation coefficient for the set of items was .75. Hence, treating these logit-based estimates as reliable measures of satisfaction was judged to be justifiable. Next, correlations between logit-based coaching competencies and satisfaction with the coach scores were explored at the athlete-level and team-level (see Table 8). At both levels, all coaching competency subscales had at least a moderately high correlation with satisfaction with the coach (range = .61 to .85). At the athlete-level, MC had the highest correlation, .70, with satisfaction, with TC having the next highest correlation, .66. At the team-level, MC also had the highest correlation, .85, with satisfaction, with CBC having the next highest correlation, .82. Because MC was highly correlated to the other coaching competencies at both levels (range = .79 to .91), it was selected as the sole independent variable, at both levels, to avoid problems with multicollinearity. 55 Table 10 Correlations Between Competency Judgments and Satisfaction with the Coach at the Individual and Team-Level SWC MC GSC TC CBC SWC ----- 0. 70 0.63 0.66 0. 61 MC 0.85 mu 0. 82 0. 81 0.79 GSC 0.77 0.88 ----- 0.8 7 0.72 TC 0.78 0.88 0.94 ----- 0. 72 CBC 0.82 0.91 0.84 0.84 ----- Note. Italicized entries in the upper diagonal are the athlete-level correlations and the team-level correlations are in the lower diagonal. All of the correlations were statistically significant at the .0005 level. SWC = satisfaction with coach; MC = motivation competence; GSC = game strategy competence; TC = technique competence; CBC = character building competence. 56 The ICC for satisfaction with the coach was .26, which indicated that 74% of the variance in satisfaction with the coach was due to within-team differences. Fit indices for Model 1 were 62 = 1552.27 and CAIC = 1574.38. The point reliability estimate for the team-level satisfaction means was .86. The variance of the team-level means, 6230, around the average team mean, 700, was statistically significant, x231) = 226.17, p < .001 (see Table 9). Team-level means did not vary based on sport played. Thus, I concluded that satisfaction with the coach had a significant amount of variance both within teams and between teams; and, that team-level satisfaction means were significantly different and should remain random in subsequent models. Athlete-level MC was added as a within-team predictor in Model 2 (see Table 9). This model fit better than Model 1, 12mm = 315.83, p < .001, CAICdm= 293.72, and explained 39% of the within-team variance in satisfaction with the coach. The average influence of MC was moderately large and positive, 710 = .70, and statistically significant, tm) = 19.77, p < .001. The point reliability estimate for these slopes was .06. The variance of the team-level effects, 0'23], around the average satisfaction-MC slope, 710, was not statistically significant, x2(31)= 35.33, p = .27. Thus, I concluded that the average influence of MC on satisfaction with the coach was moderately large and positive, had a somewhat low reliability, was similar within teams, and should be fixed across teams in a respecified model. The influence of MC was fixed across teams in a respecified version of Model 2 (see Table 9). Conceptually, this model assumed that the influence of MC on satisfaction with the coach was similar within teams. The respecified model fit at least as well as the previous model, 3813(2) = 1.43, p = .49, CAICm = 13.31. The influence of MC on 57 satisfaction remained moderately large and positive (010 = .69) and statistically significant, «533) = 20.83, p = .001. Thus, I concluded that MC had a moderately large and positive relationship with satisfaction with the coach across athletes. Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach Between Teams? First, it should be noted that the variance in the team satisfaction means, 0230, around the average team satisfaction, 7%, shrank fi'om .26 to .05 after controlling for the influence of MC within teams (see Table 9). Practically, this means that there was much less variance in team satisfaction to model once the influence of MC on satisfaction was controlled for within teams. Still, because a statistically significant, 1261) = 99.64, p < .0005, amount of variance remained among team satisfaction after imposing Model 2, team MC was added as a Level-2 predictor, Yor- This model did not fit better than the respecified version of Model 2, x213“) = 0.85, p < .36, CAICm = - 6.52, and did not explain additional between-team variance in team satisfaction. Accordingly, the influence of team MC on team satisfaction was negligible, 701 = -.05, and not statistically significant, «30) = -0.92, p = .36. Hence, the variance of the team satisfaction estimates, 6230, around the average team satisfaction, yoo, remained significantly greater than zero, x200) = 100.12, p = .01. Thus, I concluded that team MC was unrelated to team satisfaction after controlling for the effect of MC on satisfaction within teams. 58 Table 11 Hierarchical Linear Models where Satisfaction was the Dependent Variable Estimates of fixed effects Fixed effect y SE t df p Model 1 Average satisfaction mean(Yoo) -0.02 0.10 -0. l 7 31 .87 Model 2 Average satisfaction-motivation slope(ylo) 0.70 0.03 21.24 31 < .01 Fixed satisfaction-motivation slope(y.o) 0.69 0.03 20.98 583 < .01 Model 3 Team satisfaction-team motivation SlOpC(‘ym) -0.05 0.05 -0.92 30 0.36 Estimates of variance components Random effect a 62 df x2 p Model 1 Within team residuals(rigs) 0.87 0.75 Between team residuals(uogs) 0.51 0.26 31 226.17 < .01 Model 2 Within team residuals(rigs) 0.68 0.46 Between team residuals(uogs) 0.23 0.05 31 99.63 < .01 Between team residuals(u1gs) 0.05 0.01 31 35.33 .27 Model 3 Within team residuals(rigs) 0.68 0.46 Between team residuals(uogs) 0.23 0.05 30 100.12 < .01 Note. a = standard deviation; 0'2 = variance. 59 CHAPTER 5 DISCUSSION This study provides initial validity evidence for the CCS, and introduces MCFA as an appropriate methodology to use when data are meaningfully nested and evaluation of the factor structure of a set of indicators is desired. Results offer some support for the proposed multilevel multidimensional conceptualization of coaching competency (see Figure 3), the internal consistency of the coaching competency estimates, and a relationship between motivation competency and satisfaction with the coach within teams. Validity concerns are observed for the original rating scale structure and the relationship between motivation competency and satisfaction with the coach between teams. Results are interpreted to guide future research with the CCS, to provide recommendations for revisions to the instrument, and to assist researchers in physical education and kinesiology in understanding when and how to apply MCF A to their data. Analysis of the original rating scale structure (i.e., 10 categories) indicated that athletes were being asked to distinguish between too many levels of coaching competency (see Table 1). This finding is similar to previous research on the CES (Myers, Wolfe, eta1., in press) and congruent with long-standing recommendations for Likert scales (Likert, 1932). However, an interesting difference between the Myers, Wolfe, et a1. findings and this study, is that the previous study reported that coaches employed only four categories to judge the strength of their own coaching efficacy, whereas in this study athletes appeared to utilize five categories to judge their coach’s competency. This difference was probably due to the fact that only 5% of the coaches’ responses were in the 0-5 range, whereas 20% of the athletes’ responses were within this 60 range. Athletes may make finer and more critical judgments of a coach’s competency than coaches make regarding their own coaching efficacy, when responding to the CCS and CBS items, respectively. However, to test this hypothesis, future research should match coaches’ and athletes’ responses within a sample, unlike the comparison I offer across two different samples. While post-hoe analysis identified an improved S-category rating scale structure, it is unknown whether the modified scale would prove optimal on a cross-validation sample of intercollegiate soccer and ice hockey athletes or with high school athletes. However, there is reason for confidence in the potential utility of the modified scale, with a similar sample, as Rasch-based optimal categorizations have been confirmed in follow- up applications (Zhu, 2002). Users of the CCS are encouraged to assess the utility of the proposed 5-category structure (i.e., complete incompetence, low, moderate, high, and complete competence). Although “low” may not be selected frequently, such a category would likely attract at least a minimum number (10) of observations necessary for minimal precision of threshold estimates (Linacre, 2002). There is reason to believe that the CCS may adequately measure athletes’ perceptions of their head coach’s motivation, game strategy, technique, and character building competencies in lower division intercollegiate soccer and ice hockey programs (see Table 5 and Table 6). However, both within teams and between teams, there was limited discriminant validity among subscales-particularly between GSC and TC, within item multidimensionality for MC3, and extremely high standardized factor loadings at the between teams level. Allowing MC3 (mentally prepare athletes for game strategies), to load on both MC (athletes ’ evaluations of their head coach ’s ability to aflect the 61 psychological mood and skills of athletes) and GSC (athletes ’ evaluations of their head coach’s ability to lead during competition) makes sense conceptually. Because I judge the content of MC3 to be sufficiently specified and relevant to both MC and GSC, revision is not suggested. Therefore, users of the CCS are encouraged to impose internal models that allow responses to MC3 to influence scores on both MC and GSC. Distinguishing between competence in leading during competition (GSC) and instructional and diagnostic competence (TC) makes sense conceptually. Empirically, evidence exists for the differential diagnostic ability of game strategy efi‘icacy and technique eflicacy in both high school and college coaches (F eltz et al., 1999; Myers, Vargas-Tonsing, et al., in press). Thus, I propose refining the definitions of GSC and TC and modifying items to lessen the overlap among the subscales in a revised version of the CCS. For example, altering the definition of TC to focus on evaluations of one’s instructional and diagnostic skills during practice, may help to distinguish this competency from competency to lead during competition. However, until a revised CCS is available, the proposed internal model should be utilized to produce multidimensional measures of coaching competency from the existing items. But, because subscale scores are likely to be highly related and cause problems associated with multicollinearity when used as a set of independent variables, theory and bivariate correlations between the dependant variable(s) and competency scores should be. explored to determine which competency measure should be used. The extremely high magnitude of most of the standardized factor loadings at the between-teams level (see Table 6) was noted and was deemed to be similar to parallel loadings in previous research (Kaplan & Kreisman, 2000; Muthén, 1994). Possible 62 explanations for the observation of such loadings across studies: (1) overparameterization of the model (i.e., a possible artifact of MCF A modeling), (2) nearly perfect indicators of the latent variables at the between level, were posed in personal communications to experts in MCFA methodology. J. J. Hox (December 3, 2004) noted that he had also observed this trend and that he believe that possible explanations could include: (1) overparameterization; (2) that error on the between-level tends to be averaged out by the methodology; (3) that because the group-level variance is smaller, a higher percentage is able to be explained. D. Kaplan (December 3, 2004) noted that he too had observed this trend and that he was unsatisfied with the explanations to date. He proposed, as a remedy, modeling the factor structure of only Spw (i.e., stopping at Step 3) if one is not planning to model the between group variation as a ftmction of between group variables. In this study, however, determining a group-level factor structure for coaching competency estimates was important in order to guide formation of a team-level predictor of satisfaction with the coach (i.e., test aspects of the external model). However, confidence in the proposed between factor structure should be tempered pending additional, confirmatory research. In terms of evidence relating to the external aspect of validity, I found that athletes’ evaluations of their head coach’s ability to affect the psychological mood and skills of athletes has a moderately large and positive relationship with satisfaction with the coach within teams. But, no such relationship is observed between teams afier modeling the said relationship within teams (see Table 9). It is interesting to note, that if this relationship was modeled between teams without modeling the said relationship within teams, team MC is a significant predictor of team satisfaction. But, because the 63 majority of the variance (74%) in satisfaction is within teams and because Raudenbush & Bryk (2002) advocate for settling on a Level-1 model before specifying a Level-2 model (i.e., model-building from the bottom up), I conclude that team MC is not related to team satisfaction between teams. Methodologically, this finding demonstrates the need for examining multiple levels of variance simultaneously in order to reach more accurate conclusions and improve theory development (Silverman, 2004). Methodological considerations aside, it appears that athletes’ satisfaction with the coach may frequently be linked to the coach’s ability to attend to the athletes’ psychological needs. Unfortunately, although most coaches are well equipped in the technical aspects of their sport, they rarely receive formal training in creating a healthy psychological environment (Smoll & Smith, 2001). Because the usefulness of the CCS is at least partially dependent on the ability of resultant measures to demonstrate relationships specified in models of coaching effectiveness (Horn, 2002), more research is needed in this area. Specifically, studies that investigate how athletes’ personal characteristics influence athletes’ perceptions of their coach’s competency, and/or how a coach’s behavior influences athletes’ perceptions of their coach’s competency, and/or how athletes’ competency judgments of their coach affects athletes’ self-perceptions and beliefs, could advance understanding in coaching effectiveness and extend validity evidence for the CCS. Also, the ability of coaching education programs to alter athletes’ perceptions of their coach’s competency would be a very practical way to asses an important aspect of coaching education programs. This latter area, assessing coaching education programs, is becoming increasingly important in the current evidence-based educational system (Smoll & Smith, 2001 ). 64 Because the effects that a coach’s behaviors exert on athletes are likely mediated by the meaning that athletes attribute to them (Horn, 2002; Smoll & Smith, 1989), the CC S has the potential to contribute to the improvement of coaching and the further development of coaching effectiveness models. Although the CCS has this potential, it should not be viewed as a competitor to instruments that assess other aspects of coaching. Rather, the CCS should be viewed as an additional tool that measures key constructs that are not fully covered by existing instruments. For example, if one were interested in examining relationships between observed coaching behavior and athletes’ perceptions of their coach’s negative activation, then using the CBAS and the CBQ would be appropriate. However, if this person was also interested in how negative activation was related to athletes’ perceptions of their coach’s motivation competency then administering the CCS would also be appropriate. In short, a single instrument cannot fully measure the wide range of constructs involved in effective coaching. Because effective coaching is complex, a variety of well-defined instruments is necessary to the scientific investigation of sufficiently targeted inquiries. An introduction to MCF A was detailed earlier and encouragement toward appropriate future implementations is provided here. MCFA should be considered when subjects are meaningfully nested within groups and evaluation of the factor structure of a set of indicators is desired (Muthén, 1994). It is not uncommon in physical education, sport, and exercise contexts to collect data from subjects who are meaningfully nested within groups and to be interested in determining the factor structure of a set of indicators. A few examples include when data are collected within classes, teams, or exercise groups, and indicators are intended to measure subjects’ perceptions and/or 65 performance (e.g., academic performance, collective efficacy, exercise behavior). In such cases, researchers should determine if MCF A is appropriate as detailed in the Introduction to MCFA section. If MCFA is appropriate and is applied, improved model- data fit will likely be observed, as compared to model-data fit on the total variance- covariance matrix, because levels of variance will be separated (Julian, 2001). Separating substantive levels of variance contributes to developing more accurate practical recommendations and to improved theory development (Silverman, 2004; Silverman & Solomon, 1998). I hope that the introduction will assist researchers in physical education and kinesiology in understanding when and how to apply MCFA to their data. 66 APPENDICES 67 APPENDIX A How competent is your head coach in his or her ability t «- 1. help athletes maintain confidence in themselves? (MCl) 2. recognize opposing team's strengths during competition? (GSC2) 3. mentally prepare his/her athletes for game strategies? (MC3) 4. understand competitive strategies? (GSC4) 5. instill an attitude of good moral character? (CBC5) 6. build the self-esteem of his/her athletes? (MC6) 7. demonstrate the skills of your sport? (TC7) 8. adapt to different game situations? (GSC8) 9. recognize opposing team's weakness during competition? (GSC9) 10. motivate his/her athletes? (MCIO) 11. make critical decisions during competition? (GSCl 1) 12. build team cohesion? (MC12) l3. instill an attitude of fair play among his/her athletes? (CBC l 3) 14. coach individual athletes on technique? (TCl4) 15. build the self-confidence of his/her athletes? (MC15) 16. develop athletes' abilities? (TC 1 6) 17. maximize his/her team's strengths during competition? (GSC17) 18. recognize talent in athletes? (TC18) 19. promote good sportsmanship? (CBC19) 20. detect skill errors? (TC20) 21. adjust his/her game strategy to fit his/her team's talent? (GSC21) 22. teach the skills of his/her sport? (TC22) 23. build team confidence? (MC23) 24. instill an attitude of respect for others? (CBC24) 68 REFERENCES American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246. Bentler, P. M. (2004). EQS (Version 6.1) [computer program]. Encino, CA: Multivariate Software, Inc. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370. Browne, M. W., & Cudek, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21, 230-258. Chelladurai, P. (1978). A contingency model of leadership in athletics. Unpublished doctoral dissertation. University of Waterloo, Canada. Chelladurai, P., & Arnott, M. (1985). Decision styles in coaching: Preferences of basketball players. Research Quarterly for Exercise and Sport, 56, 15-24. Chelladurai, P., & Riemer, H.A. (1997). A classification of facets of athlete satisfaction. Journal of Sport Management, 11, 133-159. Chelladurai, P., & Saleh, S. D. (1978). Preferred leadership in sport. Canadian Journal of Applied Sport Sciences, 3, 85-92. Chelladurai, P., & Saleh, S. D. (1980). Dimensions of leader behavior in sports: Development of a leadership scale. Journal of Sport Psychology, 2, 34-45. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. 69 Cronbach, L. J. (1976). Research in classrooms and schools: formulations of questions, designs, and analysis. Occasional paper: Stanford Evaluation Consortium. Feltz, D. L., Chase, M. A., Moritz, S. E., & Sullivan, P. J. (1999). A conceptual model of coaching efficacy: Preliminary investigation and instrument development. Journal of Educational Psychology, 91, 765-776. Gould, D. (1987). Your role as a youth sport coach. In V. Seefeldt (Ed.), Handbook for youth sport coaches (pp. 17-32). Reston, VA: American Alliance for Health, Physical Education, Recreation, and Dance. Hamqvist, K. (1978). Primary mental abilities of collective and individual levels. Journal of Educational Psychology, 70, 706-716. Horn, T. S. (2002). Coaching effectiveness in the sports domain. In T.S. Horn (Ed.), Advances in Sport Psychology (pp. 309-3 54). Champaign, IL: Human Kinetics. Hox, J. J. (1993). Factor analysis of multilevel data: Gauging the Muthén model. In J. H. L. Oud & R. A. W. van Blokland-Vogelesang (Eds), Advances in Longitudinal and Multivariate Analysis in the Behavioral Sciences (pp. 141-156). Nijmegen, Netherlands: ITS. Hox, J. J. (2002). Multilevel factor models. In G.A. Marcoulides (Ed.), Multilevel Analysis (pp. 225-250). Mahwah, NJ: Lawrence Erlbaum. Hox, J. J ., & Maas, C. J. M. (2001). The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Structural Equation Modeling, 8, 157-174. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. J amshidian, M. & Bentler, P. M. (1999). ML estimation of mean and covariance structures with missing data using complete data routines. Journal of Educational and Behavioral Statistics, 24, 21 -41. Julian, M. W. (2001). The consequences of ignoring multilevel data structures in nonhierarchical covariance modeling. Structural Equation Modeling, 8, 325-352. Kaplan, D. (2000). Multilevel structural equation modeling. Structural Equation Modeling: Foundations and Extensions (pp. 130-148). Thousand Oaks, CA: Sage. Kaplan, D. & Kreisman, M. (2000). On the validation of indicators of mathematics 70 education using TIMMS: An application of multilevel covariance structure - modeling. International Journal of Educational Policy, Research and Practice, 1, 2 1 7-242. Kenow, L. J. & Williams, J. M. (1992). Relationship between anxiety, self-confidence, and evaluation of coaching behaviors. The Sport Psychologist, 6, 344-357. Kline, RB. (1998). Principles and practice of structural equation modeling. New York: The Guilford Press. Lee, K. S., Malete, L., & Feltz, D. L. (2002). The effect of a coaching education program on coaching efficacy. International Journal of Applied Sport Sciences, 14, 55-67. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 1-55. Linacre, J. M. (1995). Categorical misfit statistics. Rasch Measurement Transactions, 9, 450- 451. Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3, 85-106. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 11 I , 490-504. Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519-530. McCullagh, P. & Nelder J. A. (1990). Generalized linear models (2"d ed.). Boca Raton, FL: CRC Press. McDonald, R. P. (1985). Factor analysis and related methods. Mahwah, NJ, Lawrence Erlbaum Associates. McDonald, R. P. (1994). The bilevel reticular action model for path analysis with latent variables. Sociological Methods & Research, 22, 399-413. Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-103). New York: Macmillan. Muthen, B. O. (1989). Latent variable modeling in heterogeneous populations: Presidential address to the psychometric society. Psychometrika, 54, 557-585. Muthen, B. O. (1990). Mean and covariance structure analysis of hierarchical data. Paper presented at the annual meeting of the Psychometric Society, Princeton, NJ. 71 Muthen, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376-398. Muthén, B. O. (1997). Latent variable modeling of longitudinal and multilevel data In A. E. Raftery (Ed.) Sociological methodology 1997 (pp. 453-481). Washington, DC: American Sociological Association. Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of nonnormal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189. Muthen, B. O., & Satorra, A. (1989). Multilevel analysis of varying parameters in structural models. In R. D. Bock (Ed.), Multilevel Analysis of Educational Data (pp. 87-99). San Diego, CA: Academic Press. Myers, N. D., Vargas-Tonsing, T. M., & Feltz, D. L. (in press). Coaching efficacy in intercollegiate coaches: Sources, coaching behavior, and team variables. Psychology of Sport & Exercise. Myers, N. D., Wolfe, E. W., & F eltz, D. L. (in press). An evaluation of the psychometric properties of the coaching efficacy scale for American coaches. Measurement in Physical Education and Exercise Science. National Association for Sport and Physical Education. (1995). Quality coaches, quality sports: National standards for athletic coaches. Dubuque, IA: Kendall/Hunt. Park, J. K. (1992). Construction of the Coaching Confidence Scale. Unpublished doctoral dissertation. Michigan State University, East Lansing. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Newbury Park, CA: Sage. Raudenbush, S. W., & Bryk, A. S., Cheong, Y. F., & Congdon, R. (2000). HLM5: Hierarchical and linear modeling. Homewood, IL: Scientific Software International. Rushall, B. S., & Wiznuk, K. (1985). Athletes’ assessment of the coach: The coach evaluation questionnaire. Canadian Journal of Applied Sport Sciences, 10, 157- 161. Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 23 5- 249. Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors on covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent Variable Analysis (pp. 399-419). Thousand Oaks, CA: Sage. 72 Silverman, S. (2004). Analyzing data from field research: The unit of analysis issue. Research Quarterly for Exercise and Sport, 75, iii-iv. Silverman, S., & Solmon, M. (1998). The unit of analysis in field research: Issues and approaches to design and data analysis. Journal of Teaching in Physical Education, 1 7, 270-284. Silvey, S. D. (1959). The Lagrangian multiplier test. Annals of Mathematical Statistics, 30, 389-407. Smith, R. E., Smoll, F. L., & Curtis, B. (1978). Coaching behaviors in little league baseball. In F. L. Smoll & R. E. Smith (Eds.), Psychological perspectives in youth sports (pp. 173-201). Washington, DC: Hemisphere. Smith, R. E., Smoll, F. L., & Hunt, E. B. (1977). A system for the behavioral assessment of athletic coaches. Research Quarterly, 48, 401-407. Smoll, F. L., & Smith, R. E. (1989). Leadership behaviors in sport: A theoretical model and research paradigm. Journal of Applied Social Psychology, 19, 1522-1551. Smoll, F. L. & Smith, R. E. (2001). Conducting sport psychology training programs for coaches: Cognitive-behavioral principles and techniques. In J .M. Williams (Ed.), Applied Sport Psychology (pp. 378-400). Sullivan, P. J ., & Kent, A. (2003). Coaching efficacy as a predictor of leadership style in intercollegiate athletics. Journal of Applied Sport Psychology, 1 5, 1-11. Tabachnick, B. G., & Fidell, L. S. (2001). Cleaning up your act. In Using multivariate statistics (4th ed., pp. 56—110). Boston: Allyn & Bacon. Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10. Vargas-Tonsing, T. M., Warners, A. L., & Feltz, D. L. (2003). The predictability of coaching efficacy on team efficacy and player efficacy in volleyball. Journal of Sport Behavior, 26, 396-407. . Wald, A. (1943). Tests of statistical hypothesis concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society, 54, 426-482. Wicherts, J. M., & Dolan, C. V. (2004). A cautionary note on the use of information fit indices in covariance structure modeling with means. Structural Equation Modeling, 1 I , 45-50. 73 Williams, J. M., Jerome, G. J ., Kenow, L. J ., Rogers, T., Sartain, T. A., & Darland, G. (2003). Factor structure of the coaching behavior questionnaire and its relationship to athlete variables. The Sport Psychologist, I 7, 16-34. Wright, B. D., & Linacre, J. M. (1992). Combining and splitting categories. Rasch Measurement Transactions, 6, 233-235. Wright, B. D., & Linacre, J. M. (1998). Winsteps: A Rasch model computer program. Chicago: MESA Press. Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: Mesa Press. Wu, M. L., Adams, R. J ., & Wilson, M. R. (1998). ACER ConQuest: Generalized item response modeling sofiware (Version 1.0) [computer program]. Melbourne, Victoria, Australia: Australian Council for Educational Research. Zhu, W. (2002). A confirmatory study of Rasch-based optimal categorization of a rating scale. Journal of Applied Measurement, 3(1), 1-15. Zhu, W., Updyke, W. F., & Lewandowski, C. (1997). Post-hoe Rasch analysis of optimal categorization of an ordered-response scale. Journal of Outcome Measurement, 1, 286-304. 74 FOOTNOTES ' The term coach’s behavior is used to be consistent with Hom’s (2002) model. It is noted that no instrument can fully and completely represent the wide range of behaviors involved in effective coaching. 2 Analyses for questions 4 and 5 in this study provide examples of multilevel extensions of the conventional multiple regression model. 3 In this study only two levels of variance, within teams and between teams, was considered. 4 It is noted that multidimensional measures of athlete satisfaction have been suggested (Chelladurai & Reimer, 1997). Such measures are suggested to be used as indicators of overall organizational effectiveness. In this study, because satisfaction with the coach, not the entire organization was of interest, our measure was considered appropriately specific. 5 To the authors’ knowledge, IRT software programs, including Winsteps, are currently limited to evaluating rating scale structures for unidimensional models. 6 Person and item separation indices are expected to decrease somewhat when categories are collapsed because, in general, the more categories there are the better the discrimination (Zhu et al., 1997). 7 Due the complexity of MU ML estimation in unbalanced groups, it is not uncommon for programs to require starting values to converge to a stable solution (Hox, 2002). 75 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 1|11111111113111!111171111111111111111