“HM

  

11W

\
\

WIWINWMWINl\\l\ll\ll|1\\\!|

\

r“ . 7
4’ ‘1 I ,' . ,
l k) "

(0 3 (0/1) 0%"?

This is to certify that the
dissertation entitled

ATHLETES” EVALUATIONS OF THEIR HEAD COACH’S
COACHING COMPETENCIES: A MULTILEVEL CONFIRMATORY
FACTOR ANALYSIS

presented by
Nicholas Daniel Myers

has been accepted towards fulﬁllment
of the requirements for the

Dual-Major degree in Department of Kinesiology &
Doctoral Department of Counseling,
Educational Psychology,
Special Education

 

 

galaxy/Z EM? 1%

/Major Professors’ Signatures

l1! INLD‘]
Date

MSU is an Aﬁirmative Action/Equal Opportunity Institution

LIBRARIES
MICHIGAN STATE UNIVERSITY
EAST LANSING, MICH 48824-1048

‘—-—.-———--—.v—

- I ‘ v '

PLACE IN RETURN Box to remove this checkout from your record.
1'0 AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE | DATE DUE DATE DUE

,
l0 5208?;539‘4

MAIN B 2010.0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 mElRWeDueIndd-pjs‘

 

ATHLETES’ EVALUATIONS OF THEIR HEAD COACH’S COACHING
COMPETENCIES: A MULTILEVEL CONFIRMATORY FACTOR ANALYSIS

By

Nicholas Daniel Myers

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY-DUAL MAJOR
Department of Kinesiology

Department of Counseling, Educational Psychology, and Special Education

2005

ABSTRACT

ATHLETES’ EVALUATIONS OF THEIR HEAD COACH’S COACHING
COMPETENCIES: A MU LTILEVEL CONFIRMATORY FACTOR ANALYSIS

By
Nicholas Daniel Myers
This study (a) provided initial validity evidence for the Coaching Competency
Scale (CCS), and (b) introduced multilevel conﬁrmatory factor analysis (MCFA) as an
appropriate methodology to use when data are meaningfully nested and an evaluation of
the factor structure of a set of indicators is desired. Data were collected from
intercollegiate men’s (g = 8) and women’s (g = 13) soccer and women’s ice hockey
teams (g = l 1). Results offered some support for the proposed multilevel
multidimensional conceptualization of coaching competency, the internal consistency '
reliabilities of the coaching competency estimates, and a relationship between motivation
competency and satisfaction with the coach within teams. Validity concerns were
observed for the original rating scale structure and the relationship between motivation
competency and satisfaction with the coach between teams. Results were interpreted to
guide future research with the CCS, to provide recommendations for revisions to the
instrument, and to assist researchers in physical education and exercise science in

understanding when and how to apply MCFA to their data.

DEDICATION

This work is dedicated, in part, to my partner, Ahnalee. Thank you for hanging in there
over the past ﬁve years, babe. I look forward to our life together.

This work is also dedicated, in part, to my family and friends. Please know that although

you have often been pushed aside over the last few years, you have never been forgotten.
Thank you for your love, companionship, and strength. I take you with me wherever I go.

iii

ACKNOWLEDGEMENTS

Deb Feltz: Thank you for your mentorship, guidance, and support over the last ﬁve years.
And, thank you for providing an opportunity to an applicant who had little “traditional”
background ﬁve years ago. Nobody has had a larger impact on my graduate training.

Ed Wolfe: Thank you for your mentorship, excellent teaching in complex subjects, and
active advisement over the last three years. Your decision to write a note on my ﬁnal
exam three years ago, asking me if I ever thought about pursuing a degree in MQM, has
changed my career. Thank you for taking the time that such insight and curiosity require.

Kim Maier: Thank you for your excellent teaching in complex subjects, willingness to
consult, and encouragement in my job search. Although we have not known each other
for long, I hope that our paths will cross at regular intervals down the road.

Mark Reckase: Thank you for your excellent teaching in complex subjects, and honesty
in our interactions. And, thank you for packaging your deep knowledge and experience in
probes that are available to people, like me, with considerably less knowledge and
experience. You provide a model that many students in MQM aspire to emulate.

iv

TABLE OF CONTENTS

LIST OF TABLES ................................................................................... vii

LIST OF FIGURES ................................................................................ viii

KEY TO SYMBOLS AND ABBREVIATIONS ................................................ ix
CHAPTER 1

INTRODUCTION AND REVIEW OF COACHING EFFECTIVENESS

LITERATURE ......................................................................................... 1

Nature of the Problem ........................................................................ 1

Review of Coaching Effectiveness Literature ............................................ 3

Establishing an Initial Validity Framework for the CCS ................................ 6

Interpretive framework ............................................................ .7

Instrument development ............................................................. 7

Instrumentation ....................................................................... 8

External model ....................................................................... 9

Internal model ........................................................................ 9

Statement of Purpose ....................................................................... 12

Research Questions ......................................................................... 12
CHAPTER 2

INTRODUCTION OF MCFA ..................................................................... l3

Consequences of Ignoring Multilevel Data Structures .................................. 14

Steps in a MCFA ............................................................................ 15

A Technical Synopsis ....................................................................... 15

Sample size ........................................................................ 19

Judging the ﬁt of MCFA ......................................................... 19

An Application ofMuthén’s Steple
CHAPTER 3

METHOD ............................................................................................. 23

Sample ....................................................................................... 23

Procedure ..................................................................................... 23

Measures ..................................................................................... 24

Coaching competence ............................................................. 24

Satisfaction with the coach ....................................................... 24

Treatment of Data ........................................................................... 25

Missing data ........................................................................ 25

Outliers .............................................................................. 25

Normality ............................................................................ 26

Analyses ..................................................................................... 26

Rating scale ........................................................................ 26

Internal models ..................................................................... 30

Internal consistency reliability .................................................... 31
Forming measures to test questions 4 and 5 .................................... 32
Testing questions 4 and 5 ......................................................... 33
Model estimation and ﬁt .......................................................... 36
Reliability estimates ............................................................... 36
CHAPTER 4
RESULTS ............................................................................................ 37
Did Athletes Employ the Rating Scale Structure in the Manner that the Authors
Intended? ............................................................................................................... 37
To What Degree did the Proposed Internal Models ﬁt the Data? ......................... 40
Step 1 ............................................................................... 40
Step 2 ............................................................................... 40
Step 3 ............................................................................... 44
Step 4 ............................................................................... 48
How Reliable were the Rank Ordering of Coaching Competency Estimates? ..... 55
Were Coaching Competency Estimates Positively Related to Satisfaction with the
Coach Within Teams? ............................................................................................ 55
Were Coaching Competency Estimates Positively Related to Satisfaction with the
Coach Between Teams? ......................................................................................... 58
CHAPTER 5
DISCUSSION ....................................................................................... 60
APPENDICES ....................................................................................... 67
REFERENCES ....................................................................................... 69
FOOTNOTES ....................................................................................... 75

vi

LIST OF TABLES

Table 1 Original and Post Hoc Rating Scale Structures .................................... 38-39
Table 2 Item Characteristics for the CCS ......................................................... 41
Table 3 Pooled Within-Teams Correlations and Covariances ................................. 42
Table 4 Scales Between-Teams Correlations and Covariances ............................... 43
Table 5 Model-Data Fit Statistics .................................................................. 47
Table 6 Within-Teams and Between-Teams Estimates ........................................ 50

Table 7 Comparisons of Factor Loadings and Correlations Among Factors by Gender...52
Table 8 Comparisons of Factor Loadings and Correlations Among Factors by Sport ...... 53
Table 9 Comparisons of Factor Loadings and Correlations Among Factors by Year. ....54
Table 10 Correlations Between Competency Judgments and Satisfaction with the Coach

at the Individual and Team-Level ................................................................. 56

Table 11 Hierarchical Linear Models where Satisfaction was the Dependent Variable. . .59

vii

LIST OF FIGURES

Figure 1 . Hom’s working model of coaching effectiveness .................................... 2
Figure 2. Multidimensional internal model of the CCS ........................................ 11
Figure 3. Multidimensional MUML model of CCS ............................................ 18
Figure 4. Coherence example for a well-ﬁtting ﬁve category structure ...................... 29

viii

KEY TO SELECT ABBREVIATIONS
Any format acceptable to the departments may be used.

or: Cronbach’s coefficient alpha

AERA: American Educational Research Association
APA: American Psychological Association

CAIC: consistent Akaike information criterion
CBAS: coaching behavior assessment system
CBC: character building competence

CBQ: coaching behavior questionnaire

CCS: coaching competency scale

CEQ: coaching evaluation questionnaire

CES: coaching efﬁcacy scale

CFA: conﬁrmatory factor analysis

CFI: comparative ﬁt index

8: item difﬁculty

DQS: decision style questionnaire

EFA: exploratory factor analysis

FIML: full maximum likelihood

GSC: game strategy competence

ICC: intraclass correlation coefﬁcient

LM: Lagrange multiplier test

LSS: leadership scale for sports

MC: motivation competence

MCF A: multilevel conﬁrmatory factor analysis
WSQ: outﬁt mean square ﬁt statistic

MUML: Muthén’s maximum likelihood

NASPE: National Association for Sport and Physical Education
NCME: National Council on Measurement in Education
RMSEA: root mean square error of approximation
RSM: rating scale model

8.3: scaled between group covariance matrix

23: between group population covariance matrix
Spw: within group covariance matrix

SRMR: standardized root mean squared residual
ST: total covariance matrix

2w: within group population covariance matrix

‘t: threshold

TC: technique competence

9: person ability

TLI: Tucker-Lewis index

ix

CHAPTER 1
INTRODUCTION AND REVIEW OF COACHING EFFECTIVENESS LITERATURE
Nature of the Problem

To date, much of the research in sport leadership has been directed toward
identifying particular coaching styles that elicit successful performance and/or positive
psychological responses from athletes (Horn, 2002). The two most prominent models of
leadership effectiveness in sport, the Multidimensional Model of Leadership
(Chelladurai, 1978) and the Mediational Model of Leadership (Smoll & Smith, 1989),
have served as frameworks for much of the related research. Recently, Horn combined
elements of both models to form a working model of coaching effectiveness.

Horn’s (2002) model of coaching effectiveness, as displayed in Figure 1, is
founded on at least three assumptions. First, both antecedent factors (e.g., coach’s
personal characteristics, organizational climate) and personal characteristics of athletes
inﬂuence a coach’s behavior indirectly through a coach’s expectancies, beliefs, and goals.
Second, a coach’s behavior directly affects athletes’ perceptions and evaluations of a
coach’s behavior. Third, athletes’ perceptions and evaluation of a coach’s behavior
mediate the inﬂuence that a coach’s behavior has on athletes’ self-perceptions (e.g., self-
eﬁicacy) and attitudes (e. g., satisfaction with a coach), which in turn directly affects
athletes’ motivation and performance. Because athletes’ perceptions and evaluation of a
coach’s behavior are believed to play a critical role in coaching effectiveness, providing a
tool to assess athletes’ evaluations of key coaching competencies is important to the
continued improvement of coaching, and to the further development of coaching

effectiveness models.

358 N . :38 20:85 Boga om coco—nan «$83588.

 

menace—25—
09:9:

 

 

OamaNmaosm.
0:358

 

 

 

   

008:8.
Ramona .
arm—808838

 

 

08038.

«E83598.
Gama? mom—m

 

 

>298.

@883—
‘ 039808838

 

 

 

Oocoram.
ﬂow—5&3

>388. 688358 an
85:539. o». 88:8.
worm—$2.

 

 

 

‘

 

 

>538.

81.2.8880 Ba
warm—$8

>958. 8:.

wanna—ones?

case? an 83:58

 

 

_

>958. _o<o_ an
Qua om 827520:

 

Review of Coaching Eﬂectiveness Literature

There are many instruments, for which validity evidences already exists, designed
to measure a coach’s behavior.1 The Coaching Behavior Assessment System (CBAS;
Smith, Smoll, & Hunt, 1977), the Leadership Scale for Sports (LSS; Chelladurai & Saleh,
1978, 1980), and the Decision Style Questionnaire (DQS; Chelladurai & Arnott, 1985)
are the most prominent of such instruments. As reviewed by Horn (2002), these
instruments have also been used to assess athletes’ perceptions of their coach’s behavior
(e. g., how often does your coach use positive reinforcement with athletes?) and/or
decision styles (e.g., what type of decision style would your coach employ in order to
select a team captain?) However, none of these instruments measure athletes’
evaluations of their coach’s behavior (e.g., how competent do you believe your coach is
in teaching the skills of your sport?). While each of these instruments has contributed
mightily to the understanding of coaching behavior over the last few decades, Smoll and
Smith (1989) themselves note that, “the ultimate effects that coaching behavior exerts are
mediated by the meaning that players attribute to them” (p. 1527), effects these
instruments do not measure.

Unlike the CBAS, the LSS and the DQS, the Coaching Evaluation Questionnaire
(CEQ; Rushall & Wiznuk, 1985) and the Coaching Behavior Questionnaire (CBQ;
Kenow & Williams, 1992) were designed to assess athletes’ evaluative reactions to
speciﬁc aspects of their coach’s behavior. Rushall and Wiznuk suggest that the CEQ
allows athletes to evaluate a coach on his or her personal qualities, personal and
professional relationships, organizational skills, and performance as a teacher and a

coach. The stated purpose for collecting these measures is to provide immediate feedback

to coaches in order to enhance their ability to relate more effectively to athletes. Although
the items are intended to measure separate constructs, scores are to be totaled across
items or formed at the item-level. Psychometric evidence for the CEQ appears to be
limited to item-level test-retest reliability. The CEQ rarely appears in the literature.

The CBQ allows athletes to evaluate what they perceive to be their coach’s
typical behavior, speciﬁcally his or her negative activation and supportiveness/emotional
composure during competition against a top opponent. The purpose of collecting these
measures appears to be to provide coaches with information on how a coach’s negative
activation/supportiveness and emotional composure affects their athletes’ performance
and relevant psychological states during competition against a top opponent. Recent
psychometric work on CBQ provided evidence for the proposed two-factor structure, the
internal consistency reliability, and the external aspect of the validity of measures from
the instrument (Williams et al., 2003). Evidence for the use of the CBQ for the intended
purposes is accumulating. The CBQ certainly provides a valuable tool to measure
athletes’ evaluative reactions to aspects of their coach’s behavior, but it measures a fairly
speciﬁc and narrow subset of coaching behavior (negative activation and supportiveness,
and emotional composure) and in a rather targeted scenario (competition against a top
opponent).

Three competency domains that are stipulated in the National Standards for
Athletic Coaches (National Association for Sport and Physical Education [NASPE],
1995) and are not fully covered by the CBQ are (a) growth, development and learning of
athletes, (b) psychological aspects of coaching, and (c) skills, tactics and strategies.

Within the growth, development and learning of athletes, an expected competency is that

a coach provide instruction to develop speciﬁc motor skills. Within the psychological
aspects of coaching domain, expected competencies include that a coach demonstrate
effective motivational skills; and, that a coach conduct practices and competitions to
enhance social/emotional growth and promote good sportsmanship in athletes. Within the
skills, tactics and strategies domain, an expected competency is that a coach applies
level-appropriate competitive strategies. An instrument that more fully measures athletes’
evaluations of their head coach in these three competency domains is warranted.

The Coaching Efficacy Scale (CES; F eltz, Chase, Moritz, & Sullivan, 1999) was
developed to measure a coach’s belief in his or her ability to inﬂuence the learning and
performance of his or her athletes. The speciﬁc factors measured—instructional
technique, motivation, character building, and game strategy—purposely overlap with the
expected competency domains articulated above. The CBS has demonstrated
psychometric qualities that provide an acceptable level of conﬁdence in the efﬁcacy
measures produced (F eltz et al.; Lee, Malete, & F eltz, 2002; Myers, Wolfe, & F eltz, in
press). Resulting measures have correlated with observed and reported coaching behavior
(Feltz et al.; Myers, Vargas-Tonsing, & Feltz, in press), leadership styles (Sullivan &
Kent, 2003), and athletes’ self-efficacy, satisfaction with the coach, and team
performance (Myers, Vargas-Tonsing, et al.; Vargas-Tonsing, Wamers, & Feltz, 2003).

However, according to models of coaching effectiveness, why a coach’s beliefs,
like coaching efficacy, are related to athletes’ self-perceptions and performance is due to
the inﬂuence that these beliefs exert on a coach’s behavior (see Figure 1). But, the
inﬂuence that a coach’s behavior exerts on athletes’ self-perceptions, motivation, and

performances is mediated, at least in part, by athletes’ evaluations of their coach’s

behavior. Thus, athletes’ evaluations of their coach’s competence in domains measured
by the CBS—instructional technique, motivation, game strategy, and character
building—should be related to athletes’ self-perceptions and attitudes, which in turn
should be related to athletes’ motivation and performance. But, because the CES was not
originally designed to measure athletes’ evaluations of their head coach’s competencies,
new evidence should be provided to demonstrate the validity of the use of CES measures
for this purpose (American Educational Research Association [AERA], American
Psychological Association [APA], & National Council on Measurement in Education
[NCME], 1999). From this point forward, when the CBS is referred to as an instrument to
measure coaching competencies it will be referred to as the Coaching Competency Scale
(CCS).
Establishing an Initial Validity Framework for the C CS

Validity refers to the degree to which evidence and theory support the intended
uses and interpretations of scores from a particular test (AERA, APA, & NCME, 1999).
Validation is an ongoing, integrated, and value-laden judgment of the degree to which
empirical evidence and theoretical rationales support the adequacy and appropriateness of
inferences and actions based on various modes of measurement (Messick, 1989).
Important steps in the validation process for the CCS include providing a conceptual
framework for scores derived from the instrument (i.e., an interpretive framework),
revealing the instrument’s development process, explaining how the construct is
operationalized within the instrument (i.e., instrumentation), specifying how proposed

components of coaching competency exert inﬂuence on responses to the items and how

the components are related to one another (i.e., an internal model), and predicting how
measures of coaching competencies relate to other variables (i.e., an external model).

Interpretive framework. High school and lower division collegiate athletes of
team sports comprise populations for which this version of the CCS is intended. The
purpose of the instrument is to measure athletes’ evaluation of their head coach’s ability
to affect the learning and performance of athletes. Measures are intended to be used to
test relationships in models of coaching effectiveness (e.g., how do athletes’ evaluations
of their head coach’s competencies affect athletes’ self-perceptions, beliefs, and
attitudes?) and to serve as a tool in the evaluation of coaching education programs (e.g.,
assess the ability of education programs to alter athletes’ perceptions of their head
coach’s competencies). Contexts in which the CCS can be used include ﬁeld, laboratory,
and other educational settings. Scores on the CCS are conceptualized as being norm-
referenced because interpretations are intended to compare a head coach’s coaching
competency scores to other coach’s scores within a speciﬁed population, or to track
changes in a coach’s competency scores across time.

Instrument development. The development process of the CES (Feltz et al., 1999)
served as a basis for beginning the validation process for using the instrument for this
new purpose and is summarized here. The CBS was developed during a 5-week seminar
involving 11 coaches who had varying levels of experience in coaching and were
graduate students in sport psychology. The National Standards for Athletic Coaches
(NASPE, 1995), preliminary work on a coaching efﬁcacy scale (Park, 1992), and a
review of the coaching education literature, provided a framework for group discussions

on the key components of coaching efﬁcacy. Themes that emerged from the group’s

discussions were reduced to: teaching technique, implementing game strategies,
motivating athletes, and developing athletes’ character. These themes represented many
of the primary roles that previous research has suggested coaches perform (Gould, 1987).

The dimensions that emerged from the seminar led to the generation of 41 items.
Items included the stem: “How conﬁdent are you in your ability to.” The rating scale
employed was a 10-point Likert scale with categories ranging from 0 (not at all
conﬁdent) to 9 (extremely conﬁdent). Nine collegiate and scholastic coaches acted as
content experts to evaluate the relevance of the items on a scale that ranged from 1
(essential) to 3 (not essential). Feedback from this group content led the research team to
conclude that all items were potentially important indicators of coaching efﬁcacy.
However, 17 of the original items were later dropped after considering the results of
factor analyses. All of the retained items (24) were determined relevant for the CCS.
However, the stem was changed to: “How competent is your head coach in his or her
ability to.” And, the anchors of the rating scale were changed to 0 (not at all competent)
and 9 (extremely competent).

Instrumentation. A substantive aspect of how coaching competence is
operationalized within the CCS is the degree to which athletes employed the rating scale
structure in the way the authors intended or “systematically” (Zhu, Updyke, &
Lewandowski, 1997). Previous research with the CES (Myers, Wolfe, & Feltz, in press)
and long-standing recommendations for measuring attitudes (Likert, 1932) suggest that
this rating scale structure likely contains too many categories and that collapsing data to

create fewer category distinctions should be considered in order to achieve an optimal

rating scale structure for the CCS. Criteria for determining this will be discussed in the
Methods section.

External model. According to Hom’s (2002) model of coaching effectiveness (see
Figure 1), a coach’s behavior, athletes’ characteristics (e.g., age, gender, psychological
characteristics), and the sociocultural context directly affect athletes’ evaluations of their
head coach’s competency. In turn, athletes’ evaluations of their head coach’s competency
directly affect athletes’ self-perceptions (e.g., efficacy beliefs) and attitudes (e. g.,
satisfaction with the coach). In accordance with the model, to establish initial and limited
evidence for the external aspect of validity, I hypothesized that athlete and team measures
of coaching competencies would be positively related to athlete and team satisfaction
with the head coach, respectively.

Internal model. Feltz et a1. (1999) put forth both an oblique multidimensional and
a hierarchical internal model of the CES. The hierarchical model ﬁt the data signiﬁcantly
worse than the oblique multidimensional model. Because the multidimensional factors
tend to be moderately to highly correlated with one another (F eltz et al.; Lee et al., 2002),
subsequent psychometric work has focused on comparing the ﬁt of the multidimensional
model to a unidimensional model (Myers, Wolfe, et al., in press). These authors
concluded that the multidimensional model tends to ﬁt the data better than a
unidimensional model, but that either model can be justiﬁed depending on the purpose of
the study.

The ﬁt of both an oblique multidimensional model (see Figure 2) and a
unidimensional model were explored for the CCS. Motivation commtence (MC) is

speciﬁed to inﬂuence responses to seven items and is defrned as athletes’ evaluations of

their head coach’s ability to affect the psychological mood and skills of athletes. Gm
_s_tr_at_egv competence (GSC) is speciﬁed to inﬂuence responses to seven items and is
deﬁned as athletes’ evaluations of their head coach’s ability to lead during competition.
Technique comxtence (TC) is speciﬁed to inﬂuence responses to six items and is
deﬁned as athletes’ evaluations of their head coach’s instructional and diagnostic
abilities. Character building competence (CBC) is speciﬁed to inﬂuence responses to four
items and is deﬁned as athletes’ evaluations of their head coach’s ability to inﬂuence the
personal development and positive attitude toward sport in their athletes. In the
unidimensional model, total coaching comxtence (T CC) is speciﬁed to inﬂuence
responses to all 24 items and is deﬁned as athletes’ evaluations of their head coach’s
ability to affect the learning and performance of athletes. To determine the ﬁt of these
internal models the nesting of athletes within head coaches need be taken into account.
Multilevel conﬁrmatory factor analysis (MCF A) is an appropriate methodology when
data are meaningfully nested and an evaluation of the factor structure of an instrument is

desired (Muthén, 1989, 1994).

10

Figure 2. Multidimensional model of the CCS.

 

 

——Dmc3

 

 

Motivation
‘—' "1610 ‘ Competency

———Oj me12

~——v mc15

 

 

 

——> mc23

 

 

—' gsc8

 

Game Strategy
'_—’ 939 Competency

-—v gsctt

 

 

—* gsc17

 

-—-—> gsc21

 

—--V tc7

 

—' tc14

 

-——-> tc16 Technique
Competency

 

-—-—> tc18

 

r—V tc20

 

—' 1622

 

 

————* cbc5

 

*—’ cbc13 haracter Building
Competency

 

——> cbc19

--—+ cbc24

 

 

 

11

Statement of Purpose
The purposes of this study were to (a) provide initial evidence for key aspects of
the validity framework described, and (b) introduce MCF A as an appropriate
methodology under the previously stated circumstances.
Research Questions
This study provided initial evidence for key aspects of the validity framework
described by examining the following questions:
1. Did athletes employ the rating scale structure in the manner that the authors
intended?
2. To what degree did various multilevel internal models ﬁt the data?
3. How reliable were the rank orderings of coaching competency estimates?
4. Were coaching competency estimates positively related to satisfaction with the
coach within teams?
5. Were coaching competency estimates positively related to satisfaction with the
coach between teams?
Additionally, in Chapter 2, this study provides a modest primer of applying MCF A

methodology when athletes are nested in teams.

12

CHAPTER 2
INTRODUCTION OF MCFA

This introduction should not be construed as original thinking by the author.
Rather, it is based on the author’s understanding of Muthén’s pioneering work (1989;
1994) and previous syntheses of MCFA in the education literature (Kaplan, 2000; Hox,
2002). The rationale for providing it is to put a relatively new and complex methodology
which may have important applications in sport and exercise science, in a context that
may be more familiar and accessible to this audience. A working understanding of
conﬁrmatory factor analysis (CF A) is assumed.

The call to separate substantive levels of variance, such as within students and
between classrooms, has been voiced for decades in education (Cronbach, 1976;
Hamqvist, 1978). Empirically, nonhierarchical analysis of the total covariance matrix of
meaningfully nested data violates the assumption of independence. Conceptually and
practically, nonhierarchical analyses often confound within-group and between-group
relations and hamper theory development at both levels. Whether nesting is due to
students clustered within classrooms, or athletes within teams, if both individual
attributes and group characteristics are relevant then multilevel modeling, also known as
hierarchical modeling, should be considered.

Most multilevel modeling applications in socialscience research are multilevel
extensions of the conventional multiple regression model (Hox, 2002).2 But, there are
models of substantive interest, MCFA in this study, that contain multiple levels which
cannot be analyzed within the linear multilevel framework. Muthén (1989) and Muthén

and Satorra (1989) have provided an approach to latent variable modeling when data are

13

meaningfully nested. This section provides an example of applying this methodology
when athletes are nested within teams and evaluation of the factor structure of an
instrument is desired.3 However, before introducing the complex methodology, it is
worthwhile to explain typical and practical consequences of ignoring multilevel data
structures when evaluating the factor structure of latent variables.
Consequences of Ignoring Multilevel Data Structures

In a simulation study, Julian (2001) generated multilevel data, manipulating three
design factors: (a) intraclass correlation (ICC): .05, .15, and .45; (b) group and member
conﬁguration: 100 groups with ﬁve members each (100/5), 50/10, 25/20, and 10/50; and
(c) the internal model of the between group variance components: four factor oblique
within and four factor oblique between (4FOBW/4FOBB), 4FOBW/2FOBB, and
4FOBW/5FOBB. Data were ﬁtted to CFA models that ignored the multilevel structure.
Biases in the chi-square statistic (x2) , model parameters (i.e., factor loadings, variances,
covariances, and error variances), and standard errors were examined. The design factor
that exerted the most inﬂuence was the ICC. When the ICC was > .05 the x2 was inﬂated,
model parameters were inﬂated, and standard errors were deﬂated.

Practically, CF A on the total covariance matrix (ST) when the ICCs are non-trivial
(Z. 10, as a guideline; Muthen, 1997) will likely result in the model exhibiting more
overall misﬁt (i.e., elevated x2 test), hypothesis tests that are overly optimistic (i.e.,
deﬂated standard errors leading to increased Type I error), and inﬂation of the absolute
value of parameter estimates (e.g., factor loadings). A correction by Satorra and Bentler

(1990, 1994) which is often used to correct for such bias in the x2 statistic and its standard

14

errors does not adjust for bias in the absolute value of the factor loadings (Julian, 2001),
and does not allow modeling at both levels.
Steps in a MCFA

Muthén (1994) advised that MCF A should generally follow four steps. First,
factor analyses of S; are conducted, exploratory (EFA) and/or conﬁrmatory depending on
the adequacy of an a priori internal model. Although this solution will likely be biased if
the data truly exhibit a multilevel structure (Julian, 2001; Hox & Maas, 2001), it can
provide a rough estimate of model ﬁt. Second, estimate the degree of between group

variation in the variables of interest (i.e., the ICC). The ICC for each item is estimated,
a;
where ICC = —2_2— , 023 is the between group variance, and 62w is the within
0' B + 0' W .

group variance. If the ICCs are trivial a multilevel model may be unnecessary. Third,
estimate the within factor structure of the within group covariance matrix (Spw). Because
Spw is an unbiased estimator of the within group population covariance matrix (SW) and
is not confounded by the scaled between group covariance (S's), model ﬁt should be at
least as good as in Step 1. Fourth, estimate the between factor structure. Because 8.3 is
not an unbiased estimator of the between group population covariance matrix (23) and
includes 2w, this is the most difﬁcult part of the analysis and requires a technical
synopsis.
A Technical Synopsis

Assume that athletes are nested within teams, and that the number of athletes

within teams is unbalanced. To perform a MCFA athletes’ competency scores need to be

broken down into an athlete-level component (i.e., athlete’s deviation from the team

15

mean: Yg, — Y g) and a team-level component (the disaggregated team mean: Y g ). An

unbiased estimate of the population within team covariance matrix (21w) is provided by

0 N3 _ _
2:2 :(Ygi_YgXXgi—Y8)'
_ 8 ’

Spw, where S PW "' N—G and subscript g = 1...G teams, and

 

subscript i = 1...Ng athletes per group with a total of N athletes (Muthen, 1989). Because
Spw is the maximum likelihood estimator of 2w with sample size N-G, Step 3 is

reasonable (Muthen, 1990).

in (I, -l—’g)(1-’-I;g )'

In the unbalanced case, 8.3 is calculated where S‘B = g

 

G — 1
SIB estimates the composite 2w + c*23 where c* is a scaling parameter. Full Maximum
Likelihood (FIML) estimation of 23 is problematic because a SIB would need to be

computed for each distinct group size (Muthen, 1990). Muthén suggests computing a

2 G 2
N in.
8

N(G -- 1)

 

single 8.3 by including c“, where 0* = - The value of c* is approximately

equal to the average team size. This estimation procedure is referred to as Muthén’s
maximum likelihood (MUML). Simulation studies and comparisons of FIML and
MUML estimates suggests that Muthén’s methodology provides a reasonable
approximation of FIML estimates in many applications (Hox, 1993; Hox & Maas, 2001;

McDonald, 1994, Muthen, 1990).

16

Once the appropriate covariance matrices are estimated, the within and between
factor structures can be evaluated simultaneously via the multi-group option found in
most structural equation modeling software (Hox, 2002). However, because S‘B estimates
the composite 2w + 6'23, two models need be speciﬁed for S'B: one for the within team
structure and one for the between team structure. Thus, the within team structure is
speciﬁed at both levels with equality restrictions between both “groups”. The between
team structure is speciﬁed only in the second “group” with the square root of the scaling
factor, sc *, built into the model. See Figure 3 for an example where the model speciﬁed

in Figure 2 is speciﬁed at both the within and between team levels.

17

Eucqm w. 253353.03». 2.53.. 30am. 0* 00m

magma: "om—.3 l/

5380 mo." 3o mecca 3\|/

o" 0* o.
\
\) \.\

/,

2.0

   

/

$33.: 823

 

18

Sample size. Hox and Maas (2001) generated multilevel data to explore the
accuracy of Muthén’s methodology with pseudobalanced groups and small samples at
both levels. They concluded that the within teams part of the model performs well even
with small samples at both levels (i.e., n = 5 to 15 within “small” groups, and G S 50, as
guidelines). They also concluded that problems can arise when estimating the between
teams part of the model when the number of teams is small. In such instances
inadmissible estimates, such as negative error variances, may be observed at the between
teams level. In such cases the error variance is ﬁxed to zero. But, even when an
admissible solution is reached, residual error variances and standard errors may still be
underestimated. However, factor loadings are generally accurate. Due to the said biases,
Type I error rates may be approximately 8%, when an alpha of .05 is speciﬁed. Thus,
ideally, within teams sample sizes should _>_ 5, and team-level sample size should be 2 50.
In cases where resources limit the number of teams but MCFA is still deemed appropriate
(i.e., modeling the between teams variance is important), as in this study, it is necessary
to qualify inferences made from the between teams portion of the model.

Judging the ﬁt of MCFA. As in CFA, there is no shortage of indices to judge the
ﬁt of a MCF A to the observed data. Given common practice in sport and exercise science
and guidelines by Hu and Bentler (1999) and Kline (1998), the following ﬁt indices are
suggested: the x2 test, Bentler’s comparative ﬁt index (CFI; 1990), the Tucker-Lewis
index (TLI; Tucker & Lewis, 1973), the standardized root mean squared residual
(SRMR), and the root mean square error of approximation (RMSEA; Browne & Cudek,

1992). In instances where comparing the ﬁt of competing models is desired, the

19

likelihood ratio chi-square statistic (ﬁg) and the consistent Akaike information criterion
(CAIC; Bozdogan, 1987) are also suggested.

Introductions to and criterion guidelines for the suggested ﬁt indices are offered
here guided by recommendations from Hu & Bentler (1999) and Kline (1998). The x2 test
represents a test of the signiﬁcance of the difference in ﬁt between the speciﬁed model
and a just-identiﬁed version of it. Because this test is sensitive to sample size a common
guideline is to divide the x2 by the degrees of freedom (50‘), where a ratio 5 3 suggests
reasonable ﬁt. The CF] and TLI are incremental ﬁt indices which indicate the proportion
in the improvement of the overall ﬁt of the speciﬁed model to a null model. Well-ﬁtting
models have values 2.95, while marginally acceptable models have values 2 .90. The
SRMR denotes the average residual from ﬁtting the correlation matrix for the speciﬁed
model to the correlation matrix of the observed data. Well-ﬁtting models have a value 5
.08, while marginally acceptable models have a value 5 .10. The RMSEA represents the
amount of error per df in the speciﬁed model. Well-ﬁtting models have a value 5 .06,
while marginally acceptable models have a value 5 .10.

The szR statistic can evaluate the relative ﬁt of nested models (sza = stimplc -
szplcx) where 02 is the deviance value for the model in question. The ﬁn statistic is
distributed with degrees of freedom equal to the difference between the number of
parameters in the nested models (McCullagh & Nelder, 1990). Because the szR statistic
is sensitive to sample size, the CAIC is also suggested. The CAIC depicts the ﬁt of the
model in question relative to the number of parameters estimated, where lower values
indicate better ﬁt and competing model do not need to be nested (Wicherts & Dolan,

2004)

20

An Application of Muthén ’s Steps

Assume that the internal model in Figure 3 is posited. It bears mentioning that the
internal models need not be the same at both levels. Step 1 and Step 2 can be performed
on the raw data matrix. Assume that a CFA on ST in Step 1 provided some support for the
posited internal model in Figure 2 but with more misﬁt than is acceptable. Assume
further that Step 2 revealed ICCs for all of the 24 items that were considerably greater
than .10 and that MCFA was deemed appropriate. Practically, this decision implies that
athletes’ perceptions of their head coach’s coaching competency were inﬂuenced by
athlete-level attributes as well as shared team-level characteristics. Given the construct of
interest, athletes’ evaluations of their head coach’s coaching competencies, strong within
teams and between teams effects were expected.

Step 3 and Step 4 are multi-staged steps. Step 3 begins with estimating both Spw
and 8.3 from the raw data matrix, ST. All MCFA analyses that will be reviewed can, and
later will be, performed in EQS 6.1 (Bentler, 2004). The covariances in 8.3 should be
larger than the values in Spw because this matrix equals the within team matrix plus the
between team matrix multiplied by c*. Spw is ﬁtted to the hypothesized model, Figure 2
in this case, ignoring S}. J ustiﬁable modiﬁcations to the internal model should then
made, if necessary.

The ﬁnal step, evaluating the ﬁt of the within teams and between teams models
simultaneously, also can be considered multi-staged (Hox, 2002). Using the multi-group
procedure, the ﬁnal within teams model is speciﬁed in both “groups” with equality
constraints across groups for all parameter estimates. This part of the model is constant

throughout the proceeding analyses. A series of between teams models should then be

21

speciﬁed. A null between teams model (i.e., the lack of any between team model) should
be speciﬁed. If this model ﬁts, there is no between teams model. Assuming poor ﬁt, an
independent model (i.e., only variances are estimated in the between model) should be
speciﬁed. If this model ﬁts, team-level variance exists but a team-level structural model
does not. Assuming improved yet still unacceptable ﬁt, the hypothesized between model
should be speciﬁed. J ustiﬁable modiﬁcations to the internal model should then be made,

if necessary.

22

CHAPTER 3
METHOD
Sample

To maximize group-level sample size, teams were recruited from both lower
division intercollegiate soccer and ice hockey programs. Soccer and ice hockey were
chosen because both can be considered open team sports, because head coaches tend to
be involved in the coaching of most, if not all, positions on the team, and because both
have a fairly large number of athletes within teams. Men’s (g = 14) and women’s (g = 29)
soccer teams (g = 43) were recruited from the midwest of the United States, and 31 teams
(12 men’s and 19 women’s) agreed to participate. Women’s ice hockey teams (g = 28)
were recruited from the northeast and midwest regions of the United States, and 16 teams
agreed to participate. Despite numerous reminders from a variety of sources, only 21
soccer teams (8 men’s, 13 women’s) and 11 hockey teams submitted data (response rate
= 68%; G = 32). Within the soccer sample, three coaches of the women’s teams were
women; men coached the men’s teams. Within the women’s ice hockey sample, ﬁve of
the coaches were women.

At the athlete-level, participants were 407 soccer players (165 male, 242 female)
and 183 ice hockey players (M = 18.44 athletes per team). Within teams, number of
participants varied from 13 to 25 (N = 590). Across teams, most participants were
Caucasian (94%) and between the ages of 18 - 23 years (99%; M = 19.53, SD = 1.34).
Distribution of year on team was 43% ﬁrst-year, 23% second-year, 20% third-year, and
13% fourth-year.

Procedure

23

All necessary permissions were obtained from the institutional review board and
the 32 head coaches prior to data collection. An explanation of the study was presented to
each team by the head coach. Informed consent was obtained from all athletes. Athletes
were guaranteed conﬁdentiality for their responses. Questionnaires were completed at
approximately the one-half mark of the season to ensure that athletes had enough
experience to make informed judgments regarding their head coach’s coaching
competency and their own satisfaction with the coach. On each team, an identiﬁed trainer
or team manager administered the questionnaires. Completed questionnaires were
returned to the trainer or team manager, who mailed the returns to the researchers.
Trainers or team managers who successfully followed through were given a $60
honorarium.

Measures

Coaching competence. Coaching competence was measured at the athlete-level by
the CCS as described in the initial validity framework subsection of the Introduction. The
CCS items are listed in Appendix A.

Satisfaction with the coach. Satisfaction with the coach was measured at the
athlete-level and consisted of selected items from a scale that was intended to measure, in
part, attitudes toward the head coach (Smith, Smoll, & Curtis, 1978).4 Indicators used in
this study were the same as those used by Feltz et al. (1999) and Myers, Vargas-Tonsing
et al. (in press) and included (a) how much do you like playing for your coach, (b) if you
were able to play next year how much would you like to have the same coach again, (c)
how much does your coach like you and, ((1) how much does your coach know about

soccer/hockey. Ratings were made on a 7-point Likert scale ranging from 1 (very little) to

24

7 (a lot). Myers, Vargas-Tonsing et al. provided psychometric evidence for the
unidimensionality of the scale for a similar purpose. Speciﬁcally, in their study, the
principal component accounted for 70% of the total variance, all items had a high loading
(2 .70) on this component, and the set of items displayed good internal consistency,

a = .85.

Treatment of Data

Missing data. Five cases were missing data (i.e., < 1% of all cases). In each case
data were missing for no more than two responses. Thus, incomplete cases had at least
93% of the responses of primary interest. These cases were retained and data were judged
to be randomly missing. Scores for missing responses were imputed using observed
responses based on case-wise maximum likelihood estimation using the J amshidian-
Bentler EM algorithm (1999).

Outliers. Multivariate outliers were identiﬁed using Mahalanobis distance
estimates. Five cases, or less than 1% of the data, were identiﬁed (p < .001). As
suggested by Tabachnick and Fidell (2001) visual inspection of the cases and stepwise
regression ensued to inform judgments regarding what caused the cases to be outliers
prior to deciding their fate. In the identiﬁed cases, data were entered correctly, subjects
were within expected age ranges, represented both genders, and were nested within
different teams. Empirically, six to eight competency items distinguished each outlier
from the other cases. The speciﬁc items that generated outlying responses were not
consistent across the outliers. The ﬁve outlying cases were determined to be random and

were dropped. The ﬁnal athlete-level sample size was, N = 585.

25

Normality. Normality was assessed with univariate skew and univariate kurtosis
estimates and Mardia’s (1970) coefﬁcient. Due to the size of the sample, absolute values
of univariate estimates, not signiﬁcance tests, were examined. Guidelines used were
values 2 Bl for extreme skewness and values 2 HO] for extreme kurtosis (Kline, 1998).
None of the univariate estimates suggested extreme non-normality (see Table 2).
Mardia’s coefﬁcient suggested multivariate departures from the kurtosis of a normal
distribution, 120.94, p < .001. However, large samples can inﬂate values for this
coefficient (Bollen, 1989). A common adjustment for multivariate kurtosis, Satorra and
Bentler’s (1994) correction, was not applied because it applies the correction on the raw
data matrix and most analyses in MUML methodology use covariance matrices.
However, because nonnorrnal distributions can inﬂate test statistics (Muthén & Kaplan,
1985), subsequent model ﬁt may have been artiﬁcially inﬂated (i.e., worse ﬁt).

Analyses

Rating scale. Competency data were calibrated to the Rasch Rating Scale Model
(RSM; Andrich, 1978) using Winsteps (Wright & Linacre, 1998). Rasch models are a
family of l-pararneter item response theory (IRT) measurement models. IRT is an
alternative to true score test (TST) theory and is well suited to analyze rating scale data
(Wright & Masters, 1982). In this case, the RSM described the probability that a speciﬁc
athlete (n) would rate a particular item (i) using a speciﬁc rating scale category (k),
conditioned on the athlete’s competency judgment (9,.) and the item’s difficulty (6,). The
log-odds equation for this probability, log (Pk/ Pk-1)= 0,, - 8, - ‘tk, contains three
parameters: 0“, 8,, and category threshold (1a,), the threshold between two adjacent rating

scale categories. Parameters of this model are estimated from the observed data via a

26

Joint Maximum Likelihood Estimation method, which does not impose distributional
assumptions on the parameter estimates (Wright & Masters, 1982). In addition to the
parameter estimates, Winsteps produces standard errors of these estimates and model-to-
data ﬁt indices. In accordance with posited internal models, both an omnibus
unidimensional construct (TCC) and consecutive unidimensional dimensions (MC, GSC,
TC, CBC) were explored.5 The consecutive approach was just a unidimensional approach
repeated for each dimension.

Following calibration of the data to the RSM, the degree to which athletes
employed the rating scale structure in the manner that the authors intended was evaluated
according to guidelines suggested by Linacre (2002). These guidelines can be
summarized as: (a) all categories should have at least 10 observations, (b) distributions of
ratings for each category should be unirnodal, (c) average measures should increase with
the categories, ((1) unweighted mean square (UMS) ﬁt statistics should be less than 2.0
for each threshold, (e) category thresholds should increase with the categories, (f) ratings
imply measures (coherence > 39%), (g) measures imply ratings (coherence > 39%), (h)
category thresholds should increase by at least 1.2 logits, and, (i) category thresholds
should increase by no more than 5 logits.

Because criteria (d), (f) and (g) are not well-deﬁned in the sport and exercise
science literature, an elaboration of these guidelines is provided. The UMS ﬁt statistic
depicts the degree to which the observed ratings are consistent with the expected values,
and is sensitive to large residuals from pairings of item difﬁculty and ability estimates
that are far apart on the underlying scale. The UMS ﬁt statistic is reported as a chi-square

divided by its degrees of freedom, resulting in an expected value of 1.00 and a range ﬁ'om

27

0.00 to co. In rating scale analyses, thresholds with UMS ﬁt statistics < 2.00 are
considered to demonstrate adequate ﬁt (Linacre, 2002).

Coherence refers to the degree to which the observed ratings match the modeled
expectations for a particular rating scale category, and vice versa. Each rating can be
depicted as belonging to a rating scale m. The logit scale upon which athletes are
located can be demarked into measurement Agnes that are deﬁned by the locations of the
category thresholds. As illustrated in Figure 4, the location of each athlete can be
associated with one measurement zone (X axis—one zone per rating scale category with
a zone representing the expected rating for that coach-by-item combination), and the
observed ratings assigned by that athlete (X) can be placed in one of the rating scale
categories (located in the Y axis). Each point indicates a rating assigned by an athlete to a
single item. Note that two of the three ratings in Zone 1 are in the observed rating
category of l and one of these ratings is in the rating category of 3. Hence, 67% of the
measures in Zone 1 are go_h_emt_ with Category 1 observations. Similarly, three of the
ﬁve Category 2 observations fall in Zone 2. Hence, 60% of Category 2 ratings are

coherent with Zone 2.

28

 

Emaxm a. 00:82.8 98:66 m2. c 22—53% 96 oﬁomoa. 83888.

No.8 _ Nana N No.5 u No.5 A No.8 u

W 0000 W

Genome—Q m
0 000000
0.38.5. A M
o o o o o

9.33 ... W

.3 o o o o o M
09.32.! N s.

o o o

A

Guano—u. — . Hm: _ Hm: N Am: u Hm: a

- . _ - _ _ _ -
-wbc -Nbo -_ be abs :5 P8 wbc Poo

Nona _ -v Nona ~ -v Nona w -v Nona A -v Nona u -v
Gammon! _ OmgnmoQ N 08803. u Genome—u. 5. OmnomoQ m
n 3.x. u do} u. 3:. u 8.x. u we..\..

Genome—u. u -v Nona m u 39x.

08808. A -v Nona A u 3.x.

nﬁomoa. u -v Nona w n 8.x.

08803. N -v None N n 8.x.

nanomoQ _ -v Nosa _ .I. 3..\..

29

Because a number of Linacre’s guidelines (2002) were not realized in the original
rating scale structure (to be discussed in the Results section), a post-hoe approach was
applied to arrive at an improved rating scale structure. To determine a post-hoe structure,
categories were collapsed based on general principles (Linacre, 1995; Wright & Linacre,
1992) and statistical indicators (Zhu et al., 1997). General principles for collapsing
categories state that collapsed categories (a) should be explainable, and, (b) should
balance observed ﬁ'equencies as much as possible. Statistical indicators of improved ﬁt
for a post-hoe structure should include (a) improved model-data ﬁt statistics, (b) category
and parameter estimates that come closer to satisfying Linacre’s guidelines, and, (c)
separation indices should not decrease drastically as compared to the original rating scale
structure.6 Of interest in Rasch analysis are the person and item separation indices. In
non-technical terms (see Linacre, 1994, for a technical introduction), both of these
indices indicate how well the scale distinguishes individual people and items,
respectively, where larger values indicate greater separation.

Internal models. A MCF A approach to model ﬁt was used to evaluate the utility
of a unidimensional model (TCC) and a multidimensional model both within teams and
between teams (see Figure 3). Post hoc respeciﬁcation was considered only for models
that approached acceptable ﬁt (MacCallum, Roznowski, & Necowitz, 1992).
Empirically-based possibilities for model respeciﬁcation were guided by inspection of
Lagrange multiplier tests [LM] (Silvey, 1959), Wald tests (Wald, 1943), and standardized
residuals. LM values approximate the amount by which the model’s overall )6 would
decrease if the identiﬁed parameter were estimated. Wald test values estimate the amount

by which the model’s overall x2 would increase if the identiﬁed parameter were ﬁxed to

30

zero. Standardized residuals indicate the degree of misﬁt between the observed
correlation matrix and the predicted correlation matrix, where absolute values greater
than .10 can indicate misﬁt (Kline, 1999). Empirically-based possibilities were evaluated,
in part, based on relevant coaching effectiveness literature. Disattenuated correlations
among latent factors were also examined to depict the degree of redundancy in the
multidimensional models.

Once an intemal model was determined, variability of the factor structure between
sub-samples was evaluated to determine the degree to which the assumption that all of
the athletes were a random sample of a single population was reasonable. Sub-samples of
note were males (n = 165) and females (n = 420), soccer athletes (n = 403) and women’s
hockey athletes (n = 182), and year on team, where 1St = 150, 2nd = 135, and 3rd and 4’" =
190. Athletes in their third and fourth year were collapsed so that each sub-sample had at
least 100 observations. Because determining the degree of factorial invariance between
sub-samples was not the purpose of this study, and because of the relatively large size of
the sub-samples, only the invariance of factor loadings and factor covariances were
explored, and the alpha level selected for these comparisons was equal to .001.

Internal consistency reliability. The consistency of rank orderings of competence
estimates across measurement contexts was examined with reliability of separation
coefficients (or). The reliability of separation coefﬁcient is analogous to Cronbach’s

(1951) alpha, but it is based on estimates of true and error variance derived from IRT

models. Speciﬁcally, the reliability of separation for competency estimates = [V (d) -

MSE (é) ] / V09), where V(6l) is the variance of the competency estimates and MSE (d)

is the mean error variance of the competency estimates. This equation is comparable to

31

the TST theory deﬁnition of reliability as the ratio of true variance to observed variance.
As suggested by Kline (1999) and in relation to the given purpose, as greater than 0.90
were considered excellent, as greater than 0.80 were considered very good, and as
greater than 0.70 were considered adequate.

Forming measures to test questions 4 and 5. Satisfaction with the coach data were
calibrated to the RSM using Winsteps. In this case, the RSM described the probability
that a speciﬁc athlete (n) would rate a particular item (i) using a speciﬁc rating scale
category (It), conditioned on the athlete’s satisfaction with the coach (0,.) and on the
item’s difficulty (8,). The log-odds equation for this probability, log (Pk/ PM) = 0., - 8, -
1],, contains three parameters: 0“, 8,, and category threshold (tk). Calibration of the data to
the RSM resulted in a satisfaction estimate for each athlete. Estimates were on a single
linear continuum in logistic ratio units (logits). A logit is the natural logarithm of the odds
of an event. Because the data in this study were polytomous, odds were deﬁned by the
likelihood of assigning a rating in one category versus the odds of assigning a rating in
the next lower category.

A Principal Component Analysis (PCA) of the residuals from the Rasch model
was performed on the four indicators of satisfaction to judge the adequacy of the
assumption of unidimensionality. The residual from the Rasch model is deﬁned as the
difference between the observed rating and the model-based expectation, where the
expected score (E) for the nth athlete on the ith item was given by (Linacre, 1998):

m

15.:ka

m m'k , where k is the value of the rating scale category, ranging from 0 to a
k=0

maximum number, m, and Pm, is the probability of observing a response in category k for

32

athlete n on item i as deﬁned by the Rasch model. Scaling data to the Rasch model is
equivalent to extracting the ﬁrst principal component, which is referred to as the Rasch
component, from the data with the restriction that all items have equal loadings on that
component (McDonald, 1985). A PCA of the residuals reveals whether systematicity
(i.e., multidimensionality) exists once the variance accounted for by the Rasch
component was taken into account. That is, the results of the PCA indicate whether the
assumption of unidimensionality is tenable.

Competency data also were calibrated to the RSM using Winsteps, although
initially these data were attempted to be ﬁtted to a multidimensional extension of this
model as implemented in ConQuest (Wu, Adams, & Wilson, 1998). But, due to a non-
positive deﬁnite psi matrix (i.e., high correlations among the factors, which will be
presented in the Results section), convergence to a stable solution for the
multidimensional model was not possible. Empirically, what this means is that at least
some of the factors could not be distinguished by the subjects within this particular
measurement model. Instead, four unidimensional models were ﬁt to the data, to produce
logit-based scores, guided by the ﬁnal internal model (i.e., MC, GSC, TC, CBC). The
decision to maintain separate factors was consistent with the results of the MCF A and
will be defended in the Discussion section. PCA of the residuals from these models were
not performed because the dimensionality of these data. was conﬁrmed in the MCF A.

Testing questions 4 and 5. Satisfaction and coaching competency measures were
dependent because athlete observations were nested within teams. Hierarchical linear
modeling (HLM), as implemented in HLM5 (Raudenbush, Bryk, Cheong, & Congdon,

2000), is well-suited to handle observed dependent data. Congruent with Horn’s model of

33

coaching effectiveness (see Figure 1), satisfaction with the coach was treated as the
dependant variable and the proposed positive relationships with coaching competency
were tested within and between teams. To avoid problems caused by multicollinearity,
bivariate correlations between coaching competencies and satisfaction with the coach, at
the individual-level and team-level, were explored to determine which competency
measure would be used as the independent variable.

Model-building consisted of at least three steps to test questions 4 and 5
(Raudenbush & Bryk, 2002). First, an unconditional model (i.e., Model 1) was imposed:

Level 1: £8 = ﬂog + r,g, where
ﬂog = the mean of athlete satisfaction for team g,
r,g= the unique effect of athlete i for team g,

Level 2: ﬂag = 700 + uog, where
700 = the average team mean of athlete satisfaction,

uog = the unique effect of team g on the average team satisfaction mean.
Of particular interest in Model 1 was the variance of r,g or the within team variance, 62w,
and the variance of uog or the between team variance, 0'23. These variances were used to
estimate the ICC of satisfaction with the coach.
Second, a random coefﬁcient regression model (i.e., Model 2) was imposed where
the coaching competency slope was estimated within teams and was free to vary between
teams:

Level I: Y ,g = ﬂog + ﬂlgmthlete ’s competency judgment),g + r,g, where
ﬂog = the mean of athlete satisfaction for team g,
,618 = the expected amount that an athlete’s satisfaction score would change
given a one-unit change in his/her competency judgment for team g,

r,g = the residual of athlete i for team g,

34

 

Level 2: ﬂag = 700 + uog
,6“, = 710 + ulg, where
700 = the average team mean of athlete satisfaction,
710 = the average satisfaction-competency slope across teams,
uog = the unique effect of team g on the average team satisfaction mean,

u ,8 = the unique effect of team g on the average satisfaction-competency slope.

Of particular interest in Model 2 was 7,0 and the variance of ulg , 6231. 710 was the average
satisfaction-competency slope and addressed question 4. 02,3, was the variance of the
estimated satisfaction-competency slopes, [3133, around the average satisfaction-
competency slope, 7,0. An alpha equal to .05 was selected for all hypotheses tests, and the
magnitude of the standardized betas were interpreted according to Cohen’s (1988)
guidelines for effect sizes, where 0.20, 0.50, and 0.80 indicated small, medium, and large
effect sizes, respectively.

Third, an intercepts as outcomes model (i.e., Model 3) was imposed where the
team competency score was added to model between team variance on yoo:

Level I .' Y,g = [908 + ﬂlgathlete ’s competency judgment),g + mg, where
ﬂog = the mean of athlete satisfaction for team g,
ﬂlg= the expected amount that an athlete’s satisfaction score would change

given a one-unit change in his/her competency judgment for team g,

r,g= the residual of athlete i for team g,
Level 2: ﬂag = 700 + 701(team ’s competency judgment)g + uog
ﬂlg = 7,0 + ulg, where
700 = the average team mean of athlete satisfaction,
701== the expected amount that team satisfaction would change given a
one- unit change in the team competency score,
710 = the average satisfaction-competency slope across teams,

uog = the residual of team g on the average team satisfaction mean,

35

u 1 g = the unique effect of team g on the average satisfaction-competency slope.
Of particular interest in Model 3 was yo, and the variance of uog, 0'23. yo, was the average
satisfaction-competency slope across teams and addressed question 5. 62,3,was the
variance of the adjusted average team satisfaction scores. Sport played (0 = soccer, 1 = ice
hockey) was entered as a Level-2 predictor to ensure that none of the ﬁxed effects varied
on the basis of sport played.

Model estimation and ﬁt. Final parameters were estimated via restricted
maximum likelihood, and differences in model ﬁt were examined via FIML estimation
(Raudenbush & Bryk, 2002). Relative ﬁt of nested models was evaluated with a szR
statistic and comparing CAIC values.

Reliability estimates. Point reliability estimates described how reliable, on
average, the slopes were based on computing ordinary least square regressions separately
for each team. A reliability estimate was provided by averaging individual team
estimates. Raudenbush and Bryk (2002) suggest, as a guideline, that the point reliability
estimate should be greater than .05. Slopes that do not meet this heuristic were candidates

to be ﬁxed across groups.

36

CHAPTER 4
RESULTS

Did Athletes Employ the Rating Scale Structure in the Manner that the Authors Intended?

Athletes did not employ the original rating scale structure, for either the omnibus
unidimensional conceptualization or the consecutive unidimensional conceptualization, in
the manner that the authors intended. Speciﬁc problems were observed with threshold
estimates-criteria (e), (h), and (i), and coherences-criteria (f) and (g) (see Table 1). In the
omnibus unidimensional model, all threshold estimates and most (65%) coherences failed
to ﬁrlly meet these criteria. In the consecutive unidimensional model, most (60%)
thresholds and more than one-third (3 8%) of coherences failed to fully meet these
criteria. In both conceptualizations, the lower portion of the scale was used less
frequently than the middle and upper end of scale, and the majority of problems were
observed in the lower and middle ranges with some problems associated with Category 9.

Post-hoe categorizations were evaluated in accordance with previously mentioned
criteria. The best ﬁtting, and accepted, post hoc categorization was a ﬁve-category
structure where responses were collapsed in Categories 0 and 1, Categories 2 through 4,
Categories 5 and 6, Category 7, and Categories 8 and 9 (see Table 1). The majority of
problems, according to Linacre’s (2002) guidelines, encountered in the omnibus
unidimensional conceptualization were improved in the post hoc categorization. All of
the problems, according to Linacre’s guidelines, observed in the consecutive
unidimensional conceptualization were improved in the post hoc categorization. Last,
person and item and separation statistics changed little in the new categorization, which

was consistent with Zhu et al.’s (1997) guidelines for post hoc categorization.

37

48.8.8 .

02.5.38 9R we... Ion. 55.3.“ and? M26228...

 

 

 

 

 

 

H2». 88......" 7.2.32.2. 0.2.8 2885. 4.8.5.98 0.5828 3.5....“
OOBbGnOn—OG OOBUGnGH—OO Goagnﬂaoa OOSUQHODOO OOBRHGDOG
Cain... $87.5." Clash: .88.-..8 Clash. .08.-..8 01%|... .8815." Gangs. 32-78
.39 A... c .3 3 .~ 8 8
2.8238. . .8... .8 .3 .N. .8 a. N. 3 we 3
N .8 3 8.. .8 .N
... ... .NS .3. 33 3 Na. .8 83 3 NS
. 3o 3. .3 a. .8
3 .8. ... a... 8.. 3.
3 N38 33 3. .8. 3.. a: 3.. 8. a... .8.
a 33 83 :5 :5 . .2 . .8 So so 3.. 33
a 8.... 3.. .2. .5 3..
e .3. .33 .8 . . a 3.. .33 .3 .N: .3 8..
a. c -9... - -93 - .8 - -93 8.8
33% . -..3 .93 -93 93 a... 9... -9... ...~. -5. tr...
aaaca N .93 -..3 -93 -...u .3
a .93 -98 .93 -5. .93 -98 .93 .93 -..8 -..3
. 93 .98 .93 .93 .9...
3 9.8 9.. 9... 93 9...
a 98 .... Z. .8 ...o .3 .... .a. .3 .3
. .... N. .. NS 98 93 a... .3 .3 8.3 8.8
m .8 8.8 .93 N3 93
e we. 8... .... a... 3.3 .3 u... .9... .3 .8
a. c 93 98 93 ..o. ...c
Seams... . S. ..o. S. 98 9d 93 .8 ...N .... ...o
38.. 3:8... N ..Nc ..No 93 .... 9.3
8 r... 93 ...e 98 .8 .3 ..o. 98 .8 r...
. .8 .8 r... .8 .2
a 98 98 93 .8 9...
a .8 9a.. .9. 98 .8 9e. 93 98 93 93
q 98 98 93 9o. 98 ..... 93 98 93 98
m 93 93 98 93 9..
e Z: .3 ...: S. 93 .2 .3 .2 Z. ....

38

A... a... a...
H.532...

.9:
._.MA
Luu
Arms
.cba
PcA
aha
FaA
v.8

VONQGM-th—O

Ac.\.
we:
3...
.A.\.
NN.\.
Nae}
mac}
mA.\.
A...\.
3.x.

3

0.8.82 Iv
3.3:...

VomﬂO\M-hWN—‘O

am...
an...
no...
nix.
93$
8.x.
qu\o
A ..\.
AA.\.
.33.

.m.

7.03:... Iv
038.5.

VOWQGUIAUN—O

v2.3... 3.........Q ohm
v2.8: 833...... Abe
.8... 833...... Sea

23.. win... on... 5&8... 3...... 8 38. m nc...o....o.

-w .A@

.93
—.au
who

m..\.

.3...

m ..\.
33.
um...

.33.

33.

ma...
we...
.33.

obA
Pm.
9.3

-ubw

-98
who
”ran.

8.x.
mm...

3.x.
3.x.
mo...

m . .X.
um...

aA.\.
Am.\.
3....

98
Po.
mbu

-9;
.92.
-~.wu
-23
.ebA
Pun
ruc
u. .0
Mac

mA.\.
Aq.\.
Au.\.
3.x.
uu.\.
.33.
u ..\.
we...
3.x.
we...

mm...
3.3.
Am.\.
Ao.\.
A .3.
Au...
A33.
mm...
max.
.33.

90.
who
PM.

-93

-obA
N.A©
A. .A

ac...
Am.\.

9.3.
S...
aA.\.
no...

.33.

3...
.9...
3....
93
.3

-N.Au
gs...»
Aha
.9:
Arms
P. A
F;
who
u b.

3.x.
Am.\.
33
—a.\.
NN.\.
uA.\o
Au.\.
um...
um...
Nao\o

mm...
Aw.\.
nae}
u. .\o
um.\o
uu.\o
A . .\.
Aq.\.
Am.\.
aw...

9mm
Nu:
mbm

Ado

.98
N...“
Pun

Am.\.

3.x.

3...
3.x.
Ac.\.

3.x.

.3...

33.
Am.\.
3....

Pam
Mum
Ahm

Luna
.N._..
ha...
-—.=m
Avg
chm
..—m
Now
Abe

8.x.
3 o\.
5.}
we!-
Ma...
moi.
uw.\.
an...
3.x.
Ao.\.

3.x.
mm...
was}
we.\.
ma...
AN.\.
AA.\.
3.x.
um...
3A.\.

93
N3
93

-u..A

.98
~. .A
ubu

mu...
am...

3.x.
an...
Aa.\.

ac...
3N...

3...
u ..\.
3A...

Paw
N:
9A..

39

To What Degree did the Proposed Internal Models Fit the Data?

Step1. The proposed unidimensional model and multidimensional model were
imposed on ST. The unidimensional model ﬁt the data poorly, x20”) = 2565.41, p <.001,
ledf= 10.18, CFI = .81, TLI = .79, SRMR = .07, and RMSEA = .13. The
multidimensional model ﬁt better than the unidimensional model, CAICdm= 1316.56.
However, the multidimensional model exhibited only marginally acceptable ﬁt to the
data, x2046) = 1204.93, p <.oo1, xz/df= 4.90, CFI = .92, TLI = .91, SRMR = .05, and
RMSEA = .08. In both models the x2 value was likely inﬂated due to the fact that the
multilevel structure of the data was ignored (Julian, 2001).

Step 2. ICC values for the 24 CCS items ranged from .16 to .36 (M = .25, SD =
.06; see Table 2), which made it reasonable to proceed to Step 3. In order to proceed to

Step 3 and Step 4, Spw and 8*3 were calculated (see Table 3 and Table 4, respectively).

40

Table 2

Item Characteristics for the C CS

 

 

Item M SD skewness kurtosis ICC
MCl 3.67 1.07 -0.33 -0.69 .22
MC3 3 .76 1.06 -0.36 -0.79 .24
MC6 3.42 1.12 -0.09 -0.88 .29
MClO 3.61 1.11 -0.31 -0.72 .31
MC 1 2 3.70 1.10 -0.37 -0.71 .29
MC15 3.40 1.1 1 -0.10 -0.78 .25
MC23 3.71 1.12 -0.47 -0.65 .31
GSC2 4.16 0.94 -0.90 0.14 .21
GSC4 4.13 0.99 -0.86 -0.10 .24
GSC8 3 .97 1.02 -0.63 -0.42 .24
GSC9 4.15 0.98 -0.92 0.06 .20
GSCll 3.82 1.06 -0.51 -0.54 .16
GSC17 3.84 1.03 -0.53 -0.54 .22
GSC2] 3 .92 1.05 -0.56 -0.65 .18
TC7 3.91 1.08 -0.52 -0.80 .34
TC 1 4 3.61 1.03 -0.21 -0.68 .20
TC16 3.71 1.04 -0.32 -0.69 .20
TC18 3.87 1.12 -0.62 -0.52 .16
TC20 3 .85 1.03 -0.55 -0.42 .20
TC22 4.02 1.02 -0.77 -0.16 .26
CBCS 3.94 1.12 -0.76 -0.40 .35
CBC 1 3 3.88 1.09 -0.62 -0.51 .25
CBC 1 9 4.09 1.06 -0.93 -0.04 .36
CBC24 3 .95 1.13 -0.80 -0.29 .29

 

Note. ICC = intraclass correlation.

41

adv—a w

3833 33.33.333.33». 95.3385. ask 032.8358

 

 

l 3 6 m u U 3 2 4 8 9 H U ml... 7 4 6 8 0 2 5 B .01, M.

C C C 2 C C C C .l I. .l. 2 2 C C C C

Mmmmmmmaamammmmmmmmmmwmm
:0. .3 :3 .3 :3 :3 .3 .33 .3 .33 .33 .3 .33 .3 .3». .33 .33 :3 .33 :3 .33 .33 .33 :3 .33
30w .3 .3 .33 .3 .33 .3 .3 3 3 .3 .33 .3 .3 AV .3 3 x3 .3 .3 .3 :3 :3 :3 :3
303 .3 .3 .2 :3 .3 3 3 .3 33 .3 .3 .33 .3 .3 3w .3 :3 .33 .3 .33 .3 .3 :3 .33
:03 m3 .3 .3 .3 :3 .3 .3313 :3 .33 .33 :3 .33 :3 .3 :3 :3 .3 .3 .33 x3 .33 .3 .3
305 .3 :3 .33 .3 .3 :3 .3 .3 .3 13 .3 :3 :3 .3V :3 .33 :3 :3 .33 x3 33 .3 .3 :3
303 .3 .3 .3 .3 .3 bu .33 :3 .3.» .3 :3 :3 :3 :3 .3 .3 .33 :3 :3 :3 :3 .33 .3 :3
anw .3 .3 .3 .3 .3 .3 .3 .3 .33 .33 .3 :3 :3 .3 :3 :3 :3 :3 .33 .3 .33 .33 13 :3
033 .3 .3 .3 .3 .3 .3 .3 .3 :3 A3 .3 .3 :3 :3 .3 .3 .3 .3 .3V .3 .3 .3.» .3 :3
Own.“ .3 .AN .3 .3 .3 .3 .3 .3 .3 .3 :3 :3 .3 :3 :3 .33 .3 :3 :3 .3 .33 .3 .33 .3
Oman 2.: .33 .3 33 .2 .33 .3 .3 m3 .3 A3 :3 :3 .3 :3 .3 .33 .3 :3 :3 .3 .3 .3 .3
OwOe .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 :3 :3 :3 .33 .3 .3 .33 .33 .33 :3 .33 .3 .3
Qm0_ _ .3 .3 .3 .3 .3 .3 ha .3 x3 .3 ha .35 .3 :3 .3 :3 :3 .3 :3 :3 .3 .3 V .3 .3
Own: .3 .3 .3 .3 .3 x3 33 be .3.» ha .3 .3 .3 .3 .3 .3 .3 .33 :3 :3 x3 .3 .33 .33
DmOB .3 .3 .3 .3 .AN .3 m3 .3 .Am .3 .33 .3 .3 .3 .3 :3 .3 :3 .3 .3 .33 .3 .33 .3
4.0.3 .3 .u. .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 3m .33 .3 :3 .3 :3 .33 :3 :3 :3
3.03 .A. :3 .3 .3 .3 .3 .3 .3 .3 .Ac .3 x3 .3 x3 .3 .3 A3 :3 :3 .3 .3 .3 .3 :3
ad 3 .3 .3 .Am .3 .33 .3 .3 .3 .3 .3 .3 .Am .3 x3 .3 .3 .3 :3 :3 A3 .33 .3 .3 .3
HO _ a .3 .3 .3 .Am 33 .3 be .3 .3 x3 .3. .3 .3 .3 .3 .3 .3 :3 :3 :3 .33 .33 :3 :3
H03 .3 .3 .3 .3 .AN .3 .Au .3 .AN .AN .3 33 .Am .3 .3 .3 .3 .3 .3 :3 .3 .33 .33 .3
H03 .3 .3 :3 .3 .3 .3 .3 .3 .2 .AN .3 .3 .3 .3 .3 .3 m3 33 x3 .3 .3 2.3 .3 V .3
nwnm .53 .3 .3 x3 .3 .3 .3 .3 .3 .3 .3 .53 .3 .3 .3 .3 .3 .3 .3 .3 .3 :3 .3 .3
own _ u .3 .3 .3 .3 he .3 km .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .83 .3 .33 .3
own; .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3 .3: .Na .3 .3 .3 .3 .3 .3 x3 .3 .3
Gamma .3 .3 .Au .3 .3 .3 .3 .3 .3 .mm .3 .3 .3 be .3 .3 .3 .3 .Ac .3 .3 .3 .3 .3—

 

22» 5.332. oaiom 3 5a :25. Emma—5. m8 :5 8.3.388“ songs—manna 2:13. Ed 8&1888.

42

32a A

@833 33.68.- 3.33% 0933588. $3 035.33an

 

 

.l 7 .l 3 9 4

M M M M M M M G G G G a b a T T T T T T c m m m

30— bb .%A .33 bb 3% .33 b3 .3 bb .Vb .bb .3 3% .3 A3 .%3 .%b .3 b% .3 .%3 bu A3 .%3
30b A3 b3 .3 .%b .%A .33 .%A .%b .%b .%3 .%b .%b .33 .%N :3 .3 .%b .%b .3 .3 .3 .3 bu .%3
30b bb b._ qb .%V .%3 .33 .33 b3 :3 :3 :3 b% b% bb :3 .3 .3 .3 bu 3% .%3 .VV .3 .%%
Z05 bb bb b3 3.3 .%b .%% .33 .3 .3 bu bb .3 .%3 .3 .AV .3 3% .%3 b% bb .%b .3 .3 .%b
:0 5 Ab b.A bb bb ab .%3 .%% .3 bb bu .bb 3% .%b .33 .33 b% .3 .33 .3 bb .%A .%b . 3 .33
:03 bb Ab bb bb bb bb .%3 A3 bb bb :3 bb .Vb bu bb .3 .3 .3 bb b% .%b .VV .3 .%b
:03. bb bb qb 3b bb b.A 3b bb bb bw .bb .3 3% .3 :3 .3 .3 .mb bu bb .%V .%3 .3 bb
Om0b b.A Ab bb bb Ab bb bb A._ .3 bb bb bb bb .%b b% .3 .%b .%b .%b .%3 b% bu bb .Vb
Dm0A b3 Ab b3 Ab Ab buy A._ A._ b._ bb bb .3 .33 .%3 .3 .%b .%V .%A .%V .%% bb .33 b3 bb
Qm0b bb Ab b3 Ab A.A bb Ab Ab Ab bb bb .3 bb .3 .3 .%b .%V .%b .%% .%b bu b% bb 3%
Qw0b b._ Ab b._ bb bb Nb bb Ab Ab A.A A.A .%3 .%b .%b bb 3% .%b 3% .%b .3 bb bb :3 .33
Qm0: bb Ab bb Ab Ab bb Ab bb Ab Ab bb Ab bb bb bb .%A .%3 .%% .33 .%b .3 . 3 .3 3%
Qw0: Ab Ab Ab bb Ab Ab bb Ab Ab Ab Ab A.A b._ bb bb .%3 .%3 .%b .%% .%A .3 .VV .3 3%
OM02 bb A. _ bb Ab Ab b.A Ab bb Ab A.A b3 Ab A.A A.A .3 .3 .%b .%b .33 .%b .3 .3 .3 .3.
40.3 Ab bb A.A b3 bb Ab Ab bb A3 Ab bb b3 Ab b3 bb 3% 3% b3 bb .%A b% :3 .33 :3
H0 _A A._ Ab A.A Ab Ab Ab A.A bb Ab Ab bb bb Ab bb Ab Ab bb 3% .%b .33 .3 .3 .3 b%
H0 _ b Ab A.A Ab Ab A3 Ab Ab bb Ab A.A b3 A.A A.A Ab Ab Ab Ab .%% .33 .33 .3 .3 bb .3
3.0 A b bb Ab Ab Ab Ab A. _ Ab b3 A. A Ab bb Ab A.A bb bb b3 Ab Ab .%b 3% .3 .3 .3 3%
d0bb bb A._ bb A._ Ab bb A._ b3 Ab A.A bb Ab Ab A._ Ab Ab Ab Ab A3 .%% .3 .3 .3 .3
._.0bb Ab Ab A.A A.A A.A A.A Ab bb Ab Ab bb Ab Ab Ab b3 A3 bb A._ Ab b3 .3 .3 .bb .3
0w0b bb bb qb bb b3 b.A 3b A._ A.A Ab bb Ab bb Ab Ab Ab bb Ab Ab bb bb .3 bb bA
0w0_b bb Ab bb b._ b3 Ab bb bb bb bb bb Ab Ab bb bb b3 A._ Ab Ab Ab b3 bb bb .3
0&0; A.A Ab bb bb bb b._ b._ bb bb bb bb Ab Ab Ab bb bb Ab Ab A.A A._ 3.3 bb bb .3
0WONA bb bb b3 bb bb bb .3.— Ab Ab A.A bb A.A Ab Ab A._ A._ Ab A3 A.A Ab 3b bb .3.— 3b

 

43

Step 3. Prior to imposing the internal models on Spw, item-level correlations‘were
examined. Correlations among items within proposed subscales appeared to be greater
than correlations between items outside of respective subscales: MC items with MC items
(M = .58, SD = .07), MC items with non-MC items (M = .47, SD = .06); GSC items with
GSC items (M = .58, SD = .05), GSC items with non-GSC items (M = .48, SD = .06); TC
items with TC items (M = .54, SD = .08), TC items with non-TC items (M = .48, SD =
.06); and, CBC items with CBC items (M = .59, SD = .04), CBC items with non-CBC
items (M = .45, SD = .05). This provided some support for the multidimensional model.

The proposed unidimensional model and multidimensional model were imposed
on Spw (see Table 5). The unidimensional model ﬁt the data poorly, x2052, = 1732.15, p
<.001, xz/df= 6.87, CFI = .83, TLI = .82, SRMR = .06, and RMSEA = .10. The
multidimensional model ﬁt better than the unidimensional model, CAICdm= 715.84.
However, the multidimensional model exhibited only marginally acceptable ﬁt to the
data, {(2.6) = 972.39, p <.001, xz/df= 3.95, CFI = .92, TLI = .91, SRMR = .05, and
RMSEA = .07. As expected, both models appeared to ﬁt the data better than the parallel
models in Step 1.

Because the multidimensional model marginally ﬁt the data, model
respeciﬁcation was considered. None of the Wald tests were statistically signiﬁcant. LM
values and standardized residuals suggested several potential modiﬁcations, three of
which were considered defensible and were adopted: correlated error terms between
items GSC2 (recognize Opposing team ’s strength during competition) and GSC9
(recognize opposing team’s weaknesses during competition), and GSC9 and GSC8

(adapt to diﬂrent game situations), and MC3 loaded on MC and GSC. In regard to the

correlated errors, both pairs of items may have measured athletes’ perceptions of their
coach’s competence in recognizing and making critical decisions about the other team
during competition, in addition to measuring a more general competency to lead during
competition. In regard to the within item multidimensionality for MC3 (mentally prepare
athletes for game strategies), it makes sense that it may have measured both MC
(athletes ’ evaluations of their head coach’s ability to aﬂect the psychological mood and
skills of athletes) and GSC (athletes ’ evaluations of their head coach ’s ability to lead
during competition).

First, error terms for GSC2 and GSC9 were allowed to correlate (r = .39, p = .02);
and this model ﬁt better than the previous model, 38”.“, = 75.55, p <.001, CAICm=
68.23, and negated what had been the second largest standardized residual (.15). Second,
error terms for GSC9 and GSC8 were allowed to correlate (r = .27, p = .02); and this
model ﬁt better than the previous model, szR“) = 42.76, p <.001, CAICdm= 35.44, and
dramatically reduced (.02) what had been the ﬁﬁh largest standardized residual (.12).
Third, MC3 was allowed to load on both MC, .46, SE = .09, and GSC, .57, SE = .11; and
this model ﬁt better than the previous model, X213“) = 28.67, p <.001, CAICdm= 21.35,
and dramatically reduced (.05, .04, .03) what were three of the eight largest standardized
residuals (.13, .12, .11). Overall, this ﬁnal model marginally ﬁt the data, X2043) = 825.41 ,
p <.001, xz/df= 3.40, CFI = .93, TL] = .93, SRMR = .04, and RMSEA = .07. Table 6
illustrates within-teams estimates for this model.

Factors in the retained model were moderately to highly correlated with one
another, with the strongest association occurring between GSC and TC, r = .92, and the

weakest associations occurring between CBC and GSC, and CBC and TC, r = .76.

45

Because of the high correlation between GSC and TC another model was speciﬁed where
all of these items loaded on only one factor. This simpler model ﬁt worse than the
previous model, CAICdm= 53.74. Thus, the previous model was retained as the ﬁnal
model. Reasons for retaining, and implications of, subscales with such high correlations

with one another will be explicated in the Discussion section.

46

ﬂea—o m

3289639 3‘ M8523

 

 

 

 

:89 9o 3 3% 9o; 3.... 020 names a: a: was» 83mm»,
Maw u Q: mesa
Czamagmmosa 33.3 Mum 93 ..... l--- N393 l--- Paw Paw oba 95
zeaaaoeaoea 88.8 88 98 ..... o 83.3 :93 98 98 98 93
5588588: 893 88 8.8 8.8 a 88.3 8.8 98 98 93 93
83888588: 898 8t 98 8.8 _ 83.8 8.: 98 98 98 93
83888588: 89.: 88 9.5 8.3 _ 8.8.8 8.8 98 98 98 93
mam A 3: 9.2 Ba mowv
Coaéoeeoea 8898 8_ 8.8 .......... 8:93 ..... 98 98 9; 98
wagon:
zaeeaoaaoa 32.8 88 98 ..... a 8893 8.8 98 98 93 98
0388:

35883828 88.8 8: 8.3 8.8 o 829: 9: 98 98 98 98

 

23m. &\u mamaoom om moon—ea. «SE n mam—dam em @6803 “—5.938 83$me 8 So “.8305 Senor 020 n
8:839: >588 mama—.538: 01813. 02 M 833835 w: :53. a: u Hen—nourrocsm manner mESW n
mgamamuaa 33 Bow: 3:88 8895—. men 35mm.» n 83 Bow: $.58 2.8.. cm 3681338?

47

Step 4. Prior to imposing the internal models, item-level correlations within 8*3
were examined. Correlations between items within proposed subscales appeared to be
greater than correlations between items outside of respective subscales: MC items with
MC items (M = .87, SD = .06), MC items with non-MC items (M = .72, SD = .09); GSC
items with GSC items (M = .91, SD = .02), GSC items with non-GSC items (M = .74, SD
= .09); TC items with TC items (M = .83, SD = .09), TC items with non-TC items (M =
.74, SD = .09); and, CBC items with CBC items (M = .92, SD = .02), CBC items with
non-CBC items (M = .72, SD = .09). This provided some support for the
multidimensional model.

Using the multi-group procedure, the accepted within teams model was speciﬁed
in both groups for all subsequent analyses. The scaling factor, c*, equaled 4.27. Both a
between teams null model, 12(567) = 2087.01, p <.001, and a between teams independent
model, x2(541)= 1751.25, p <.001, were rejected. In both models, the estimated
covariances of the pair of correlated error terms (GSC8 and GSCll = .139, GSC10 and
GSCll = .096) were inserted as starting values due to convergence problems.7 To
maintain model comparability, these starting values were inserted ﬁom this point
forward. Next, the unidimensional structure was speciﬁed between teams (see Table 5).
This model ﬁt the data poorly to marginally, 9662,, = 1539.33, p <.001, ledf= 2.95, CFI
= .92, TLI = .91, SRMR = .18, and RMSEA = .06. Next, the original multidimensional
structure (i.e., no correlated error terms or cross-loadings) was speciﬁed between teams
(see Table 5). This model ﬁt better than the previous model, CAICdm= 86.35. However,

this model exhibited only marginally acceptable ﬁt to the data, X2615) = 1401 .39, p <.001,

48

xz/df= 2.72, CF I = .93, TLI = .92, SRMR = .07, and RMSEA = .05. Because the
multidimensional model marginally ﬁt the data, model respeciﬁcation was considered.

Due to concerns associated with a modest sample size at Level 2 (e.g., stability of
between team estimates), only one model respeciﬁcation was considered defensible:
MC3 loaded on both MC and GSC. MC3 was allowed to load on both MC, .50, SE = .13,
and GSC, .66, SE = .15; this model ﬁt better than the previous model, szR“) = 15.50, p
<.001, CAICdm= 8.13, and dramatically reduced (range = .00 - .05) what were seven of
the 11 largest standardized residuals at the between-level (range = .28 to .19). Overall,
this ﬁnal model marginally ﬁt the data, 12514) = 1385.89, p <.001, ledf= 2.70, CFI = .93,
TLI = .93, SRMR = .06, and RMSEA = .05. Table 6 illustrates between-teams estimates
for this model. Because a multidimensional model was retained at both the within-teams
and between-teams levels, the remaining analyses focus on multidimensional estimates of
coaching competency only.

Factors in the between teams portion of the model were also moderately to highly
correlated with one another, with the strongest association occurring between GSC and
TC, r = .93, and the weakest association occurring between MC and CSC, and GSC and
CBC, r = .74. Because of the high correlation between GSC and TC another model was
speciﬁed where all of these items loaded on only one factor. This simpler model ﬁt worse
than the previous model, CAICdm= 72.00. Thus, the previous model was retained as the
ﬁnal model. Reasons for retaining, and implications of, such high correlations among

subscales will be explicated in the Discussion section.

49

Table 6

Within- Teams and Between- Teams Estimates

 

Within teams

Between teams

 

Item Loading SE Std Std Loading SE Std Std
Loading Residual Loading Residual

MC 1 1.00‘ 0.00 .74 .68 1.00‘ 0.00 0.98 0.20
MC3 .46”.57“ .09b.11° .35".36c .74 .sob‘.66°‘ .13”.15° .48b.55° 0.26
MC6 1.08“ 0.06 .79 .62 1.18“ 0.1 1 0.97 0.25
MC 10 1.05* 0.06 .79 .61 1.18“ 0.11 0.97 0.25
MC 12 0.95" 0.06 .71 .70 1.05* 0.14 0.89 0.45
MC15 1.15* 0.06 .83 .55 1.07“ 0.1 l 0.96 0.29
MC23 1.1 1" 0.06 .83 .57 1.22* 0.1 1 0.99 0.17
GSC2 1.00‘al 0.00 .70 .71 1.00‘I 0.00 1.00 0.00
GSC4 1.10* 0.07 .75 .66 1.13* 0.10 0.98 0.19
GSC8 1.17* 0.07 .77 .64 1.14" 0.10 0.99 0.14
GSC9 1.05* 0.05 .7] .70 1.03" 0.08 0.99 0.14
GSC] 1 1.26“ 0.07 .77 .64 .99“ 0.09 1.00 0.00
GSC17 1.22* 0.07 .79 .61 1.11"“ 0.09 0.99 0.06
GSC21 1.27* 0.07 .79 .62 0.99" 0.10 0.98 0.18
TC7 l .00‘ 0.00 .62 .79 1.00II 0.00 0.83 0.56
TC 14 1.29" 0.09 .76 .65 0.84“ 0.14 0.97 0.24
TC 1 6 1.35* 0.09 .79 .61 0.88" 0.14 1.00 0.00
TC 1 8 1.38“ 0.09 .73 .68 0.75" 0.14 0.92 0.39
TC20 1.25“ 0.09 .74 .68 0.81 * 0.14 0.96 0.27
TC22 1.24* 0.08 .77 .64 0.96* 0.15 0.99 0.17
CBCS 1.00‘I 0.00 .78 .63 1.00‘| 0.00 0.99 0.15
CBC 13 0.98" 0.06 .73 .68 0.78* 0.07 0.97 0.25
CBC 19 0.96“ 0.05 .79 .62 0.89* 0.08 0.95 0.32
CBC24 1.06“ 0.06 .79 .62 0.92“ 0.06 0.99 0.06

 

Note. " indicates that the parameter was ﬁxed to set the metric; * indicates statistical
signiﬁcance at the .001 level; b indicates MC; and ° indicates GSC.

50

Comparisons of the vast majority (98%) of factor loadings (n = 21 for each
comparison) and factor covariances (n = 6 for each comparison) did not reject the
hypothesis of invariant structures across sub-samples. All of the factor loadings, and most
of the factor covariances were not signiﬁcantly different for males and females (see Table
7). The two factor covariances that were signiﬁcantly different suggested that CBC was
more strongly related to MC and TC for females, r = .84 and .77, than for males, r = .70
and .69, respectively. None of the said estimates varied by sport (see Table 8) or year on
team (see Table 9). Because 106 of 108 of the said estimates were not signiﬁcantly
different across selected sub-samples, assuming that all of the athletes were a random
sample of a single population was considered reasonable. It is noted that the possible
violations of invariance of the identiﬁed factor covariances would likely cause greater

misﬁt than would be observed if all of the estimates were invariant.

51

Table 7

Comparisons of F actor Loadings and Correlations Among Factors by Gender

 

 

 

 

 

 

Standardized Unstandardized Standard Factor
Loading Loading Error Correlation
female male female male female male female male
V] are * "‘ * * * * .... ----
v2 mc .36 .41 0.46 0.51 .08 .16 --- ----
v2 gsc .43 .36 0.59 0.53 .08 .18 --- --
v3 mc .84 .78 1.12 1.03 .06 09 ---- ----
v4 mc .84 .80 1.13 1.02 .06 09 -- ----
v5 mc .77 .69 1.03 0.87 .06 09 -- -~
v6 mc .85 .87 1.10 1.17 .06 .09 --- ----
v7 mc .88 .82 1.22 1.02 .06 .09 ---- --
vg gsc t t It at at It: ____ ___.
v9 gsc .81 .80 1.07 1.11 .06 .10 --- ----
v10 gsc .86 .80 1.14 1.20 .06 .11 --- ----
v11 gsc .81 .77 1.05 1.09 .06 .11 --- ----
v12 gsc .81 .80 1.13 1.21 .06 .11 --- ----
v13 gsc .82 .82 1.13 1.14 .06 .10 ---- ----
v14 gsc .83 .79 1.16 1.14 .06 .11 ---- ----
v15 tc "' * * * * * --.. --..
v16tc .81 .74 1.13 1.15 .07 .16 ---- ----
v17 to .83 .84 1.15 1.43 .07 .18 ---- ---
v18 tc .79 .67 1.21 1.22 .08 .18 -- ----
v19 tc .79 .76 1.12 1.18 .07 .16 ---- ----
v20 to .83 .75 1.17 1.10 .07 .15 ---- «—
V21 CbC * t t t t * -_-_ __-_
v22 cbc .82 .64 .93 .88 .05 .10 ---- --
v23 cbc .85 .78 .97 .96 .04 .09 ---- ---
v24 cbc .86 .81 1.03 1.06 .05 .09 ---- ----
mc 85¢ "“ ..-- --- --- --- ---- .75 .84
mc “3 m- --- ---- --- ---- ---- .81 .85
me cbc --- .83 .70
85° to "“ ---- ---- --- ---- ---- .92 .91
gsc cbc -- -- --- -- .76 .68
t9 cbc "" “*- “" ~- -- ---- .77 .69

 

Note. v = variable; me = motivation competence; gsc = game strategy competence; tc =

technique competence; cbc = character building competence; "' = ﬁxed; Bolded cells

indicate signiﬁcant (p < .001) invariance between estimates.

52

Table 8
Comparisons of F actor Loadings and Correlations Among Factors by Sport

 

 

 

 

 

 

Standardized Unstandardized Standard Factor
Loading Loading Error Correlation
soccer hockey soccer hockey soccer hockey soccer hockey

V1 “10 t t t t i * -___ ____
v2 mc .41 .31 0.52 0.38 .08 .1 1 ---- ---
v2 gsc .35 .53 0.54 0.65 .10 .11 --- ---
v3 mc .83 .86 1.13 1.03 .06 .07 ---- ----
v4 me .82 .88 1.09 1.08 .06 .07 --- ---
v5 me .72 .84 0.94 1.01 .06 .07 -- ----
v6 mc .87 .85 1.17 1.00 .06 .07 --- ----
v7 mc .87 .88 1.16 1.13 .06 .07 --- ---
V8 gsc e at e t It at ..-- -__
v9 gsc .79 .85 1.14 1.02 .07 .06 --.. ----
v10 gsc .80 .91 1.19 1.12 .07 .06 --- ---
v11 gsc .77 .87 1.11 1.01 .07 .06 ---- ---
v12 gsc .77 .86 1.16 1.12 .07 .07 ---- ----
v13 gsc .77 .89 1.12 1.15 .07 .07 --- ----
v14 gsc .77 .87 1.17 1.11 .08 .07 ---- ----
v15 tc * * * r * * ..-- ----
v16 tc .78 .86 1.25 1.04 .10 .08 ---- ----
v17 to .82 .87 1.31 1.08 .10 .08 ---- ----
v18tc .69 .85 1.15 1.19 .10 .09 --- ---
v19 to .76 .83 1.17 1.05 .10 .08 --- ----
v20 to .78 .90 1.22 1.09 .10 .08 ---- ---
v21 cbc * * ‘* * * * ..-- ----
v22 cbc .75 .85 0.88 0.92 .05 .06 -- ---
v23 cbc .82 .88 0.92 0.96 .05 .06 ---- ----
v24 cbc .84 .87 1.00 1.00 .05 .06 ---- ---
mc gsc --- .74 .82
mc “3 m- --- ---- --- ---- ---- .81 .87
me cbc --- -—-- --- -- —- .79 .85
850 “3 “"- ---- ---- --- --- ---- .88 94
gsc cbc ~--- -- -- -- .67 .84
tc cbc -- -- -- .... .... ..-- ,7] 34

 

Note. v = variable; me = motivation competence; gsc = game strategy competence; to =
technique competence; cbc = character building competence; "' = ﬁxed; Bolded cells
indicate signiﬁcant (p < .001) invariance between estimates.

53

Table 9

Comparisons of F actor Loadings and Correlations Among Factors by Year

 

 

 

 

 

Standardized Unstandardized Standard Factor
Loading Loading Error Correlation

1 st 2nd 3rd 1 st 2nd 3rd 1 st 2nd 3rd 1 st 2nd 3rd
V1 mc * t It i t t t t # ____ ____ ____
v2 mc .51 .45 .25 0.67 0.55 0.30 .13 .11 .10 ---- ---- ----
v2 gsc .25 .40 .58 0.37 0.55 0.82 .14 .12 .12 ---'- ---- ---
v3 mc .82 .87 .84 1.10 1.14 1.10 .08 .09 .08 ---- ---- ----
v4 mc .85 .85 .83 1.14 1.11 1.06 .08 .09 .08 an ---- ----
v5 mc .75 .81 .80 0.99 1.04 1.00 .08 .10 .08 ---- --- ----
v6 mc .84 .92 .83 1.13 1.19 1.06 .08 .09 .08 ---- --- ----
v7 mc .87 .91 .85 1.16 1.23 1.08 .08 .09 .08 ---- ---— —--
v8 gsc c c at: at a: at t t e __-_ ,___ __-_
v9 gsc .79 .84 .83 1.09 1.04 1.10 .08 .09 .09 --- ---- ----
v10gsc .85 .85 .83 1.24 1.06 1.12 .09 .09 .09 ---- ---- ----
v11 gsc .79 .80 .83 1.08 0.98 1.11 .08 .09 .09 ---- ---- ---
v12 gsc .81 .84 .77 1.21 1.14 1.08 .09 .09 .09 --- ---- ----
V13 gsc .85 .87 .77 1.25 1.15 1.02 .09 .09 .09 ---- ---- ----
v14 gsc .81 .85 .78 1.19 1.21 1.06 .09 .10 .09 ---- ---- ----
v15 to t t a: t a: a: an t a. ____ ____ ____
v16tc .82 .80 .78 1.20 1.22 0.98 .11 .14 .10 ---- ---- ----
v17 to .86 .87 .79 1.28 1.35 1.00 .1 1 .14 .10 ---- ---- --
v18 tc .77 .82 .71 1.22 1.34 0.98 .11 .15 .11 --- ---- ----
v19 tc .79 .85 .79 1.18 1.19 1.03 .11 .13 .10 --- ---- ---
v20 tc .83 .84 .80 1.23 1.24 0.98 .1 1 .13 .10 --- ---- ----
v2] cbc * * I. I! t It It II t ____ ____ ____
v22 cbc .76 .87 .77 0.84 1.00 0.91 .06 .07 .08 ---- ---- ----
v23 cbc .88 .88 .77 0.94 1.00 0.89 .05 .07 .07 ---- ---- ----
v24 cbc .87 .88 .86 1.03 1.04 1.05 .06 -08 .07 ---- --- ---
mc gsc ~--- -- --- --- --- -- .82 .75 .73
me tc --- .85 .87 .79
me cbc --- --- .79 .85 .84
gsc tc --- --- .93 .91 .87
gsc cbc --- --- -- --- .74 .76 .74
tc cbc ---- ---- --- ---- --- ---- ---- -- --- .75 .85 .72

 

Note. v = variable; mc = motivation competence; gsc = game strategy competence; tc =

technique competence; cbc = character building competence; "' = ﬁxed; Bolded cells

indicate signiﬁcant (p < .001) invariance between estimates.

54

How Reliable were the Rank Orderings of Coaching Competency Estimates?
Reliability coefﬁcients were .90 GVIC), .87 (GSC), .85 (TC) and .82 (CB). These
coefﬁcients indicated very good to excellent levels of internal consistency for

multidimensional coaching competency estimates.

Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach
Within Teams?

Prior to answering this question, the assumption of unidimensionality for the set
of satisfaction items was assessed. The Rasch component extracted at least one—half of
the total variance in each of the items (range of extracted communalities = .49 to .86),
produced an eigenvalue equal to 2.91 , was reliably measured (range of loadings = .70 to
.93 ), and accounted for 73% of the total variance. The eigenvalue for the next unaccepted
component produced was equal to .49. The reliability of separation coefficient for the set
of items was .75. Hence, treating these logit-based estimates as reliable measures of
satisfaction was judged to be justiﬁable.

Next, correlations between logit-based coaching competencies and satisfaction
with the coach scores were explored at the athlete-level and team-level (see Table 8). At
both levels, all coaching competency subscales had at least a moderately high correlation
with satisfaction with the coach (range = .61 to .85). At the athlete-level, MC had the
highest correlation, .70, with satisfaction, with TC having the next highest correlation,
.66. At the team-level, MC also had the highest correlation, .85, with satisfaction, with
CBC having the next highest correlation, .82. Because MC was highly correlated to the
other coaching competencies at both levels (range = .79 to .91), it was selected as the sole

independent variable, at both levels, to avoid problems with multicollinearity.

55

Table 10

Correlations Between Competency Judgments and Satisfaction with the Coach at the

 

 

Individual and Team-Level

SWC MC GSC TC CBC
SWC ----- 0. 70 0.63 0.66 0. 61
MC 0.85 mu 0. 82 0. 81 0.79
GSC 0.77 0.88 ----- 0.8 7 0.72
TC 0.78 0.88 0.94 ----- 0. 72
CBC 0.82 0.91 0.84 0.84 -----

 

Note. Italicized entries in the upper diagonal are the athlete-level correlations and the
team-level correlations are in the lower diagonal. All of the correlations were
statistically signiﬁcant at the .0005 level. SWC = satisfaction with coach; MC =

motivation competence; GSC = game strategy competence; TC = technique

competence; CBC = character building competence.

56

The ICC for satisfaction with the coach was .26, which indicated that 74% of the
variance in satisfaction with the coach was due to within-team differences. Fit indices for
Model 1 were 62 = 1552.27 and CAIC = 1574.38. The point reliability estimate for the
team-level satisfaction means was .86. The variance of the team-level means, 6230,
around the average team mean, 700, was statistically signiﬁcant, x231) = 226.17, p < .001
(see Table 9). Team-level means did not vary based on sport played. Thus, I concluded
that satisfaction with the coach had a signiﬁcant amount of variance both within teams
and between teams; and, that team-level satisfaction means were signiﬁcantly different
and should remain random in subsequent models.

Athlete-level MC was added as a within-team predictor in Model 2 (see Table 9).
This model ﬁt better than Model 1, 12mm = 315.83, p < .001, CAICdm= 293.72, and
explained 39% of the within-team variance in satisfaction with the coach. The average
inﬂuence of MC was moderately large and positive, 710 = .70, and statistically signiﬁcant,
tm) = 19.77, p < .001. The point reliability estimate for these slopes was .06. The variance
of the team-level effects, 0'23], around the average satisfaction-MC slope, 710, was not
statistically signiﬁcant, x2(31)= 35.33, p = .27. Thus, I concluded that the average
inﬂuence of MC on satisfaction with the coach was moderately large and positive, had a
somewhat low reliability, was similar within teams, and should be ﬁxed across teams in a
respeciﬁed model.

The inﬂuence of MC was ﬁxed across teams in a respeciﬁed version of Model 2
(see Table 9). Conceptually, this model assumed that the inﬂuence of MC on satisfaction
with the coach was similar within teams. The respeciﬁed model ﬁt at least as well as the

previous model, 3813(2) = 1.43, p = .49, CAICm = 13.31. The inﬂuence of MC on

57

satisfaction remained moderately large and positive (010 = .69) and statistically
signiﬁcant, «533) = 20.83, p = .001. Thus, I concluded that MC had a moderately large and
positive relationship with satisfaction with the coach across athletes.

Were Coaching Competency Estimates Positively Related to Satisfaction with the Coach
Between Teams?

First, it should be noted that the variance in the team satisfaction means, 0230,
around the average team satisfaction, 7%, shrank ﬁ'om .26 to .05 after controlling for the
inﬂuence of MC within teams (see Table 9). Practically, this means that there was much
less variance in team satisfaction to model once the inﬂuence of MC on satisfaction was
controlled for within teams. Still, because a statistically signiﬁcant, 1261) = 99.64, p <
.0005, amount of variance remained among team satisfaction after imposing Model 2,
team MC was added as a Level-2 predictor, Yor- This model did not ﬁt better than the
respeciﬁed version of Model 2, x213“) = 0.85, p < .36, CAICm = - 6.52, and did not
explain additional between-team variance in team satisfaction. Accordingly, the inﬂuence
of team MC on team satisfaction was negligible, 701 = -.05, and not statistically
signiﬁcant, «30) = -0.92, p = .36. Hence, the variance of the team satisfaction estimates,
6230, around the average team satisfaction, yoo, remained signiﬁcantly greater than zero,
x200) = 100.12, p = .01. Thus, I concluded that team MC was unrelated to team

satisfaction after controlling for the effect of MC on satisfaction within teams.

58

Table 11

Hierarchical Linear Models where Satisfaction was the Dependent Variable

 

 

 

 

 

 

 

 

 

 

Estimates of ﬁxed effects

Fixed effect y SE t df p
Model 1

Average satisfaction mean(Yoo) -0.02 0.10 -0. l 7 31 .87
Model 2

Average satisfaction-motivation slope(ylo) 0.70 0.03 21.24 31 < .01

Fixed satisfaction-motivation slope(y.o) 0.69 0.03 20.98 583 < .01
Model 3

Team satisfaction-team motivation SlOpC(‘ym) -0.05 0.05 -0.92 30 0.36

Estimates of variance components
Random effect a 62 df x2 p

Model 1

Within team residuals(rigs) 0.87 0.75

Between team residuals(uogs) 0.51 0.26 31 226.17 < .01
Model 2

Within team residuals(rigs) 0.68 0.46

Between team residuals(uogs) 0.23 0.05 31 99.63 < .01

Between team residuals(u1gs) 0.05 0.01 31 35.33 .27
Model 3

Within team residuals(rigs) 0.68 0.46

Between team residuals(uogs) 0.23 0.05 30 100.12 < .01

 

Note. a = standard deviation; 0'2 = variance.

59

 

CHAPTER 5
DISCUSSION

This study provides initial validity evidence for the CCS, and introduces MCFA
as an appropriate methodology to use when data are meaningfully nested and evaluation
of the factor structure of a set of indicators is desired. Results offer some support for the
proposed multilevel multidimensional conceptualization of coaching competency (see
Figure 3), the internal consistency of the coaching competency estimates, and a
relationship between motivation competency and satisfaction with the coach within
teams. Validity concerns are observed for the original rating scale structure and the
relationship between motivation competency and satisfaction with the coach between
teams. Results are interpreted to guide future research with the CCS, to provide
recommendations for revisions to the instrument, and to assist researchers in physical
education and kinesiology in understanding when and how to apply MCF A to their data.

Analysis of the original rating scale structure (i.e., 10 categories) indicated that
athletes were being asked to distinguish between too many levels of coaching
competency (see Table 1). This ﬁnding is similar to previous research on the CES
(Myers, Wolfe, eta1., in press) and congruent with long-standing recommendations for
Likert scales (Likert, 1932). However, an interesting difference between the Myers,
Wolfe, et a1. ﬁndings and this study, is that the previous study reported that coaches
employed only four categories to judge the strength of their own coaching efﬁcacy,
whereas in this study athletes appeared to utilize ﬁve categories to judge their coach’s
competency. This difference was probably due to the fact that only 5% of the coaches’

responses were in the 0-5 range, whereas 20% of the athletes’ responses were within this

60

range. Athletes may make ﬁner and more critical judgments of a coach’s competency
than coaches make regarding their own coaching efficacy, when responding to the CCS
and CBS items, respectively. However, to test this hypothesis, future research should
match coaches’ and athletes’ responses within a sample, unlike the comparison I offer
across two different samples.

While post-hoe analysis identiﬁed an improved S-category rating scale structure,
it is unknown whether the modiﬁed scale would prove optimal on a cross-validation
sample of intercollegiate soccer and ice hockey athletes or with high school athletes.
However, there is reason for conﬁdence in the potential utility of the modiﬁed scale, with
a similar sample, as Rasch-based optimal categorizations have been conﬁrmed in follow-
up applications (Zhu, 2002). Users of the CCS are encouraged to assess the utility of the
proposed 5-category structure (i.e., complete incompetence, low, moderate, high, and
complete competence). Although “low” may not be selected frequently, such a category
would likely attract at least a minimum number (10) of observations necessary for
minimal precision of threshold estimates (Linacre, 2002).

There is reason to believe that the CCS may adequately measure athletes’
perceptions of their head coach’s motivation, game strategy, technique, and character
building competencies in lower division intercollegiate soccer and ice hockey programs
(see Table 5 and Table 6). However, both within teams and between teams, there was
limited discriminant validity among subscales-particularly between GSC and TC, within
item multidimensionality for MC3, and extremely high standardized factor loadings at the
between teams level. Allowing MC3 (mentally prepare athletes for game strategies), to

load on both MC (athletes ’ evaluations of their head coach ’s ability to aﬂect the

61

psychological mood and skills of athletes) and GSC (athletes ’ evaluations of their head
coach’s ability to lead during competition) makes sense conceptually. Because I judge
the content of MC3 to be sufficiently speciﬁed and relevant to both MC and GSC,
revision is not suggested. Therefore, users of the CCS are encouraged to impose internal
models that allow responses to MC3 to inﬂuence scores on both MC and GSC.

Distinguishing between competence in leading during competition (GSC) and
instructional and diagnostic competence (TC) makes sense conceptually. Empirically,
evidence exists for the differential diagnostic ability of game strategy eﬁ‘icacy and
technique eﬂicacy in both high school and college coaches (F eltz et al., 1999; Myers,
Vargas-Tonsing, et al., in press). Thus, I propose reﬁning the deﬁnitions of GSC and TC
and modifying items to lessen the overlap among the subscales in a revised version of the
CCS. For example, altering the deﬁnition of TC to focus on evaluations of one’s
instructional and diagnostic skills during practice, may help to distinguish this
competency from competency to lead during competition. However, until a revised CCS
is available, the proposed internal model should be utilized to produce multidimensional
measures of coaching competency from the existing items. But, because subscale scores
are likely to be highly related and cause problems associated with multicollinearity when
used as a set of independent variables, theory and bivariate correlations between the
dependant variable(s) and competency scores should be. explored to determine which
competency measure should be used.

The extremely high magnitude of most of the standardized factor loadings at the
between-teams level (see Table 6) was noted and was deemed to be similar to parallel

loadings in previous research (Kaplan & Kreisman, 2000; Muthén, 1994). Possible

62

explanations for the observation of such loadings across studies: (1) overparameterization
of the model (i.e., a possible artifact of MCF A modeling), (2) nearly perfect indicators of
the latent variables at the between level, were posed in personal communications to
experts in MCFA methodology. J. J. Hox (December 3, 2004) noted that he had also
observed this trend and that he believe that possible explanations could include: (1)
overparameterization; (2) that error on the between-level tends to be averaged out by the
methodology; (3) that because the group-level variance is smaller, a higher percentage is
able to be explained. D. Kaplan (December 3, 2004) noted that he too had observed this
trend and that he was unsatisﬁed with the explanations to date. He proposed, as a remedy,
modeling the factor structure of only Spw (i.e., stopping at Step 3) if one is not planning
to model the between group variation as a ftmction of between group variables. In this
study, however, determining a group-level factor structure for coaching competency
estimates was important in order to guide formation of a team-level predictor of
satisfaction with the coach (i.e., test aspects of the external model). However, conﬁdence
in the proposed between factor structure should be tempered pending additional,
conﬁrmatory research.

In terms of evidence relating to the external aspect of validity, I found that
athletes’ evaluations of their head coach’s ability to affect the psychological mood and
skills of athletes has a moderately large and positive relationship with satisfaction with
the coach within teams. But, no such relationship is observed between teams aﬁer
modeling the said relationship within teams (see Table 9). It is interesting to note, that if
this relationship was modeled between teams without modeling the said relationship

within teams, team MC is a signiﬁcant predictor of team satisfaction. But, because the

63

majority of the variance (74%) in satisfaction is within teams and because Raudenbush &
Bryk (2002) advocate for settling on a Level-1 model before specifying a Level-2 model
(i.e., model-building from the bottom up), I conclude that team MC is not related to team
satisfaction between teams. Methodologically, this ﬁnding demonstrates the need for
examining multiple levels of variance simultaneously in order to reach more accurate
conclusions and improve theory development (Silverman, 2004). Methodological
considerations aside, it appears that athletes’ satisfaction with the coach may frequently
be linked to the coach’s ability to attend to the athletes’ psychological needs.
Unfortunately, although most coaches are well equipped in the technical aspects of their
sport, they rarely receive formal training in creating a healthy psychological environment
(Smoll & Smith, 2001).

Because the usefulness of the CCS is at least partially dependent on the ability of
resultant measures to demonstrate relationships speciﬁed in models of coaching
effectiveness (Horn, 2002), more research is needed in this area. Speciﬁcally, studies that
investigate how athletes’ personal characteristics inﬂuence athletes’ perceptions of their
coach’s competency, and/or how a coach’s behavior inﬂuences athletes’ perceptions of
their coach’s competency, and/or how athletes’ competency judgments of their coach
affects athletes’ self-perceptions and beliefs, could advance understanding in coaching
effectiveness and extend validity evidence for the CCS. Also, the ability of coaching
education programs to alter athletes’ perceptions of their coach’s competency would be a
very practical way to asses an important aspect of coaching education programs. This
latter area, assessing coaching education programs, is becoming increasingly important in

the current evidence-based educational system (Smoll & Smith, 2001 ).

64

Because the effects that a coach’s behaviors exert on athletes are likely mediated
by the meaning that athletes attribute to them (Horn, 2002; Smoll & Smith, 1989), the
CC S has the potential to contribute to the improvement of coaching and the further
development of coaching effectiveness models. Although the CCS has this potential, it
should not be viewed as a competitor to instruments that assess other aspects of coaching.
Rather, the CCS should be viewed as an additional tool that measures key constructs that
are not fully covered by existing instruments. For example, if one were interested in
examining relationships between observed coaching behavior and athletes’ perceptions of
their coach’s negative activation, then using the CBAS and the CBQ would be
appropriate. However, if this person was also interested in how negative activation was
related to athletes’ perceptions of their coach’s motivation competency then
administering the CCS would also be appropriate. In short, a single instrument cannot
fully measure the wide range of constructs involved in effective coaching. Because
effective coaching is complex, a variety of well-deﬁned instruments is necessary to the
scientiﬁc investigation of sufﬁciently targeted inquiries.

An introduction to MCF A was detailed earlier and encouragement toward
appropriate future implementations is provided here. MCFA should be considered when
subjects are meaningfully nested within groups and evaluation of the factor structure of a
set of indicators is desired (Muthén, 1994). It is not uncommon in physical education,
sport, and exercise contexts to collect data from subjects who are meaningfully nested
within groups and to be interested in determining the factor structure of a set of
indicators. A few examples include when data are collected within classes, teams, or

exercise groups, and indicators are intended to measure subjects’ perceptions and/or

65

performance (e.g., academic performance, collective efficacy, exercise behavior). In such
cases, researchers should determine if MCF A is appropriate as detailed in the
Introduction to MCFA section. If MCFA is appropriate and is applied, improved model-
data ﬁt will likely be observed, as compared to model-data ﬁt on the total variance-
covariance matrix, because levels of variance will be separated (Julian, 2001). Separating
substantive levels of variance contributes to developing more accurate practical
recommendations and to improved theory development (Silverman, 2004; Silverman &
Solomon, 1998). I hope that the introduction will assist researchers in physical education

and kinesiology in understanding when and how to apply MCFA to their data.

66

APPENDICES

67

APPENDIX A
How competent is your head coach in his or her ability t «-
1. help athletes maintain conﬁdence in themselves? (MCl)
2. recognize opposing team's strengths during competition? (GSC2)
3. mentally prepare his/her athletes for game strategies? (MC3)
4. understand competitive strategies? (GSC4)
5. instill an attitude of good moral character? (CBC5)
6. build the self-esteem of his/her athletes? (MC6)
7. demonstrate the skills of your sport? (TC7)
8. adapt to different game situations? (GSC8)
9. recognize opposing team's weakness during competition? (GSC9)
10. motivate his/her athletes? (MCIO)
11. make critical decisions during competition? (GSCl 1)
12. build team cohesion? (MC12)
l3. instill an attitude of fair play among his/her athletes? (CBC l 3)
14. coach individual athletes on technique? (TCl4)
15. build the self-conﬁdence of his/her athletes? (MC15)
16. develop athletes' abilities? (TC 1 6)
17. maximize his/her team's strengths during competition? (GSC17)
18. recognize talent in athletes? (TC18)
19. promote good sportsmanship? (CBC19)
20. detect skill errors? (TC20)
21. adjust his/her game strategy to ﬁt his/her team's talent? (GSC21)
22. teach the skills of his/her sport? (TC22)
23. build team conﬁdence? (MC23)
24. instill an attitude of respect for others? (CBC24)

68

REFERENCES

American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education. (1999). Standards for
Educational and Psychological Testing. Washington, DC: American Educational
Research Association.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika,
43, 561-573.

Bentler, P. M. (1990). Comparative ﬁt indexes in structural models. Psychological
Bulletin, 107, 238-246.

Bentler, P. M. (2004). EQS (Version 6.1) [computer program]. Encino, CA: Multivariate
Software, Inc.

Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The
general theory and its analytical extensions. Psychometrika, 52, 345-370.

Browne, M. W., & Cudek, R. (1992). Alternative ways of assessing model ﬁt.
Sociological Methods & Research, 21, 230-258.

Chelladurai, P. (1978). A contingency model of leadership in athletics. Unpublished
doctoral dissertation. University of Waterloo, Canada.

Chelladurai, P., & Arnott, M. (1985). Decision styles in coaching: Preferences of
basketball players. Research Quarterly for Exercise and Sport, 56, 15-24.

Chelladurai, P., & Riemer, H.A. (1997). A classiﬁcation of facets of athlete satisfaction.
Journal of Sport Management, 11, 133-159.

Chelladurai, P., & Saleh, S. D. (1978). Preferred leadership in sport. Canadian Journal of
Applied Sport Sciences, 3, 85-92.

Chelladurai, P., & Saleh, S. D. (1980). Dimensions of leader behavior in sports:
Development of a leadership scale. Journal of Sport Psychology, 2, 34-45.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Erlbaum.

Cronbach, L. J. (1951). Coefﬁcient alpha and the internal structure of tests.
Psychometrika, 16, 297-334.

69

Cronbach, L. J. (1976). Research in classrooms and schools: formulations of questions,
designs, and analysis. Occasional paper: Stanford Evaluation Consortium.

Feltz, D. L., Chase, M. A., Moritz, S. E., & Sullivan, P. J. (1999). A conceptual model of
coaching efficacy: Preliminary investigation and instrument development. Journal
of Educational Psychology, 91, 765-776.

Gould, D. (1987). Your role as a youth sport coach. In V. Seefeldt (Ed.), Handbook for
youth sport coaches (pp. 17-32). Reston, VA: American Alliance for Health,
Physical Education, Recreation, and Dance.

Hamqvist, K. (1978). Primary mental abilities of collective and individual levels. Journal
of Educational Psychology, 70, 706-716.

Horn, T. S. (2002). Coaching effectiveness in the sports domain. In T.S. Horn (Ed.),
Advances in Sport Psychology (pp. 309-3 54). Champaign, IL: Human Kinetics.

Hox, J. J. (1993). Factor analysis of multilevel data: Gauging the Muthén model. In J. H.
L. Oud & R. A. W. van Blokland-Vogelesang (Eds), Advances in Longitudinal
and Multivariate Analysis in the Behavioral Sciences (pp. 141-156). Nijmegen,
Netherlands: ITS.

Hox, J. J. (2002). Multilevel factor models. In G.A. Marcoulides (Ed.), Multilevel
Analysis (pp. 225-250). Mahwah, NJ: Lawrence Erlbaum.

Hox, J. J ., & Maas, C. J. M. (2001). The accuracy of multilevel structural equation
modeling with pseudobalanced groups and small samples. Structural Equation
Modeling, 8, 157-174.

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for ﬁt indexes in covariance structure
analysis: Conventional criteria versus new alternatives. Structural Equation
Modeling, 6, 1-55.

J amshidian, M. & Bentler, P. M. (1999). ML estimation of mean and covariance
structures with missing data using complete data routines. Journal of Educational
and Behavioral Statistics, 24, 21 -41.

Julian, M. W. (2001). The consequences of ignoring multilevel data structures in
nonhierarchical covariance modeling. Structural Equation Modeling, 8, 325-352.

Kaplan, D. (2000). Multilevel structural equation modeling. Structural Equation
Modeling: Foundations and Extensions (pp. 130-148). Thousand Oaks, CA: Sage.

Kaplan, D. & Kreisman, M. (2000). On the validation of indicators of mathematics

70

education using TIMMS: An application of multilevel covariance structure -
modeling. International Journal of Educational Policy, Research and Practice, 1,
2 1 7-242.

Kenow, L. J. & Williams, J. M. (1992). Relationship between anxiety, self-conﬁdence,
and evaluation of coaching behaviors. The Sport Psychologist, 6, 344-357.

Kline, RB. (1998). Principles and practice of structural equation modeling. New York:
The Guilford Press.

Lee, K. S., Malete, L., & Feltz, D. L. (2002). The effect of a coaching education program
on coaching efficacy. International Journal of Applied Sport Sciences, 14, 55-67.

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology,
140, 1-55.

Linacre, J. M. (1995). Categorical misﬁt statistics. Rasch Measurement Transactions, 9, 450-
451.

Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied
Measurement, 3, 85-106.

MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modiﬁcations in

covariance structure analysis: The problem of capitalization on chance.
Psychological Bulletin, 11 I , 490-504.

Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications.
Biometrika, 57, 519-530.

McCullagh, P. & Nelder J. A. (1990). Generalized linear models (2"d ed.). Boca Raton,
FL: CRC Press.

McDonald, R. P. (1985). Factor analysis and related methods. Mahwah, NJ, Lawrence
Erlbaum Associates.

McDonald, R. P. (1994). The bilevel reticular action model for path analysis with latent
variables. Sociological Methods & Research, 22, 399-413.

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement (3rd ed., pp.
13-103). New York: Macmillan.

Muthen, B. O. (1989). Latent variable modeling in heterogeneous populations:
Presidential address to the psychometric society. Psychometrika, 54, 557-585.

Muthen, B. O. (1990). Mean and covariance structure analysis of hierarchical data.
Paper presented at the annual meeting of the Psychometric Society, Princeton, NJ.

71

Muthen, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods &
Research, 22, 376-398.

Muthén, B. O. (1997). Latent variable modeling of longitudinal and multilevel data In A.
E. Raftery (Ed.) Sociological methodology 1997 (pp. 453-481). Washington, DC:
American Sociological Association.

Muthén, B. O., & Kaplan, D. (1985). A comparison of some methodologies for the factor
analysis of nonnormal Likert variables. British Journal of Mathematical and
Statistical Psychology, 38, 171-189.

Muthen, B. O., & Satorra, A. (1989). Multilevel analysis of varying parameters in
structural models. In R. D. Bock (Ed.), Multilevel Analysis of Educational Data
(pp. 87-99). San Diego, CA: Academic Press.

Myers, N. D., Vargas-Tonsing, T. M., & Feltz, D. L. (in press). Coaching efficacy in
intercollegiate coaches: Sources, coaching behavior, and team variables.
Psychology of Sport & Exercise.

Myers, N. D., Wolfe, E. W., & F eltz, D. L. (in press). An evaluation of the psychometric
properties of the coaching efficacy scale for American coaches. Measurement in
Physical Education and Exercise Science.

National Association for Sport and Physical Education. (1995). Quality coaches, quality
sports: National standards for athletic coaches. Dubuque, IA: Kendall/Hunt.

Park, J. K. (1992). Construction of the Coaching Confidence Scale. Unpublished doctoral
dissertation. Michigan State University, East Lansing.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and
data analysis methods. Newbury Park, CA: Sage.

Raudenbush, S. W., & Bryk, A. S., Cheong, Y. F., & Congdon, R. (2000). HLM5:
Hierarchical and linear modeling. Homewood, IL: Scientiﬁc Software
International.

Rushall, B. S., & Wiznuk, K. (1985). Athletes’ assessment of the coach: The coach
evaluation questionnaire. Canadian Journal of Applied Sport Sciences, 10, 157-
161.

Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in the

analysis of linear relations. Computational Statistics & Data Analysis, 10, 23 5-
249.

Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors on

covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent
Variable Analysis (pp. 399-419). Thousand Oaks, CA: Sage.

72

Silverman, S. (2004). Analyzing data from ﬁeld research: The unit of analysis issue.
Research Quarterly for Exercise and Sport, 75, iii-iv.

Silverman, S., & Solmon, M. (1998). The unit of analysis in ﬁeld research: Issues and
approaches to design and data analysis. Journal of Teaching in Physical
Education, 1 7, 270-284.

Silvey, S. D. (1959). The Lagrangian multiplier test. Annals of Mathematical Statistics,
30, 389-407.

Smith, R. E., Smoll, F. L., & Curtis, B. (1978). Coaching behaviors in little league
baseball. In F. L. Smoll & R. E. Smith (Eds.), Psychological perspectives in youth
sports (pp. 173-201). Washington, DC: Hemisphere.

Smith, R. E., Smoll, F. L., & Hunt, E. B. (1977). A system for the behavioral assessment
of athletic coaches. Research Quarterly, 48, 401-407.

 

Smoll, F. L., & Smith, R. E. (1989). Leadership behaviors in sport: A theoretical model
and research paradigm. Journal of Applied Social Psychology, 19, 1522-1551.

Smoll, F. L. & Smith, R. E. (2001). Conducting sport psychology training programs for
coaches: Cognitive-behavioral principles and techniques. In J .M. Williams (Ed.),
Applied Sport Psychology (pp. 378-400).

Sullivan, P. J ., & Kent, A. (2003). Coaching efﬁcacy as a predictor of leadership style in
intercollegiate athletics. Journal of Applied Sport Psychology, 1 5, 1-11.

Tabachnick, B. G., & Fidell, L. S. (2001). Cleaning up your act. In Using multivariate
statistics (4th ed., pp. 56—110). Boston: Allyn & Bacon.

Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood
factor analysis. Psychometrika, 38, 1-10.

Vargas-Tonsing, T. M., Warners, A. L., & Feltz, D. L. (2003). The predictability of
coaching efficacy on team efﬁcacy and player efficacy in volleyball. Journal of
Sport Behavior, 26, 396-407. .

Wald, A. (1943). Tests of statistical hypothesis concerning several parameters when the
number of observations is large. Transactions of the American Mathematical
Society, 54, 426-482.

Wicherts, J. M., & Dolan, C. V. (2004). A cautionary note on the use of information ﬁt

indices in covariance structure modeling with means. Structural Equation
Modeling, 1 I , 45-50.

73

Williams, J. M., Jerome, G. J ., Kenow, L. J ., Rogers, T., Sartain, T. A., & Darland, G.
(2003). Factor structure of the coaching behavior questionnaire and its
relationship to athlete variables. The Sport Psychologist, I 7, 16-34.

Wright, B. D., & Linacre, J. M. (1992). Combining and splitting categories. Rasch
Measurement Transactions, 6, 233-235.

Wright, B. D., & Linacre, J. M. (1998). Winsteps: A Rasch model computer program.
Chicago: MESA Press.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement.
Chicago: Mesa Press.

Wu, M. L., Adams, R. J ., & Wilson, M. R. (1998). ACER ConQuest: Generalized item
response modeling soﬁware (Version 1.0) [computer program]. Melbourne,
Victoria, Australia: Australian Council for Educational Research.

Zhu, W. (2002). A conﬁrmatory study of Rasch-based optimal categorization of a rating
scale. Journal of Applied Measurement, 3(1), 1-15.

Zhu, W., Updyke, W. F., & Lewandowski, C. (1997). Post-hoe Rasch analysis of optimal

categorization of an ordered-response scale. Journal of Outcome Measurement, 1,
286-304.

74

FOOTNOTES

' The term coach’s behavior is used to be consistent with Hom’s (2002) model. It
is noted that no instrument can fully and completely represent the wide range of

behaviors involved in effective coaching.

2 Analyses for questions 4 and 5 in this study provide examples of multilevel

extensions of the conventional multiple regression model.

3 In this study only two levels of variance, within teams and between teams, was

considered.

4 It is noted that multidimensional measures of athlete satisfaction have been
suggested (Chelladurai & Reimer, 1997). Such measures are suggested to be used as
indicators of overall organizational effectiveness. In this study, because satisfaction with
the coach, not the entire organization was of interest, our measure was considered
appropriately speciﬁc.

5 To the authors’ knowledge, IRT software programs, including Winsteps, are
currently limited to evaluating rating scale structures for unidimensional models.

6 Person and item separation indices are expected to decrease somewhat when
categories are collapsed because, in general, the more categories there are the better the
discrimination (Zhu et al., 1997).

7 Due the complexity of MU ML estimation in unbalanced groups, it is not
uncommon for programs to require starting values to converge to a stable solution (Hox,

2002).

75

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

1|11111111113111!111171111111111111111