(‘1‘

"3::

If:

 

 

' 232‘-
in;

. --.

 

r kn '

.
ﬁ.
.. .
m- ‘

"‘58" lllllllllllllllllllllllllllHHllllHlHllllllllllllllll

.7 31293 01565 5024

LIBRARY

Michigan State
University

 

 

 

 

This is to certify that the

dissertation entitled

Evaluatinq Content Validity

in Cross-National Achievement Tests
presented by

Pamela Marie Jakwerth

has been accepted towards fulﬁllment
of the requirements for

Measurement.& Quantitative

Ph D -
' d
' 63m m Methads

 

950

Major professor

 

Date /{ ﬁgléﬁ/ Cf”

MS U is an Aﬂ'trmatt'w Action/Equal Opportunity Institution 0-12771

PLACE N RETURN BOX to remove thle checkout from your record.
TO AVOID FINES return on or before date due. ‘

DATE DUE DATE DUE DATE DUE

    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU le An Afﬁrmative Action/Equal Opportunlty lnetltulon
Want

EVALUATING CONTENT VALIDITY IN CROSS-NATIONAL
ACHIEVEMENT TESTS

By

Pamela M. J akwerth

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational Psychology, and Special Education

1996

ABSTRACT

The main purpose of this study was to use the results of an extensive
multi-national curriculum analysis to analyze the content of a cross-national
mathematics achievement test. A second purpose was to determine the impact on
national scores and ranks that would result from altering test content to improve
curricular match. The ultimate goal was to use this information to enhance the
validity of cross-national comparisons of student achievement.

I compared data on the mathematics curriculum of 17 nations to the
content of the TIMSS mathematics ﬁeld trial instrument for 13 year old students.
Three different data sources from the curriculum analysis component of the Third
International Mathematics and Science Study (TIMSS) were used to describe the
intended mathematics curriculum of each country. I also used the curriculum data
to develop several sets of test speciﬁcations based on different methods of
summarizing the mathematics curricula in the 17 countries. Using the country
performance data from the ﬁeld trial, I calculated country mean scores on each of
the speciﬁed “tests.” I then ranked each country on each test and compared the
country scores and ranks across the different tests

The content of the mathematics curriculum varied across and within the 17
nations involved in this study. Consequently, the content of the ﬁeld-trial
instrument matched the content of the curriculum of some of the countries better

than others. This variation in curriculum and differential match has implications

for the validity of inferences made from the test, but a ﬁnal conclusion of test
validity will depend on the purpose for which the test will be used.

Variation in county scores and ranks on the different tests I developed was
minimal; however, some isolated variations did exist. Patterns suggest that, at the
total score level, the impact of test-curriculum mis—match is likely to be minimal.
However, the presence of variation in performance across topics and performance
expectations indicate that total scores may be reﬂecting a general math ability,
rather than achievement of a particular curriculmn. The implication is that the
concept of test-curriculum match is more complex than merely matching on topic

coverage.

for my mother.

iv

ACKNOWLEDGMENTS

I can’t believe I am ﬁnally writing this. So many people have helped to make this
accomplishment possible. First, I need to thank my advisor Dr. Betsy Becker and my
dissertation supervisor Dr. William Schmidt. Dr. Becker has worked with me for over
eight years. She has read, edited, and re-read countless pieces of work, and is responsible
for many of my skills as a writer and researcher. Dr. Schmidt employed me - twice - and
supplied the topic of this dissertation. He was the person that ﬁnally pushed me enough
to get this thing ﬁnished. I appreciate all the input and time from the rest of my
committee, Dr. Bill Mehrens, Dr. Richard Houang, and Dr. Sandra Wilcox. Each has
contributed a different perspective to this work, and I thank them for enlightening me. I
also have to thank Bill Frey for refusing to let me give up. He has served many roles in
my life over the past eight years - from surrogate parent to mentor - and has taught me so
much. Thanks to my family - who can ﬁnally stop asking when I will ﬁnish school, my
friends - who have been absolutely encouraging, and my coworkers past and present -
who bore the brunt of a lot of my stress. I am so grateful you all have stood by me.
Finally, I would not be here today if it were not for my mother. Over eight years ago, she
challenged me into pursuing this degree. She was my sounding-board for many years. I
only can hope that she somehow knows what I have accomplished and is proud of what I

have become. I never imagined what a struggle it would be to get here. Thank you all.

TABLE OF CONTENTS

LIST OF TABLES ............................................................................................................ vii
LIST OF FIGURES .......................................................................................................... viii
CHAPTER I: Introduction and Study Focus
Introduction .......................................................................................................................... 1
Statement of the Problem ..................................................................................................... 4
Difficulties in Domain Identiﬁcation and Speciﬁcation ................................................. 4
Constraints on Item Development ................................................................................... 6
Purpose ................................................................................................................................. 7
CHAPTER II: Review of Related Literature
Comparative-Achievement Studies - Growth, Rationale, and Impact ............................... 10
The Growth of Comparative-Achievement Studies ...................................................... 10
The Rationale for Comparative-Achievement Studies ................................................. 12
The Impact of Comparative-Achievement Studies ....................................................... 13
Validity and Comparative-Achievement Studies ............................................................... l4
Accusations of Invalidity .............................................................................................. 14
A Deﬁnition of Validity ................................................................................................ 16
Domain Speciﬁcation in Comparative-Achievement Tests .......................................... 19
Evaluating Content Validity .......................................................................................... 26
The Impact of Low Content Validity ............................................................................ 29
Recent Advances ................................................................................................................ 31

CHAPTER 111: Study Design and Procedures

Purpose and Questions ....................................................................................................... 32
The Third International Mathematics and Science Study .................................................. 33
Study Population ................................................................................................................ 34
Instrumentation .................................................................................................................. 37
Curriculum Frameworks for Mathematics .................................................................... 37
Field-Trial Instrument ................................................................................................... 38
Data Sources ...................................................................................................................... 41
Expert Topic Mapping .................................................................................................. 41
Curriculum-Guide Analyses ......................................................................................... 42
Textbook Analyses ........................................................................................................ 43
Data Analyses .................................................................................................................... 44
Compare Curriculum Sources and Compare Match to Field-Trial Instrument ............. 46
Write Test Blueprints and Calculate Match between Blueprints and Field-Trial
Instrument ..................................................................................................................... 48
Write Test Blueprints to Improve Match with Field-Trial Instrument and Calculate
Match to Curricula ........................................................................................................ 48

vi

Evaluate Country Performance across the New Tests .................................................. 49

CHAPTER IV: Results

Curriculum Comparisons ................................................................................................... 51
Description of the Mathematics Curriculum ................................................................. 51
Analyses of Match between the Field-Trial Instrument and the Curricula ................... 66

Development of Test Blueprints ........................................................................................ 86
Determine the Purpose of the Test ................................................................................ 86
Determine Topic Inclusion ............................................................................................ 87
Determine Topic Emphasis ........................................................................................... 89
Comparisons between Field-Trial Instrument and Test Blueprints .............................. 91

Re-Speciﬁcation of Test Blueprints ................................................................................. 100

Comparison of the Curriculum to Unique Specially-Constructed-Test Blueprints ....102
Comparisons of the Curriculum to Inclusive Specially-Constructed-Test

Blueprints .................................................................................................................... 113
Variations in Performance across Specially-Constructed Tests ....................................... 126
Scores and Ranks ........................................................................................................ 126
Performance Differences ............................................................................................. 133
Variations in Topic Performance ................................................................................ 140
Performance Expectations .......................................................................................... 147

CHAPTER V: Discussion, Summary, and Recommendations

How Much Variation Exists in Curricular Content? ....................................................... 159
Variation in Coverage of Topics within Each Data Source ........................................ 160
Variation in Coverage for Countries within Each Data Source .................................. 161
Potential Explanations of Variation ............................................................................ 162

How Well Does the Content of the Field-Trial Instrument Match the Content of the

Curriculum-Data Sources? ............................................................................................... 164
Topics .......................................................................................................................... 164
Data Sources ............................................................................................................... 165
Countries ..................................................................................................................... 167
Conclusions about Test-to-Curriculum Match ............................................................ 167

How Does the Content of the Test Blueprints Compare with the Content of the Field-

Trial Instrument? .............................................................................................................. 168
Focus of the Test Blueprints ....................................................................................... 168
Variation in Correlations between the Test Blueprints and the Field-Trial
Instrument ................................................................................................................... 169

How Well Does the Content of the Specially-Constructed-Test Blueprints Match the

Content of the Curriculum-Data Sources ......................................................................... 171

How Does Country Performance Vary? ........................................................................... 174
Differences in Total Scores and Ranks ....................................................................... 175
Differences in Topic Scores and Ranks ...................................................................... 175
Differences in Performance-Expectation Results ....................................................... 176
Within-Country Variation ........................................................................................... 177

Summary .......................................................................................................................... 182

vii

Limitations ....................................................................................................................... 1 85

Recommendations and Conclusion .................................................................................. 186
Appendix A: Mathematics-Curriculum-Framework Categories ..................................... 189
Appendix B: TIMSS Field-Trial Instrument Content Coverage ..................................... 192
Appendix C: Curriculum Data for Each Country and Bach Data Source ....................... 194
Appendix D: Scores and Ranks of Specially-Constructed Tests .................................... 204
References ........................................................................................................................ 2 l 0

viii

LIST OF TABLES

Table 1: Country Sample Sizes for the Combined Upper and Lower Grades of Each Country .. 36
Table 2: Summary of Expert-Topic-Mapping Proportions for Each Math Topic Across all 17
Countries ........................................................................................................................................ 52
Table 3: Summary of Expert-Topic-Mapping Proportions for Each Country across Topics
....................................................................................................................................................... 55
Table 4: Summary of Curriculum-Guide-Topic Proportions for Each Topic across Countries... 57
Table 5: Summary of Curriculum-Guide-Topic Proportions for Each Country across Topics... 58

Table 6: Summary of Textbook Proportions for Each Topic across Countries ............................ 60
Table 7: Summary of Textbook Proportions for Each Country across Topics ............................. 62
Table 8: Agreement of Topic Inclusion across Expert-Mapping, Curriculum-Guide-, and
Textbook-Data Sources Presented for Topics Across Countries ................................................... 64
Table 9: Agreement of Topic Inclusion across Expert-Mapping, Curriculum-Guide-, and
Textbook-Data Sources Presented for Countries Across Topics ................................................... 67
Table 10: Document and Field-Trial Proportion Comparisons .................................................... 69
Table 1 l: Proportions of Items in the F ield-Trial Instrument that are in Each Country’s Curricula
and Proportions of Each Country’s Curricula Tested on the Field-Trial Instrument .................... 71
Table 12: Differences in Topic Inclusion between the F ield-Trial Instrument and Bach
Curriculum Source for Each Topic ................................................................................................ 73
Table 13: Differences in Topic Inclusion between the Field-Trial Instrument and Bach
Curriculum Source for Each Country ............................................................................................ 76
Table 14: Differences in Topic Emphasis between the Field-Trial Instrument and Each
Curriculum Source for Each Topic ................................................................................................ 78
Table 15: Differences in Topic Emphasis between the Field-Trial Instrument and Bach
Curriculum Source for Each Country ............................................................................................ 81

Table 16: Correlations between the Proportions-of-Topic-Emphasis Proﬁles for Each Country in
Each Curriculum-Data Source and the Topic-Weight Proﬁle for the Field-Trial Instrument ....... 83
Table 17: Euclidean Distances between the Proportions-of-Topic-Emphasis Proﬁles for Each
Country in Each Curriculum-Data Source and the Topic-Weight Proﬁle for the F ield-Trial

Instrument ...................................................................................................................................... 84
Table 18: Items Included on Test Blueprints ................................................................................ 88
Table 19: Topic Weights on Test Blueprints ................................................................................ 90
Table 20: Test-Blueprint Codes .................................................................................................... 92
Table 21: Proportions of Field-Trial Items on Each Test Blueprint and Proportions of Items on
Each Test Blueprint Tested on F ield-Trial Instrument .................................................................. 94
Table 22: Differences in Topic Inclusion between the F ield-Trial Instrument and Bach Test
Blueprint ........................................................................................................................................ 95
Table 23: Differences in Topic Emphasis between the Field-Trial Instrument and Bach Test
Blueprint ........................................................................................................................................ 97
Table 24: Correlations and Euclidean Distances between the Topic-Weight Proﬁles for Each
Test Blueprint and the Topic-Weight Proﬁle for the Field-Trial Instrument ................................ 99
Table 25: Topic Weights on Specially-Constructed-Test Blueprints ......................................... 101
Table 26: Test-Blueprint Codes for Specially-Constructed-Test Blueprints .............................. 103
Table 27: Numbers and Proportions of Countries Including Topics in Curriculum Sources that
are not on Corresponding Unique-Test Blueprints ...................................................................... 105

ix

Table 28: Numbers and Proportions of Topics in Curriculum Sources that are Included on

Corresponding Unique-Test Blueprints ....................................................................................... 106
Table 29: Differences in Topic Emphasis for Each Topic across Countries on Unique-Test
Blueprints and Corresponding Curriculum Sources .................................................................... 108
Table 30: Differences in Topic Emphasis for Each Country across Topics on Unique—Test
Blueprints and Corresponding Curriculum Sources .................................................................... 1 10

Table 31: Correlations and Euclidean Distances between the Proportions-of-Topic-Emphasis
Proﬁles for Each Country in Each Curriculum-Data Source and the Topic-Weight Proﬁles for

Each Corresponding Unique-Test Blueprint ............................................................................... 1 12
Table 32: Proportions of items on Inclusive-Test Blueprints in Each Corresponding Curriculum
Source .......................................................................................................................................... 114
Table 33: Proportions of Each Country’s Curriculum Tested on Corresponding Inclusive-Test
Blueprint ...................................................................................................................................... 115
Table 34: Differences in Topic Inclusion between Each Inclusive-Test Blueprint and Bach
Corresponding Curriculum Source for Each Topic ..................................................................... 1 17
Table 35: Differences in Topic Inclusion between Each Inclusive-Test Blueprint and Each
Corresponding Curriculum Source for Each Country ................................................................. l 18
Table 36: Differences in Topic Emphasis between Each Inclusive-Test Blueprint and Each
Corresponding Curriculum Source for Each Topic ..................................................................... 120
Table 37: Differences in Topic Emphasis between Each Inclusive-Test Blueprint and Each
Corresponding Curriculum Source for Each Country ................................................................. 123

Table 38: Correlations between the Proportions-of—Topic-Emphasis Proﬁles for Each Country in
Each Curriculum-Data Source and the Topic—Weight Proﬁles for Each Corresponding Inclusive-
Test Blueprint .............................................................................................................................. 125
Table 39: Euclidean Distances between the Proportions-of-Topic-Emphasis Proﬁles for Each
Country in Each Curriculum-Data Source and the Topic-Weight Proﬁles for Each Corresponding
Inclusive-Test Blueprint .............................................................................................................. 127
Table 40: Differences in Euclidean Distances between the Proportions-of-Topic-Emphasis
Proﬁles for Each Country in Each Curriculum-Data Source and the Topic-Weight Proﬁles for

Each Corresponding Inclusive-Test Blueprint ............................................................................. 128
Table 41: Summary of Country Scores on Field-Trial Instrument and across Specially-
Constructed Tests ......................................................................................................................... 130
Table 42: Summary of Country Ranks on F ield-Trial Instrument and across Specially-
Constructed Tests ......................................................................................................................... 132
Table 43: Correlations between Country Scores on the Field-Trial Instrument and Scores on
Each Specially-Constructed Test ................................................................................................. 134
Table 44: Correlations between Country Ranks on the Field-Trial Instrument and Ranks on Each
Specially-Constructed Test .......................................................................................................... 135
Table 45: Summary of Differences in Scores on the F ieId-Trial Instrument and Scores on Each
Specially-Constructed Test ......................................................................................................... 136
Table 46: Summary of Differences in Scores on the Field-Trial Instrument and Scores on Each
Specially-Constructed Test for Each Country ............................................................................. 137
Table 47: Summary of Differences in Ranks on the F ield-Trial Instrument and Ranks on Each
Specially-Constructed Test for Each Test ................................................................................... 139
Table 48: Summary of Differences in Ranks on the Field-Trial Instrument and Ranks on Each
Specially-Constructed Test for Each Country ............................................................................. 141
Table 49: Topic Scores for Each Country .................................................................................. 142
Table 50: Country Ranks on Each Topic .................................................................................... 144
Table 51: Summary of Differences in Scores on the Field-Trial Instrument and Scores on Each
Topic for Each Topic ................................................................................................................... 145

Table 52: Summary of Differences in Scores on the Field-Trial Instrument and Scores on Each

Topic for Each Country ............................................................................................................... 146
Table 53: Summary of Differences in Ranks on the F ield-Trial Instrument and Ranks on Each
Topic for Each Topic ................................................................................................................... 148
Table 54: Summary of Differences in Ranks on the Field-Trial Instrument and Ranks on Each
Topic for Each Country ............................................................................................................... 149
Table 55: Within-Country Ranks of Topic Scores ..................................................................... 150
Table 56: Proportions of Textbook Blocks Allocated to Each Performance Expectation by Each
Country ........................................................................................................................................ 151
Table 57: Performance-Expectation Scores ................................................................................ 153
Table 58: Performance-Expectation Ranks ................................................................................ 154
Table 59: Performance-Expectation Category Scores ................................................................ 155
Table 60: Performance-Expectation Category Ranks ................................................................. 156
Table 61: Country Performance on Unique Tests based on Performance Expectations and Topics
crossed with Performance Expectations ...................................................................................... 158
Table 62: Estimated Reliabilities and Standard Errors ............................................................... 181
Table B1: Topic Coverage on the TIMSS Mathematics Item F ield-Trial Instrument for
Population 2 ................................................................................................................................. 192
Table C1: Expert-Topic-Mapping-Topic Proportions for 13-Year-Old Students ...................... 194
Table C2: Curriculum-Guide-Topic Coverage Data .................................................................... 196
Table C3: Proportion of Blocks Devoted to Topics in Each Country’s Textbook(s): ................ 198
Table C4: Number of Data Sources in which Topics Appear within a Country ........................ 200
Table C5: Average Emphasis Devoted to Topics across Expert Topic Mapping, Curriculum
Guides, and Textbooks ................................................................................................................ 202
Table D1: Unweighted Specially-Constructed Test Scores ......................................................... 204
Table D2: Weighted Specially-Constructed Test Scores ............................................................ 205
Table D3: Unique Specially-Constructed Test Scores ................................................................ 206
Table D4: Ranks on Unweighted Specially-Constructed Tests .................................................. 207
Table D5: Ranks on Weighted Specially-Constructed Tests ...................................................... 208
Table D6: Ranks on Unique-Specially-Constructed Tests ......................................................... 209

xi

LIST OF FIGURES

Figure 1: Example of content and performance-expectation curriculum framework codes
for mathematics .................................................................................................................. 38

Figure 2: Boxplots of scores on all specially-constructed tests, TIMSS sub-scales, topics,
performance expectation, items for topic 1.6.2, and items for topic 1.7.2 ....................... 178

xii

CHAPTER I

Introduction and Study Focus

Interest has never been higher in comparable information about education
internationally, both for noble and ignoble reasons. In certain hands, such
information opens a window to a whole new world, one becoming
increasingly smaller in this information-technology-driven age of global
communication and conversation. In other hands, the same information
can serve as a sword to slay imagined enemies and vanquish challengers to
the power and status of nations.

We cannot escape the ideological use and misuse of cross-national
data for political purposes. We can only hope to overwhelm the most base
misrepresentations with the wealth of knowledge and understanding
international studies can provide. These are the motivations that
historically have led scholars world-wide to engage in cross-national
studies through IEA and that have convinced enlightened government and
non-governmental ofﬁcials to support these efforts. (Burstein, 1993, p.
xxxi)

Introduction

Studies comparing the structure of educational systems and the performance of
students in nations across the world have been a reality for over 30 years. Educators,
policy-makers, and researchers maintain that comparative cross-national studies provide
nations with a broad perspective for ascertaining the effectiveness of their educational
systems (Linn & Baker, 1995; Mislevy, 1995; Porter, 1990; Robitaille, McKnight,
Schmidt, Britton, Raizen & Nicol, 1993; Schmidt & Valverde, 1995). Information from
these studies can be used as input for policy decisions aimed at educational improvement.
Comparative studies also are conducted within nations to monitor educational
effectiveness. Within the United States, for example, such studies may use results of

student achievement testing to compare states (e.g., National Assessment of Educational

2
Progress - NAEP), districts (e.g., Michigan Educational Assessment Program, MEAP;

California Learning Assessment System, CLAS; Kentucky Instructional Results
Information System, KIRIS), or programs within districts (LaPointe, 1991).

Researchers conducting comparative-education studies typically collect a wide
array of information from participating educational systems. In addition to collecting
student performance data, comparative researchers may collect descriptive information
related to the structure and processes of each educational system or attitudinal
information from stakeholders such as students, teachers, or administrators. Despite the
availability of descriptive information, however, the public, educators, and policy-makers
focus much of their attention on student performance results, and, often, these results
receive the primary emphasis in reporting and analysis (Husen, 1987; Linn, 1988).

One popular approach for reporting student performance results in cross-national
studies is to rank countries using total scores, or selected sub-scores, on tests presumed to
measure student achievement in various subject areas. The common interpretation of
these rankings is that students in nations ranking at or near the “top” are achieving, or
have learned, more than students in nations ranking lower. The implication is that the
nations at the top have more effective educational systems, at least in particular subject
areas, than do the nations at the bottom. The accuracy and meaningfulness of these
interpretations, however, depend on the ability of the test that was used to obtain the
rankings to measure what it was intended to measure (i.e., the validity of the test). At
issue, though, when evaluating the validity of achievement tests used in comparative

studies of educational systems, is determining exactly what a particular study was

3

intending to compare before evaluating how well a particular test measures the variables
(e. g., skills, knowledge) needed to make the comparisons.

Generally, the primary goal of researchers who conduct comparative-education
studies is not to highlight differences in student performance in and of themselves
(Burstein, 1992; Husen, 1982; Schmidt & McKnight, 1995). Rather, the goal is to do so
in a way that accounts for the differences in educational contexts, inputs, and processes
across and within nations (McDonnell, 1995; Robitaille et al., 1993; Schmidt &
McKnight, 1995). Simply ﬁnding out that the students of one nation perform better on a
set of items than do students of another nation is not meaningful to educational
improvement if student performance cannot be not linked to some characteristic of a
particular educational system. Therefore, the value of many comparative achievement
studies depends upon the extent to which student test performance reﬂects achievement
that can be attributed to the student’s educational experiences (Airasian & Madaus, 1983;
Linn, 1987; Mislevy, 1995; Nitko, 1989; Schmidt & McKnight, 1995). According to
Airasian and Madaus (1983),

When a standardized achievement test is used to compare achievement

differences among schools or programs, the presumption is that the test

taps characteristics speciﬁc to the schools or programs....If we want to

make inferences about differential school, program or instructional

effectiveness, then the processes underlying performance on the

achievement measures need to be closely linked to instruction....lf the

issue is how effective are schools in developing general, transferable

skills, traditional achievement tests may be ﬁne. But if we are interested

in whether schools develop the speciﬁc skills and knowledge they set out

to develop, then such general tests are not valid. (p. 106)

The “speciﬁc skills and knowledge” educational systems “set out to develop” are

articulated in the curriculum of that nation. Therefore, many comparative studies focus

4

on the success with which educational systems impart to their students a certain deﬁned
curriculum. The tests developed for these studies are designed to measure student
attainment of this curriculum. A key component to evaluating the validity of these tests is
determining how representative the test content is of the corresponding curriculum.
Often, measurement specialists refer to this particular component of validity as content
validity. Schmidt (1983) refers to the lack of content validity as content bias and

considers this to be one cause of test invalidity.

Statement of the Problem

Difﬁculties in Domain Identification and Specification

Domain identiﬁcation. The content validity of a test is evaluated in relation to the
speciﬁc domain (in this case, a speciﬁc curriculum) about which test scores are used to
make inferences (Crocker, Miller, & Franks, 1989; Fitzpatrick, 1983; Messick, 1989).
The more representative the items are of the domain of interest, the greater is the chance
that student performance on the sample of items will mirror their performance within the
entire domain (Messick, 1989). A test may have high (content) validity in relation to one
domain but low (content) validity in relation to another, and all persons who use the
results of a particular test, however, may not be interested in the same domain, and.

Different curricula (i.e., domains), or components of a curriculum, may be of
interest to educators and researchers who conduct cross—national studies (Schmidt &
McKnight, 1995). For example, aside from the particular subject matter of interest,

researchers may be interested in the curriculum as laid out in ofﬁcial documents (e.g.,

5

curriculum guides, national goals statements) or as laid out in textbooks and other
instructional materials. Additionally, some researchers may be interested in the
curriculum that is actually delivered by teachers. A crucial, and often ignored, issue in
the development of cross-national achievement tests is determining what speciﬁc
component of a curriculum (i.e., domain) is of particular interest (Airasian & Madaus,
1983; Mislevy, 1995) and, therefore, whether achievement results should reﬂect what
students are intended to learn, what is in text books, what is delivered in the classroom,
what the students of most nations achieve, or something else (Airasian & Madaus, 1983).

Domain speciﬁcation. Even when a speciﬁc domain is identiﬁed, cross-national
researchers still face challenges in writing test speciﬁcations for that domain. For
example, a test could consist of only those topics that all countries include in their
curriculum, topics that most countries include in their curriculum, or all topics included in
the curriculum of any country (Linn, 1988; Linn & Baker, 1995; Porter, 1990).
Generally, however, cross-national achievement tests are comprised of items that
represent an internationally negotiated set of content (Linn & Baker, 1995). Critics of
cross-national achievement studies often argue that the tests used in these studies provide,
at best, an abstract deﬁnition of achievement in a particular subject area and may not
adequately represent the curriculum of any participating nation (Linn & Baker, 1995;
Mislevy, 1995; Porter, 1990; Westbury, 1992, 1993).

The accuracy and meaningfulness of interpretations of cross-national achievement
results are impacted by the degree to which the test used in a particular cross-national
study, reﬂects the curriculum of each country in the study (Guiton & Oakes, 1995; Linn

& Baker, 1995; McDonnell, 1995; Romberg & Wilson, 1992). Performance results on a

6

test that is not based on a clearly deﬁned domain provides little more than the knowledge
of who outperforms who on a speciﬁc set of items (Airasian & Madaus, 1983; Robitaille
et al., 1993). Interpretations of educational effectiveness or explanations of cross-
national differences that are based on such results are questionable, if not invalid
(Airasian & Madaus, 1983; Berliner, 1993; Guiton & Oakes, 1995; Guskey & Kifer,
1990; McDonnell, 1995; Stedman, 1994; Westbury, 1992, 1993). Therefore, in order to
validly interpret comparative-cmss-national-achievement data, it is important to
understand the relationship of the test items used to obtain these data to the curricula of
each participating nation (Airasian & Madaus, 1983; Linn & Baker, 1995; Schmidt &

McKnight, 1995; Schmidt, McKnight, Valverde, Houang, & Wiley, 1996).

Constraints in Item Development

Two prevailing constraints on cross-national achievement-test construction exist.
One of these constraints stems from the politics of item negotiation. Decisions about the
speciﬁc content of cross-national achievement tests evolve through years of negotiation.
Reaching even a minimal level of consensus from participating nations demands
sensitivity to the unique concerns and political realities of each nation. Often, reaching
consensus entails cutting corners in test development and adding or deleting certain items
or topics despite speciﬁcations to the contrary.

A second constraint on cross-national achievement test construction relates to the
adequacy of the item pool available to test developers. Item writing is an arduous and
costly process. It is even more difﬁcult in the cross-national arena as it involves

developing items that transcend cultures and translations. Often, researchers will draw

7

from existing item pools when constructing large-scale achievement tests (Garden &
Orpwood, 1996; Husen, 1983). However, the existing item pool may not always
adequately represent the range of topics and behaviors included in the curricula of all
nations. Items, especially those measuring higher-order thinking or complex reasoning,
may be sparse, and resources may prohibit the development of enough items to overcome
the deﬁcits.

The reality of these constraints may mean that cross-national tests will never allow
for a perfect match to all potential curricula. Therefore, researchers must continue to
explore ways to use the information available on cross-national curricular differences to
aid in the interpretation of cross-national-achievement results (Linn & Baker, 1995;
Porter, 1990). A key question remaining to be answered is: what methods for selecting
test content and strategies for analyzing and presenting test results provide the most valid

basis for comparing student achievement across nations?

Purpose

The purpose of this study was to use the results of an extensive multi-national
curriculum analysis to analyze the content of a cross-national mathematics achievement
test in relation to the curriculum of nations administering the test. A second purpose was
to determine if altering the content of the test to better match the countries’ mathematics
curricula has an impact on national performance and to evaluate the subsequent
consequences of such content alterations on test validity. The ultimate goal was to use

this information to enhance the validity of cross-national comparisons of student

8

achievement. My primary focus was on the relationship between test items and
curriculum as a key element of test validity.

This study is one of the ﬁrst applications of the results from an extensive multi-
national curriculum analysis undertaken as a part of the Third International Mathematics
and Science Study (TIMSS). Preliminary results of the curriculum analysis and TIMSS
achievement testing are due for release in late fall of 1996. The curriculum analysis
entailed an exhaustive review of curricular intentions for math and science in 50 countries
(Robitaille et al., 1993; Schmidt & McKnight, 1995; Schmidt et al., 1996). It
necessitated the development of a curriculum framework describing subject-area content,
performance expectations, and perspectives (i.e., attitudes; Robitaille et al., 1993). The
framework was subsequently used to guide the construction of student achievement tests.
The data provide the opportunity for using a common framework to link student
attainment with the results of curricular intentions across nations. Additionally,
information on curricular intentions obtained using the framework can be used to guide
the development of future cross-national achievement tests.

The results of this study may be applicable to intra-national comparative
achievement studies in addition to cross-national studies. As mentioned earlier, interest
in the ability to compare student achievement across states, districts, and schools
continues to grow in the United States (Linn & Baker, 1995; Mislevy, 1995; Porter,
1990). Calls continue for a national system of assessments that recognizes the
individuality of states while measuring progress toward common standards. The diversity

of the American educational system introduces many of the same problems encountered

9

when conducting cross-national studies (Linn, 1988). The results of the present study

will apply to these situations as well.

CHAPTER II

Review of Related Literature
Comparative-Achievement Studies - Growth, Rationale, and Impact

Comparisons are fascinating and they make juicy items of gossip, but they

do not necessarily lead to improvement. The penchant for comparing is

taken for granted with little thought as to what is gained by such

comparisons. (Maeroff, 1991, p. 92)
The Growth of Comparative-Achievement Studies

Many nations have demonstrated a long-term interest in comparing the
achievement of their students with that of the students in other nations (Linn & Baker,
1995; Pelgrum, 1989; Porter, 1991). The concept for “a study of cognitive competence in
children belonging to different national systems of education” (Husen, 1982, p. 6) was
being discussed as early as 1958 at a meeting of the UNESCO Institute for Education in
Hamburg. It was not until 1961, however, that researchers established an organization
aimed at achieving this goal. The International Association for the Evaluation of
Educational Achievement (IEA) was founded “to promote research aimed at examining
educational problems common to many countries and thereby devise evaluative
procedures which can provide facts which can be useful in the ultimate improvement of
educational systems” (Husen, 1987, p. 30). The IEA completed a preliminary study of 12
countries in 1961. The First lntemational Math Study (FIMS) took place between 1962

and 1965 (Husen, 1987).

10

11

Today the interest in cross-national achievement studies continues. The IEA
studies have expanded from their original focus on mathematics to include studies in
science, reading, literature, writing, civics, French and English as foreign languages,
computers, and preprimary education (Linn & Baker, 1995; Pelgrum, 1989). Among
recent comparative studies are the 1990-91 IEA study in reading literacy and the 1991
International Assessment of Educational Progress (IAEP) studies in math and science.
Additionally, the IEA, together with other researchers around the world, is preparing the
results of the Third lntemational Math and Science Study (TIMSS) for release in late
1996 through 1997. This most recent cross-national study involved testing three
populations of students from approximately 50 nations.

Additionally, within the United States current educational reform movements
highlight the need for comparative data to insure that American students remain
competitive with other major industrialized nations. Supporters of these reforms
encourage policy-makers to develop high standards for education that are “benchmarked”
against the achievement of students in other nations (Linn & Baker, 1995; Resnick, Nolan
& Resnick, 1995). Groups like the National Education Goals Panel (N EGP), the National
Council on Education Standards and Testing (NCEST), and the National Academy of
Education Panel on the Evaluation of the NAEP Trial State Assessments have begun
work in this area (Linn, 1988; Linn & Baker, 1995; Schmidt & Valverde, 1995). These
and other groups, such as the Council of Chief State School Ofﬁcers (CCSSO), also
advocate state-by-state comparisons of educational achievement (Bracey, 1995; LaPointe,

1991; Linn, 1987, 1988; Porter, 1991; Postlethwaite, 1987).

12

The Rationale for Comparative-Achievement Studies

Husen (1987) stated that the ﬁrst international studies “were inspired by
expanding international communication, trade, and military competition” (p. 43). In a
world of increasing global competition and interdependence, many nations still have a
desire (or need) to know where they stand in comparison to other nations (Berliner, 1993;
Guthrie, 1986; Mislevy, 1995). This desire to know who is “ﬁrst” or “best” is sometimes
referred to as the “cognitive Olympics” or “international horse race” (Husen, 1987;
Schmidt & Valverde, 1995). An example of this competition is evident in the Goals
2000, Educate America Act (HR. 1804 and SR. 846, 1993). One goal in this policy
statement declared that “US. students will be ﬁrst in the world in science and
mathematics achievement” by the year 2000. As a result, U.S. educators and the public
are eagerly awaiting the results of TIMSS to determine the progress they are making
toward this goal and to compare the ranking of American students with their poor
standing in past studies.

Another reason for conducting comparative-achievement studies is to determine
priorities for expenditures and resource allocation within educational systems (Guthrie,
1986; Mislevy, 1995). Budget cuts and shortages of resources such as computers,
textbooks, and qualiﬁed teaching staff in some educational systems have led to a need to
closely examine educational priorities. Comparative achievement studies can help
educators determine areas of strength and weakness in their educational system in relation
to other educational systems. This can result in more informed decision making

regarding budgeting and resource allocation. Additionally, comparative studies can

13

increase public awareness of the standing of their own educational system in relation to
others which, in turn, may lead to support for increases in funding or a re-focusing of
systemic priorities (Cohen, 1988).

Another, perhaps most important, reason for conducting comparative-achievement
studies is school improvement (Guthrie, 1986; Mislevy, 1995). Comparative studies
provide researchers and policy-makers with information than cannot be gained from
single-system studies (Postlethwaite, 1987; Robitaille et al., 1993; Schmidt & Valverde,
1995). Comparisons with systems both different from and similar to ones own broadens
knowledge about what is and is not possible. These comparisons also can provide greater
opportunities for reviewing the impact of educational interventions.

By looking at the educational systems of the world we challenge our own

conceptions, gain new and objective insights into education in our own

country, and are thus empowered with fresh vision with which to

formulate effective educational policy and new tools to monitor the effects
of these new policies. (Schmidt & Valverde, 1995, p. 7)

The Impact of C omparative-A chievement Studies

Comparative studies have had a signiﬁcant impact on the US. educational system.
Results of past international studies have led to the ruin of “new math” (Husen, 1987;
Schmidt & Valverde, 1995) and have led to questions about classroom-grouping and
school-tracking policies (Husen, 1987; Schmidt & Valverde, 1995). Such studies have
highlighted the inadequacy of the American curriculum in math and science and have
resulted in nationwide curricular reform (McKnight, Crosswhite, Dossey, Kifer,

Swafford, Travers, & Cooney, 1987). Furthermore, one of the most signiﬁcant inﬂuences

14

on the school-reform movement of the past decade, A Nation at Risk (National
Commission on Excellence in Education, 1983), was written partly in response to poor
U.S. performance on cross-national studies (Kaestle, 1985).

Results of achievement testing within nations also signiﬁcantly impact
educational systems. In the US, for example, funding, endorsements, or program
continuation may be tied to comparisons of student performance results. Performance
rankings factor into real-estate prices and the attractiveness of certain districts or states.
Furthermore, U.S. educational systems, programs, and teachers have received substantial
criticism from the media, public, and researchers as a result of performance in national

and cross-national comparative studies (Bracey, 1995).

Validity and C omparative-A chievement Studies

Accusations of Invalidity

Controversy over the use of country ranks. Cross-national studies provide
stakeholders and consumers with a variety of results. However, the results that
historically have received the most attention are rankings of countries on national mean-
achievement scores (Husen, 1987; Linn, 1988; Schmidt & McKnight, 1995). Policy-
makers encourage such rankings because they provide a simple yardstick for gauging
educational health (Postlethwaite, 1987). Many researchers, on the other hand,
discourage such rankings because of problems reaching valid interpretations for all
countries (Berliner, 1993; Husen, 1987; Linn & Baker, 1995; Mislevy, 1995; Porter,

1990; Postlethwaite, 1987; Stedman, 1994; Westbury, 1992, 1993).

15

Criticisms of the validity of cross—national achievement results come from many
parties. First, some critics maintain that cross-national-achievement results have
historically been based on poor sampling methodology (Bracey, 1995; Linn & Baker,
1995; Porter, 1991). Some of the countries involved in past cross-national studies tested
populations that were not representative of the entire population to which the results were
intended to generalize. For example, some countries tested only higher achieving
students or native-language speakers. Stedman (1994), however, maintained that these
problems are becoming fewer and more isolated. Furthermore, countries that do not
employ adequate sampling procedures are being identiﬁed in the reporting of TIMSS
results.

Differences inherent in the test populations of each nation are sometimes cited as
reasons for invalidity (Berliner, 1993; Linn & Baker, 1995). For instance, test
populations may differ in the total years of schooling students have received prior to the
testing age. Students in some countries begin school at earlier ages than students in
others. Additionally, critics also point out that differences in tracking practices across
nations sometimes result in comparisons of elite populations of students with more
comprehensive populations. These differences, too, are less extreme today than they were
in the past (Linn & Baker, 1995).

Another concern about using ranks relates to cross-national differences in student
motivation to do well (Berliner, 1993; Porter, 1991; Stedman, 1994). One often cited
example is that of Korean students being applauded by their classmates as they leave the
classroom to take an achievement test for a cross-national study (Berliner, 1993; Mislevy,

1995).

l6

Differences in the focus and priorities of comparative education studies. Some of
the most serious criticisms about the validity of cross-national achievement testing relate
to the differing curricula of the nations involved in the studies and the problems that arise
in test development and reporting as a result of these differences (Berliner, 1993; Linn &
Baker, 1995; Stedman, 1994; Westbury, 1992, 1993). According to Husen (1983),
“comparing the outcomes of learning in different countries is in several respects an
exercise in comparing the incomparable” (p. 455). The difﬁculty stems from the fact that
educational systems are unique to the culture of each country (Passow, 1984; Purves,
1987). They are based upon differing views of development and childhood (Berliner,
1993). They have differing goals which reﬂect differing social, political, economic, and
resource needs and priorities (Schmidt & McKnight, 1995; Schmidt & Valverde, 1995).
The time available for formal education is limited, making it impossible to teach
everything. It is highly unlikely that different nations will choose to ﬁll this limited time
in exactly the same ways (Schmidt & McKnight, 1995). Therefore, the degree of
variability in curricular goals and offerings across differing educational systems has a
direct impact on the interpretation of results from comparative studies of these systems
(Berliner, 1993; Linn & Baker, 1995; Mislevy 1995; Stedman, 1994; Westbury, 1992,

1993).

A Definition of Validity
Categories of test validity. The 1985 Standards for Educational and
Psychological Testing opens with the following:

Validity is the most important consideration in test evaluation. The
concept refers to the appropriateness, meaningfulness, and usefulness of

17

the speciﬁc inferences made from test scores. Test validation is the

process of accumulating evidence to support such inferences. (AERA,

APA, & NCME, 1985, p. 9)

In describing validity, Cronbach (1971) refers to “accuracy,” Messick (1989)
refers to “adequacy and appropriateness,” and Mehrens and Lehmann (1991) refer to
“truthfulness.” Test validation is the process of evaluating the accuracy, adequacy,
appropriateness, truthfulness, or usefulness of inferences made from test results, as
opposed to evaluating the test itself. A test is never valid in and of itself; however, it may
be valid for a certain purpose (Cronbach, 1971; Mehrens & Lehmann, 1991; Messick
1989).

Historically, three categories of validity evidence have been described: construct-
related, content-related, and criterion-related (AERA, APA, & NCME, 1985; Cronbach,
1971; Mehrens & Lehmann, 1991; Messick, 1989). Some measurement specialists
consider all validity evidence to be construct-related (Messick, 1989); others have
challenged the notion of or the usefulness of content validity (Fitzpatrick, 1983; Guion,
1978; Messick, 1989); still others have discussed additional categories of validity such as
consequential (Messick, 1989, 1994; Moss, 1992) and systemic (Frederikson & Collins,
1989). Linn, Baker, and Dunbar (1991) presented alternative criteria for evaluating the
validity of assessments that are more performance oriented. These criteria are
consequences, fairness, transfer and generalizability, cognitive complexity, content
quality, content coverage, meaningfulness, and cost and efficiency.

Content validity. Test validation is a process in which evidence is gathered about

the accuracy of test inferences. The process is never complete, as one can continually

collect evidence which supports or disputes the validity of test inferences for different

l8

purposes. The validity evidence most often sought when evaluating content-oriented
achievement tests, including those used for comparative purposes, is evidence of content
validity (Cronbach, 1971; Mehrens & Lehmann, 1991).

Content validity is particularly important for achievement tests. Typically,

we wish to make an inference about a student’s degree of attainment of the

universe of situations and/or subject-matter domain. The test behavior

serves as a sample, and the important question is whether the test items do,
in fact, constitute a representative sample of behavioral stimuli. (Mehrens

& Lehmann, 1991, p. 267)

Content validity is the extent to which test items constitute an adequate sample of
the content domain about which inferences are intended (AERA, APA, & NCME, 1985;
Anastasi, 1982; Cronbach, 1971; Mehrens & Lehmann, 1991; Messick, 1989).
Evaluations of content validity typically rely on judgments about test content as opposed
to empirical analyses of test results (Messick, 1989). As such, Messsick (1989) prefers to
speak of content relevance (i.e., the degree to which each item reﬂects the content
domain) and content representation (i.e., the degree to which all items adequately
represent the domain and any sub-domains) as opposed to content validity. Some
authors, though, (e.g., Airasian & Madaus, 1983; Schmidt, Porter, Schwille, Floden, &
Freeman, 1983) further sub-divide content validity into curricular and instructional
validity depending upon the speciﬁc domain of reference. Messick (1989), however,
referred to these two concepts as curricular relevance and representation and instructional
relevance and representation. In any sense, “content validity” is one necessary but not

sufﬁcient condition for test validity (Guion, 1978; Messick, 1989; Schmidt, 1983).

19
The content validity of any test is evaluated in light of the speciﬁc purposes for

which the test is to be used and the speciﬁc domain(s) it is intended to represent
(Messick, 1989). An evaluation of content validity ﬁrst requires a clear and operational
deﬁnition of the content domain (Cronbach, 1971; Haertel & Calfee, 1983; Mehrens &
Lehmann, 1991; Messick, 1989; Millman & Greene, 1989).

The nature of the behavioral domain about which inferences are to be

drawn or predictions made becomes especially important at two points in

the measurement process: ﬁrst, at the stage of test construction, where

domain speciﬁcations serve as a blueprint or guide for what kinds of items

should be constructed or selected for inclusion in the test; second, at the

stage of test use, where the relevance and coverage of the constructed test

must be evaluated for applicability to a speciﬁc, possibly different applied

domain. The central problem at either stage, of course, is determining

how to conceptualize the domain. (Messick, 1989, p. 37)
Domain Specification in Comparative-Achievement Tests

The range of domain possibilities. Domain speciﬁcation is often one of the most
difﬁcult aspects of test development (Cronbach, 1971; Messick, 1989). A testing domain
deﬁnes the parameters from which the content of test items can be drawn and sets limits
to the inferences that can be made from test results. If not clearly deﬁned and articulated
by the test developer, the testing domain can be deﬁned only in terms of the set of items
that comprise the test. Too often, the testing domain in comparative studies of
educational achievement is not always clear, and, in actuality, several different domains
may be of interest to researchers conducting, or others using the results of, such studies.

Three categories of testing domains are relevant as sources of content for the

achievement tests used in cross-national-education studies. The ﬁrst of these testing

domains relates to the a priori (Schmidt, 1983) or intentional achievement goals of a

20
nation, referred to as the intended curriculum in IEA studies (Sclunidt & McKnight,

1995). The content of such domains is speciﬁed in formal statements of educational
goals and curricular objectives. The second category of testing domains is deﬁned in the
content of curricular or instructional materials. This domain is sometimes considered the
curricular domain (Schmidt, 1983). The third category of testing domain is based on the
content of the actual instruction delivered by teachers. This instructional domain
(Schmidt, 1983) corresponds to what IEA studies term the implemented curriculum. In
addition, Schmidt et a1. (1996) considered textbooks to be a bridge between the intended
and implemented curriculum, that is, an articulation of the potentially implemented
curriculum of a nation.

Researchers conducting comparative-achievement studies need to determine
precisely in which curriculum domain they are interested before selecting or developing
the tests they will use. They must determine if they are interested in student achievement
of what the students’ educational systems intended they learn, of what is contained in
actual instructional materials, or of what they were taught in the classroom. Schmidt
(1983) stated that researchers sometimes use tests that have been developed in reference
to one domain and make inferences about student achievement in reference to another
domain. He considered this to be one source of the content bias of a test.

Mislevy (1989) further identiﬁed an additional distinction that must be made in
domain speciﬁcation. This is the distinction between the concept of immediate
curriculum (or instruction) and ultimate curriculum (or instruction). The immediate
curriculum (instruction) relates to what was actually included in a speciﬁc curriculum or

actually addressed in the classroom during a particular period of time (e.g., a school year).

21

Ultimate curriculum (instruction) relates to the ﬁnal objectives of a curriculum or
instruction, or those objectives generally desired for similar groups of students. Some
researchers (Cronbach, 1971, Mehrens & Lehmann, 1991; Millman & Greene, 1989)
have considered achievement results based on a general or ultimate curricular
(instructional) domain to be more meaningful for comparative purposes than those based
on the immediate domain.

A particular difﬁculty, then, in domain speciﬁcation is determining exactly which
domain is of primary interest. The resolution of this difﬁculty lies in the purpose of the
test. If the purpose is to evaluate student achievement of subject-matter knowledge and
skills most members of society deem important, one would be interested in a domain
reﬂecting ultimate implicit educational goals. If the purpose is to evaluate student
achievement of what was presented in textbooks, one would be interested in the
immediate curricular domain. If the purpose of a test is to evaluate student achievement
of what they were taught, one would be interested in the immediate instructional domain.
Sometimes more than one domain may be of interest.

The purpose of cross-national achievement studies. Often conﬂicting purposes
for conducting cross-national comparative-achievement studies exist. Policy makers may
be interested only in student achievement comparisons in and of themselves. However,
most cross-national studies typically have a purpose beyond merely ranking countries on
student-test performance (Burstein, 1992; Husen, 1983; Postlethwaite, 1987). Some
researchers (Bracey, 1991; Burstein, 1992; Linn & Baker, 1995) ﬁnd it valuable to know
how students within a nation perform on test content that is unique to their particular

educational system and to compare this performance to the performance of students in

22

other nations on content that is unique to their system. These researchers are less
interested in performance differences due to “student attributes” (Burstein, 1991, p. 50) or
ability than they are interested in detecting differences due to schooling and determining
how and why these differences arise (Burstein, 1991; Husen, 1983). Burstein (1993), in
the prologue to his edited volume on SIMS results, recounts the historical purpose behind
IEA testing. In it, he quotes from Husen’s preface to the 1967 volume on the First
International Mathematics Study:

...the overall aim is, with the aid of psychometric techniques, to compare
outcomes in different educational systems. The fact that these
comparisons are cross-national should not be taken as an indication that
the primary interest was, for instance, national means and dispersions in
school achievements at certain age or school levels.

...the main objective of the study is to investigate the “outcomes” of
various school systems by relating as many as possible of the relevant
input variables (to the extent that they could be assessed) to the output
assessed by international test instruments. . .In discussions at an early stage
in the project, education was considered as a part of a larger social-
political-philosophical system. In most countries, rapid changes are
occurring...Any fruitful comparison must take account of how education
responded to changes in the society. One aim of this project is to study
how mathematics teaching and learning have been inﬂuenced by such
deve10pment.(p. 30)

...The IEA study was not designed to compare countries; needless to say,
it is not to be conceived of as an “international contest” ...its main
objective is to test hypotheses which have been advanced within a
framework of comparative thinking in education. Many of the hypotheses
cannot be tested unless one takes into consideration cross-national
differences related to the various school systems operating within the
countries participating in this investigation. (in Burstein, 1993, p. xxxii)

Complexities in study purposes and conﬂicting priorities only add to the difﬁculty
of domain speciﬁcation. However, the important point in domain speciﬁcation is that it
cannot begin without ﬁrst clearly deﬁning ones purpose(s) for the testing program

(Millman & Greene, 1989) even if the purposes are many.

23

Determining test content. Once a particular domain is chosen as the focus of a
test, the test developer must determine the exact content that will be included on the test
and the proportion of items that will be allocated to each content area or topic (Messick,
1989; Millman & Greene, 1989; Postlethwaite, 1987). Much debate exists over the exact
method of specifying the desired domain. For example, suppose one were to develop a
test to measure student achievement of the curriculum included in math textbooks.
Cross-national studies of textbook content (Schmidt et al., 1996) have found considerable
variability in the content of these textbooks across nations. Different methods exist for
determining the exact topics to include on such tests and the proportions of items to
allocate to each topic.

For any given target population, no two countries have exactly the same

curriculum. Is it then possible to make valid comparisons of student

achievement? The way in which international tests are currently
constructed consists of ﬁrst undertaking in each nation a content analysis

of what is meant to have been learned by the end of a given period of time

by a target population...It would seem reasonable to make comparisons

about mathematics achievement in general if 80 percent or more of the

content is the same between countries (and if the target populations are

very similar and if the standard errors of sampling are small). What about

79 percent? What is a reasonable cutoff point? (Postlethwaite, 1987, p.

153)

Linn (1988) described three methods of domain speciﬁcation ﬁrst proposed by
Seldon (in Linn, 1988). These are identifying a “least—common denominator” of content
(e.g., content that is common to all textbooks across the nations), an “optimal” set of

content (e.g., content found in a large number of textbooks), or an “inclusive” set of

content (e.g., content that is found in any of the textbooks). Linn believed that the least

24

common denominator may appear most fair, but would tend to favor those systems that
are narrow in their curriculum.

Linn and Baker (1995) and Porter (1991) stressed the need for a more inclusive
approach to test development to ensure that cross-national achievement studies provide
U.S. educators with adequate data on how well their students perform on educational
goals speciﬁc to the US. Linn and Baker proposed that the tests be developed in such a
way that a subset of content could be “mapped” onto speciﬁc national standards. This
would entail developing a comprehensive assessment that “assesses the union rather than
only the intersection of content standards of participating countries.” Linn and Baker and
Porter acknowledged, however, the potential political difﬁculties inherent in negotiating
test content. Garden and Orpwood (1996) detailed these difﬁculties in their technical
report on the development of the TIMSS achievement tests. Additionally, an inclusive
approach to test development would demand large amounts of testing time from students
unless complex matrix sampling designs were employed. However, the results of cross-
national achievement studies may be limited without the ability to match national goals or
practices to test results (Linn, 1987; Linn & Baker, 1995; Porter, 1991).

Other issues related to test content. Other difﬁculties in developing tests with
high content validity relate to issues such as the balance of breadth versus depth in
content coverage (Burstein, 1986). Should a limited number of items be used to measure
a large number of topics superﬁcially or should some topics be measured in depth at the
expense of others? Another issue relates to the adequacy of the item pool for test

development. Items that measure integrated topics and higher-order thinking processes

25

are in limited supply, are difﬁcult to write, and encounter more resistance in country
negotiations (Garden & Orpwood, 1996; Linn, 1987).

Additionally, the increasing complexity of subject matter calls into question the
unidimensionality of test domains. Lack of unidimensionality raises questions about the
meaning of total scores used in country ranks and subsequent analyses (Airasian &
Madaus, 1983; Maeroff, 1983). Researchers (Burstein, 1991; Kupermintz, Ennis,
Hamilton, Talbert, & Snow, 1995; Maeroff, 1983; Muthen et al., 1995) have suggested
that mathematics scores aggregated over different topics represent general-math ability
rather than math achievement that can be linked to curriculum or instruction. Student
performance varies, sometimes signiﬁcantly, across sub-topics (Ariasian & Madaus,
1983). This general-math factor may be so strong that it masks any correlation between
curriculum and achievement (Burstein, 1991). Better linkage between tests and
curriculum is obtained at the sub-topic level (Airasian & Madaus, 1983; Burstein 1991;
Mislevy, 1995); although, some researchers suggest that the most useful performance
results are at the item level (Guskey & Kifer, 1990; Mislevy, 1995). As Mislevy (1995)
has stated, “The outcome for every individual task in an international assessment tells a
story in its own right. Assessments with hundreds of tasks, like those of IEA and IAEP,
tell hundreds of stories” (p. 426).

Additionally, domain speciﬁcation also must consider what students are expected
to do with test content (Airasian & Madaus, 1983; Linn, 1983; Mislevy, 1995; Snow &
Lohmann, 1989; Walker & Schafarzick, 1974). New cognitive theories have resulted in

increased attention to expectations for student performance. Often, these expectations

26

vary within and across educational systems; always, they add to the complexity of the
domain.

Finally, the level of speciﬁcity of domain deﬁnition needs to be determined.
Burstein (1986) found differing levels of domain speciﬁcation across different tests and
curricular documents. Mehrens and Phillips (1987) have shown that the level of
speciﬁcity of domain deﬁnition has an impact on the degree of test to domain match.
According to Schmidt et al. (1981), “The domain should be at a ﬁne enough level to
make important distinctions but not such a ﬁne level of detail so as to classify everything

within the subject matter as being important” (p. 136).

Evaluating Content Validity

Two primary approaches exist for evaluating test-content validity (Airasian &
Madaus, 1983; Leinhardt & Seewald, 1981). The ﬁrst approach uses test results to
compare the performance of individuals who have been exposed to curricular content
with the performance of those who have not. The intent is either to determine if test
scores discriminate between these two groups or to ﬁnd items that do (Airasian &
Madaus, 1983; Burstein, 1991; Muthen et al. 995). This approach includes the use of
IRT, intra-class correlations, factor analysis, and generalizability theory. The
methodology is used post hoc and does not directly evaluate the content being measured
by test items (Airasian & Madaus, 1983).

The second approach to evaluating test to curriculum match relies on a judgment
of the overlap between a test and a domain (Airasian & Madaus, 1983; Crocker et al.,

1989; Leinhardt, 1983; Leinhardt & Seewald, 1981; Messick, 1989). Generally, a

27

taxonomy to which the domain and test are matched is developed (e.g., Burstein, 1986;
Gamoran, Porter, Smithson & White, 1996; Schmidt et al., 1983). This taxonomy may
include only topics or a matrix of topics and cognitive processes. In some cases (e.g.,
Leinhardt, 1983; Leinhardt & Seewald, 1981; Schmidt & McKnight, 1995), actual test
items are matched to textbooks or teacher coverage.

Several methods have been used to quantify overlap and results often depend
upon the speciﬁc method used. Crocker et a1. (1989) reviewed a series of methods for
evaluating the overall ﬁt between items and a content domain. Many of the procedures
involved using judges to rate the proportion of items that assess what is in a curriculum or
what is deemed to be an important learning objective. Judges will typically rate the
relevance or value of items and these ratings are averaged across judges.

Concepts in proﬁle analysis also may be useful to consider when evaluating
content validity, especially for cross-national purposes. A proﬁle is a vector of k
elements where each element could correspond to the proportion of test items in a given
topic area, the proportion of time spent teaching a topic, the proportion of a textbook
allocated to a topic, or the weight a topic is given in curricular intentions. Proﬁles of
topic areas on a test could be compared to the curriculum proﬁles to determine the degree
of similarity between the two.

A proﬁle has three main properties: shape, elevation, and scatter (Cattell, 1949).
Elevation is the mean of all the proﬁle elements; scatter (dispersion) is the standard
deviation of all proﬁle elements from the mean (elevation); shape (conﬁguration) is the
relative highs and lows (or rank correlation) of proﬁle elements. Differences of opinion

exist as to which elements should be considered when assessing proﬁle similarity.

28

Indices based on correlation look almost entirely at shape without regard to the other two
properties. Euclidean distance measures (D) utilize all three factors of proﬁle similarity
(Skinner, 1978). D2 is the sum of the squared distances between corresponding elements
in two proﬁles; D is the square root of this measure. Cronbach and Gleser (1953)
recommend the use of D as opposed to D2, as it tends to exaggerate large differences.

Schmidt (1983) uses a similar concept to deﬁne the content bias of a test as the
following:

Total bias = 2(WJ-T - WJ-D),
where WJ-T is the weight for a topic, often deﬁned by a proportion of items, for the test
and WJ-D is the weight for the topic in the domain of interest (e.g., proportion of a
textbook or proportion of instructional time devoted to a topic). Schmidt’s formula is
similar to the Euclidean distance formulas used in proﬁle analysis (Cronbach & Gleser,
1953). Gamoran et al. (1996) in a recent study also drew upon proﬁle analysis to
measure content coverage. They developed an indicator that combined the “proportion of
instructional time spent covering tested material (level of coverage), and the match of
relative emphases of types of content between instruction and the test (conﬁguration of
coverage)” (p. 12). The formula for the conﬁguration of coverage was
1-(21w,T-w,-D|/2)

where WjT is the proportion of items in each tested area and WJ-D is the proportion of
instructional time spent on each tested area. The ﬁnal index was the product of the level

of coverage and the conﬁguration of coverage.

29
The Impact of Low Content Validity

Considerable disagreement exists as to the impact of the lack of ﬁt between a test
and a domain. One impact of the lack of ﬁt is the perceived importance of the test to
stakeholders. Linn (1987) stated, “If a test does not measure the outcomes that
correspond to important program goals, the evaluation will surely be considered unfair”
(p. 6), especially if it better measures the goals of another program in the study.

Studies have shown that results on tests not well-matched to a domain can be
misleading (Berliner, 1993; Linn, 1988; Stedman, 1994; Westbury, 1992, 1993). Others
have found that ranks on total scores are unstable, may result in unfair comparisons
(Guskey & Kifer, 1991; Linn, 1987; Mislevy, 1995), and are dependent on the relative
weighting of sub-topic areas (Cronbach, 1971). IEA studies introduced the notion of
opportunity to learn (OTL) as a means of ensuring the technical validity of their ﬁndings
(McDonnel, 1995). Researchers have shown that opportunity-to-learn the skills being
tested is a signiﬁcant explanatory variable of student performance (Berliner, 1993;
Burstein, 1992; Burstein et al., 1990; Husen, 1983; Kuperrnintz et al., 1995; McDonnell,
1995; Muthen, Huang, Jo, Khoo, Goff, Novak, & Shi, 1995; Purves, 1987; Walker &
Schaffarzick, 1974).

Additionally, Westbury (1993) found that differences between the scores of
American and Japanese students on SIMS decreased when controlling for curriculum.
Studies by Raizen and Jones (1985) found a correlation between mathematics
achievement and the number of math courses students take. One particular critic of cross-

national studies has stated

30

We make curricular decisions different from those that other countries

make. Thus differences in achievement are most parsimoniously

explained as differences in national curricula, rather than differences in the

efﬁciency or effectiveness of a particular national system of education.

(Berliner , 1993, p. ),

Differing opinions about the impact of curriculum on student achievement also
exist. In a reanalysis of the Westbury data, Baker (1993) still found large differences
between American and Japanese scores even when accounting for opportunity to learn.
Furthermore, although he did ﬁnd some curricular impact on test results, Stedman (1994)
found that curriculum was just one of many variables having an impact. Phillips and
Mehrens (1988) maintained that studies comparing test-to-curriculum match “have not
provided any evidence regarding the impact of the mismatch” (p. 34). Mehrens (1984),
Mehrens and Phillips (1987), and Phillips and Mehrens (1988) felt that impact of
mismatch on achievement would be minimal in norm-referenced testing situations where
the curriculum is basically homogenous. However, they surmised that the results could
be quite different if comparing “two totally different curricula” (Mehrens & Phillips,
1987, p. 368) or when comparing “countries in which textbooks are not as homogeneous
as those in the United States” (Phillips & Mehrens, 1988, p.50).

It is reasonable to assume that the more different the curricula, the more

likely those differences will have an impact on the test scores. Thus if

differences in curricula between, for example, the United States and Japan

are great, those differences may indeed impact scores on a common test.

Examining score differences across countries, we could make incorrect

inferences about the quality of the instruction or the quality of the students

rather than making correct inferences about the impact of curricular
differences on test scores. (Mehrens & Phillips, 1987, p. 358)

31

Recent Advances

In TIMSS, the IEA has collected information that may provide a means to
overcome some of the difﬁculties in the domain speciﬁcation of cross-national
achievement tests (Schmidt et al., 1996). Two methodological innovations in particular
relate to domain speciﬁcation. The ﬁrst of these was the development of a detailed
curriculum framework used to code all the content of materials and instruments in the
study (Robitaille et al., 1993). The second was an exhaustive analysis of the content of
the intended curricula of participating nations (Schmidt et al., 1996). In addition, in order
to obtain measures of the implemented curriculum within each nation, the IEA revised
questionnaires used in previous studies (IEA, 1994a). These questionnaires asked
teachers to identify from a list of mathematics topics those that they taught during the
school year and the amount of time allocated to each.

The TIMSS curriculum frameworks, document analyses, and teacher
questionnaires provide educators and researchers with the tools for reducing the content
bias of cross-national achievement tests and increasing test validity. Information from
these materials provides a window into the unique educational experiences confronted by
students across the world and provides a framework for domain speciﬁcation. However,
many issues still remain to be resolved. For example, researchers will still need to
determine which types of domains are of particular interest. They then must determine

how information across countries will be combined in domain speciﬁcation.

CHAPTER III

Study Design and Procedures

Purpose and Questions

One purpose of this study was to use data on curricular intentions from a review
of mathematics curriculum in 17 countries to evaluate the content of a mathematics
assessment being developed for cross-national comparisons. A second purpose was to
explore ways of using the curriculum data to improve test-to-curriculum match. A ﬁnal
purpose was to investigate the relationship between student performance results and test-
to-curriculum mis-match, and the subsequent implications for test validity.

I compared the mathematics-curriculum data collected through the TIMSS
document analyses to the content of the TIMSS mathematics ﬁeld-trial instrument for 13-
year-old students. 1 also developed several sets of test speciﬁcations based on different
methods of summarizing the curriculum data. Using the country-performance data from
the ﬁeld trial, I calculated country-mean scores on each of the speciﬁed “tests.” I then
ranked each country on each test and compared the country scores and ranks across the
different tests. The questions I attempted to answer were
1. How much variation in content exists across the 17 nations in the mathematics

curricula for 13-year-old students? How well does the content of the TIMSS ﬁeld-
trial instrument match these curricula?
2. What test speciﬁcations provide a good curricular match across countries? How well

does the content of the TIMSS ﬁeld-trial instrument match these test speciﬁcations?

32

 

33
3. What test speciﬁcations would improve the content match between the TIMSS ﬁeld-

trial instrument and the countries’ math curricula? How well do these speciﬁcations
match the curricula?

4. How stable are country scores and ranks across tests developed using the new test
speciﬁcations when compared to the total scores and ranks on the ﬁeld-trial
instrument?

A brief description of TIMSS as well as information on the study population, data

sources, and methods for answering these questions follow.

The Third International Mathematics and Science Study
The TIMSS is the largest cross-national study of educational systems ever

attempted (Robitaille, et al., 1993, Schmidt & McKnight, 1995). Approximately 50

nations have been involved in some aspect of the study. The primary objective of the

study was to “contribute to improvements in the teaching and learning of mathematics
and science” (Robitaille, et al., 1993, p. 35). The study revolves around three
components: A study of the intended curriculum, the implemented curriculum, and the
attained curriculum of the nations involved. Data on the intended curriculum were
collected through expert questionnaires and document analyses in each country. Data on
the implemented curriculum were collected through school, teacher, and student
questionnaires. Data on the attained curriculum were collected through student-
achievement testing and student questionnaires.

The TIMSS study population included students in the two grades in which most

students were 8 years old, students in the two grades in which most students were 13

rvr_;

 

 

 

34

years old, and students in their ﬁnal year of schooling. Additionally, a sub-population of
students in their ﬁnal year of schooling specializing in calculus or physics was also tested.
Data collection on curricular intentions began in 1991 and was completed in 1995.
Data were cleaned and initial analyses released in 1996. Achievement testing was
completed in 1995 and school, teacher, and student questionnaires were completed at the
same time. The ﬁeld-trial data used in this study were collected in May, 1994.
The research questions (Robitaille & Garden, 1996) for TIMSS were:
1. How do countries vary in the intended learning goals for mathematics
and science; and what characteristics of educational systems, schools,
and students inﬂuence the development of these goals? (p. 38)
2. What opportunities are provided for students to learn mathematics and
science; how do instructional practices in mathematics vary among
nations; and what factors inﬂuence these variations? (p. 40)
3. What mathematics and science concepts, processes, and attitudes have
students learned; and what factors are linked to students’ opportunity
to learn? (p. 40)
4. How are the intended, the implemented, and the attained cuniculum
related with respect to the contexts of education, the arrangements for
teaching and learning, and the outcomes of the educational process? (p.
42)
Study Population
I used data from 17 countries to conduct my analyses, with the unit of analysis
being the country. The countries included in the study were those that participated in the
TIMSS mathematics ﬁeld trial for 13-year-old students and for which information was
available from the TIMSS document analyses. Seven of the original 25 countries
participating in the ﬁeld trial were dropped from planned analyses because they either did

not participate in or did not have complete data from the document analyses; one country

was dropped because it had incomplete data on the ﬁeld trial. The study countries

35

consisted of 2 Asian nations, 2 Eastern European nations, 10 Western European nations,
2 North American nations, and 1 South Paciﬁc nation.

National Research Coordinators (NRCs) in each country were asked to select a
“judgment sample” of students for participation in the ﬁeld trial (IEA, 1994c). NRCs
ﬁrst identiﬁed (at least) 12 schools having classes that ﬁt within the speciﬁed target
populations (i.e., the two adjacent grades that contained the largest proportion of 13-year-
old students) in each of their countries. Next, they selected one or more classes within
these schools for testing. The sample sizes in each country were to be at least 100
students for each of four test booklets administered; at least 60 of those students were to
have been in the upper target grade. The minimum sample size, then, was 400 students
for each country (240 at the upper grade and 160 at the lower grade). Each student was
given one test booklet, with all four booklets being used within each classroom. The IEA
instructed NRCs to use the “best evidence” available to select as wide a range of student
ability and educational and socioeconomic settings as possible. Some lack of geographic
representation was tolerated for ease of data collection and quick tum-around. Because
most of the curriculum information collected for TIMSS corresponds to grades (primarily
the upper-target grades) instead of ages, I used data from l3-year-old students in only the
upper grade of each country for this study.

Country-sample sizes are reported in Table 1. The IEA provided sample statistics
only for the combined-grade samples (i.e., upper and lower grades combined) of each
country, so I was unable to determine the sample sizes of the upper grades. Most
countries met the requirement of 100 students per test booklet. Of those which did not,

all but one had between 90 and 99 students per booklet. Country P had only 86 students

36

for booklet 8. Distributions of students-to-booklets were fairly uniform within a country.
All but four of the countries exceeded the minimum total sample size of 400 students,
several by over 100 students. The sample sizes of the remaining countries ranged

between 374 and 396 students.

Table 1
Country Sample Size for the Combined Upper and Lower Grades of Each Country

 

Test Booklet

 

 

 

Country 3 5 6 8 Total
A 107 106 115 108 436
B 126 123 129 119 497
C 111 107 102 98 418
D 134 135 143 135 547
E 104 97 94 92 387
F 133 142 143 140 558
G 96 101 99 100 396
H 108 105 106 108 427
I 95 99 96 90 380
J 103 104 107 105 419
K 119 113 107 114 453
L 122 136 133 133 524
M 122 116 116 115 469
N 126 l26 122 127 501
O 104 99 99 104 406
P 96 97 95 86 374
Q 178 180 183 183 724
Total 1987 1991 1995 1965 7916

 

Note. Booklets 1, 2, 4, and 7 contained items for the science-
assessment ﬁeld trial.

37

Instrumentation

Curriculum Frameworks for Mathematics

All data sets and test items were linked through codes from the mathematics
curriculum framework developed for the TIMSS study (Robitaille et al., 1993; see
Appendix A). The framework speciﬁes three types of codes relating to three “aspects” of
curriculum: content (i.e., topic area), performance expectations (i.e., math—related
behaviors), and perspectives (i.e., attitudes or values). I used only the content (here after
referred to as “topic”) and performance-expectation codes for this study.

At the most general level, the mathematics framework has 10 main-topic
categories (e.g., numbers, proportionality) and 5 main-performance-expectation
categories (e.g., knowing, communicating). All topic categories have one or two levels of
sub-categories (for a total of 44 individual sub-categories at the lowest level), and all
performance expectations have one level of sub-categories (see Figure l) The framework
covers most mathematics topics relevant to “K-12” education across nations, and it
reﬂects recent reforms and trends in mathematics education. It is meant to provide
researchers with a meaningful description of mathematics content to be used throughout

and beyond the duration of this study.

38

 

Sample Content Category with Sub-Categories
1.1 Numbers
1.1.1 Whole numbers
1.1.1.1 Meaning
1.1.1.2 Operations
1.1.1.3 Properties of operations
Sample Performance Expectation Category with Sub-Categories
2.1 Knowing
2.1.1 Representing
2.1.2 Recognizing equivalents

2.1.3 Recalling mathematical objects and properties

Figure 1. Example of content and performance—expectation curriculum framework codes
for mathematics.

 

F ield-T rial Instrument

The TIMSS mathematics-achievement-item ﬁeld-trial instrument for 13-
year-old students consisted of four booklets containing a series of multiple choice,
short-answer, and extended-response items. The test was developed by a multi-
national team of national research coordinators, subject-matter specialists, and
measurement specialists. Items from past IEA studies and other large-scale
achievement studies comprised the initial item pool. Additional items were
provided by countries or developed as needed. Test blueprints were not

completed prior to the initial stages of item development but were completed prior

39
to the ﬁeld trial (see Garden & Orpwood, 1996). The blueprints were based on

preliminary data from the document analyses. They also reﬂected the desire to
evaluate “in-depth” performance on a sub-set of topics (Garden & Orpwood;
1996)

The ﬁeld-trial instrument consisted in approximately twice as many test
items as were desired for the ﬁnal achievement test. Extended-response items
were particularly more predominant than indicated by the blueprints due to a
greater need for information on the properties of these items. Limitations in items
and testing time made it impossible to obtain enough items to report performance
on every topic and to include items measuring all topic by performance-
expectation intersections. Therefore, topics were limited to six reporting
categories (fractions and number sense; geometry; algebra; data representation,
analysis, and probability; measurement; proportionality).

All test items were coded with topic and performance-expectation codes
from the mathematics framework. Items could receive up to four codes each (two
topic and two performance expectation codes). I reviewed the codes of all items
prior to undertaking my analyses. 1 disagreed with the original item coding for
some items. As a result, I re-coded these items. I discussed the item codes on a
sample of approximately 25 items with a math-content specialist and a senior
researcher in the TIMSS study. Additionally, this senior researcher independently
re-coded items from the ﬁnal mathematics-achievement test for TIMSS. I
compared my re—coding on items that appeared in both the ﬁeld-trial instrument

and the ﬁnal TIMSS test with the researcher’s codes, and we discussed any

40

disagreements. We re-coded approximately 40% of the items. Re-coding entailed
changing either the topic, performance-expectation, or both codes; adding either
topic, performance-expectation, or both codes; or a combination of the two. The
topic codes of 6% of the items were changed, and additional topic codes were
added to 19% of the items.

Appendix B contains information on each of the four test booklets in the TIMSS
mathematics ﬁeld-trial instrument for 13-year-old students. The full test consisted of 241
unique items (197 multiple choice, 25 short answer, and 19 extended response) dispersed
throughout 4 test booklets, two of which had 63 items and two of which had 74. Thirty-
three “linked” items appeared in two different test booklets. Two of the linked items
were short answer; one was extended response; the remaining were multiple choice.

Fifteen of the 44 framework sub-categories were not represented on the test, and
three of the 10 main categories, 1.8 Elementary Analysis, 1.9 Validation and Structure,
and 1.10 Other Content were not represented at all on the test. Sixteen of the topics were
represented in all booklets; seven topics were represented in three booklets each; four
topics were represented in two booklets each; two topics were represented in only one
booklet each. Extended-response and short-answer items were evenly distributed across
the booklets. However, they were not evenly distributed across topics.

The IEA provided item statistics for all test items on the ﬁeld-trial instrument. It
also provided information on the percent of students at the lower, upper, and combined
grades who passed each item in each country. Data on linked items were summarized
across the two sub-groups responding to the items and were presented only for both

groups as a whole. Although extended response items were scored using a multi-level-

41

scoring rubric, students were not given partial credit in the international scoring. They
either passed or did not pass each item. Percentages of students receiving each rubric

point were provided.

Data Sources

I used three data sources from the TIMSS curriculum analysis to identify and
describe the curriculum of the 17 study countries. Each source contained data for each
country from analyses involving either expert topic mapping, curriculum-guide coding, or
textbook coding. The expert-topic-mapping data source described each country’s
intended coverage of each of the 44 topics on the mathematics framework. The
curriculum-guide-data source contained data on topic, performance-expectation, and
perspectives coverage in a selection of curriculum guides for each country. The textbook-
data source provided the same information for a selection of textbooks in each country.
The curriculum-data sources are described in detail below. For a more detailed

explanation, refer to Schmidt, et a1. (1996).

Expert Topic Mapping

A panel of subject-matter experts familiar with the mathematics curriculum in
each country identiﬁed the ages in which each topic on the mathematics framework was
intended to be introduced, was intended to be taught, and was intended to receive focus
(i.e., receive special emphasis or attention in the curriculum relative to other years). The
data sets for expert topic mapping consisted of matrices of Os, ls, and 23 for each age.
Zeros represented topics not intended in the curriculum of a country at a particular age; 15

represented topics that were intended in the curriculum of a country at a particular age,

42

but were not focused; 2s represented focus topics. The expert topic mapping contained
data on only the topics aspect of the mathematics framework and not on the performance
expectations or perspectives aspects of the framework. I used only the data for age 13 in

this study.

C urriculum-Guide Analyses

Curriculum guides were collected within each country at the national level if they
existed or at regional levels if necessary. The collection was to include those curriculum
guides pertaining to at least half of the students at the TIMSS testing grades. The
collections of curriculum guides were to have represented any major school types or
geographic regions. Subject-matter experts within each country participated in a
standardized training session on coding the document sample (i.e., curriculum guides and
textbooks) from their countries. Once trained, they coded the untranslated versions of all
documents from their countries. The curriculum guide coding entailed dividing the
documents into conceptual units representing the “smallest functional segment” (e.g.,
introduction, objectives, pedagogy; Schmidt et al., 1996, p.191) of each guide. Each unit
was coded with the appropriate topic, performance-expectation, and/or perspective codes
from the mathematics framework.

1 used only the data on the topics aspect of the mathematics framework for the
grade corresponding to the “upper grade” of age 13 (8‘h grade in the US). Because
curriculum guides did not constitute a random sample, it was difﬁcult to determine
exactly what proportion of each country’s school population each guide represented.

Therefore, the collection of guides within each country was taken as a whole to represent

43

students in the corresponding grade within a country. Additionally, curriculum guides
varied drastically in their unit structure and meaning. For example, some guides included
pages of detailed objectives; others contained only a simple list of objectives. Therefore,
it was difﬁcult to determine what it meant if a topic was more prevalent in one curriculum
guide versus another. As a result, the curriculum guide data consisted of ls and Os in
each cell of the countries by topics matrix. Ones indicated that a particular topic was
included in any of the curriculum guides collected for a country; Os indicated that it was
not included. Most of the 17 countries collected only one curriculum guide. One country

collected 5, one collected 6, and one collected 15.

Textbook Analyses

In each country, math and science textbooks corresponding to the same target
grades described for curriculum guides were collected. Each country was to collect
textbooks used by at least 50% of the students in the country within each target grade.
Many countries needed to collect a series of textbooks to meet this 50% criterion while
others needed only one. Additionally, some countries found it difﬁcult just to meet the
50% criterion, while others could collect the speciﬁc book(s) used by 100% of their
students. Seven countries in this study had one textbook in their textbook sample; seven
countries had two textbooks; one country had three textbooks; two countries had four
textbooks.

Coders divided the textbooks into units representing one to three days of
instruction which were further sub-divided into blocks. The content within each block

was coded with all topic, performance-expectation, and/or perspectives codes that

44

applied.l Again, country-level aggregates of data were developed for each country.
These data indicated the average proportion of blocks across all sampled textbooks that
were devoted to a particular topic, performance expectation, or perspective. I used the
data on only the topics aspect of the mathematics framework for most of my analyses. I
used data on the performance expectations and the content by performance-expectation

intersections for selected analyses.

Data Analyses
My data analyses consisted of four primary steps. These were

1. Describe and compare the content of the three curriculum sources, and compare this
content to the content of the TIMSS ﬁeld-trial instrument.

2. Develop 12 test blueprints using 3 methods of summarizing, for each of the 3
curriculum sources taken individually plus 1 overall aggregate (incorporating all three
data sources), then calculate the match between the content of the TIMSS ﬁeld-trial
instrument and each of the 12 test blueprints.

3. Identify those topics from the TIMSS ﬁeld-trial instrument included in each of the 12
test blueprints, and re-write the blueprints using only these topics creating 12 new sets
of “inclusive” test blueprints (i.e., the same test blueprint for all countries); write 4
sets of “unique” test blueprints for each country based on the four data sources (3

individual and 1 aggregate) using only the topics included in the ﬁeld-trial instrument

 

' To evaluate the reliability of the document coding, the following process was used (1) two units were
randomly sampled from textbooks selected from different countries, were translated, and were coded by an
expert coder; (2) an iterative process was used to match blocks and the coding sequence of the country
coders to the standard produced by the expert coder. Forty-ﬁve documents from 12 countries were used in
the reliability study. The estimated reliability was .80 (see, Schmidt et al., 1995).

45

(total: 17x4=68); calculate the match between each of the 12 sets of inclusive-test
blueprints and each country’s corresponding curricula as represented by the three data
sources individually plus the aggregate as well as the match between each country’s 4
unique-test blueprints and each country’s corresponding curricula.

4. Use country-level performance on the items that measured the topics included on each
of the test speciﬁcations developed in step 3 to compute 32 sets of scores for each
country (12 sets of “weighted” scores on inclusive tests, weighting each topic to
match curriculum emphasis; 12 sets of “unweighted” scores on inclusive tests; 4 sets
of “weighted” scores on unique tests; 4 sets of “unweighted” scores on unique tests).
Compare country level results on the TIMSS ﬁeld trial with results on the new sets of

tests (24 inclusive tests and 8 unique tests for each country).

I used the three curriculum sources as different representations of the mathematics
curriculum of each country. The expert topic mapping and curriculum guide analyses
provided two representations of the curriculum that each country intended to be taught by
teachers (i.e., attained by students). The textbook analyses provided a representation of
the curriculum that was potentially implemented (Schmidt et al., 1996) by teachers.
Teacher data on the implemented curriculum are not yet available internationally;
therefore, textbooks provide the best indication of what may have actually been taught in
the classroom. Additionally, the textbook data are much more detailed than the data in
the other two curricultun sources and may better represent how topics are treated in the
classroom. However, because teachers do not always teach all topics included in their

textbooks, I also combined data across the three cuniculum sources to obtain a second

46

estimate of the potentially implemented curriculum of each nation. I averaged across only
those topics contained in all three data sources within each country. Although the
presence of a topic in all three sources of curriculum data does not guarantee the topic
will be taught, the potential for a topic to be taught should increase over the potential for
those topics included in fewer of the data sources. The aggregate of the data sources,
then, should represent a lower bound of the topics taught in the classroom.

I re-scaled the numbers in the cells of the expert-topic-mapping and curriculum-
guide data sets so that they summed to one across all topics within each country by
summing over all elements (i.e., the 44 topics) in each country vector and dividing each
element by this sum. These numbers were estimates of the relative proportion of
emphasis for each topic within a country. Countries that included fewer topics in these
data sources received higher proportions of emphasis for each tOpic included than did
countries that included more topics. To construct the aggregate-data set I averaged over
proportions of emphasis on only those topics that were included in all three data sources

for a country.

Compare Curriculum Sources and Compare Match to F ield- Trial Instrument

I reviewed the content of each curriculum source and summarized it across
countries and across topics. I compared topic inclusion and coverage both across and
within countries.

I then evaluated test-curriculum match using several methods. For most analyses,
1 treated each set of topic proportions (i.e., the proportions of emphasis computed for the

expert-topic-mapping, curriculum-guide, and aggregate-data sources and the proportion

47

of textbook blocks in the textbook-data source) for each country as a different “proﬁle” of
the mathematics curriculum for the country. Likewise, topic weights (i.e., proportions of
items allocated to each topic) on the ﬁeld-trial instrument provided a “proﬁle” of test
emphasis. Thus, I sought to compare the similarity of the four curriculum proﬁles for
each country to or dissimilarity from the test proﬁle.

I looked at the match between the curriculum proﬁles and the test proﬁle
separately for each country. I conducted six different analyses to estimate test-curriculum
match. First, I calculated the proportion of items on the mathematics ﬁeld-trial
instrument that measured topics appearing in each of the four curriculum proﬁles.
Second, I calculated the proportion of each curriculum proﬁle that was tested on the ﬁeld-
trial instrument. Third, I calculated differences between measures of topic inclusion (i.e.,
presence) on the ﬁeld-trial instrument and topic inclusion in each of the four curriculum
proﬁles. Forum, I calculated differences between topic weights (i.e., the proportion of
items for each topic) on the ﬁeld-trial instrument and topic emphasis proportions in each

curriculum proﬁle. Finally, I computed correlations and Euclidean-distance measures,

 

44
J2 (WIT — ij )2 - where W,T is the weight of topic j on the ﬁeld-trial instrument and

j=I
W; is the weight of topic j in the curriculum of country i, between the topic weights on

the ﬁeld-trial instrument and topic-emphasis proportions in each of the four curriculum

proﬁles.

48
Write Test Blueprints and Calculate Match between Blueprints and F ield-T rial

Instrument

I wrote test blueprints for three “inclusive” tests (i.e., the same test for each
country, combining curriculum information across countries) for each curriculum-data
source and the aggregate-data source (for a total of 12 blueprints) using the following
methods:

1. a strict intersection (SI) method that included only the topics in all countries’

curriculum proﬁles within each of the four data sources,

2. a 70% intersection (71) method that included only the topics common to at
least 70% of the countries’ curriculum proﬁles within each of the four data
sources, and

3. a union (UN) method that included all topics in any of the countries’
curriculum proﬁles within each of the four data sources.

I averaged across each country’s proportion of emphasis for each topic included in
each blueprint to obtain weights for each topic on each of the 12 sets of test blueprints.
Each set of weights was re-scaled to sum to 1 across all topics. I then repeated the same
analyses described in step 1 comparing the “proﬁle” of topic weights for each of the 12

sets of test speciﬁcations with the ﬁeld-trial instrument’s “proﬁle” of topic weights.

Write Test Blueprints to Improve Match with F ield-T rial Instrument and Calculate Match
to Curricula
My intention was to use the test blueprints and ﬁeld-trial data to compute country

scores on each of the 12 new “tests” and compare these scores to country performance on

49

the total ﬁeld-trial instrument. However, the ﬁeld-trial instrument did not cover all topics
in the mathematics framework, so I re-wrote the 12 test blueprints developed in step 2
using only the topics for which items were included on the ﬁeld-trial instrument. I used
the same aggregate methods described earlier (i.e., strict intersection, 70% intersection,
union). In addition, I also wrote four sets of speciﬁcations for “unique” tests for each
country, using only the topics that appeared in both the country proﬁles for each
cuniculum-data source and the ﬁeld-trial instrument. The topic weights on the test
blueprints were scaled to sum to one across topics.

I then conducted the same comparisons outlined in step 1 to evaluate the match
between each set of the 12 inclusive-test speciﬁcations and each country’s corresponding
curriculum proﬁle as well as each country’s 4 unique-test speciﬁcations and the country’s
corresponding curriculum proﬁle (i.e., the test speciﬁcations using the expert mapping
data were compared to each country’s proﬁle of expert mapping data, etc.). The
comparisons between the unique-test speciﬁcations and the curriculum proﬁles provided
an estimate of “best possible match” between the curriculum proﬁles and any test

developed using the ﬁeld-trial topics.

Evaluate Country Performance across the New Tests

I calculated scores for each country using the topics on each set of speciﬁcations I
developed. I calculated both weighted and unweighted scores. To calculate unweighted
scores, I computed the average percent of students passing items with a particular topic
code and averaged across all topics included on each “test.” To calculate weighted

scores, I multiplied the average percent of students passing items within a topic by the

50

corresponding weight on a particular test speciﬁcation. 1 then summed these numbers
over topics. I ranked each country on each measure.

I compared all country scores and ranks with their scores and ranks on the ﬁeld
trial. First, I compared an average of scores and ranks across all new tests with the ﬁeld-
trial scores and ranks. I also looked at country variability across all scores and ranks.
Next, I computed differences between the ﬁeld-trial total scores and ranks and each new
score and rank.

Finally, I looked at between-countries variation in scores and ranks on each topic.
Then I calculated country-level scores using performance-expectation codes and
compared country results across these measures.

The results of all analyses follow.

CHAPTER IV
Results
Curriculum Comparisons

Description of the Mathematics Curricula

Expert topic mapping. Summary statistics for the expert topic mapping are
contained in Tables 2 (for topics) and 3 (for countries). Table 01 in Appendix C
presents the full set of data.

The columns in Table 2 represent (1) the average across countries of the
proportions of emphasis2 for each topic, (2) the standard deviations of the proportions of
emphasis, (3) the median across countries of the proportions of emphasis for each topic,
(4) the maximum proportions of emphasis across countries for each t0pic (minimum
proportions were 0 for all but topic 1.3.2 Basic 2D Geometry with a minimum of .024),
(5) the number of countries in which the topic was intended for coverage in the
curriculum (whether focused or unfocused), and (6) the number of countries in which the
intended topic was a focus topic.

Table 2 reveals that three topics (1.1.4.3 Complex Numbers, 1.8.1 Inﬁnite
Processes, 1.8.2 Change) were not intended for coverage in the curriculum of any

country; one topic (1.3.5 Vectors) was intended for coverage by only one country; and

 

2 “Emphasis” in the expert-topic-mapping data source was calculated by adding up all Os, ls, and 25 for
each topic for a country and dividing each number by the total; emphasis in the curriculum-guide-data
source was calculated by adding up all Os and ls for each topic for a country and dividing each number by
the total; emphasis in the textbook data source corresponds to the proportion of textbook blocks associated
with each topic for a country.

51

52

 

 

 

Table 2
Summary of Expert- Topic-Mapping Proportions for Each Math Topic across all I 7
Countries
Num. of Num. of

Ave. Median Max. Countries Countries
Topic Prop. of Prop. of Prop.of Including That
Code Topic Emphasis SD Emphasis Emphasis Topic Focus
1.1.1.1 Wh.Num-Meaning 0.012 0.013 0 0.034 8 1
1.1.1.2 Wh.Num.-Oper. 0.017 0.020 0.020 0.071 9 3
1.1.1.3 Prop. of Oper. 0.017 0.022 0 0.071 8 4
1.1.2.1 Common Fractions 0.032 0.025 0.027 0.105 14 6
1.1.2.2 Decimal Fractions 0.028 0.023 0.027 0.105 14 4
1.1.2.3 Relat. of Fractions 0.025 0.024 0.025 0.105 12 4
1.1.2.4 Percentages 0.025 0.019 0.027 0.054 12 6
1.1.2.5 Prop. of Frac. 0.025 0.026 0.024 0.105 12 3
1.1.3.1 Negative Numbers 0.032 0.019 0.028 0.071 14 8
1.1.3.2 Rational Numbers 0.030 0.025 0.025 0.105 14 5
1.1.3.3 Real Numbers 0.021 0.018 0.020 0.054 1 1 3
1.1.4.1 Binary Arithmetic 0.003 0.008 0 0.032 2 0
1.1.4.2 Exponents 0.036 0.018 0.039 0.069 15 9
1.1.4.3 Complex Numbers 0 0 O 0 0 0
1.1.4.4 Number Theory 0.024 0.016 0.027 0.056 13 4
1.1.4.5 Counting 0.004 0.011 0 0.041 2 1
1.1.5.1 Estim. Quant.& Size 0.014 0.013 0.020 0.036 9 1
1.1.5.2 Rounding 0.030 0.018 0.027 0.065 14 7
1.1.5.3 Estim. Comput. 0.023 0.018 0.025 0.054 12 5
1.1.5.4 Exponents&Mag. 0.027 0.021 0.027 0.069 12 7
1.2.1 Measurement Unit 0.029 0.015 0.027 0.054 15 5
1.2.2 Per.,Area,Volume 0.029 0.014 0.027 0.054 15 5
1.2.3 Estim. Errors 0.018 0.016 0.020 0.065 1 1 2
1.3.1 2D Geo:Coordinate 0.029 0.018 0.027 0.065 14 5
1.3.2 2D GeozBasics 0.039 0.014 0.034 0.071 17 8
1.3.3 2D Geo: Polygons 0.034 0.016 0.030 0.065 16 7
1.3.4 3D Geo 0.034 0.017 0.030 0.069 15 7
1.3.5 Vectors 0.002 0.007 0 0.028 1 0
1.4.1 Geo. Transform. 0.033 0.019 0.027 0.069 15 7
1.4.2 Cong. & Sim. 0.031 0.021 0.027 0.069 14 7
1.4.3 Constructions 0.024 0.018 0.024 0.061 13 3
1.5.1 Proport. Concepts 0.030 0.015 0.028 0.054 15 6
1.5.2 Proport. Prob. 0.041 0.019 0.041 0.071 16 1 1
1.5.3 Slope & Trig. 0.021 0.022 0.020 0.065 10 4
1.5.4 Lin. Interp. 0.011 0.017 0 0.061 6 2
1.6.1 Pat, Rel., Func. 0.032 0.018 0.027 0.065 15 6
1.6.2 Equat. & Formulas 0.041 0.017 0.041 0.069 16 1 1
1.7.1 Data Rep. & Anal. 0.039 0.018 0.039 0.069 16 9
1.7.2 Uncer. & Prob. 0.015 0.016 0.014 0.053 9 1
1.8.1 Inﬁnite Process. 0 0 0 0 0 0
1.8.2 Change 0 0 0 0 0 0
1.9.1 Val. & Just. 0.013 0.018 0 0.065 7 2
1.9.2 Struc. & Abs. 0.012 0.014 0 0.041 8 2
1.10.1 Other 0.017 0.015 0.020 0.048 10 2
AveraE 0.023 0.016 0.020 0.060 11 4

 

53

one topic (1.1.4.1 Binary Arithmetic) was intended for coverage by two countries. Only
topic 1.3.2 Basic 2D Geometry was intended for coverage by all countries. The average
number of countries intending a topic be included in the curriculum at age 13 was 11
(65%). The number of countries that intended that a topic be a focus topic at age 13
ranged from 0 to 11. Eleven of the 17 countries intended focus on topics 1.5.2
Proportionality Problems and 1.6.2 Equations and Formulas. Thirty-nine of the 44 tOpics
were intended as focus topics at age 13 by at least one country. The average number of
countries that intended focus on any given topic was 4 (24%).

Of those topics being intended for coverage in the curriculum of at least one of the
countries, the average proportion of emphasis ranged from .002 (1.3.5. Vectors) to .041
(1.5.2 Proportionality Problems, 1.6.2 Equations and Formulas). Lower average
proportions of emphasis mean that (1) few countries intended coverage of the topic, (2)
few or no countries intended focus on the topic, or (3) the topic was intended for
instruction in countries that intended a large number of topics (therefore, each topic
would receive a lower proportion in those countries). Topics intended and/or focused on
by a large number of countries and intended by countries with a narrow curriculum would
receive higher proportions of emphasis. To better interpret the proportions of emphasis,
on can treat them as the percent of mathematics class periods allocated to particular topics
over the course of a school year. Out of a school year with 180 mathematics periods, for
example, a proportion of .002 would represent less than one class period and .041 would
represent seven class periods. Most standard deviations of the proportions were between

.01 and .02 (two to four class periods). Medians were generally within three hundredths

 

54

of the means (ﬁve class periods). Maximum proportions ranged from around .03 (ﬁve
class periods) to .10 (1 8 class periods).

Table 3 summarizes the expert-topic-mapping data for each country. The second
column indicates the average proportion of topic emphasis for each country across all
topics with non-zero proportions (i.e., all topics of which the country intends coverage at
age 13), the next column indicates standard deviations of topic proportions of emphasis
across all topics (including those with Os), and the ﬁnal two columns indicate the number
of topics intended for coverage at age 13 as well as the number of intended topics that
were also intended for focus in the country at that age.

Table 3 shows variation in intended topic coverage across the 17 countries. The
column of average proportions was calculated across topics only with non-zero
proportions (i.e., only across topics that were intended in the curriculum of a country).
Averaging across all topics would have generated identical values for all countries
because the proportions of emphasis sum to one within all countries. Similarly, the
averages of non-zero proportions within a country were simply a factor of the number of
topics intended in the curriculum of that country. Countries that intended the same
number of topics in their curriculum had the same average proportion, regardless of the
ratio of focused to intended topics. The numbers are presented in the table as indications
of the magnitude of topic intention differences.

Average proportions of topic emphasis ranged from .026 to .071 (5 class periods
to 13 class periods). Standard deviations of proportions ranged for most countries from
.01 to .02. Country N had the smallest standard deviation (.009), and country G had the

largest (.036). The numbers of topics intended for coverage at age 13 ranged from 14 to

 

 

 

55
Table 3

Summary of Expert-Topic-Mapping Proportions
for Each Country across Topics

 

 

 

Number of
Topics Number
Ave.a Intended of Topics
Prop. of to be Intended to
Country Emphasis SD b Included be Focused
A 0.030 0.016 33 7
B 0.032 0.018 31 11
C 0.031 0.016 32 5
D 0.027 0.013 37 14
E 0.042 0.023 24 9
F 0.036 0.020 28 9
G 0.071 0.036 14 5
H 0.037 0.020 27 14
1 0.032 0.017 31 18
J 0.053 0.028 19 12
K 0.037 0.021 27 10
L 0.029 0.014 35 6
M 0.037 0.021 27 9
N 0.026 0.009 39 35
O 0.048 0.026 21 8
P 0.048 0.026 21 7
Q 0.029 0.015 35 14
Average 0.038 0.020 28 1 l

 

b
aAverage of non-zero numbers. SD of non-zero

numbers.

 

56

39 topics, and countries intended focus on 5 to 18 of the intended topics. The proportions
of intended topics within a country that were also intended for focus ranged from .16 to
.90 with an average of .40 and a standard deviation of .17.

Curriculum-guide analyses. Tables 4 and 5 present the curriculum-guide data,
and Table C-2 in Appendix C presents the full data set.

The data in Table 4 are by topic, as are the data in Table 2 for the expert topic
mapping. The only difference is the absence of a count of focused topics. All topics
were included in the curriculum guides of at least one country. Two topics (1.8.1 Inﬁnite
Processes and 1.1.4.3 Complex Numbers) were included in the guides of only two
countries. The average proportions of emphasis for these two topics were .004 and .003
respectively. Two topics (1.3.4 3D Geometry, 1.6.2 Equations and Formulas) were
included in the curriculum guides of all countries. The average proportions of emphasis
for these two topics were .04.

Most standard deviations of the proportions of emphasis were around .01 to .02
and most medians were within a few hundredths of the mean. Maximum proportions
ranged from .023 to .091, with a mean of .06. If these were thought of as proportions of a
180 period school year, this range would be about 4 to 16 class periods, with a mean of
11 class periods.

Table 5 presents the curriculum-guide data summarized for each country. It
reveals that the number of topics included in a country’s curriculum guide ranged from 11
to 44, with an average of 27 topics. Average proportions in this table were merely a
function of the number of topics included in a country’s curriculum guide(s): Countries

with the same number of topics had the same average proportion, and average proportions

5 7
Table 4

Summary of C urriculum-Guide- Topic Proportions for Each Topic across Countries

 

 

 

Ave. # of

Prop.of Median Max. Countries

Topic Topic Prop. of Prop.of Including
Code Topic Emphasis SD Emphasis Emphasis Topic
1.1.1.1 Wh.Num.-Meaning 0.023 0.025 0.026 0.091 10
1.1.1.2 Wh.Num.-Oper. 0.027 0.023 0.029 0.091 12
1.1.1.3 Prop. of Oper. 0.020 0.024 0.023 0.091 9
1.1.2.1 Common Fractions 0.017 0.016 0.023 0.045 9
1.1.2.2 Decimal Fractions 0.020 0.018 0.026 0.056 10
1.1.2.3 Relat. of Fractions 0.021 0.020 0.026 0.059 10
1.1.2.4 Percentages 0.022 0.017 0.028 0.048 1 1
1.1.2.5 Prop. of Frac. 0.008 0.013 0 0.034 5
1.1.3.1 Negative Numbers 0.033 0.016 0.033 0.059 15
1.1.3.2 Rational Numbers 0.028 0.018 0.032 0.059 13
1.1.3.3 Real Numbers 0.028 0.023 0.029 0.091 12
1.1.4.1 Binary Arithmetic 0.009 0.015 0 0.040 5
1.1.4.2 Exponents 0.023 0.019 0.028 0.059 1 1
1.1.4.3 Complex Numbers 0.003 0.008 0 0.026 2
1.1.4.4 Number Theory 0.026 0.019 0.029 0.059 12
1.1.4.5 Counting 0.007 0.012 0 0.034 4
1.1.5.1 Estim. Quant.& Size 0.011 0.016 0 0.042 6
1.1.5.2 Rounding 0.020 0.017 0.026 0.048 10
1.1.5.3 Estim. Comput. 0.015 0.016 0 0.042 8
1.1.5.4 Exponents&Mag. 0.016 0.017 0 0.042 8
1.2.1 Measurement Unit 0.027 0.017 0.032 0.056 13
1.2.2 Per.,Area,Volume 0.031 0.017 0.032 0.059 14
1.2.3 Estim. Errors 0.023 0.019 0.028 0.056 11
1.3.] 2D GeozCoordinate 0.024 0.017 0.029 0.048 12
1.3.2 2D Geo:Basics 0.030 0.017 0.032 0.059 14
1.3.3 2D Geo: Polygons 0.038 0.019 0.034 0.091 16
1.3.4 30 Geo 0.041 0.016 0.034 0.091 17
1.3.5 Vectors 0.018 0.017 0.023 0.048 9
1.4.l Geo. Transform. 0.035 0.020 0.033 0.091 15
1.4.2 Cong. & Sim. 0.032 0.018 0.033 0.059 14
1.4.3 Constructions 0.024 0.017 0.029 0.050 12
1.5.1 Proport. Concepts 0.030 0.017 0.032 0.056 14
1.5.2 Proport. Prob. 0.032 0.015 0.033 0.056 15
1.5.3 Slope & Trig. 0.021 0.027 0 0.091 8
1.5.4 Lin. Interp. 0.013 0.018 0 0.050 6
1.6.1 Pat, Re1., Func. 0.038 0.019 0.034 0.091 16
1.6.2 Equat. & Formulas 0.041 0.016 0.034 0.091 17
1.7.1 Data Rep. & Anal. 0.035 0.020 0.033 0.091 15
1.7.2 Uncer. & Prob. 0.025 0.018 0.029 0.056 12
1.8.1 Inﬁnite Process. 0.004 0.012 0 0.045 2
1.8.2 Change 0.005 0.011 0 0.034 3
1.9.1 Val. & Just. 0.008 0.016 0 0.059 4
1.9.2 Struc. & Abs. 0.015 0.020 0 0.059 7
1.10.1 Other 0.031 0.017 0.033 0.059 14
Averaﬂ 0.023 0.017 0.021 0.060 1 l

 

58
Table 5

Summary of C urriculum-Guide- Topic
Proportions for Each Country across Topics

 

 

 

Ave.a # of

Prop.of Topics

Topic Included

Country Emphasis SD b in Guide
A 0.034 0.016 29
B 0.042 0.021 24
C 0.026 0.008 39
D 0.029 0.012 35
E 0.045 0.023 22
P 0.028 0.01 l 36
G 0.056 0.027 18
H 0.091 0.039 1 1
I 0.032 0.015 31
J 0.040 0.020 25
K 0.050 0.025 20
L 0.033 0.016 30
M 0.032 0.015 31
N 0.059 0.029 17
O 0.048 0.024 21
P 0.034 0.016 29
Q 0.023 0.000 44
Average 0.041 0.019 27

 

b
“Average of non-zero numbers. SD of non

zero numbers.

59

were larger as fewer t0pics were included in a curriculum guide. Again, these proportions
indicate the magnitude of differences in inclusion of topics. The average of the average
proportions of emphasis was .04 (7 class periods). The range of proportions of emphasis
was .023 (about 4 class periods) to .091 (over 3 weeks of classes).

Textbook analyses. Tables 6 and 7 present summaries of the textbook-data
sources. Table C-3 presents the full data set. These analyses were conducted using only
the topic codes, even though textbooks were also coded with performance-expectation
codes. Other analyses will make use of performance-expectation codes.

Table 6 presents textbook summaries over topics. Only one topic (1.8.2 Change)
did not appear in the textbook-data source of any country. Three topics (1.3.2 Basic 2D
Geometry; 1.6.1 Patterns, Relations, and Functions; 1.6.2 Equations and Formulas)
appeared in the textbook data sources of all countries. Overall, the highest proportion of
textbook blocks was devoted to topic 1.6.2. This topic, on average, appeared in 21% of
textbook blocks. The next most emphasized topic, 1.3.3 Polygons and Circles, appeared
in an average of 10% of textbook blocks.

Standard deviations were larger than in the expert topic mapping and curriculum-
guide-data sources suggesting greater variation of topic coverage patterns. Two topics
(1.3.4 3D Geometry, 1.6.2 Equations and Formulas) had standard deviations of over .10
(10% of text blocks). For some topics, the medians were quite different from the means
indicating skewed distributions. Some topics had proportions at or near 0 in many
countries, but may also have had a few large proportions. Such distributions impact the

mean more than the median, making the median a better measure of central tendency.

Table 6

Summary of Textbook Proportions for Each Topic across Countries

60

 

 

 

Ave. Median Max. # of

Prop. of Prop. of Prop. of Countries

Topic Text Text Text Including
Code Topic Blocks SD Blocks Blocks Topic
1.1.1.1 Wh.Num.-Meaning 0.015 0.026 0.004 0.106 11
1.1.1.2 Wh.Num.-Oper. 0.040 0.049 0.010 0.184 15
1.1.1.3 Prop. of Oper. 0.021 0.023 0.009 0.069 15
1.1.2.1 Common Fractions 0.041 0.034 0.036 0.126 16
1.1.2.2 Decimal Fractions 0.024 0.024 0.014 0.065 15
1.1.2.3 Relat. of Fractions 0.013 0.010 0.010 0.031 15
1.1.2.4 Percentages 0.035 0.034 0.035 0.129 14
1.1.2.5 Prop. of Frac. 0.006 0.010 0.001 0.042 1 1
1.1.3.1 Negative Numbers 0.041 0.036 0.040 0.1 10 15
1.1.3.2 Rational Numbers 0.028 0.071 0.010 0.306 12
1.1.3.3 Real Numbers 0.026 0.064 0.002 0.278 1 1
1.1.4.1 Binary Arithmetic 0.001 0.003 0 0.012 4
1.1.4.2 Exponents 0.041 0.038 0.034 0.117 14
1.1.4.3 Complex Numbers 0.000 0.001 0 0.002 3
1.1.4.4 Number Theory 0.016 0.022 0.007 0.072 1 1
1.1.4.5 Counting 0.002 0.006 0 0.025 7
1.1.5.1 Estim. Quant.& Size 0.002 0.003 0.001 0.011 9
1.1.5.2 Rounding 0.007 0.008 0.007 0.028 10
1.1.5.3 Estim. Comput. 0.008 0.009 0.006 0.032 12
1.1.5.4 Exponents&Mag. 0.007 0.015 0.000 0.062 7
1.2.1 Measurement Unit 0.040 0.042 0.031 0.167 15
1.2.2 Per.,Area,Volume 0.071 0.057 0.075 0.164 13
1.2.3 Estim. Errors 0.002 0.003 0.000 0.009 7
1.3.1 2D GeozCoordinate 0.034 0.032 0.032 0.112 14
1.3.2 2D Geo:Basics 0.055 0.042 0.043 0.142 17
1.3.3 2D Geo: Polygons 0.098 0.054 0.093 0.202 16
1.3.4 3D Geo 0.068 0.121 0.019 0.469 13
1.3.5 Vectors 0.005 0.013 0 0.053 7
1.4.1 Geo. Transform. 0.056 0.064 0.052 0.243 13
1.4.2 Cong. & Sim. 0.040 0.060 0.012 0.231 11
1.4.3 Constructions 0.008 0.012 0.002 0.035 9
1.5.1 Proport. Concepts 0.008 0.010 0.004 0.028 10
1.5.2 Proport. Prob. 0.020 0.023 0.017 0.095 12
1.5.3 Slope & Trig. 0.014 0.025 0.000 0.083 6
1.5.4 Lin. Interp. 0.002 0.004 0 0.014 4
1.6.1 Pat., Re1., Func. 0.060 0.054 0.049 0.208 17
1.6.2 Equat. & Formulas 0.205 0.118 0.174 0.388 17
1.7.1 Data Rep. & Anal. 0.048 0.032 0.057 0.099 14
1.7.2 Uncer. & Prob. 0.003 0.008 0.000 0.034 6
1.8.1 Inﬁnite Process. 0.001 0.001 0 0.004 4
1.8.2 Change 0 0 0 0 0
1.9.1 Val. & Just. 0.022 0.072 0.002 0.309 10
1.9.2 Struc. & Abs. 0.021 0.034 0.007 0.117 9
1.10.1 Other 0.036 0.062 0.006 0.223 11
Average 0.029 0.032 0.020 0.1 19 1 1

 

61

The maximum proportions of textbook blocks for many of the topics were around
.10 or more. Some of the topics with larger maximum proportions were 1.3.4 3D
Geometry (.47), 1.6.2 Equations and Formulas (.39), 1.9.1 Validation and Justification
(.31), and 1.1.3.2 Rational Numbers (.31). Topics with some of the lowest maximum
proportions were 1.1.4.1 Binary Arithmetic, 1.1.4.3 Complex Numbers, 1.2.3
Measurement Estimation and Error, 1.5.4 Linear Interpolation, and 1.8.1 Inﬁnite
Processes (all around .01 or less).

Table 7 contains the summary of textbook data for each country. It shows (1) the
average of the proportions of textbook blocks devoted to a topic across all 44 topics for
each country, (2) the standard deviation of proportions across all topics, (3) the average
proportion of textbook blocks across only topics included in the textbook(s) of each
country, (4) the maximum proportion of textbook blocks devoted to each topic, (5) the
number of topics included in each country’s textbook(s), and (6) the number of topics
within a country’s textbook(s) that appeared in at least 10% of the textbook blocks.

The numbers of topics included in country textbooks varied. The average number
of topics included in a textbook was 28. One country included only 11 topics while
another included 40 topics. Less variation existed in the average proportion of blocks
devoted to any topic (average of .03) most likely due to the fact that most proportions
summed to around 1 (proportions could sum to more than one due to the potential for the
presence of multiple-topic codes within each block). One country (N) did, however, have
an average proportion of .06 with a standard deviation of .10. The average of the

standard deviations was .05 (5% of text blocks).

62
Table 7

Summary of Textbook Proportions for Each Country across Topics

 

 

 

# of # of

Ave. Ave. Max. Topics Topics

Prop. of Prop. for Prop.of Included with

Text Included Text by Prop.

Country Blocks SD Topics Blocks Country >. 1
A 0.024 0.035 0.035 0.145 30 2
B 0.023 0.057 0.039 0.282 26 3
C 0.033 0.037 0.038 0.163 39 3
D 0.028 0.027 0.034 0.087 36 0
E 0.025 0.053 0.067 0.243 16 3
F 0.022 0.037 0.047 0.148 21 3
G 0.032 0.075 0.056 0.374 25 4
H 0.026 0.039 0.039 0.141 30 2
I 0.033 0.037 0.044 0.123 33 4
J 0.023 0.070 0.091 0.388 11 4
K 0.028 0.063 0.045 0.356 28 3
L 0.025 0.036 0.034 0.174 32 2
M 0.035 0.063 0.045 0.323 35 5
N 0.061 0.107 0.084 0.469 32 8
O 0.029 0.058 0.059 0.296 22 5
P 0.022 0.041 0.038 0.184 26 2
Q 0.029 0.039 0.032 0.236 40 1
Average 0.029 0.05 l 0.048 0.243 28 3

 

63

When averaging only proportions of topics included in a country’s textbook(s), an
increase of approximately 2% of text blocks was seen (average .048). These proportions
ranged from .032 to .091. Also, the range of maximum proportions and the number of
topics with proportions over .10 showed that some textbooks devoted a lot of space to a
few t0pics while others spread their space over many topics. The maximum amount of
textbook space devoted to a single topic ranged from 9% of a textbook to almost half of a
textbook. The data indicated that in one country no topic received over 10% of the space,
while in another county eight topics received over 10% of the space. In most countries,
however, between two and four topics received over 10% of the textbook space.

Aggregate-data source. The results of the three curriculum-data sources were
combined to obtain a composite picture of mathematics curriculum in each country.
Table 8 presents data on the agreement of topic inclusion across the three data sources.
Table C-4 includes the full set of data. Table 8 shows the average number of each
countries’ three data sources in which each topic was included, the number of countries in
which the topic appeared in all three of the data sources, and the number of countries in
which the topic appeared in none of the data sources. Additionally, I calculated the
proportion of countries that had agreement of topic inclusion across the three data sources
(i.e., the proportion of countries in which the topic either appeared in all three data
sources or none of the data sources). Table 8 also presents summaries of proportions of
emphasis. Within each country, proportions of topic emphasis were averaged for only
those topics appearing in all three data sources for that country. Other topics were given
Os. These averages were scaled to sum to 1.00 across the included topics within a

country. Table 8 presents the average of these proportions for each topic.

64
Table 8

Agreement of Topic Inclusion across Expert-Mapping, C urriculum-Guide—, and Textbook-Data Sources
Presented for Topics across Countries

 

 

 

it of # of Ave.“ Median‘ Max.‘
Topic AVC- # 0f Cntrys: 3 Cntrys: 0 Prop.of Prop. of Prop. of
Code Topic Sources' Sourcesb Sourcesc Agreementd Emphasis SD Emphasis Emphasis
1.1.1.1 Wh.Num-Meaning 1.7 3 1 0.24 0.006 0.014 0 0.053
1.1.1.2 Wh.Num.-Oper. 2.1 8 1 0.53 0.024 0.032 0 0.121
1.1.1.3 Prop. ofOper. 1.9 6 2 0.47 0.016 0.023 0 0.070
1.1.2.1 Common Fractions 2.3 8 1 0.53 0.025 0.027 0 0.072
1.1.2.2 Decimal Fractions 2.3 9 1 0.59 0.025 0.031 0.023 0.122
1.1.2.3 Relat. ofFractions 2.2 9 1 0.59 0.017 0.017 0.018 0.048
1.1.2.4 Percentages 2.2 8 2 0.59 0.024 0.030 0 0.107
1.1.2.5 Prop. ofFrac. 1.6 4 3 0.41 0.007 0.014 0 0.046
1.1.3.1 Negative Numbers 2.6 12 0 0.71 0.031 0.027 0.028 0.083
1.1.3.2 Rational Numbers 2.3 8 0 0.47 0.032 0.076 0 0.327
1.1.3.3 Real Numbers 2.0 6 0 0.35 0.017 0.028 0 0.093
1.1.4.1 Binary Arithmetic 0.6 0 9 0.53 0 0 0 0
1.1.4.2 Exponents 2.4 10 2 0.71 0.026 0.026 0.030 0.089
1.1.4.3 Complex Numbers 0.3 0 14 0.82 0 0 0 0
1.1.4.4 Number Theory 2.1 8 2 0.59 0.017 0.019 0 0.050
1.1.4.5 Counting 0.8 1 9 0.59 0.002 0.007 0 0.031
1.1.5.1 Estim. Quant.& Size 1.4 4 4 0.47 0.006 0.010 0 0.031
1.1.5.2 Rounding 2.0 7 2 0.53 0.011 0.014 0 0.037
1.1.5.3 Estim. Comput. 1.9 7 4 0.65 0.011 0.014 0 0.040
1.1.5.4 Exponents&Mag. 1.6 3 l 0.24 0.005 0.011 0 0.032
1.2.1 Measurement Unit 2.5 1 l 0 0.65 0.032 0.029 0.029 0.099
12.2 Per.,Area,Volume 2.5 12 0 0.71 0.045 0.041 0.043 0.146
1.2.3 Estim. Errors 1.7 4 2 0.35 0.005 0.010 0 0.028
1.3.1 2D Geo:Coordinate 2.4 8 0 0.47 0.022 0.027 0 0.087
1.3.2 2D Geo:Basics 2.8 14 0 0.82 0.043 0.028 0.041 0114
1.3.3 20 Geo: Polygons 2.8 15 0 0.88 0.065 0.037 0.063 0.131
1.3.4 30 Geo 2.6 11 0 0.65 0.047 0.053 0.035 0.161
1.3.5 Vectors 1.0 1 6 0.41 0.002 0.009 0 0.037
1.4.1 Geo. Transform. 2.5 1 1 O 0.65 0.049 0.061 0.036 0.242
1.4.2 Cong. & Sim. 2.3 9 0 0.53 0.037 0.053 0.020 0.183
1.4.3 Constructions 2.0 5 0 0.29 0.008 0.013 0.000 0.034
1.5.1 Proport. Concepts 2.3 9 0 0.53 0.017 0.017 0.021 0.051
1.5.2 Proport. Prob. 2.5 10 0 0.59 0.027 0.029 0.028 0095
1.5.3 Slope & Trig. 1.4 3 4 0.41 0.010 0.028 0 0.116
1.5.4 Lin. Interp. 0.9 0 7 0.41 0 0 0 0
1.6.1 Pat, Rel, Func. 2.8 15 0 0.88 0.059 0.039 0.053 0.135
1.6.2 Equat. & Formulas 2.9 16 0 0.94 0.133 0.087 0.106 0.338
1.7.1 Data Rep. & Anal. 2.6 13 0 0.76 0.051 0.038 0.051 0.133
1.7.2 Uncer. & Prob. 1.6 4 4 0.47 0.006 0.012 0 0.035
1.8.1 Inﬁnite Process. 0.4 0 12 0.71 0 0 0 0
1.8.2 Change 0.2 0 14 0.82 0 0 0 0
1.9.1 Val. & Just. 1.2 3 4 0.41 0.008 0.024 0 0.101
1.9.2 Struc. & Abs. 1.4 5 5 0.59 0.011 0.019 0 0.063
1.10.1 Other 2.1 8 2 0.59 0.022 0.032 0 0.129
Average 1.9 7 2.7 0.57 0.02 0.03 0.01 0.09

 

”The average number of data sources (out of 3) in a country in which the topic appears. Ir’The number of countries in which
the topic appears in all 3 data sources. ‘The number of countries in which the topic appears in no data sources. dThe

proportion of countries in which the topic appears in all 3 or none of the data sources. eWithin each country, the average.
median, or maximum proportions for topics included in all 3 data sources.

65

Seven topics (1.1.3.1 Negative Numbers and Integers; 1.2.2 Perimeter, Area,
Volume; 1.3.2 Basic 2D Geometry; 1.3.3 Polygons and Circles; 1.6.1 Patterns, Relations,
Functions; 1.6.2 Equations and Formulas; 1.7.1 Data Representation and Analysis)
appeared in all three curricular data sources of at least 70% of the countries. Topics
1.1.4.1 Binary Arithmetic, 1.1.4.3 Complex Numbers, 1.5.4 Linear Interpolation, 1.8.1
Inﬁnite Processes, and 1.8.2 Change did not appear in all three data sources for any
country. Topics 1.1.4.3, 1.8.1, and 1.8.2 also appeared in none of the data sources for at
least 70% of the countries. The average proportion of agreement across the data sources
(i.e., topics either appeared in all three or none of the data sources within a country) was
almost 60%. Topic 1.6.2 Equations and Formulas had agreement across all three data
sources in 94% of the countries while tOpics 1.1.1.1 Whole Number Meanings and 1.1.5.4
Exponents had agreement across the data sources in less than 25% of the countries.

The aggregate of the proportions of emphasis in each data source (i.e., an average
of the proportions of emphasis across all three data sources for topics that appeared only
in all three sources in a country) ranged from .002 (1.1.4.5 Systematic Counting and 1.3.5
Vectors) to .133 (1.6.2 Equations and Formulas; this was .07 more emphasis than the
next closest topic had). Most standard deviations of these proportions were around .03;
although, a few were larger (1.1.3.1 Negative Numbers and Integers, .076; 1.4.1
Transformations, .061; 1.6.2 Equations and Formulas, .087). Medians of the aggregate
proportions of emphasis for many topics differed from the means due to the high
proportions of zeros in the data source (i.e., any topic not appearing in all three data
sources for a country received 0 as the proportion of emphasis in the aggregate-data

source). Maximums of the aggregate proportions for each topic averaged .09. Several of

66

the maximum proportions were quite small (e.g., 1.1.4.5 Systematic Counting, 1.1.5.1
Estimating Quantity and Size, 1.2.3 Measurement Estimation and Errors). The largest
were around .30 (1.1.3.2 Real Numbers, 1.6.2 Equations and Formulas), indicating topics
that received approximately 1/3 of the emphasis within a country.

Table 9 shows agreement and proportion summaries for each country. It indicates
the number of topics within a country that either appeared in all or none of the data
sources and the proportion of the 44 topics this represents. None of the countries had
100% agreement across data sources. The lowest amount of agreement was 34% (country
G), and the highest was 80% (country D), with an average of 57%. Within a country, the
numbers of topics appearing in all three data sources ranged from 5 to 32 with an average
of 18. Five countries had at least 10 topics appearing in none of the data sources. For
country Q, all topics appeared in at least one of the data sources.

Average emphasis across topics ranged from .03 (countries C, D, Q) to .20
(country G). Standard deviations of the proportions averaged .04 but ranged from .02 to
.73 (country G). Country G had the highest average emphasis but the largest standard
deviation because so few topics appeared in all three data sources. Most of the maximum
proportions were at least .10. Differences between means and medians reﬂected the

number of Os in the data set.

Analyses of Match between the F ield-Trial Instrument and the Curricula
I evaluated the match between the ﬁeld-trial instrument and the curriculum-data
sources in several ways. First, I compared the number of countries including each ﬁeld-

trial topic in each data source, the number of countries including topics not on the ﬁeld-

67
Table 9

Agreement of Topic Inclusion across Expert-Mapping-, Curriculum-Guide-, and T extbook-
Data Sources Presented for Countries across Topics

 

 

 

Ave. # Prop.

Sources #Topics #Topics Topics in Ave.a Mediana Max.a

Topics in All 3 in 0 3 or 0 Prop. of Prop. of Prop. of
Country Appear Sources Sources Sources Emphasis SD Emphasis Emphasis
A 2.1 19 5 0.55 0.053 0.028 0.000 0.094
B 1.8 19 9 0.64 0.053 0.035 0.000 0.154
C 2.5 30 3 0.75 0.033 0.019 0.024 0.070
D 2.5 32 3 0.80 0.031 0.016 0.025 0.049
E 1.4 7 10 0.39 0.143 0.057 0.000 0.242
F 1.9 17 6 0.52 0.059 0.032 0.000 0.107
G 1.3 5 10 0.34 0.200 0.073 0.000 0.338
H 1.5 8 9 0.39 0.125 0.049 0.000 0.161
I 2.2 25 6 0.70 0.040 0.022 0.027 0.065
J 1.3 9 15 0.55 0.111 0.055 0.000 0.269
K 1.7 1 1 8 0.43 0.091 0.049 0.000 0.251
L 2.2 25 5 0.68 0.040 0.023 0.024 0.088
M 2.1 23 4 0.61 0.043 0.030 0.021 0.136
N 2.0 17 4 0.48 0.059 0.036 0.000 0.142
O 1.5 11 11 0.50 0.091 0.044 0.000 0.186
P 1.7 18 11 0.66 0.056 0.031 0.000 0.121
Q 2.7 32 0 0.73 0.031 0.020 0.022 0.106
Average 1.9 18.1 7.0 0.57 0.07 0.04 0.01 0.15

 

aWithin each country, the average, median, or maximum proportions for topics included in all 3
data sources. Average shows the average of non-zero numbers.

68

trial instrument in each data source, and the average proportions of emphasis for each
topic within each curriculum source (including the aggregate of the curriculum sources)
with the proportion of items for each topic (i.e., topic weight) in the ﬁeld-trial instrument.
Second, I calculated the proportion of items on the ﬁeld-trial instrument that measured
topics included in each of the curriculum-data sources for each country and the proportion
of each country’s curricula (according to the four data sources) that was tested by the
ﬁeld-trial instrument. Third, I calculated differences between topic inclusion on the ﬁeld-
trial instrument and topic inclusion in each of the four data sources for each country, and I
did the same for a comparison of topic weight (i.e., proportion of items) on the ﬁeld-trial
instrument and proportion of emphasis for each topic in each of the curriculum-data
sources. Fourth, I computed correlations and Euclidean-distance measures between the
ﬁeld-trial instrument topic “proﬁles” (i.e., patterns of topic weights) and the four
cuniculum “proﬁles” (i.e., patterns of proportions of topic emphasis) for each country.
The results of each of these analyses are presented below.

Summary comparison. Table 10 provides a summary of (l) the numbers of
countries that included each topic within each data source, (2) the average proportions of
emphasis for topics across all countries for all data sources, and (3) the numbers and
proportions of items for each topic on the ﬁeld-trial instrument. The proportions of items
on the ﬁeld-trial instrument sum to more than one because many items had more than one
content code.

The higher frequencies of items on the ﬁeld-trial instrument were for topics
1.1 .2.1 Common Fractions, 1.5.2 Proportionality Problems, 1.6.2 Equations and

Formulas, and 1.7.1 Data Representation and Analysis. Most of the topics with higher

69
Table 10

Document and F ield- Trial Proportion Comparisons

 

 

 

 

Number of Countries Ave. Prop. of Emphasis Field Trial
Topic Expert Curr. Aggre- Expert Curr. Aggre- Prop.
Code Topic Map. Guide Text gate Map. Guide Text gate # Items Items
1.1.1.1 Wh.Num.-Meaning 8 10 11 3 0.012 0.023 0.015 0.006 4 0.017
1.1.1.2 Wh.Num.-Oper. 9 12 15 8 0.017 0.027 0.040 0.024 14 0.058
1.1.1.3 Prop. of Oper. 8 9 15 6 0.017 0.020 0.021 0.016 2 0.008
1.1.2.1 Common Fractions 14 9 16 8 0.032 0.017 0.041 0.025 34 0.141
1.1.2.2 Decimal Fractions 14 10 15 9 0.028 0.020 0.024 0.025 17 0.071
1.1.2.3 Relat. of Fractions 12 10 15 9 0.025 0.021 0.013 0.017 11 0.046
1.1.2.4 Percentages 12 11 14 8 0.025 0.022 0.035 0.024 7 0.029
1.1.2.5 Prop. of Frac. 12 5 1 1 4 0.025 0.008 0.006 0.007 0 0
1.1.3.1 NegativeNumbers 14 15 15 12 0.032 0.033 0.041 0.031 3 0.012
1.1.3.2 Rational Numbers 14 13 12 8 0.030 0.028 0.028 0.032 0 0
1.1.3.3 Real Numbers 1 1 12 1 1 6 0.021 0.028 0.026 0.017 0 0
1.1.4.1 Binary Arithmetic 2 5 4 0 0.003 0.009 0.001 0 0 0
1.1.4.2 Exponents 15 11 14 10 0.036 0.023 0.041 0.026 3 0.012
1.1.4.3 Complex Numbers 0 2 3 0 0 0.003 0 0 0 0
1.1.4.4 NumberTheory 13 12 11 8 0.024 0.026 0.016 0.017 1 0.004
1.1.4.5 Counting 2 4 7 1 0.004 0.007 0.002 0.002 0 0
1.1.5.1 Estim.Quant.& Size 9 6 9 4 0.014 0.011 0.002 0.006 9 0.037
1.1.5.2 Rounding 14 10 10 7 0.030 0.020 0.007 0.011 8 0.033
1.1.5.3 Estim.Comput. 12 8 12 7 0.023 0.015 0.008 0.011 7 0.029
1.1.5.4 Exponents&Mag. 12 8 7 3 0.027 0.016 0.007 0.005 1 0.004
1.2.1 Measurement Unit 15 13 15 11 0.029 0.027 0.040 0.032 18 0.075
1.2.2 Per.,Area,Volume 15 14 13 12 0.029 0.031 0.071 0.045 16 0.066
1.2.3 Estim. Errors 11 ll 7 4 0.018 0.023 0.002 0.005 3 0.012
1.3.1 2D Geo:Coordinate 14 12 14 8 0.029 0.024 0.034 0.022 6 0.025
1.3.2 2D Geo:Basics 17 14 17 14 0.039 0.030 0.055 0.043 7 0.029
1.3.3 2D Geo: Polygons 16 16 16 15 0.034 0.038 0.098 0.065 8 0.033
1.3.4 3D Geo 15 17 13 11 0.034 0.041 0.068 0.047 4 0.017
1.3.5 Vectors 1 9 7 1 0.002 0.018 0.005 0.002 0 0
1.4.1 Geo. Transform. 15 15 13 11 0.033 0.035 0.056 0.049 10 0.041
1.4.2 Cong.&Sim. 14 14 11 9 0.031 0.032 0.040 0.037 14 0.058
1.4.3 Constructions 13 12 9 5 0.024 0.024 0.008 0.008 0 0
1.5.1 Proport. Concepts 15 14 10 9 0.030 0.030 0.008 0.017 8 0.033
1.5.2 Proport. Prob. 16 l5 12 10 0.041 0.032 0.020 0.027 23 0.095
1.5.3 Slope & Trig. 10 8 6 3 0.021 0.021 0.014 0.010 0 0
1.5.4 Lin. Interp. 6 6 4 0 0.011 0.013 0.002 0 0 0
1.6.1 Pat.,Rel.,Func. 15 16 17 15 0.032 0.038 0.060 0.059 12 0.050
1.6.2 Equat.&Formu|as l6 17 17 16 0.041 0.041 0.205 0.133 33 0.137
1.7.1 DataRep.&Anal. 16 15 14 13 0.039 0.035 0.048 0.051 27 0.112
1.7.2 Uncer.&Prob. 9 12 6 4 0.015 0.025 0.003 0.006 11 0.046
1.8.1 Inﬁnite Process. 0 2 4 0 0 0.004 0.001 0 0 0
1.8.2 Change 0 3 0 0 0 0.005 0 0 0 0
1.9.1 Val. & Just. 7 4 10 3 0.013 0.008 0.022 0.008 0 0
1.9.2 Struc. & Abs. 8 7 9 5 0.012 0.015 0.021 0.011 0 0
1.10.1 Other 10 14 11 8 0.017 0.031 0.036 0.022 0 0
Average 11 11 11 7 0.023 0.023 0.029 0.023 7 0.030

 

70

numbers of items also had high rates of inclusion and emphasis in all data sources. The
exceptions were 1.1.5.1 Estimating Quantity and Size and 1.7.2 Uncertainty and
Probability. Nine items measured 1.1.5.1 and 11 measured 1.7.2. However, both topics
were included in each of the curriculum sources of less than 70% of the countries and
topic 1.1.5.1 had an average of only .002 blocks across all country textbooks while topic
1.7.2 had an average of .003. On the other hand, topics 1.1.3.1 Negative Numbers,
1.1.4.2 Exponents, and 1.1.4.4 Number Theory were measured by three or fewer items.
However, topic 1.1.3.1 was included in each of the curriculum-data sources of at least
70% of the countries and topics 1.1.4.2 and 1.1.4.4 were included in two of the
curriculum sources in at least 70% of the countries. Additionally, average proportions of
emphasis for these topics were similar to those proportions for many other topics. No
items measured topic 1.1.3.2 Rational Numbers which appeared in the three main data
sources of at least 70% of the countries or topics 1.1.3.3 Real Numbers and 1.4.3
Geometric Constructions which also appeared in the data sources of many countries.
However, the proportions of textbook blocks and proportions of emphasis in the
aggregate of the data sources were low for topic 1.4.3.

Proportions of items/curricula covered. The top half of Table 11 shows the
proportions of items on the ﬁeld-trial instrument that measured topics included in each of
the curriculum-data sources for each country (hereafter referred to as covered items).
Across data sources and countries, these proportions ranged from .18 to 1.00 (l8-100%)
with an average of .75. Only one of the averages of the proportions of covered items for
each country-data source was below .75. This exception was for the average of the

proportions of items measuring topics included in the aggregate of each of the country’s

71

 

 

 

 

 

66.. 666 6.6 66.6 666 66.6 66.. F6 .66 66.6 .66 66.6 66.6 ~66 66.6 66.6 66.. 66.6 66.6 66.6 ..66 v8.2
.66 $6 ..66 66.6 66.6 666 $66 $6 3.6 66.6 2.6 ~66 :6 .66 $6 666 66.6 666 666 66.6 m66 ES.
«.66 N66 .66 V6.6 666 66.6 66.6 m66 666 $66 66.6 6.6 V66 .66 66.6 ...6 ...6 66.6 ..66 ~66 w66 Gm
666 66.6 66.6 .66 66.6 66.6 666 666 66.6 66.6 66.6 666 V66 V6.6 666 66.6 66.6 666 66.6 66.6 .66 owﬂo><
66.. $6 6.6 66.6 666 m66 66.. $6 66.6 «66 66.6 666 66.6 .66 $6 666 66.. 66.6 66.6 66.6 66.6 camouwﬂ
66.6 66.6 66.6 66.6 666 66.6 .66 6B6 .66 666 «.66 66.6 #66 N66 666 .66 .66 66.6 66.6 2.6 666 6.02.36...
.66 $6 66.6 666 $66 666 66.6 $6 :6 66.6 :6 N66 26 N66 66.6 6$6 .66 666 60.6 6.26 666 6220.56
m66 666 666 N66 666 666 666 E26 666 66.6 66.6 3.6 66.6 66.6 2.6 n66 666 .66 666 66.6 6.6 65663). 566....
683... 823.650 60 5:835
66.. $6 «6.6 36 66.6 66.6 66.6 66.. n66 66.. N66 :6 .66 666 66.6 666 .66 66.. n66 n66 666 Va.2
66.6 6.6 v.6 36 66.6 $6 666 3.6 666 N66 66.6 .66 ...6 6N6 6.6 666 .66 66.6 N66 36 666 E3.
.66 ...6 66.6 N.6 66.6 6.6 666 66.6 6.6 666 NN6 66.6 66.6 666 .66 36 ...6 .66 n66 0.6 6.6 06
66.6 666 6.6 n66 666 $6 $6 3.6 :6 N66 66.6 6.6 66.6 666 .m6 666 666 66.6 66.6 66.6 .66 668914.
66.6 6.6 666 6.6.6 666 $6 66.6 R6 606 N66 666 66.6 .66 6N6 6.6 .66 .v6 66.6 66.6 3.6 666 262%...
66.. $6 36 >66 .66 66.6 66.6 66.. 666 66.. N66 56 .66 66.6 666 .66 .66 66.. N66 n66 66.6 6.009on
66.6 66.6 6.6 :6 66.6 m66 ...6 N66 n66 66.6 66.6 666 m66 66.6 $6 V6.6 3.6 66.6 m66 .66 666 6650620
66.. 3.6 6.6 6B6 66.6 66.6 .n6 66.. ...6 66.6 66.6 3.6 66.6 66.6 .66 .66 2.6 66.. N66 N66 ..66 656662.63”.
E=_=o.t:U E 62630 was: 638% ..o :oEomok.
X5). 7:2 mm m>< O 6 0 Z .2 A v. a . I O n. m. n. U m .<. 8:56 San.

 

DEED

 

E25865 BIN 63$ 5 6683.6 8:62.30 6.5.589 666.6% EDEQQEK use Szotaab 6.66589 56% 2.. 6.8 83 3252.63 32.6.3636 S 6.86:6 Easement

 

. . 036...

 

72

data sources (.55). Within-country averages of the proportions of covered items were
more variable and ranged from .48 to 1.00. For most countries, the highest proportions of
covered items were those measuring topics included in the textbooks. Additionally, for
most countries, the lowest proportions of covered items were those measuring topics
included in the aggregate of the data sources - the most restricted data source. Standard
deviations of proportions of coverage within countries (across data sources) ranged from
.01 to .30. When looking across countries (within data sources), the least variability was
for the proportion of items measuring topics included in the textbook data sources, and
the greatest variability was for the proportion of items measuring topics included in the
aggregate of the data sources.

Much less variability existed in the proportions of the cuniculum tested within
each country. These proportions ranged from .65 to 1.00. Averages for the four data
sources (across countries) were around .80 to .90. Most averages across data sources
within countries were in that same range with the exception of one average proportion of
.70 (country N). On average, 30% of this country’s curriculum was not tested. The
highest proportions of tested curricula within each country varied across the data sources,
with the majority of countries having more of the curriculum as deﬁned by the
textbook(s) tested that the curriculum as deﬁned by any of the other data sources. The
lowest proportions of tested curriculum were associated with the curriculum guides. All
standard deviations were .12 or less.

Differences in topic inclusion and emphasis. Differences for each topic between
topic inclusion in each curriculum-data source and topic inclusion in the ﬁeld-trial

instrument are presented in Table 12. The second column of the table indicates which

73
Table 12

Drﬁiarences in Topic Inclusion between the F ield- Trial Instrument and Each Curriculum Source for Each Topic

 

Expert
Mapping Curr. Guide Textbook Aggpegate

 

Ave. # Ave.
Topic Test # Mis- Prop. #Mis- Prop. # Mis- Prop. # Mis- Prop. Mis- Prop.
Code Items Match Match Match Match Match Match Match Match Match Match

 

1.1.1.1 J 9 0.47 7 0.59 6 0.65 14 0.18 9.0 0.47
1.1.1.2 J 8 0.53 5 0.71 2 0.88 9 0.47 6.0 0.65
1.1.1.3 J 9 0.47 8 0.53 2 0.88 11 0.35 7.5 0.56
1.1.2.1 J 3 0.82 8 0.53 1 0.94 9 0.47 5.3 0.69
1.1.2.2 J 3 0.82 7 0.59 2 0.88 8 0.53 5.0 0.71
1.1.2.3 J 5 0.71 7 0.59 2 0.88 8 0.53 5.5 0.68
1.1.2.4 J 5 0.71 6 0.65 3 0.82 9 0.47 5.8 0.66
1.1.2.5 12 0.29 5 0.71 11 0.35 4 0.76 8.0 0.53
1.1.3.1 J 3 0.82 2 0.88 2 0.88 5 0.71 3.0 0.82
1.1.3.2 14 0.18 13 0.24 12 0.29 8 0.53 11.8 0.31
1.1.3.3 11 0.35 12 0.29 11 0.35 6 0.65 10.0 0.41
1.1.4.1 2 0.88 5 0.71 4 0.76 0 1 2.8 0.84
1.1.4.2 J 2 0.88 6 0.65 3 0.82 7 0.59 4.5 0.74
1.1.4.3 0 1.00 2 0.88 3 0.82 0 l 1.3 0.93
1.1.4.4 J 4 0.76 5 0.71 6 0.65 9 0.47 6.0 0.65
1.1.4.5 2 0.88 4 0.76 7 0.59 1 0.94 3.5 0.79
1.1.5.1 J 8 0.53 11 0.35 8 0.53 13 0.24 10.0 0.41
1.1.5.2 J 3 0.82 7 0.59 7 0.59 10 0.41 6.8 0.60
1.1.5.3 J 5 0.71 9 0.47 5 0.71 10 0.41 7.3 0.57
1.1.5.4 J 5 0.71 9 0.47 10 0.41 14 0.18 9.5 0.44
1.2.1 J 2 0.88 4 0.76 2 0.88 6 0.65 3.5 0.79
1.2.2 J 2 0.88 3 0.82 4 0.76 5 0.71 3.5 0.79
1.2.3 J 6 0.65 6 0.65 10 0.41 13 0.24 8.8 0.49
1.3.1 J 3 0.82 5 0.71 3 0.82 9 0.47 5.0 0.71
1.3.2 J 0 1.00 3 0.82 0 1.00 3 0.82 1.5 0.91
1.3.3 J 1 0.94 l 0.94 l 0.94 2 0.88 1.3 0.93
1.3.4 J 2 0.88 0 1.00 4 0.76 6 0.65 3.0 0.82
1.3.5 1 0.94 9 0.47 7 0.59 l 0.94 4.5 0.74
1.4.1 J 2 0.88 2 0.88 4 0.76 6 0.65 3.5 0.79
1.4.2 J 3 0.82 3 0.82 6 0.65 8 0.53 5.0 0.71
1.4.3 13 0.24 12 0.29 9 0.47 5 0.71 9.8 0.43
1.5.1 J 2 0.88 3 0.82 7 0.59 8 0.53 5.0 0.71
1.5.2 J 1 0.94 2 0.88 5 0.71 7 0.59 3.8 0.78
1.5.3 10 0.41 8 0.53 6 0.65 3 0.82 6.8 0.60
1.5.4 6 0.65 6 0.65 4 0.76 0 1 4.0 0.76
1.6.1 J 2 0.88 1 0.94 0 1.00 2 0.88 1.3 0.93
1.6.2 J l 0.94 0 1.00 0 1.00 l 0.94 0.5 0.97
1.7.1 J 1 0.94 2 0.88 3 0.82 4 0.76 2.5 0.85
1.7.2 J 8 0.53 5 0.71 11 0.35 13 0.24 9.3 0.46
1.8.1 0 1.00 2 0.88 4 0.76 0 1 1.5 0.91
1.8.2 0 1.00 3 0.82 0 1.00 0 l 0.8 0.96
1.9.1 7 0.59 4 0.76 10 0.41 3 0.82 6.0 0.65
1.9.2 8 0.53 7 0.59 9 0.47 5 0.71 7.3 0.57
1.10.1 10 0.41 14 0.18 11 0.35 8 0.53 10.8 0.37

 

.0‘
N
o

Average 4.64 0.73 5.52 0.68 5.16 0.70 0.64 5.38 0.68

74

topics had items included on the ﬁeld-trial instrument. The “Mis-Match” column under
each curriculum source represents either (1) the number of countries including topics that
are not on the ﬁeld-trial instrument in a particular data source or (2) the number of
countries not including a topic that was on the ﬁeld-trial instrument in a particular data
source. The columns labeled “Prop. Match” indicate the proportion of countries in which
a match occurred (i.e., the topic appeared in both the ﬁeld-trial instrument and curriculum
source or did not appear in either).

The average of the proportions of “match” between the ﬁeld-trial instrument and
the curriculum-data sources in topic inclusion ranged from .64 to .73 for data sources and
from .37 to .97 for topics. The topics with the lowest rates of match in topic inclusion
were 1.1.3.2 Rational Numbers (an average of 5 countries corresponding in topic
inclusion in their data source with topic inclusion on the ﬁeld-trial instrument across data
sources), 1.1.3.3 Real Numbers (an average of 7 countries corresponding), 1.1.5.1
Estimating Quantity & Size (an average of 7 countries corresponding), and 1.10 Other
Content (an average of 6 countries corresponding). Those with the highest match with
the ﬁeld-trial instrument in topic inclusion were 1.1.4.3 Complex Numbers, 1.3.3
Polygons and Circles, 1.6.1. Patterns, Relations, and Functions, 1.6.2 Equations and
Formulas, and 1.8.2 Change.

As expected, the lowest average proportion of match with the ﬁeld-trial
instrument on topic inclusion was between the ﬁeld-trial instrument and the aggregate of
the three curriculum-data sources for each country. The highest rate of topic-inclusion
correspondence was between the ﬁeld-trial instrument and the expert mapping. On

average, though, across topics 60% to 70% of the countries either included topics in the

75

curriculum sources that were included on the ﬁeld-trial instrument or did not include
topics in the curriculum sources that were not included on the ﬁeld-trial instrument.

Table 13 shows the summary information on the match of topic inclusion between
the ﬁeld-trial instrument and data sources on topic inclusion across countries. “Prop.
Match” is the proportion of topics for which the topic inclusion (or lack of) in a country’s
data source corresponds with the topic inclusion (or lack of) on the ﬁeld-trial instrument.
“In Curr.” is the number of topics included in a particular curriculum for each country
that is not included on the ﬁeld-trial instrument. “Not in Curr.” is the number of topics
included on the ﬁeld-trial instrument that is not included in a particular curriculum-data
source. Both inclusion and exclusion in the curriculum are important enough to consider
separately. High numbers on “In Curr.” indicated that countries intend that students be
taught more topics than those being tested. High numbers on “Not in Curr.” indicated
that students were being tested on more topics than those they were intended to be taught.

Average proportions of match with the ﬁeld-trial instrument on topic inclusion
between the curriculum-data sources and the ﬁeld-trial instrument ranged within
countries from .52 to .84. Averages within data sources were the same as those in Table
12. The average numbers of topics included in a country’s curriculum but not on the
ﬁeld-trial instrument was ﬁve, and the average number of topics included on the ﬁeld-
trial instrument but not in the curriculum was nine. The highest numbers of non-tested
topics included in a curriculum source were for topics included in the textbooks and
curriculum guides. The highest numbers of tested topics not in a curriculum source were
for topics not included in the aggregate-data source. The best correspondence overall was

for the expert mapping. Within countries, average rates of non-tested topics included in

76

 

 

 

 

 

 

6.6 N.6 66.6 6.6. 6.N V6.6 6.6 6.6 66.6 ..6 N6 66.6 6.6 6.6 66.6 668626
6.N 6.. . 66.6 6 6 66.6 . N. 66.6 6 6. 66.6 v 6. 66.6 0
6.6 6.N 66.6 N. . 66.6 6 6 66.6 6 6 66.6 6 . 66.6 6
6.N. 6.N 66.6 6. 6 66.6 6 . 66.6 6. 6 66.6 N. v V66 0
6.6. 6.6 66.6 6. 6 666 6 6 66.6 6. 6 6.4.6 6 6. 66.6 2
6.6 6.6 66.6 6. v 66.6 . 6 N66 6 6 66.6 6 6 66.6 S.
6.6 6..N N66 6 N N66 6 6 66.6 6 v 66.6 . 6 N66 1.
6... 6.6 $6 6. . 66.6 6 6 66.6 v. 6 66.6 6 6 N66 v.
6.6. 6.6 N66 .N . 66.6 6N N 66.6 .. 6 66.6 6. 6 666 ..
6.6 6.6 66.6 6 v 66.6 v 6 66.6 6 6 66.6 6 6 N66 .
6.3 6.6 66.6 6N N 6.46 6 6 66.6 6N N 66.6 6 6 66.6 .6
6.6. 6.6 66.6 6N . .66 6. 6 666 6. N 66.6 6. 6 N66 0
6.6 6.6 66.6 6. . 66.6 6. 6 66.6 v .. 66.6 6 N 66.6 6
6.6. 6.6 66.6 NN 6 66.6 6. 6 66.6 6 N 66.6 N. 6 66.6 m.
6.6 6.6 66.6 . v 66.6 . 6 66.6 6 6 66.6 6 6 N66 0
6.6 6.6 66.6 6 6 66.6 N N. 66.6 N N. 66.6 6 6 66.6 0
6.6 6.6 N66 N. N 66.6 6 v 66.6 6. 6 66.6 v 6 66.6 6.
6.6 6.6 66.6 .. . 66.6 6 6 66.6 v v N66 N 6 N66 <
8.50 F50 6863. .850 8.50 686.). 8.50 .850 686.). 8.50 8.50 686.). .850 .850 686.). 6.6550
a. 82 a. .686 a. a. .686 a. a. .686 a. a. .686 a. a. .686

.c>< .o>< .o>< “oz 57. 607. 87.

866866.». 6.866606. 6650 8250.850 .1656662 6.8666.
3506 860

 

6.6580 9.66% .546 8.566. 5338.230 686% .38 3625.55 66.26.36: 63 56356 28.86626 8666 2.. 86:6ng

6. 0.66...

77

curriculum sources ranged from 2.5 to 11, and overall average rates of tested topics not in
curriculum sources ranged from approximately 1 to 17.

Tables 14 and 15 show the differences between proportions of emphasis for topics
in each of the curriculum-data sources and the topic weight (i.e., number of items for each
topic) on the ﬁeld-trial instrument. Table 14 highlights differences across topics, and
Table 15 highlights differences across countries. Positive differences occur when topics
receive a higher emphasis in a curriculum-data source than on the ﬁeld-trial instrument,
and negative differences occur when topics receive a higher emphasis on the ﬁeld-trial
instrument than in a curriculum-data source. Again, both indices are important. The
tables show standard deviations of differences in topic emphasis for each topic or country
within each curriculum-data source as well as the averages within data sources of the
positive and negative differences in topic emphasis for each topic or each country. The
tables also show averages of these numbers across data sources.

Across data sources, the topic with the largest negative average difference
between ﬁeld-trial weight and curriculum emphasis (Table 14) was topic 1.1.2.1 Common
Fractions. This topic, on average, was emphasized more on the ﬁeld-trial instrument that
in the curriculum-data sources. The topics with the largest positive average differences
varied. Some of the larger differences were for topics 1.3.4 3D Geometry, 1.4.2
Congruence and Similarity, and 1.6.2 Equations and Formulas. The largest differences in
emphasis were between the ﬁeld-trial instrument and the textbooks.

On average across all cuniculum sources, three topics (1.1.2.1 Common
Fractions, 1.5.2 Proportionality Problems, 1.7.1 Data Representation and Analysis) had

0 as an average positive difference meaning that the topics were not emphasized more in

Table 14

78

Diﬂerence in Topic Emphasis between the F ield- Trial Instrument and Each C urriculum-
Source for Each Topic

 

 

 

 

 

Expert MappinL Curriculum Guide Textbook

Ave. Ave. Ave. Ave. Ave. Ave.
Topic Prop. SD of P05. Neg. SD of P05. Neg. SD of P05. Neg.
Code Items all Dif. Dif. Dif. all Dif. Dif. Dif. all Dif. Dif. Dif.
1.1.1.1 0.017 0.013 0.009 -0.017 0.025 0.023 -0.017 0.026 0.035 -0.013
1.1.1.2 0.058 0.020 0.013 -0.044 0.023 0.033 -0.035 0.049 0.047 -0.046
1.1.1.3 0.008 0.022 0.028 -0.008 0.024 0.030 -0.008 0.023 0.029 -0.006
1.1.2.1 0.141 0.025 0 -0.109 0.016 0 -0.124 0.034 0 -0.100
1.1.2.2 0.071 0.023 0.035 -0.047 0.018 0 -0.051 0.024 0 -0.047
1.1.2.3 0.046 0.024 0.060 -0.026 0.020 0.009 -0.029 0.010 0 -0.032
1.1.2.4 0.029 0.019 0.016 -0.017 0.017 0.009 0.019 0.034 0.032 -0.023
1.1.2.5 0 0.026 0.035 0 0.013 0.028 0 0.010 0.009 0
1.1.3.1 0.012 0.019 0.026 -0.012 0.016 0.025 -0.012 0.036 0.045 -0.010
1.1.3.2 0 0.025 0.037 0 0.018 0.037 0 0.071 0.040 0
1.1.3.3 0 0.018 0.032 0 0.023 0.039 0 0.064 0.039 0
1.1.4.1 0 0.008 0.023 0 0.015 0.031 0 0.003 0.004 0
1.1.4.2 0.012 0.018 0.029 -0.012 0.019 0.023 -0.012 0.038 0.045 -0.012
1.1.4.3 0 0 0 0 0.008 0.024 0 0.001 0.001 0
1.1.4.4 0.004 0.016 0.028 -0.004 0.019 0.032 -0.004 0.022 0.026 -0.004
1.1.4.5 0 0.011 0.034 0 0.012 0.028 0 0.006 0.006 0
1.1.5.1 0.037 0.013 0 -0.024 0.016 0.004 -0.028 0.003 0 -0.035
1.1.5.2 0.033 0.018 0.015 -0.016 0.017 0.005 -0.021 0.008 0 -0.026
1.1.5.3 0.029 0.018 0.017 -0.015 0.016 0.006 -0.023 0.009 0.003 -0.023
1.1.5.4 0.004 0.021 0.035 -0.004 0.017 0.029 -0.004 0.015 0.018 -0.004
1.2.1 0.071 0.015 0 -0.046 0.017 0 -0.047 0.042 0.034 -0.050
1.2.2 0.062 0.014 0 -0.038 0.017 0 -0.036 0.057 0.051 -0.048
1.2.3 0.012 0.016 0.015 -0.012 0.019 0.024 -0.012 0.003 0 -0.011
1.3.1 0.025 0.018 0.015 -0.012 0.017 0.010 -0.021 0.032 0.032 -0.018
1.3.2 0.029 0.014 0.019 -0.003 0.017 0.012 -0.014 0.042 0.052 -0.013
1.3.3 0.033 0.016 0.014 -0.01 1 0.019 0.015 -0.009 0.054 0.071 -0.033
1.3.4 0.017 0.017 0.021 -0.017 0.016 0.025 0 0.121 0.109 -0.013
1.3.5 0 0.007 0.028 0 0.017 0.033 0 0.013 0.013 0
1.4.1 0.037 0.019 0.016 -0.019 0.020 0.014 -0.016 0.064 0.058 -0.034
1.4.2 0.062 0.021 0.007 -0.034 0.018 0 -0.028 0.060 0.077 -0.047
1.4.3 0 0.018 0.031 0 0.017 0.035 0 0.012 0.016 0
1.5.1 0.033 0.015 0.010 -0.012 0.017 0.010 -0.014 0.010 0 -0.025
1.5.2 0.095 0.019 0 -0.054 0.015 0 -0.063 0.023 0 -0.075
1.5.3 0 0.022 0.035 0 0.027 0.045 0 0.025 0.041 0
1.5.4 0 0.017 0.031 0 0.018 0.036 0 0.004 0.007 0
1.6.1 0.050 0.018 0.007 -0.026 0.019 0.014 -0.019 0.054 0.048 0024
1.6.2 0.129 0.017 0 -0.096 0.016 0 -0.096 0.1 18 0.164 -0.041
1.7.1 0.1 12 0.018 0 -0.073 0.020 0 -0.077 0.032 0 -0.064
1.7.2 0.046 0.016 0.007 -0.033 0.018 0.007 -0.024 0.008 0 -0.042
1.8.1 0 0 0 0 0.012 0.034 0 0.001 0.003 0
1.8.2 0 0 0 0 0.01 1 0.028 0 0.000 0 0
1.9.1 0 0.018 0.031 0 0.016 0.034 0 0.072 0.037 0
1.9.2 0 0.014 0.027 0 0.020 0.036 0 0.034 0.040 0
1.10.1 0 0.015 0.028 0 0.017 0.038 0 0.062 0.056 0
Averpge 0.030 0.016 0.018 -0.019 0.017 0.020 -0.020 0.032 0.029 -0.021

 

79
Table 14 (Contd.)

 

Aggregate

 

 

 

Ave. Ave. SD of SD of

Topic Prop. SD of P03. Neg. Ave. of Ave. of P03. Neg. SD of
Code Items all Dif. Dif. Dif. Pos. Dif. NeiDiﬂ Dif. Dif. A11 Dif.
1.1.1.1 0.017 0.014 0.018 -0.017 0.021 -0.016 0.009 0.002 0.020
1.1.1.2 0.058 0.032 0.063 -0.040 0.039 -0.041 0.018 0.004 0.042
1.1.1.3 0.008 0.023 0.036 -0.008 0.031 -0.008 0.003 0.001 0.019
1.1.2.1 0.141 0.027 0 -0.1 17 0 -0.1 13 0.000 0.009 0.057
1.1.2.2 0.071 0.031 0 -0.051 0.022 -0.049 0.022 0.002 0.039
1.1.2.3 0.046 0.017 0 -0.031 0.018 -0.029 0.024 0.002 0.029
1.1.2.4 0.029 0.030 0.022 -0.029 0.020 -0.022 0.008 0.005 0.022
1.1.2.5 0 0.014 0.031 0 0.026 0 0.010 0.000 0.015
1.1.3.1 0.012 0.027 0.032 -0.012 0.032 -0.012 0.008 0.001 0.023
1.1.3.2 0 0.076 0.068 0 0.045 0 0.013 0.000 0.025
1.1.3.3 0 0.028 0.048 0 0.040 0 0.006 0.000 0.020
1.1.4.1 0 0 0 0 0.014 0 0.013 0.000 0.012
1.1.4.2 0.012 0.026 0.032 -0.012 0.032 -0.012 0.008 0.000 0.023
1.1.4.3 0 0 0 0 0.006 0 0.010 0.000 0.008
1.1.4.4 0.004 0.019 0.031 -0.004 0.029 -0.004 0.002 0.000 0.017
1.1.4.5 0 0.007 0.031 0 0.025 0 0.01 1 0.000 0.015
1.1.5.1 0.037 0.010 0 -0.032 0.001 -0.030 0.002 0.004 0.016
1.1.5.2 0.033 0.014 0 -0.028 0.006 -0.023 0.006 0.005 0.015
1.1.5.3 0.029 0.014 0.007 -0.021 0.008 -0.020 0.005 0.003 0.015
1.1.5.4 0.004 0.01 1 0.025 -0.004 0.027 -0.004 0.006 0.000 0.016
1.2.1 0.071 0.029 0.013 -0.050 0.012 -0.048 0.014 0.002 0.032
1.2.2 0.062 0.041 0.051 -0.037 0.025 -0.040 0.025 0.005 0.037
1.2.3 0.012 0.010 0.010 -0.012 0.012 -0.012 0.009 0.001 0.014
1.3.1 0.025 0.027 0.024 -0.022 0.020 -0.018 0.008 0.004 0.020
1.3.2 0.029 0.028 0.025 -0.022 0.027 -0.013 0.015 0.007 0.023
1.3.3 0.033 0.037 0.044 —0.022 0.036 -0.019 0.023 0.010 0.033
1.3.4 0.017 0.053 0.056 -0.017 0.053 -0.012 0.035 0.007 0.041

1.3.5 0 0.009 0.037 0 0.028 0 0.009 0.000 0.015
1.4.1 0.037 0.063 0.070 -0.025 0.039 -0.024 0.025 0.007 0.036
1.4.2 0.062 0.053 0.083 -0.044 0.042 -0.038 0.038 0.008 0.049
1.4.3 0 0.013 0.027 0 0.027 0 0.007 0.000 0.014
1.5.1 0.033 0.017 0 -0.021 0.009 -0.018 0.005 0.005 0.014
1.5.2 0.095 0.029 0 —0.068 0 -0.065 0.000 0.008 0.033
1.5.3 0 0.028 0.054 0 0.044 0 0.007 0.000 0.022
1.5.4 0 0 0 0 0.018 0 0.015 0.000 0.014
1.6.1 0.050 0.039 0.039 -0.025 0.027 -0.023 0.017 0.003 0.028
1.6.2 0.129 0.087 0.1 15 -0.053 0.070 -0.071 0.072 0.025 0.089
1.7.1 0.1 12 0.038 0 -0.067 0 -0.070 0.009 0.005 0.038
1.7.2 0.046 0.012 0 -0.039 0.004 -0.035 0.004 0.007 0.020
1.8.1 0 0 O 0 0.009 0 0.014 0.000 0.01 1

1.8.2 0 0 0 0 0.007 0 0.012 0.000 0.009
1.9.1 0 0.024 0.047 0 0.037 0 0.006 0.000 0.019
1.9.2 0 0.019 0.037 0 0.035 0 0.005 0.000 0.018
1.10.1 0 0.032 0.046 0 0.042 0 0.010 0.000 0.022
Avergge 0.030 0.025 0.030 -0.021 0.024 -0.020 0.013 0.003 0.025

80

any curriculum source than on the ﬁeld-trial instrument. These three topics had among
the highest average negative differences. On average, topic 1.1.2.1 had over 10% more
emphasis on the ﬁeld-trial instrument than in the curriculum sources; topics 1.5.2 and
1.7.1 Uncertainty and Probability had around 7% more emphasis. Topic 1.6.2 Equations
and Formulas had the highest positive difference (approximately 7% more emphasis on
average in the curriculum than on the ﬁeld-trial instrument), it also had a high negative
difference (approximately 7% more emphasis on average on the ﬁeld-trial instrument
than in the curriculum). Looking at data sources shows that the topic has a higher
negative than positive difference in topic emphasis between the expert-mapping data and
the ﬁeld-trial instrument and also between the curriculum-guide data and the ﬁeld-trial
instrument. It has a higher positive than negative difference for the other two data
sources. The only topics with 0 as an average negative difference in emphasis were those
averages for topics not included on the ﬁeld-trial instrument. The average of the positive
averages was .024 while the average of the negative averages was -.020.

Table 15 shows more variability in topic-emphasis differences between countries
than existed between topics. Again, standard deviations and means of differences in topic
emphasis within data sources were highest for the textbooks and the aggregate-data
source. Textbooks had a larger average standard deviation of differences within countries
than other sources had, and one country’s (N) standard deviation of differences across
topics was .109. Average positive differences in topic emphasis within countries across
topics and data sources ranged from .019 to .071 with an average of around .037.

Average negative differences in topic emphasis ranged from -.032 to -.043, with an

81

 

 

 

 

666.6 666.6 V.66 666.6- 666.6 6666- 6V66 .V66 666.6- 6V66 666.6 666.6- 6N66 666.6 6666- 6N66 666.6 6686><
6N66 V666 N666 N666- .N66 .666- 6N66 N666 6N66- VN66 V666 6666- 6.6.6 666.6 666.6- NN66 6666 0
666.6 666.6 666.6 6666.. 6N66 666.6- 6N66 666.6 N666. 6N66 666.6 V666- .N66 N666 666.6- 6N66 666.6 6
6V66 666.6 6N66 6V66- 6V66 666.6- 666.6 NV66 666.6- 666.6 N666 .V66- 666.6 666.6 6V66- 6N66 666.6 0
.666 666.6 6N66 6V66- .666 6V66- 6V66 666.6 6V66- 666.6 66.6 6V66- .V66 6V66 666.6- 6.6.6 666.6 2
666.6 666.6 6.6.6 666.6- 666.6 N666- 666.6 666.6 V666- 666.6 666.6 6666- 6N66 666.6 V666- 666.6 666.6 3.
6N66 666.6 666.6 666.6- 6N66 666.6- 6N66 6N66 .666- 6.6.6 666.6 666.6- 6.6.6 N666 666.6- 6N66 666.6 ..
666.6 .666 N.6.6 666.6- .V66 V666- 666.6 :66 666.6- N666 666.6 666.6- 666.6 666.6 666.6- 6N66 666.6 0.
6V66 .666 6N66 .V66- N666 6V66- 666.6 666.6 .V66- 666.6 .666 6V66- 6N66 6V66 NV66- V666 .V66 ..
.666 666.6 666.6 666.6- 6N66 666.6- VN66 666.6 .666- 666.6 6.66 666.6- 6N66 666.6 6666.. 6.6.6 666.6 .
6V66 666.6 VN66 666.6- V666 6666- 666.6 6666 666.6- 666.6 6V66 NV66- 666.6 6V66 666.6- 6N66 V666 ...
666.6 .666 6V66 .V6.6. .666 NV66- 6V..6 N666 NV66- .666 .666 666.6- V666 666.6 666.6- 6V66 NV66 0
666.6 N666 666.6 V666- 666.6 .666- 666.6 666.6 666.6- NV66 VV66 666.6- NN66 666.6 666.6- .N66 6N66 6
666.6 V666 N666 666.6- .666 666.6- 66.6 .666 6V66- 6V66 V666 6666.. 6N66 666.6 N666- 6N66 V666 m
6N66 V666 N666 V666- 6.6.6 666.6- 6.6.6 666.6 666.6- 6.6.6 .666 N666- .N66 666.6 6V66- 6.6.6 N666 Q
6N66 666.6 666.6 6666.. VN66 666.6- .N66 V666 .666- 6N66 666.6 666.6- 6N66 666.6 666.6- 6N66 666.6 0
VVo6 V666 6N66 6V66- 6V66 666.6- N666 6V66 6V66- N666 666.6 6V66- 6N66 666.6 666.6- 6N66 666.6 6
666.6 666.6 666.6 6666- 6N66 666.6- 6N66 666.6 666.6- VN66 V666 .V66- 6.6.6 V666 666.6- 6N66 V666 <
6.6. .65 6.6. 6.6. 6.6. 6.6. .65 6.6. =6 .65 .65 6.6. =6 6.6. .65 6.6. .6W .65 6.6. .65 ...W .9500
..< .662 .86 .667. .86 .662 .86 60 Q6. .662 .86 60 Q6. .662 .86 60 Q6. .607. .86 we Q6.

«0 Q6. 66 Q6. 60 Q6. woo>< moo>< .o>< .c>< .o>< .o>< .o>< .o>< .o>< .o><

Jaa‘ 6.8668... 6650 523.850 66.663). 6.8666

 

6:580 «cum 8K8§em 5:38.230 .6er .35 38.3.6626 68.66.3666 65 56.566 6.866566 866.6 8 8.88.5ka

6. 6.66...

82

average of -.037. Smaller overall positive and negative differences were noted for
countries D and Q, and larger differences were noted for countries G, J, and N.

Correlations and Euclidean-Distance measures. Correlations between the
proportions of emphasis patterns for topics in each country’s curriculum-data source and
topic weights on the ﬁeld-trial instrument are in Table 16. The correlations ranged from -
.064 to .66 with an overall average of .36. The average of the correlations of t0pic-
emphasis patterns with the ﬁeld-trial instrument was highest for the aggregate of the data
sources. However, only ﬁve countries had their highest correlations for that data source.
The lowest average correlation was for the curriculum guides. Average correlations
within countries across data sources varied considerably. These ranged from a low of
.111 (country N) to a high of .524 (country D). Thus, some countries had curriculum-
topic-emphasis “proﬁles” that were all, or mostly all, uncorrelated with the ﬁeld-trial
instrument topic-weight proﬁle while others had curriculum-topic-emphasis proﬁles that
were almost all moderately correlated with the ﬁeld-trial instrument topic—weight proﬁle.
Standard deviations of correlations within countries and across data sources varied from
.04 (country G) to .23 (country Q).

Euclidean distances are shown in Table 17. These numbers represent the square
roots of the sums of squared differences between the proportions of emphasis patterns for
topics in a particular curriculum-data source for each country and the proportion of items
for each topic on the ﬁeld-trial instrument. These distances can be used to determine the
extent of dissimilarity between each of the country curriculum-topic-emphasis “proﬁles’
and the ﬁeld-trial instrument topic-weight proﬁle. The larger numbers indicate greater

dissimilarity, and, for all practical purposes, the numbers are relative. However, some

83
Table 16

Correlations between the Proportions of T opic-Emphasis-Proﬁles for Each
Country in Each Curriculum-Data Source and the T opic- Weight Proﬁle for

the F ield- Trial Instrument

 

 

 

Expert Curr.

Country Map Guide Textbook Aggregate Average SD

A 0.359* 0.404" 0.556" 0.591 ** 0.477 0.098
B 0.282 0.150 0.073 0.157 0.165 0.075
C 0.146 0.215 0.434M 0.394M 0.297 0.120
D 0.513" 0.419" 0.552" 0.612M 0.524 0.070
E 0.428" 0.456" 0324* 0.506“ 0.428 0.066
F 0.636” 0.223 0.270 0.548M 0.419 0.176
G 0331* 0.265 0.361* 0.284 0.310 0.038
H 0.387" 0.201 0358* 0.270 0.304 0.074
I 0.432" 0.287 0.406M 0.436M 0.390 0.061
J 0.210 0.118 0.487M 0.470" 0.321 0.161
K 0333* 0.316* 0.493" 0.563" 0.426 0.105
L 0.195 0.498“ 0.663" 0.632" 0.497 0.185
M 0312* 0.073 0.459M 0.412" 0.314 0.149
N 0352* -0.064 0.104 0.054 0.111 0.152
O 0.300* 0.215 0.480“ 0.471" 0.367 0.113
P 0374* 0.475" 0.566M 0.525” 0.485 0.072
Q 0.244 0.144 0.587“ 0.484M 0.329 0.227
Ave 0.343 0.258 0.422 0.436 0.363 0.114
SD 0.116 0.151 0.157 0.157 0.109 0.051

 

*p <.05. *p <.01.

Table 17

Euclidean Distances between the Proportions-of-Topic-Emphasis Proﬁles in
Each Curriculum-Data Source and the the T opic- Weight Proﬁle for the F ield-

 

 

 

Trial Instrument
Expert Curr.

Country Map Guide Textbook Aggregate Average SD
A 0.233 0.228 0.228 0.207 0.224 0.010
B 0.243 0.266 0.435 0.312 0.314 0.074
C 0.256 0.243 0.260 0.231 0.247 0.012
D 0.218 0.228 0.208 0.203 0.214 0.009
E 0.231 0.226 0.360 0.335 0.288 0.060
F 0.195 0.243 0.299 0.223 0.240 0.03 8
G 0.284 0.267 0.468 0.481 0.375 0.100
H 0.233 0.323 0.285 0.354 0.299 0.045
I 0.225 0.240 0.267 0.228 0.240 0.017
J 0.278 0.267 0.407 0.334 0.321 0.056
K 0.241 0.252 0.366 0.278 0.284 0.049
L 0.248 0.217 0.201 0.195 0.215 0.020
M 0.244 0.260 0.377 0.246 0.282 0.05 5
N 0.234 0.322 0.752 0.333 0.410 0.201
O 0.258 0.264 0.345 0.281 0.287 0.034
P 0.246 0.220 0.249 0.227 0.235 0.012
Q 0.244 0.248 0.227 0.219 0.234 0.012
Ave 0.24 0.25 0.34 0.28 0.28 0.04
SD 0.020 0.030 0.130 0.073 0.063 0.043

 

85

benchmarks can be identiﬁed. For example, if the proportions of emphasis of all 44
topics in the curriculum differed from the corresponding topic weight on the ﬁeld-trial
instrument by .01 (and the proportions summed to 1 within the curriculum and the test),
the Euclidean distance would be .07; if all topics differed by .10, the Euclidean distance
would be .66; if 1/4 of the topics differed by .10 and the other 3/4 were the same, the
Euclidean distance would be .33; if 1/2 of the topics differed by .10, the Euclidean
distance would be .47. Finally, the ﬁeld-trial tested only topics not included in the
curriculum, the Euclidean distance would be 1.4. Thus, the smallest possible Euclidean
distance was 0 and the largest (if both sets of proportions summed to 1) was 1.4.

The Euclidean distances ranged from .195 to .752, with an overall average of .28.
The largest average distance was found between the ﬁeld-trial instrument topic-weight
proﬁle and the textbook-topic-emphasis proﬁles. The smallest was between the ﬁeld-trial
instrument proﬁles and the expert mapping proﬁles. For most countries, the largest
distance was between the ﬁeld-trial instrument proﬁles and the textbook proﬁles. The
exceptions were for countries D, H, L, and Q.

The smallest average Euclidean distance was for country D (.214). The largest
was for country N (.410). Average standard deviations across countries within
curriculum sources were between .24 and .34. Average standard deviations within
countries across curriculum sources ranged from .01 to .20. Country N had by far the

most variability in Euclidean distances across curriculum-data sources.

86
Development of Test Blueprints

For the next sets of analyses, I used the curriculum information from each country
to design test blueprints with optimal content match to the curriculum depending upon the
test purpose. I focused on three questions when developing the blueprints: What was the
purpose of the test (i.e., what inferences will be made), what topics should be included in

the test; What proportion of items should be allocated to each included topic?

Determine the Purpose of the Test

I assumed that, at the most general level, the purpose of all tests would be to
compare cross-national student achievement of the content included in the mathematics
curriculum for 13-year-old students. As seen earlier, the mathematics curriculum for 13-
year-old students varied within and across nations. Therefore, I was interested in
specifying the exact nature of the curriculum on which students would demonstrate
achievement. The speciﬁc purpose of the test, therefore, had implications for the topics
that would be included on the test.

I focused on two speciﬁc purposes for test development. The first purpose was to
compare student achievement of the content of the intended mathematics curriculum
cross-nationally. The expert-mapping and curriculum-guide analyses served as data
sources for these blueprints. The second purpose was to evaluate student achievement of
the content of the mathematics curriculum to which the students were likely to have been
exposed (i.e., the potentially implemented mathematics curriculum). The results of the
textbook analyses served as one the data source for these blueprints; the aggregate of the

data sources served as another source for these test blueprints.

 

 

 

87

Determine Topic Inclusion

Topic inclusion in each test blueprint was the next issue to confront. If the
curriculum sources were used to develop separate test blueprints for each country, any
topic that appeared in the relevant curriculum source for a country would be included in
the blueprint. However, when looking cross-nationally, the decision was not as simple.
Topic inclusion varied across the countries, and, although many commonalties existed,
many differences existed also.

I used four methods for determining topic inclusion in each test blueprint. The
ﬁrst method was the development of a unique-test blueprint for each country, only
including those topics that appeared in a particular data source for each country. The
other three methods were three different ways of combining each country’s curriculum
data to develop inclusive-test blueprints (i.e., one test for all countries). The ﬁrst of the
inclusive methods was to include on the test a union of all topics that any country
included in each data source. The second inclusive method was a 70% intersection
method, that is the inclusion of topics that appeared in a particular data source for at least
70% of the countries. Finally, the third inclusive method for determining topic inclusion
was to develop a test using a strict intersection of topics appearing in the relevant data
source of all countries. Table 18 presents information on topic inclusion for the union,
70%-intersection, and strict-intersection methods. One check in a column indicates that
the corresponding topic would appear in the union-test blueprint only, two checks
indicate that a topic would also appear in the 70%-intersection-test blueprint, and three
checks indicate that a topic would appear in the union-, 70%-intersection-, and strict-

intersection-test blueprints.

Table 18

Items Included on Test Blueprints

88

 

Curriculum Source

 

 

 

Topic Expert

Code Topic Mapping Curr. Guide Textbook Aggregate
1.1.1.1 Wh.Num.-Meaning J J J J
1.1.1.2 Wh.Num.-Oper. J JJ JJ J
1.1.1.3 Prop. ofOper. J J JJ J
1.1.2.1 Common Fractions J J J J J J
1.1.2.2 Decimal Fractions J J J J J J
1.1.2.3 Relat. of Fractions J J J J J J
1.1.2.4 Percentages J J J J J J
1.1.2.5 Prop. of Frac. J J J J J
1.1.3.1 Negative Numbers JJ JJ JJ JJ
1.1.3.2 Rational Numbers JJ JJ JJ J
1.1.3.3 Real Numbers J JJ J J
1.1.4.1 Binary Arithmetic J J J

1.1.4.2 Exponents JJ J JJ J
1.1.4.3 Complex Numbers J J

1.1.4.4 Number Theory JJ JJ J J
1.1.4.5 Counting J J J J
1.1.5.1 Estim. Quant.& Size J J J J
1.1.5.2 Rounding JJ J J J
1.1.5.3 Estim. Comput. J J J J J J
1.1.5.4 Exponents&Mag. J J J J J
1.2.1 Measurement Unit J J J J J J J
1.2.2 Per.,Area,Volume JJ JJ JJ JJ
1.2.3 Estim. Errors J J J J
1.3.] 2D Geo:Coordinate JJ JJ JJ J
1.3.2 2D Geo:Basics JJJ JJ JJJ JJ
1.3.3 2D Geo: Polygons JJ JJ JJ JJ
1.3.4 3D Geo JJ JJJ JJ J
1.3.5 Vectors J J J J
1.4.1 Geo. Transform. J J J J J J J
1.4.2 Cong. & Sim. JJ JJ J J
1.4.3 Constructions J J J J J J
1.5.1 Proport. Concepts J J J J J J
1.5.2 Proport. Prob. J J J J J J J
1.5.3 Slope & Trig. J J J J
1.5.4 Lin. Interp. J J J

1.6.] Pat, Rel., Func. JJ JJ JJJ JJ
1.6.2 Equat. & Formulas J J J J J J J J J J
1.7.1 Data Rep. & Anal. JJ JJ JJ JJ
1.7.2 Uncer. & Prob. J J J J J
1.8.1 Inﬁnite Process. J J

1.8.2 Change J

1.9.1 Val. & Just. J J J J
1.9.2 Struc. & Abs. J J J J
1. 10.1 Other J J J J J
Union 41 44 43 39
70% Int. 26 21 21 7
Int. 1 2 3 0

 

89

According to Table 18, union-test blueprints would contain between 39 and 44
topics. The numbers of topics on the 70%-intersection-test blueprints ranged from 7 to
26 topics. The strict—intersection—test blueprints would include only from 0 to 3 topics.
All but ﬁve topics would be included in the union-test blueprints for all data sources, and
all topics would be included in the union-test blueprints for at least one of the data
sources. Thirty-one topics would be included in at least one of the 70%-intersection-test
blueprints, and four topics would be included in at least one of the strict-intersection-test
blueprints. Only seven topics would appear in the 70%-intersection blueprints developed
for the aggregate of the data sources, and no topics would appear in the strict-intersection

blueprints developed for the aggregate of the data sources.

Determine Topic Emphasis

Next, the relative emphasis (i.e., weight) that topics would receive on each test
had to be determined. For the inclusive-test blueprints (i.e., union, 70% intersection,
strict intersection), 1 weighted topics according to the average of the proportions of
emphasis allocated to each topic for each country within each of the data sources. For the
unique tests, I weighted topics differently for each country according to the proportion of
emphasis to each topic in the relevant data source for each country.

The topic weights for the two types of intersection-test blueprints are presented in
Table 19. The weights for the union tests have been presented earlier in this paper (Table
10), and the weights for the unique tests are presented in Appendix C. The weights for

topics on the 70%-intersection blueprints based on the expert mapping and the curriculum

 

90
Table 19

Topic Weights on Test Blueprints

 

 

Topic Expert Expert Curr. Gd. Curr. Gd. Text Text Aggre.

Code Topic 70% Strict 70% Strict 70% Strict 70% AVE
1.1.1.1 Wh.Num-Meaning 0 0 0 0 0 0 0 0.000
1.1.1.2 Wh.Num.-Oper. 0 0 0.042 0 0.038 0 0 0.01 1
1.1.1.3 Prop. of Oper. 0 0 0 0 0.020 0 0 0.003
1.1.2.1 Common Fractions 0.040 0 0 0 0.039 0 0 0.01 1
1.1.2.2 Decimal Fractions 0.035 0 0 0 0.023 0 0 0.008
1.1.2.3 Relat. of Fractions 0.031 0 0 0 0.013 0 0 0.006
1.1.2.4 Percentages 0.032 0 0 0 0.034 0 0 0.009
1.1.2.5 Prop. of Frac. 0.031 0 0 0 0 0 0 0.004
1.1.3.1 Negative Numbers 0.039 0 0.050 0 0.040 0 0.073 0.029
1.1.3.2 Rational Numbers 0.038 0 0.043 0 0.027 0 0 0.015
1.1.3.3 Real Numbers 0 0 0.042 0 0 0 0 0.006
1.1.4.1 Binary Arithmetic 0 0 0 0 0 0 0 0.000
1.1.4.2 Exponents 0.045 0 0 0 0.039 0 0 0.012
1.1.4.3 Complex Numbers 0 0 0 0 0 0 0 0.000
1.1.4.4 Number Theory 0.030 0 0.039 0 0 0 0 0.010
1.1.4.5 Counting 0 0 0 0 0 0 0 0.000
1.1.5.1 Estim. Quant.& Size 0 0 0 0 0 0 0 0.000
1.1.5.2 Rounding 0.037 0 0 0 0 0 0 0.005
1.1.5.3 Estim. Comput. 0.029 0 0 0 0.007 0 0 0.005
1.1.5.4 Exponents&Mag. 0.034 0 0 0 0 0 0 0.005
1.2.1 Measurement Unit 0.036 0 0.041 0 0.038 0 0.000 0.017
1.2.2 Per.,Area,Volume 0.036 0 0.047 0 0.068 0 0.104 0.036
1.2.3 Estim. Errors 0 0 0 0 0 0 0 0.000
1.3.] 2D Geo:Coordinate 0.036 0 0.037 0 0.032 0 0.000 0.015
1.3.2 2D Geo:Basics 0.048 1.00 0.046 0 0.052 0.171 0.102 0.203
1.3.3 2D Geo: Polygons 0.043 0 0.058 0 0.094 0 0.153 0.050
1.3.4 3D Geo 0.042 0 0.063 0.500 0.065 0 0 0.096
1 .3 .5 Vectors 0 0 0 0 0 0 0 0.000
1.4.1 Geo. Transform. 0.041 O 0.052 0 0.053 0 0 0.021
1.4.2 Cong. & Sim. 0.039 0 0.049 0 0 0 0 0.012
1.4.3 Constructions 0.030 0 0.037 0 0 0 0 0.010
1.5.1 Proport. Concepts 0.037 0 0.046 0 0 0 0 0.012
1.5.2 Proport. Prob. 0.051 0 0.049 0 0.019 0 0 0.017
1.5.3 Slope & Trig. 0 0 0 0 0 0 0 0.000
1.5.4 Lin. Interp. 0 0 0 0 0 0 0 0.000
1.6.1 Pat., Rel., Func. 0.040 0 0.058 0 0.057 0.187 0.138 0.069
1.6.2 Equat. & Formulas 0.052 0 0.063 0.500 0.196 0.642 0.312 0.252
1.7.1 Data Rep. & Anal. 0.049 0 0.052 0 0.046 0 0.119 0.038
1.7.2 Uncer. & Prob. 0 0 0.038 0 0 0 0 0.005
1.8.1 Inﬁnite Process. 0 0 0 0 0 0 0 0.000
1.8.2 Change 0 0 0 0 0 0 0 0.000
1.9.] Val. & Just. 0 0 0 0 0 0 0 0.000
1.9.2 Struc. & Abs. 0 0 0 0 0 0 0 0.000
1.10.1 Other 0 0 0.047 0 0 0 0 0.007

 

91

guides ranged from around .03 to around .06. Topic weights on the intersection
blueprints based on the textbooks had a larger range of .008 (1.1.5.3 Estimating
Computations) to .196 (1.6.2 Equations and Formulas). Topic weight on the strict-
intersection blueprints varied. Only one topic was included in the strict-intersection
blueprint for the expert mapping; therefore, it received 100% of the weight. Two topics
were included in the curriculum-guide strict-intersection blueprint, each receiving half of
the weight. Three topics were included in the strict-intersection-test blueprint for the
textbook. Two of the topics received around 20% of the weight, and one topic received
around 60% of the weight. Disregarding topics not included in any of the intersection test
blueprints, averages of the topic weights ranged from .005 to .251. Topics with the
highest average weight were 1.3.2 Basic 2D Geometry (.203) and 1.6.2 Equations &
Formulas (.251). Table 20 provides a summary of codes used throughout the remainder

of this section.

Comparisons between the F ield-Trial Instrument and Test Blueprints

I repeated the test-to-curriculum match analyses described earlier comparing the
content of each of the inclusive-test blueprints (i.e., the union, 70%-intersection, and
strict-intersection blueprints) to the content of the ﬁeld-trial instrument. This resulted in
comparisons of the actual test to other tests that could be developed based on a country’s
curriculum. Unique-test blueprints are identical to the data in each of the curriculum
sources. Therefore, a comparison of these blueprints with the ﬁeld-trial instrument would

yield identical results to the initial sets of match analyses.

92

.858 E=_=o_t:o Bawocwwa .«o 5:08:35 8% 05 c8 2:533 32 2t 5 83:8 858 02 a

 

 

 

 

02-3. 03.5. 03-8 03.5 oaowwmuwwmm
moEcsoU
. 5.5 5-8 5-5 EW 5 8:5
mo 5308.85
855.00
:-o< 5:. :60 Kim .6 $2 5 8&3
no 838385
35550 =m mmocom
ZD-O< 75¢: 75-00 75-xm moi—oh mo :oED
owmwoewwxx 3008on 32:0 8:32:50 maxim—2 tonxm :ouaomﬁoﬁ
Mo @0522

8.50m 8:30:50
3.89 azimmaadumrm

on 033.

93

Proportions of items/blueprints covered. Table 21 shows the proportions of
items on the TIMSS ﬁeld-trial instrument that tested topics included in each of the test
blueprints, and the proportion of “items” in each of the test blueprints (i.e., sum of topic
weights) for topics that were tested by items on the ﬁeld-trial instrument. Proportions of
items that tested topics on each of the blueprints ranged from .02 to 1.00 with an average
of .61. All topics tested on the ﬁeld-trial instrument were included in each of the union-
test blueprints. The proportion of ﬁeld-trial items measuring topics included on the 70%-
intersection blueprints ranged from .29 to .84. The variability was quite substantial. The
standard deviation was .38.

The proportions of “items” in each of the test blueprints that were allocated to
topics tested on the ﬁeld-trial instrument ranged from .78 for the curriculum-guide union-
test blueprint to 1.00 for the three strict-intersection-test blueprints and the aggregate-
70%-intersection-test blueprint. This meant that the ﬁeld-trial instrument included all
topics that were included on each of these test blueprints. The average proportion of
emphasis was .91. As expected, proportions of items for topics included on the test
blueprints that were also included on the ﬁeld-trial instrument increased as the test
blueprints became more restricted. The opposite was true when looking at the proportion
of items on the ﬁeld-trial instrument that tested topics in each of the test blueprints.

Diﬂerences in topic inclusion and emphasis. Differences in topic inclusion
between each of the inclusive-test blueprints and the ﬁeld-trial instrument are presented
for all topics in Table 22. A check in the second column of the table indicates which
items were included on the ﬁeld-trial instrument. If a topic was included on the ﬁeld-trial

instrument but not in the blueprint, a value of -1 was entered in the corresponding cell of

4
9

 

co; wwd mod 56 co; and co; 36 36 co; mwd mud oo._ cod mwd .65—«:563E co
@8on 353025
“mom. mo :oanoE

8.. mod wmd Gd omd so; 36 and 2: mod wmd 2: mod 33 2: 3532.5
amok E 38: RE.
.205 no :Btoaoi

 

V932 7:2 Om m>< K-D< ZD-O< 5&2. K-th ZDéC. 5-00 K-va 75-00 ~m-Xm KHXm 75-xw

$53035

 

 

Emﬁahmﬁ NE: .3ka to .8365 3.235% Reg 53% 5 wEmFNe EEIQQER has Etmmém >85 «Noam 3 SSS NSC .EmEXQ arezxemexm

5 03mm.

 

95

Table 22

Diﬂerences in Topic Inclusion between the F ield- Trial Instrument and Each Test Blueprint

 

Prop.
SUM Match it Match

TX-Sl AG-UN AG-7l

EX-Sl CG-UN CG-7l CG-Sl TX-UN TX-7l

Test EX—UN EX-7l

CODE

 

00000000000000000000000000000000000000000000
45845455475362644868434443344465443343421445
lllllll OOOOOIOIOIIIIIOI100101101100000100000

44444440400040404444....44444004404400404400000
JOJJJJJOOI1.04000441240040000000.10000000400001.
OOOOOOOIOIIIOIOIO0000000000100.100.11.0000 lllll
lllllll 0100010lOllllllllOllOllOllOOllllOOOOO

lllllllloll"ll‘lllllllllllllllllllll‘llllillllil

 

22
0.50

30 34 18 34
0.77 0,41 0.77

0.68

17
0.39

16
0.36

35

0.80

32
0.73

it Match

0.61

0.66

Prop. Match
In Blueprint

10

22

14

15

12

I3 27 26 -22

O 28

Not in Blueprint

 

96

the test-blueprint vector. If a topic was not included on the ﬁeld-trial instrument but was
included in the blueprint, a 1 was entered in the corresponding cell of the blueprint
vector. A 0 indicated correspondence between the ﬁeld-trial instrument and the test
blueprint (i.e., the topic was either on both the ﬁeld-trial instrument and the test blueprint
or the topic was off of both).

The proportion of the test blueprints that corresponded with the ﬁeld-trial
instrument in inclusion (or non-inclusion) of each topic (i.e., zeros) ranged from .36 to
.91, The proportion of topics that either were included in a test blueprint and included on
the ﬁeld-trial instrument or not included on both ranged from .36 to .80. The topics with
the lowest correspondence between the ﬁeld-trial instrument and the test blueprints in
topic inclusion were 1.1.1.1 Whole Number Meanings, 1.1.3.2 Real Numbers, 1.1.5.1
Estimating Quantity & Size, 1.3.5 Vectors, and 1.7.2 Uncertainty and Probability. Each
of these t0pics was tested on the ﬁeld-trial instrument but was not included in 7 of the 11
test blueprints. Those with the highest correspondence in topic inclusion were 1.3.2
Basic 2D Geometry, 1.6.2 Equations and Formulas, and 1.8.2 Change. Topics 1.3.2 and
1.6.2 were tested on the ﬁeld-trial instrument and were included in all but one test
blueprint each, and topic 1.8.2 was not included on the ﬁeld-trial instrument and was only
included in one test blueprint. The lowest correspondence of topic inclusion between the
ﬁeld-trial instrument and test blueprints was between the ﬁeld-trial instrument and the
expert-mapping strict-intersection blueprint, and the best correspondence of topic
inclusion was between the ﬁeld-trial instrument and the expert-mapping 70%-intersection

blueprint.

Table 23

97

Diﬂerences in Topic Emphasis between the F ield- Trial Instrument and Each Test Blueprint

 

 

 

Ave Ave
CG- AG- Pos. Neg
CODE EX-UN EX-7I EX-SI UN CG-71 CG—SITX-UN TX-7I TX-Sl UN AGJI AVE SD Dif. Dif.
1.1.1.1 0.00 -0.02 -0.02 0.01 -0.02 -0.02 -0.01 002 -0.02 -0.01 -0.02 -0.01 0.01 0.007 -0.014
1.1.1.2 004 -0.06 -0.06 -0.03 -0.02 -0.06 -0.03 -0.02 -0.06 -0.03 -0.06 —0.04 0.02 0 -0.042
1.1.1.3 0.01 -0.01 -0.01 0.01 -0.01 -0.01 0.01 0.01 -0.01 0.01 -0.01 0.00 0.01 0.010 -0.008
1.1.2.1 011 -0.10 —0.14 -O.12 -0.l4 -0.14 -0.11 -0.10 -0.14 0.12 -0.14 —0.12 0.02 0 -0.124
11.22 -0.04 -0.04 -0.07 -0.05 -0.07 -0.07 005 .005 -0.07 -0.05 -0.07 -0.06 0.01 0 -0.057
1 1.2.3 -0 02 -0.01 -0.05 -0.02 -0.05 -0.05 -0.04 -0.03 -005 -0.03 —0.05 -0.04 0.01 0 0.035
1.1.2.4 0.00 0,00 -0.03 -0.01 -0.03 -003 0.00 0.00 -0.03 0.00 -0.03 -0.01 0.01 0.004 -0.018
1.1.2.5 003 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.01 0.015 0
1.1.3.1 002 0.03 -0.01 0.02 0.04 ~00] 0.02 0.03 -0.01 0.02 0.06 0.02 0.02 0.029 -00l2
1.1.3.2 003 0.04 0.00 0.03 0.04 0.00 0.02 0.03 0.00 0.03 0.00 0.02 0.02 0.031 0
1.1.3.3 0.02 0.00 0.00 0.03 0.04 0.00 0.02 0.00 0.00 0.02 0.00 0.01 0,01 0,025 0
1.1.4.1 0.00 0,00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.004 0
1.1.4.2 0.02 0.03 -0.01 0.01 -0.01 -0,01 0.02 0.03 —0.01 0.01 -0.01 0.01 0.02 0.021 -0.012
1.14.3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.002 0
1.1.4.4 0.02 0.03 0.00 0.02 0.03 0.00 0.01 0.00 0.00 0.01 0.00 0.01 0.01 0.021 -0.004
1.1.4.5 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.004 0
1.15.1 -0.02 -0.04 -0.04 .003 -0.04 -0.04 -0.04 -0.04 -0,04 -0.03 -0.04 ~0.03 0.00 0 -0.034
115.2 0.00 0.00 -0.03 —0.01 003 .003 -0.03 -0.03 -0.03 -0.02 -0.03 -0.02 0.01 0.004 002?
1.1.5.3 -0.01 0.00 —0.03 —0.01 -0.03 -0.03 -0.02 -0.02 -0.03 -0.02 -0.03 -0.02 0.01 0.000 -0.023
1.1.5.4 002 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.013 -0.004
1.2.] -0.05 -0.04 -0.07 -0,05 -0.03 -0.07 004 —0.04 -0.07 -0.04 -0.07 -0.05 0.02 0 -0.053
1.2.2 -0.04 0.03 -0.07 -0.04 -0.02 -0.07 -0.01 0.00 -0.07 -0.02 0.04 -0.03 0.03 0.020 0040
1.2.3 0.01 -0.01 -0.0I 0.01 —0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 0.01 0.008 -0.012
1.3.1 0.00 0.01 -0.02 0.00 0.01 —0.02 0.00 0.01 -0.02 0.00 -0.02 -0.01 0.01 0.007 0017
1.3.2 0.01 0.02 0.97 0.00 0.02 -0.03 0.01 0.02 0.14 0.01 0.07 0.11 0.27 0.128 .0029
1.3.3 0.00 0.01 -0.03 0.01 0.02 -0.03 0.04 0.06 -0.03 0.03 0.12 0.02 0.04 0.037 -0.033
1.3.4 002 0.03 -0.02 0.02 0.05 0.48 0.04 0.05 -0.02 0.03 -0.02 0.06 0.14 0.089 0017
1.3.5 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.006 0
1.4.1 -0.01 0.00 -0.04 -0.0l 0.01 -0.04 0.00 0.01 004 0.01 -0.04 001 0.02 0.008 0026
1.4.2 -0.03 -002 -0.06 -0.03 -0.01 -0,06 -0.03 -0.06 -0.06 -0,02 006 -0.04 0.02 0 -0.038
14.3 0 02 0.03 0.00 0.02 0.04 0 00 0 01 0.00 0.00 0.01 0.00 0.01 0.01 0.022 0
1.5.1 000 0.00 -0,03 0.00 0.01 .003 -0.03 -0,03 -0.03 -0.02 -0.03 -0.02 0.02 0.009 0024
1.5.2 -0.05 -0.04 -0.10 -0.06 -0.05 -O.10 -008 -0.08 010 -0.07 —0.10 -0.07 0.02 0 0074
1.5.3 002 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0,01 0.00 0.01 0.01 0.016 0
15.4 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.008 0
1.6.] -0.02 -0.01 -0.05 -0.01 0.01 -0.05 0.00 0.01 0.14 0.01 0.09 0.01 0.05 0.050 -0.024
1.6.2 -0.10 -0.09 -0,14 -0.10 —0.07 0.36 0.02 0.06 0.50 0.00 0.17 0.06 0.20 0.225 -0.082
1.7.1 -007 —0.06 —0.11 -0.08 -0.06 -O.11 -0.07 -0.07 -0.11 -0.06 0.01 ~0.07 0.03 0.007 -0.081
1.7.2 -0.03 -0.05 -0.05 -0.02 -0.01 -0.05 -0.04 -0,05 -0.05 -0.04 -0.05 0.04 0.01 O -0.038
1.8.] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.002 0
1.8.2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.005 0
1.9 l 0.01 0.00 0.00 0.01 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0.01 0.011 0
1.9.2 0.01 0.00 0.00 0.02 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.01 0.01 0.014 0
1.10.1 0.02 0.00 0.00 0.03 0.05 0.00 0.03 0.00 0.00 0.02 0.00 0.01 0.02 0.029 0
SD 0.03 0.03 0.15 0.03 0.04 0.10 0.03 0.03 0,09 0.03 0.05
Ave.Pos.Dif. 0.01 0.02 0.97 0.01 0.03 0.42 0.01 0.02 0.26 0.01 0.08
AveNegDif. -0.03 -0.04 -0.05 -0.04 -0.04 -0.04 -0.04 -0.04 —0.04 -0.03 -0.04

 

98

Table 23 shows the differences between the topic weights in each test blueprint
and the topic weights on the ﬁeld-trial instrument. Positive differences occurred when
topics received a higher weight in the blueprint than on the ﬁeld-trial instrument, and
negative differences occurred when topics received a higher weight on the ﬁeld-trial
instrument than in the blueprint. Tables show standard deviations of absolute weight
differences for each topic and each test blueprint as well as the averages of the positive
and negative weight differences for each topic and each test blueprint.

Looking within blueprints across topics, average positive weight differences
(more weight in the test blueprint than on the ﬁeld-trial instrument) ranged from .01 to
.97. The largest average difference for topics emphasized more in the blueprint than in
the ﬁeld-trial instrument (positive) was for the expert-mapping strict-intersection
blueprint. Strict-intersection blueprints had the largest positive weight differences
(because they included few topics which received much weight), and they had the largest
standard deviations of weight differences. All union and 70%-intersection blueprints had
average positive weight differences of .08 or less, the aggregate70%-intersection
blueprint having the largest difference. Average negative differences (topics emphasized
more in the ﬁeld-trial instrument) were between -.03 and -05.

On average across all test blueprints, nine topics had 0 as an average positive
difference meaning that the topic did not receive more weight in any of the test blueprints
than on the ﬁeld-trial instrument. Topics with the largest average positive weight
difference were 1.3.2 Basic 2D Geometry (.128) and 1.6.2 Equations and Formulas
(.225). The 15 topics not included on the ﬁeld-trial instrument had negative weight

differences of 0. The topics with the highest average negative weight differences (more

99

weight on the ﬁeld-trial instrument than on the test blueprint) were 1.1.2.1 Common
Fractions (.124), 1.6.2 Equations and Formulas (.09), and 1.7.1 Data Representation and
Analysis (.081). On average, topic 1.1.2.1 received over 10% more weight on the ﬁeld-
trial instrument than in the blueprints; topics 1.6.2 and 1.7.1 Uncertainty and Probability
received around 8% to 9% more weight. However, topic 1.6.2 had a higher positive
difference (.225) meaning that, overall, it received more weight on the test blueprints than
on the ﬁeld-trial instrument.

Correlations and Euclidean distance measures. Correlations between the topic
weight patterns (proﬁles) on each test blueprint and the topic weight patterns (proﬁles) on
the ﬁeld-trial instrument are in Table 24.

Table 24
Correlations and Euclidean Distances
between the T opic- Weight Proﬁles for

Each Test Blueprint and the T opic- Weight
Proﬁles for the F ield-Trial Instrument

 

 

 

Euclidean

Correlation Distance
EX-UN 0.590M 0.213
EX-7I 0.560“ 0.208
EX-SI -0.005 1.020
CG-UN 0.454" 0.226
CG-7I 0.374* 0.242
CG-SI 0.277 0.668
TX-UN 0.573" 0.220
TX-7I 0.603M 0.220
TX-SI 0.439" 0.609
AG-UN 0.635** 0.194
AG-7I 0.502M 0.349
Ave 0.45 0.38

 

*p<.05. **p<.01.

100

The correlations ranged from -.005 (expert-mapping strict-intersection blueprint)
to .63 (aggregate-union blueprint). The overall average was .45. In most cases, higher
correlations were between the ﬁeld-trial instnunent and the union blueprints. The
exception was for the textbooks where the largest correlation was between the ﬁeld-trial
instrument topic-weight proﬁles and the 70%-intersection blueprint topic-weight proﬁles.

Euclidean distances between the topic weights on each test blueprint and the topic
weights on the ﬁeld-trial instrument are also shown in Table 24. The distances ranged
from .194 (aggregate-union blueprint) to 1.02 (expert-mapping strict-intersection
blueprint), with an overall average of .38. The smallest distances were between the ﬁeld-

trial instrument and the union blueprints, except for the expert mapping.

Re-Speciﬁcation of Test Blueprints

I had intended to re-compute country scores on the field-trial instrument according
to each test blueprint previously discussed. However, the ﬁeld-trial instrument did not
contain items for every topic of the framework, so I could not obtain country scores for all
topics. I, therefore, had to re-write each test blueprint using only the topics included on
the ﬁeld-trial instrument. Again, I weighted topics according to averages in proportions
of emphasis across countries. I also wrote blueprints for unweighted tests in which I gave
each topic included in a blueprint equal weight. I then compared the correspondence in
topic coverage between the new weighted-test blueprints and each corresponding
curriculum source for each country.

Table 25 provides a summary of the union, 70%-intersection, and strict-

intersection topic weights after removing topics not included on the ﬁeld-trial instrument.

Table 25

101

Topic Weights on Specially-Constructed Test Blueprints

 

 

 

Test

Topic

Code EXl-UN EX1-7I EXl-Sl CG-UN CG-71 CG-Sl TX-UN TX-7l TX-Sl AG-UN AG-71 Average
1.1.1.1 0.015 0 0 0.030 0 0 0.013 0 0 0.007 0 0.006
1.1.1.2 0.021 0 0 0.035 0.042 0 0.036 0.0391 0 0.027 0 0.018
1.1.1.3 0.021 0 0 0.026 0 0 0.019 0.0205 0 0.018 0 0.009
1.1.2.1 0.039 0.044 0 0.022 0 0 0.036 0.0401 0 0.028 0 0.019
1.1.2.2 0034 0.039 0 0.026 0 0 0.021 0.0232 0 0.029 0 0.016
1.1.2.3 0.030 0.034 0 0.027 0 0 0.012 0.013 0 0.019 0 0.012
1.1.2.4 0.031 0.035 0 0.028 0 0 0.032 0.0347 0 0.027 0 0.017
1.1.2.5 0 0 0 0 0 0 0 0 0 0 0 0.000
1.1.3.1 0.039 0.044 0 0.043 0.050 0 0.037 0.0407 0 0.036 0.073 0.033
1.1.3.2 0 0 0 0 0 0 0 0 0 0 0 0.004
1.1.3.3 0 0 0 0 0 0 0 0 0 0 0 0.004
1.1.4.1 0 0 0 0 0 0 0 0 0 0 0 0.000
1.1.4.2 0.044 0.050 0 0.029 0 0 0.037 0.0403 0 0.030 0 0.021
1.1.4.3 0 0 0 0 0 0 0 0 0 0 0 0.000
1.1.4.4 0.030 0.034 0 0.033 0.039 0 0.015 0 0 0.019 0 0.015
1.1.4.5 0 0 0 0 0 0 0 0 0 0 O 0.000
1.1.5.1 0.017 0 0 0.015 0 0 0.002 0 0 0.006 0 0.004
1.1.5.2 0.037 0.042 0 0.025 0 0 0.006 0 0 0.012 0 0.011
1.1.5.3 0.029 0.032 0 0.019 0 0 0.007 0.0077 0 0.013 0 0.010
1.1.5.4 0.033 0.038 0 0.020 0 0 0.006 0 0 0.006 0 0.009
1.2.1 0.035 0.040 0 0.035 0.041 0 0.036 0.0391 0 0.037 0 0.024
1.2.2 0.035 0.039 0 0.040 0.047 0 0.063 0.0696 0 0.051 0.104 0.041
1.2.3 0.022 0 0 0.030 0 0 0.002 0 0 0.006 0 0.005
1.3.1 0.036 0.040 0 0.031 0.037 0 0.030 0.033 0 0.024 0 0.021
1.3.2 0047 0.054 1.00 0.039 0.046 0 0.049 0.0537 0.642 0.049 0.102 0.189
1.3.3 0.042 0.047 0 0.049 0.058 0 0.087 0.0961 0 0.074 0.153 0.055
1.3.4 0.041 0.046 0 0.053 0.063 0.500 0.061 0.0669 0 0.053 0 0.080
1.3.5 0 0 0 0 0 0 0 0 0 0 0 0.000
1.4.1 0.040 0.045 0 0.044 0.052 0 0.050 0.0549 0 0.056 0 0.031
1.4.2 0.038 0.043 0 0.041 0.049 0 0.036 0 0 0.042 0 0.023
1.4.3 0 0 0 0 0 0 0 0 0 0 0.004
1.5.1 0.037 0.041 0 0.039 0.046 0 0.008 0 0 0.019 0 0.017
1.5.2 0051 0.057 0 0.042 0.049 0 0.018 0.020 0 0.031 0 0.024
1.5.3 0 0 0 0 0 0 0 0 0 0 0 0.000
1.5.4 0 0 0 0 0 0 0 0 0 0 0 0.000
1.6.1 0.039 0.044 0 0.049 0.058 0 0.053 0.0588 0.187 0.067 0.138 0.063
1.6.2 0.051 0.057 0 0.053 0.063 0.500 0.183 0.2015 0.171 0.151 0.312 0.158
1.7.] 0.048 0.054 0 0.044 0.052 0 0.043 0.0472 0 0.058 0.119 0.042
1.7.2 0.019 0 0 0.033 0.038 0 0.003 0 0 0.007 0 0.009
1.8.1 0 0 0 0 0 0 0 0 0 0 0 0.000
1.8.2 0 O 0 0 0 0 0 O 0 0 0 0.000
1.9.1 0 0 O 0 0 0 0 0 0 0 0 0.000
1.9.2 0 0 0 0 0 0 0 0 0 0 O 0.000
1.10.1 0 0 0 0 0 0 0 0 0 0 0 0.004

 

102

These proportions were scaled to sum to one across topics. Overall, the highest weights
were given to topic 1.3.2 Basic Geometry and topic 1.6.2 Equations and Formulas. Table
26 provides an overview of the blueprints on which I compared country-level

performance.

Comparisons of Curriculum to Unique Specially-Constructed—Test Blueprints

The ﬁrst sets of comparisons I conducted were between the unique specially-
constructed- (SC) test blueprints developed for each country and each corresponding
curriculum-data source. That was, I compared each unique SC-test blueprint based on the
expert mapping to the corresponding country’s expert-mapping data, I compared each
unique SC-test blueprint based on the curriculum-guide analyses to the corresponding
country’s curriculum-guide data, and so forth. This provided an indication of the best
possible match that could occur between any test developed using the ﬁeld-trial
instrument topics and each country’s corresponding data source.

I conducted much the same analyses as before, but adapted them as needed to ﬁt
the particular situation. I did not compute the proportion of items in each unique SC-test
blueprint that were in each country’s curricula since this would naturally be 100%. I
likewise did not compute the proportion of topics in each country’s curriculum that was
included on the unique SC-test blueprints. No additional topics were included in the
unique SC-test blueprints than were included on the ﬁeld-trial instrument. Therefore, the
proportions of curricula tested would be the same proportions as reported in Table 11.

Differences in topic inclusion and emphasis. Table 27 shows summaries

of differences in topic inclusion between each country’s curriculum data source and the

103

.333 0859030:

05 :0 0050:8000 03 183002 on 2303 $33 003303 05 :0 0050:0050 .80-0205. .2303

3300 03002 32% 83.30E30 05 E 8303 =m 390300 a 55:5 9 .0058 Sac owmwoewwm

05 .00 00303.25 8.5m 05 000 3500033 383 2: E 00508 830. 02 u .52 BEE-03::

05 :0 0058.80.50 03 303002 on 2303 $8 08383 05 :0 oocmsco-toa 6000205-

33303 3000 003002 500 5-00 .80 05.5033 05 E 8&9 030 2:- u .33 00390303

05 :0 005508000 03 .8352 00 2303 “m8 BEE”; 05 :0 005E860 600022:- .5

km 00.0 30:00.33 68 05 :0 08mg 3003 0:0 300 a 2975 0200s 3087.. 0000—050 33 38
00:5 085E030: 0:0 3:0 .20-«22:. .8503 083 05 .0 0031.800 355033 68 00:5 =< a

.38 SEE?» 0 @352 03 $2 a 20.000 00m: ﬂ ..>?.< NNOZ

 

03.3.06 03.5.06 .03-00 033E050 030.60
2000- 000:5
a. 5.550 .300 pain-3 85:80

:0 E 830..-
.00 0030803:—

 

K-oﬁav 3593 K0005 SEEE 85:80
0o £2 5 8&3

.«0 0030088:—

A.23-03$ «ZS-02.3 503-003 «ZS-gm? 830300 Ew $080
8&8. .00 00:5

00308085 8.008on 82:0 33me tomxm 0282-03003
oamwouwwe. 8232530 .30 00502

 

00§0m 8:30:50

 

attkmém 3.0-m .320:h§094$§00&h .8.\ “$000

on 050..-

104

corresponding unique SC-test blueprint only for topics not included on the ﬁeld-trial
instrument. These differences are identical to those in Table 12. All other differences
were 0 since topics not in a country’s curricula would not be included on its unique SC-
test blueprint. What should be noted, however, are the numbers in the ﬁnal row of the
table. These can be compared to future test blueprints to determine if there is an
improvement in test-curriculum match. An ideal match would result in all differences
being 0 and proportions of match being 1.0. The topic inclusion on the test blueprints did
not correspond exactly with the curricula because not all topics were included on the test
blueprints. The inclusive test blueprints I develop, will not have lower differences or
higher matches than these; the goal will be to come as close to these as possible.

Table 28 shows the summary information on the correspondence in topic
inclusion between the test blueprints and the curriculum for each country. The numbers
in the column “In Curr.” are identical to those in Table 13. “Prop. Match” is the
proportion of topics within a country’s curriculum-data source that are included on the
corresponding unique SC-test blueprints. Again, it is the best match expected given the
topics on the ﬁeld-trial instrument. Country agreement in topic inclusion with the ﬁeld-
trial instrument ranged from .66 (country Q, curriculum guide) to 1.00 (countries E and
O, aggregate-data source). Averages of the proportions of topics both in the curriculum
and the field-trial instrument or not in both were around .90 with the exception of country
Q (.74). The average number of countries including topics in a particular data source that
were not in the corresponding unique SC-test blueprint ranged from around three to six,
with the lowest number being for the aggregate-data source and the highest number being

for the textbooks.

Table 27

105

Numbers and Proportions of Countries Including Topics in Curriculum Sources that
are not on Corresponding Unique-Test Blueprints

 

 

 

 

Expert Curriculum
Mapping Guide Textbook Aggregate

Ave # Ave.

# Mis- Prop. # Mis- Prop. # Mis- Prop. # Mis- Prop. Mis— Prop.

Topic Code Match Match Match Match Match Match Match Match Match Match
1.1.2.5 12 0.29 5 0.71 11 0.35 4 0.76 8 0.53
1.1.3.2 14 0.18 13 0.24 12 0.29 8 0.53 11.8 0.31
1.1.3.3 11 0.35 12 0.29 l l 0.35 6 0.65 10 0.41
1.1.4.1 2 0.88 5 0.71 4 0.76 0 1 2.75 0.84
1.1.4.3 0 1 2 0.88 3 0.82 0 1 1.25 0.93
1.1.4.5 2 0.88 4 0.76 7 0.59 1 0.94 3.5 0.79
1.3.5 1 0.94 9 0.47 7 0.59 l 0.94 4.5 0.74
1.4.3 13 0.24 12 0.29 9 0.47 5 0.7] 9.75 0.43
1.5.3 10 0.41 8 0.53 6 0.65 3 0.82 6.75 0.60
1.5.4 6 0.65 6 0.65 4 0.76 0 1 4 0.76
1.8.1 0 1 2 0.88 4 0.76 0 1 1.5 0.91
1.8.2 0 1 3 0.82 0 l O 1 0.75 0.96
1.9.1 7 0.59 4 0.76 10 0.41 3 0.82 6 0.65
1.9.2 8 0.53 7 0.59 9 0.47 5 0.71 7.25 0.57
1.10.1 10 0.41 14 0.18 11 0.35 8 0.53 10.8 0.37
Average 4.64 0.73 5.52 0.68 5.16 0.64 6.20 0.36 5.38 0.60

 

Tables 29 and 30 show the differences between curriculum-data sources and each
corresponding unique SC-test blueprint in topic emphasis. Table 29 highlights
differences across topics, and Table 30 highlights differences across countries. Positive
differences occurred when topics received a higher emphasis in the curriculum than on
the corresponding test blueprint, and negative differences occurred when topics received a
higher emphasis on the test blueprint than in the curriculum. Tables show standard
deviations of absolute emphasis differences for each topic and country as well as the

averages of the positive and negative differences for each topic (Table 29) and each

country (Table 30). The table also shows averages of these numbers across data sources.

 

 

 

106

Table 28

Numbers and Proportions of Topics in Curriculum Sources that are Included on
Corresponding Unique- Test Blueprints

 

Expert Curriculum
Mapping Guide Textbook Aggregate

 

Ave. Ave.
Prop. In Prop. In Prop. In Prop. In Prop. In
Country Match Curr. Match Curr. Match Curr. Match Curr. Match Curr.

 

 

A 0.86 6 0.91 4 0.84 7 0.98 1 0.90 4.5
B 0.86 6 0.89 5 0.91 4 0.95 2 0.90 4.3
C 0.86 6 0.73 12 0.73 12 0.86 6 0.80 9.0
D 0.82 8 0.86 6 0.82 8 0.91 4 0.85 6.5
E 0.84 7 0.95 2 0.86 6 1.00 0 0.91 3.8
F 0.95 2 0.75 11 0.89 5 0.98 1 0.89 4.8
G 0.93 3 0.95 2 0.86 6 0.98 l 0.93 3.0
H 0.89 5 0.95 2 0.84 7 0.95 2 0.91 4.0
l 0.89 5 0.84 7 0.82 8 0.91 4 0.86 6.0
I 0.86 6 0.84 7 0.95 2 0.98 1 0.91 4.0
K 0.93 3 0.89 5 0.89 5 0.98 1 0.92 3.5
L 0.84 7 0.91 4 0.86 6 0.95 2 0.89 4.8
M 0.84 7 0.82 8 0.84 7 0.91 4 0.85 6.5
N 0.77 10 0.86 6 0.80 9 0.86 6 0.82 7.8
O 0.91 4 0.89 5 0.98 1 1.00 0 0.94 2.5
P 0.98 1 0.89 5 0.93 3 0.98 1 0.94 2.5
Q 0.77 10 0.66 15 0.73 12 0.82 8 0.74 11.3
Average 0.87 5.6 0.86 6.24 0.86 6.35 0.94 2.6 0.88 5.21

 

1 07
Table 29

Diﬂerences in Topic Emphasis for Each Topic across Countries on Unique- Test
Blueprints and Corresponding Curriculum Sources

 

 

 

 

Expert Mapping Curriculum Guide Textbook

Ave. Ave. Ave. Ave. Ave. Ave.
Topic SD of P05. Neg. SD of P05. Neg. SD of P05. Neg.
Code All Dif. Dif. Dif. All Dif. Dif. Dif. A11 Dif. Dif. Dif.
1.1.1.1 0.003 0 -0.005 0.006 0 -0.009 0.005 0.004 -0.001
1.1.1.2 0.003 0 -0.005 0.006 0 ~0.010 0.005 0.005 -0.003
1.1.1.3 0.003 0 -0.006 0.006 0 -0.010 0.005 0.004 0004
1.1.2.] 0.008 0 -0.009 0.005 0 -0.008 0.005 0.005 -0.005
1.1.2.2 0.008 0 -0.008 0.004 O -0.008 0.004 0.004 -0.001
1.1.2.3 0.008 0 -0.008 0.008 0 -0.012 0.002 0.002 -0.003
1.1.2.4 0.004 0 -0.007 0.005 0 -0.010 0.008 0.003 -0.017
1.1.2.5 0.026 0.035 0 0.013 0.028 0 0.010 0.009 0
1.1.3.1 0.004 0 -0.007 0.007 0 -0.01 1 0.006 0.004 -0.007
1.1.3.2 0.025 0.037 0 0.018 0.037 0 0.071 0.040 0
1.1.3.3 0.018 0.032 0 0.023 0.039 0 0.064 0.039 0
1.1.4.1 0.008 0.023 0 0.015 0.031 0 0.003 0.004 0
1.1.4.2 0.005 0 -0.009 0.008 0 0 0.013 0.01 I -0.003
1.1.4.3 0.000 0 0 0.008 0.024 0 0.001 0.001 0
1.1.4.4 0.005 0 -0.008 0.008 0 -0.01 1 0.009 0.006 0.000
1.1.4.5 0.011 0.034 0 0.012 0.028 0 0.006 0.006 0
1.1.5.1 0.003 0 -0.005 0.004 0 0 0.001 0.001 0.000
1.1.5.2 0.006 0 -0.008 0.005 0 0 0.001 0.002 -0.001
1.1.5.3 0.004 0 -0.007 0.005 0 0 0.001 0.001 0.000
1.1.5.4 0.005 O -0.008 0.005 O 0 0.001 0.002 0
1.2.1 0.004 0 -0.007 0.005 0 0 0.005 0.004 -0.008
1.2.2 0.004 0 -0.007 0.008 O 0 0.013 0.016 -0.004
1.2.3 0.005 0 -0.007 0.006 O 0 0.001 0.001 0
1.3.1 0.006 0 -0.008 0.005 O -0.010 0.007 0.006 -0.012
1.3 .2 0.005 O -0.009 0.007 0 0 0.010 0.009 -0.003
1.3.3 0.006 0 -0.008 0.007 0 -0.012 0.022 0.023 -0.008
1.3.4 0.005 0 -0.009 0.007 0 -0.012 0.053 0.024 -0.040
1.3.5 0.007 0.028 0 0.017 0.033 0 0.013 0.013 0
1.4.1 0.006 0 -0.008 0.005 0 -0.01 1 0.011 0.014 -0.007
1.4.2 0.007 0 -0.010 0.008 0 -0.012 0.012 0.012 -0.002
1.4.3 0.018 0.031 0 0.017 0.035 0 0.012 0.016 0
1.5.1 0.004 0 -0.007 0.005 0 -0.010 0.001 0.002 -0.002
1.5.2 0.007 0 -0.010 0.005 0 -0.010 0.004 0.004 -0.001
1.5.3 0.022 0.035 0 0.027 0.045 0 0.025 0.041 0
1.5.4 0.017 0.031 0 0.018 0.036 0 0.004 0.007 0
1.6.1 0.007 0 -0.010 0.007 0 -0.012 0.024 0.015 -0.004
1.6.2 0.007 0 -0.01 1 0.007 0 -0.012 0.043 0.039 -0.013
1.7.1 0.007 0 -0.010 0.005 0 -0.01 1 0.005 0.006 -0.004
1.7.2 0.005 0 -0.007 0.005 0 -0.010 0.000 0.001 -0.001
1.8.] 0.000 0 0 0.012 0.034 0 0.001 0.003 0
1.8.2 0.000 0 0 0.01 1 0.028 0 0 0 0
1.9.] 0.018 0.031 0 0.016 0.034 0 0.072 0.037 0
1.9.2 0.014 0.027 0 0.020 0.036 0 0.034 0.040 0
1.10.1 0.015 0.028 0 0.017 0.038 0 0.062 0.056 0
Average 0.008 0.008 -0.005 0.010 0.01 I -0.007 0.015 0.012 -0.004

 

108

 

 

 

Table 29 (Contd.)
Aggregate
Ave. Ave. SD of

Topic SD of P03. Neg. Ave. of Ave.of SD of Neg. SD of
Code All Dif. Dif. Dif. Pos. Dif. NeéDif. Pos. Dif. Dif. All Dif.
1.1.1.1 0.002 0 -0.006 0.001 -0.005 0.002 0.003 0.004
1.1.1.2 0.003 0 -0.005 0.001 -0.006 0.002 0.003 0.004
1.1.1.3 0.002 0 -0.004 0.001 -0.006 0.002 0.002 0.004
1.1.2.1 0.007 0 -0.007 0 -0.007 0.002 0.001 0.005
1.1.2.2 0.014 0 -0.010 0.001 -0.007 0.002 0.004 0.005
1.1.2.3 0.003 0 -0.004 0.000 -0.007 0.001 0.003 0.004
1.1.2.4 0.004 0 -0.006 0.001 -0.010 0.001 0.004 0.006
1. 1.2.5 0.014 0.031 0 0.026 0 0.010 0.000 0.015
1.1.3.1 0.004 0 -0.005 0.001 -0.008 0.002 0.002 0.005
1.1.3.2 0.076 0.068 0 0.045 0 0.013 0.000 0.025
1.1.3 .3 0.028 0.048 O 0.040 0 0.006 0.000 0.020
1.1.4.1 0 0 0 0.014 0 0.013 0.000 0.012
1.1.4.2 0.006 0.000 -0.008 0.003 -0.008 0.005 0.003 0.007
1.1.4.3 0 0 0 0.006 0 0.010 0.000 0.008
1. 1.4.4 0.005 0 -0.006 0.002 -0.006 0.003 0.004 0.005
1.1.4.5 0.007 0.031 0 0.025 0 0.01 1 0.000 0.015
1.1.5.1 0.002 0 -0.003 0.000 -0.004 0.000 0.003 0.003
1.1.5.2 0.002 0 -0.003 0.000 -0.005 0.001 0.004 0.004
1.1.5.3 0.002 O -0.004 0.000 -0.005 0.001 0.003 0.003
1.1.5.4 0.002 0 -0.004 0.000 -0.005 0.001 0.004 0.004
1.2.] 0.003 0.000 -0.005 0.001 -0.007 0.002 0.002 0.005
1.2.2 0.006 0.000 -0.007 0.004 -0.008 0.007 0.003 0.008
1.2.3 0.001 0 -0.003 0.000 -0.005 0.000 0.004 0.004
1.3.1 0.004 0.000 -0.005 0.001 -0.009 0.002 0.002 0.006
1.3 .2 0.009 0.000 -0.009 0.002 -0.008 0.004 0.003 0.006
1.3.3 0.016 0.000 -0.014 0.006 -0.01 1 0.010 0.003 0.01 1
1.3.4 0.018 0.000 -0.017 0.006 -0.020 0.01 1 0.012 0.017
1.3.5 0.009 0.037 0 0.028 0 0.009 0.000 0.015
1.4.1 0.030 0.000 -0.021 0.003 -0.012 0.006 0.005 0.009
1.4.2 0.017 0.000 -0.016 0.003 -0.010 0.005 0.005 0.008
1.4.3 0.013 0.027 0 0.027 0 0.007 0.000 0.014
1.5.] 0.002 0 -0.004 0.000 -0.006 0.001 0.003 0.004
1.5.2 0.003 0.000 -0.004 0 -0.006 0.002 0.004 0.005
1.5.3 0.028 0.054 0 0.044 0 0.007 0.000 0.022
1.5.4 0 0 0 0.018 0 0.015 0.000 0.014
1.6.1 0.021 0 -0.017 0.004 -0.010 0.007 0.005 0.009
1.6.2 0.043 0.000 -0.033 0.010 -0.017 0.017 0.009 0.019
1.7.1 0.014 0 -0.01 1 0 -0.009 0.003 0.003 0.006
1.7.2 0.002 0 -0.003 0.000 -0.005 0.000 0.003 0.004
1.8.1 0 0 0 0.009 0 0.014 0.000 0.01 1
1.8.2 0 0 0 0.007 0 0.012 0.000 0.009
1.9.1 0.024 0.047 0 0.037 0 0.006 0.000 0.019
1.9.2 0.019 0.037 0 0.035 0 0.005 0.000 0.018
1.10.1 0.032 0.046 0 0.042 0 0.010 0.000 0.022
AveLage 0.01 I 0.010 -0.006 0.010 -0.005 0.006 0.002 0.010

 

109

As was the case earlier, three topics (1.1.2.1 Common Fractions, 1.5.2
Proportionality Problems, 1.7.1 Data Representation and Analysis) had 0 as an average
positive difference meaning that the topic was not more emphasized in the curriculum of
any country than in the test blueprint (see Table 29). Topics with the highest positive
differences were 1.1.3.2 Rational Numbers (.045); 1.1.3.3 Real Numbers (.04); 1.5.3
Slope and Trigonometry (.044); and 1.10 Other Content (.042).

The only topics with 0 as an average negative difference in emphasis were those
averages for topics not included on the test blueprint. Topics with the highest negative
differences were 1.3.4 3-D Geometry and 1.6.2 Equations and Formulas. Their average
were -.02 and -.017 respectively. The average of the positive average differences was .01
while the average of the negative average differences was -.005. These numbers were
much smaller than in Table 14. As was the case earlier, the largest differences were
between topic emphasis in the textbook-data source and topic weight on the
corresponding unique SC-test blueprints. Column sums and averages will be compared
with those in future analyses.

Table 30 compares variability in topic emphasis differences within countries.
Across data sources, standard deviations and means were similar. Averages across
country differences in textbook emphasis versus emphasis in the unique SC-test
blueprints based on the textbooks were smaller than the differences for other data sources
and corresponding tests. Differences were largest between topic emphasis in the
aggregate of the data sources and topic weight on the corresponding unique SC-test
blueprint. Average positive differences in emphasis (more weight in curriculum-data

source) within countries across topics ranged from 0 (countries E and O, aggregate) to

110

 

 

 

 

 

08.0 000.0 28.0 08.0- 080 08.0- 080 20.0 80.0- 08.0 20.0 20.0- 80.0 08.0 000.0- 08.0 v8.0 000.852
20.0 80.0 80.0 80.0- 080 000.0. 08.0 08.0 0 80.0 08.0 20.0- 80.0 08.0 000.0- 80.0 20.0 0
08.0 80.0 08.0 000.0- 08.0 80.0- 08.0 80.0 80.0- 800 80.0 80.0- 800 20.0 80.0- 800 20.0 0
08.0 000.0 20.0 000.0- v8.0 0 0 0 0 20.0 20.0 20.0- 000.0 08.0 80.0- 080 20.0 0
08.0 08.0 v8.0 20.0- 80.0 08.0- 08.0 08.0 0 80.0 20.0 80.0. 080 08.0 80.0- 080 20.0 2
20.0 80.0 80.0 80.0- 080 000.0. 08.0 08.0 0 08.0 08.0 .80- 800 08.0 20.0- 80.0 20.0 E
08.0 80.0 20.0 80.0- 080 80.0- 80.0 80.0 0 80.0 80.0 000.0- 80.0 20.0 80.0- 080 20.0 A
08.0 000.0 _8.0 80.0- 08.0 80.0- 0000 20.0 0 000.0 20.0 20.0- 080 08.0 80.0- 08.0 20.0 v—
080 80.0 08.0 08.0- 800 80.0- 800 80.0 80.0.. 80.0 80.0 08.0- 0000 08.0 08.0- 20.0 08.0 a
20.0 80.0 80.0 80.0- 08.0 80.0- v8.0 20.0 0 08.0 08.0 000.0- ~80 20.0 80.0- 080 000.0 _
20.0 20.0 08.0 20.0- 08.0 800- 0000 v8.0 0 80.0 000.0 08.0- 80.0 _8.0 80.0- 800 20.0 I
02.0 80.0 22.0 08.0- 820 80.0- 2.8.0 08.0 0 08.0 20.0 80.0- 080 20.0 v8.0- 080 08.0 0
08.0 80.0 80.0 80.0- 08.0 80.0- 08.0 80.0 20.0- 08.0 08.0 20.0- 08.0 20.0 80.0- 080 000.0 m
08.0 08.0 08.0 08.0- 080 08.0- 0 08.0 80.0- 08.0 000.0 000.0- 20.0 08.0 20.0- 80.0 80.0 m
v8.0 80.0 000.0 000.0- 08.0 80.0- 080 000.0 0 000.0 20.0 000.0- 08.0 20.0 80.0- 080 08.0 D
20.0 80.0 000.0 80.0- 080 80.0- 080 20.0 0 20.0 20.0 20.0- 08.0 08.0 80.0- 08.0 08.0 0
08.0 80.0 20.0 08.0- 08.0 08.0- 800 80.0 08.0- 080 08.0 20.0- 80.0 08.0 000.0- 08.0 20.0 m
08.0 80.0 20.0 80.0- 80.0 80.0- 08.0 80.0 0 80.0 80.0 000.0- 08.0 20.0 000.0- 08.0 08.0 <
.05—8. .05 .05 .05 .05 .05 .05 .05—2 .05 .05 .05—2 .05 .05 .05—2 .05 .05 05:0. 80:00
00 Q0. .82 0.00 .002 .80 .002 .80 00 Q0. 002 .000 00 Q0. .002 .80 00 Q0. .82 .80 00 Q0.
00 Q0- 00 Q0- .o>< .0>< .0>< .o>< .o>< .o>< .0>< .o>< .o>< .o><
J§< 0.000080- 0200 82:00.50 2 $350002 tomxm

 

000.500. 55:02.59 0.2888330 0050 0.35-5.03% 20.0-030.005 :0 0030.0 0.0.0.80 0530.0 £00m L0\&3&QEQ 030.0 5 000:0g0-00Q

0m 030,—-

111

.327 (country G, aggregate), with an average of around .04. Average negative differences
in emphasis (more weight in unique SC-test blueprint) ranged from 0 (Country 0,
aggregate; most countries on textbooks) to -.08 ( Country G, aggregate), with an average
of -.01. Smaller overall positive and negative differences were noted for countries A and
D, and larger differences in emphasis were noted for countries G and H.

Correlations and Euclidean distance measures. Correlations between the
proportions-of-topic-emphasis proﬁles in each curriculum-data source and topic-weight
proﬁles on each corresponding unique SC-test blueprint are in Table 31. The correlations
ranged from 0 (Country Q, curriculum guide) to 1.00 (countries J and P, Textbooks;
countries E and 0, Aggregate) with an overall average of .84. The average correlation
between topic emphasis proﬁles and ﬁeld-trial topic proﬁles within data sources across
countries was highest for the textbook- and aggregate-data-source topic-emphasis-proﬁles
and the corresponding unique SC-test blueprints. The lowest was between the
cuniculum-guide-data—source topic-emphasis-proﬁles and the corresponding test-
blueprint topic-weight proﬁles. However, this data source for each country consisted of
either a O or a proportion which was always the same proportion across all topics included
in a country’s curriculum-guide-data source. Average correlations for countries across
data sources varied. These ranged from a low of .655 (country Q) to a high of .925
(country 0).

Euclidean distances between each country’s topic emphasis proﬁles and topic
weight proﬁles on the corresponding test blueprint are also shown in Table 31. The
distances ranged from 0 (countries E and O, aggregate) to .54 (country N, textbooks),

with an overall average of .10. The largest average distance was found between the

112

textbook-data-source topic-emphasis proﬁles and the corresponding unique SC-test

blueprint topic weight proﬁles.

The smallest was between the aggregate of the data

sources and the corresponding unique SC-test blueprint. The smallest average distances

for a country were for countries A (.049), L (.058), and P (.055). The largest were for

countries N (.249) and G (.237). Average standard deviations of the Euclidean distances

across countries (within curriculum-data sources) were between .24 and .34.

Table 31

Correlations and Euclidean Distances between The Proportions-of-Topic-Emphasis
Proﬁles for Each Country in Each Curriculum-Data Source and the Topic-Weight
Proﬁles for Each Corresponding Unique-T est Blueprint

 

Correlations

Euclidean Distance

 

 

 

Country EX- CG- TX— AG- EX- CG- TX- AG-

UQ UQ UQ UQ Average UQ UCL UQ UQ Average
A 0.873 0.825 0.994 0.989 0.920 0.068 0.074 0.026 0.029 0.049
B 0.727 0.796 0.811 0.821 0.789 0.103 0.105 0.243 0.143 0.148
C 0.668 0.451 0.969 0.872 0.740 0.104 0.107 0.101 0.079 0.097
D 0.881 0.705 0.870 0.875 0.833 0.063 0.077 0.095 0.061 0.074
E 0.723 0.913 0.994 1.000 0.907 0.144 0.067 0.042 0.000 0.063
F 0.961 0.541 0.803 0.990 0.824 0.040 0.111 0.161 0.031 0.086
G 0.776 0.909 0.795 0.749 0.807 0.179 0.083 0.308 0.378 0.237
H 0.842 0.878 0.974 0.912 0.902 0.090 0.142 0.064 0.158 0.113
I 0.892 0.709 0.861 0.869 0.833 0.063 0.097 0.144 0.083 0.097
J 0.836 0.725 1.000 0.993 0.889 0.134 0.125 0.009 0.046 0.078
K 0.892 0.788 0.995 0.979 0.913 0.070 0.129 0.080 0.071 0.088
L 0.746 0.821 0.996 0.958 0.880 0.086 0.072 0.027 0.047 0.058
M 0.832 0.678 0.985 0.958 0.863 0.102 0.106 0.159 0.068 0.109
N 0.619 0.728 0.815 0.793 0.739 0.087 0.179 0.538 0.192 0.249
O 0.931 0.791 0.977 1.000 0.925 0.078 0.122 0.110 0.000 0.077
P 0.917 0.788 1.000 0.976 0.920 0.073 0.085 0.013 0.047 0.055
Q 0.820 0 0.932 0.868 0.655 0.088 0.108 0.101 0.090 0.097
Ave 0.820 0.709 0.928 0.918 0.843 0.092 0.105 0.131 0.090 0.104

 

Note. All correlations (but country Q, tx-uq) are signiﬁcant; p<.01.

113

Comparisons of the Curriculum to Inclusive Specially-Constructed—T est Blueprints

1 evaluated the correspondence in topic coverage between test blueprints and the
curriculum one ﬁnal time. For these ﬁnal analyses, I evaluated the correspondence
between topic coverage on each specially-constructed union- and 70%-intersection-test
blueprint and topic coverage in each country’s corresponding curriculum-data source. I
did not use any of the strict-intersection-test blueprints in these analyses. These test
blueprints were limited in scope, and I would not expect to ﬁnd a high quantitative match
between them and the data sources.

Proportions of items/curricula covered. Table 32 shows the proportions of
“items” (i.e., sum of topic weights) on each inclusive SC-test blueprint that measured
topics included in each of the corresponding curriculum-data sources for each country.
These proportions ranged from .30 to 1.00 (BO-100%) with an average of .82. Only two
of the averages of the proportions of “items” on the inclusive SC-test blueprints
measuring topics included in the corresponding curriculum-data sources were below .80
for any of the test blueprints. These exceptions were for the curriculum-guide union SC-
test blueprint and aggregate union SC-test blueprint. For the countries, average
proportions of “item” coverage ranged from .57 to .96. Country D had proportions of 1.0
for nearly all test blueprints. This meant that all topics included in most test blueprints
also were included in the corresponding curriculum-data source.

The proportions of each country’s curricula that were covered on the
corresponding inclusive SC-test blueprints are shown in Table 33. Union-test blueprints
had the same items as the ﬁeld-trial instrument so proportions would be the same as in

Table 11 and are not shown here. The proportions of coverage in Table 33 were more

114

 

 

 

 

00. 2 mm .0 0 2 .0 00.0 00. 2 00. 2 00.0 00. 2 00. 2 00. 2 00. 2 00.0 00.0 00. 2 80 00. 2 00. 2 00. 2 00. 2 000 00.0 0322
00.0 000 202.0 00.0 02.0 00.0 00.0 02.0 02.0 000 80 200 N00 000 000 00.0 :0 8.0 00.0 00.0 02-0 :22
00.0 8.0 8.0 00.0 0 2 .0 v 2 .0 m 2 .0 0~0 00.0 80 0 2 .0 2 .0 00.0 80 2 .0 v 2 .0 v2 .0 00.0 80 00.0 00.0 020
00.2 30 020 N00 200 N00 02-0 00.0 00.0 00.0 N00 000 200 :0 000 000 00.0 00.0 000 00.0 00.0 0080><
00. 2 02.0 h 2 .0 N00 0200 00.2 m m .0 >00 00.0 00. 2 002 80 N00 00. 2 8.0 m 0 .0 00. 2 8.0 00. 2 00.0 00.0 20-0.00
00.0 000 020 00.0 200 00.0 00.0 00.0 00.0 2.00 8.0 200 N00 000 00.0 02.0 5.0 00.0 00.0 020 2.2.0 ZD-0<
00. 2 2m .0 2V 2 .0 00.0 00. 2 000 00.0 00.0 00. 2 00.0 8.0 2 m .0 00.0 2.00 02.0 00.0 200 00. 2 00. 2 000 200 22.-0P2.
00. 2 N00 20 2 .0 00.0 00.0 00.0 00.0 000 00. 2 200 200 8.0 00.0 20.0 00.0 050 00.0 00. 2 00.0 00.0 00.0 75-08-
00. 2 2.0 0 2 .0 v0.0 00. 2 00.0 00.0 00.0 00.0 00. 2 000 00.0 20.0 2.0 00.0 00. 2 3-0 00. 2 00.0 000 000 20-00
00. 2 000 0 2 .0 :0 00. 2 2000 N00 0200 000 2.0.0 200 8.0 00.0 00.0 00.0 8.0 00.0 00. 2 >00 00.0 00.0 75-00
00. 2 0200 m 2 .0 00.0 00.0 8.0 8.0 00. 2 00.0 00. 2 N00 00 .0 00.0 000 0200 000 80 00. 2 2 0.0 00.0 00.0 20-0mm
00. 2 020 0 2 .0 N00 00.0 22-0 00.0 00. 2 00.0 00.0 000 N0 .0 8.0 .000 20 ~00 00.0 00. 2 00.0 00.0 00.0 75-08
0922 7:2 020 m2>< O .2 0 Z 22 2 v2 2 2 I 0 "2 m D 0 m < 08.2.
.9580

 

00.300. 55:02.50 0502002000000 000M 2.2 22.220033 000.0-03.3002: =0 0800~sx0 020.22.200.0LQ

8 0200.2-

 

115

 

 

 

 

N00 000 2.0 00.0 2.0 8.0 000 000 N00 80 00.0 :0 00.0 000 05.0 20.0 00.0 02.0 02.0 2.0 200 0822
00.0 8.0 00.0 2.0 .000 0.00 30.0 00.0 500 2.0.0 00.0 200 N00 200 00.0 8.0 00.0 0N0 8.0 00.0 5.0 :22
00.0 2.20 8.0 2.0 02.0 0N0 2.2.0 020 2.0 0N0 220 2000 02.0 2.0 2.0.0 20 02.0 0N0 2N0 2.0 2-20 00
02.0 2.20 00.0 000 2000 20.0 >00 00.0 00.0 00.0 00.0 00.0 00.0 00.0 00.0 8.0 00.0 00.0 00.0 2.00 00.0 0080><
00.0 8.0 2.0 200 2000 0N0 3.0 2-00 50.0 2.00 00.0 00.0 8.0 20.0 00.0 8.0 00.0 0N0 8.0 00.0 :0 20-000
80 00.0 00.0 00.0 2-0 N00 000 000 N00 N00 00.0 202-0 20.0 20.0 8.0 200 000 00.0 02.0 00.0 20.0 22.-00.2.
2.0 00.0 00.0 00.0 00.0 0200 2.0.0 000 N00 000 00.0 00.0 0200 200.0 80 2.200 00.0 0200 30.0 00.0 00.0 20-00
000 N00 00.0 2.0 00.0 00.0 000 N00 000 00.0 20.0 00.0 00.0 00.0 00.0 00.0 2.00 80 00.0 00.0 000 22.-X0
09.22222 00 m2>< 0 n2 0 Z 22 2 v2 2. 2 I 0 ...2 m D 0 m < “mo-2-
9:200

 

3.2.3035 000.0-03.3205 00.20.203.300 :0 00000-0 5530.200 0 0.002.500 2200.20 020.000.0LQ

mm 0200-2-

 

 

 

116

variable than those in previous similar tables. They ranged from .23 to .92. The average
was .63. Average proportions of coverage on SC test-blueprints for each data source
were around .43 to .80. Average proportions of coverage on SC-test blueprints for each
country ranged from .53 to .75. The highest proportions of coverage were for the
textbook data sources. The lowest proportions were for the aggregate-data source.

Differences in topic inclusion and emphasis. Differences in topic inclusion
between each curriculum-data source and the corresponding inclusive SC-test blueprints
are presented for topics in Table 34. This table can be compared to Table 12 and Table
22. On average, the inclusive specially-constructed-test blueprints had 30% more of a
mis-match to the curriculum in topic inclusion (topics on the test blueprint and not in the
curriculum or in the curriculum and not in the test blueprint) than did the unique
specially-constructed-test blueprints (see Table 22). The improvement over the
correspondence in topic inclusion between the ﬁeld-trial instrument and the curriculum
was minimal (see Table 12). The most improvement was for the aggregate-test blueprint.

The average of the proportions of countries with a correspondence in topic
inclusion between the inclusive SC-test blueprints and the corresponding data source
ranged from .31 (1.3.2. 2-D Geometry) to .97 (1.6.2 Equations and Formulas) for topics
and from .64 to .72 for curriculum sources. The topics with the lowest and highest rates
of match were the same as reported in Table 12. The lowest rate of correspondence in
inclusion or non-inclusion for data sources was for the curriculum-guide-data source.
The highest rate of correspondence was for the expert mapping.

Table 35 shows the summary information on correspondence in topic inclusion

between the ﬁeld-trial instrument and the curriculum-data sources for countries. This

117

Table 34

Diﬂkrences in Topic Inclusion between Each Inclusive— Test Blueprint and Each
Corresponding Curriculum Source for Each Topic

 

 

 

 

EX 71 CG 71 TX 71 AG 71
Ave. #

Topic # Mis- Prop. # Mis- Prop. # Mis- Prop. # Mis- Prop. Mis- Prop.
Code Match Match Match Match Match Match Match Match Match Match
1.1.1.1 8 0.53 10 0.41 11 0.35 3 0.82 8.0 0.53
1.1.1.2 9 0.47 5 0.71 2 0.88 8 0.53 6.0 0.65
1.1.1.3 8 0.53 9 0.47 2 0.88 6 0.65 6.3 0.63
1.1.2.1 3 0.82 9 0.47 1 0.94 8 0.53 5.3 0.69
1.1.2.2 3 0.82 10 0.41 2 0.88 9 0.47 6.0 0.65
1.1.2.3 5 0.71 10 0.41 2 0.88 9 0.47 6.5 0.62
1.1.2.4 5 0.71 11 0.35 3 0.82 8 0.53 6.8 0.60
1.1.2.5 12 0.29 5 0.71 11 0.35 4 0.76 8.0 0.53
1.1.3.1 3 0.82 2 0.88 2 0.88 5 0.71 3.0 0.82
1.1.3.2 14 0.18 13 0.24 12 0.29 8 0.53 11.8 0.31
1.1.3.3 11 0.35 12 0.29 11 0.35 6 0.65 10.0 0.41
1.1.4.1 2 0.88 5 0.71 4 0.76 0 1.00 2.8 0.84
1.1.4.2 2 0.88 6 0.65 3 0.82 10 0.41 5.3 0.69
1.1.4.3 0 1 2 0.88 3 0.82 0 1.00 1.3 0.93
1.1.4.4 4 0.76 5 0.71 11 0.35 8 0.53 7.0 0.59
1.1.4.5 2 0.88 4 0.76 7 0.59 1 0.94 3.5 0.79
1.1.5.1 9 0.47 6 0.65 9 0.47 4 0.76 7.0 0.59
1.1.5.2 3 0.82 10 0.41 10 0.41 7 0.59 7.5 0.56
1.1.5.3 5 0.71 8 0.53 5 0.71 7 0.59 6.3 0.63
1.1.5.4 5 0.71 8 0.53 7 0.59 3 0.82 5.8 0.66
1.2.1 2 0.88 4 0.76 2 0.88 11 0.35 4.8 0.72
1.2.2 2 0.88 3 0.82 4 0.76 5 0.71 3.5 0.79
1.2.3 11 0.35 11 0.35 7 0.59 4 0.76 8.3 0.51
1.3.1 3 0.82 5 0.71 3 0.82 8 0.53 4.8 0.72
1.3.2 0 1 3 0.82 0 l 3 0.82 1.5 0.91
1.3.3 1 0.94 l 0.94 1 0.94 2 0.88 1.3 0.93
1.3.4 2 0.88 0 1 4 0.76 11 0.35 4.3 0.75
1.3.5 1 0.94 9 0.47 7 0.59 1 0.94 4.5 0.74
1.4.1 2 0.88 2 0.88 4 0.76 11 0.35 4.8 0.72
1.4.2 3 0.82 3 0.82 l 1 0.35 9 0.47 6.5 0.62
1.4.3 13 0.24 12 0.29 9 0.47 5 0.71 9.8 0.43
1.5.1 2 0.88 3 0.82 10 0.41 9 0.47 6.0 0.65
1.5.2 1 0.94 2 0.88 5 0.71 10 0.41 4.5 0.74
1.5.3 10 0.41 8 0.53 6 0.65 3 0.82 6.8 0.60
1.5.4 6 0.65 6 0.65 4 0.76 0 1.00 4.0 0.76
1.6.1 2 0.88 l 0.94 0 1 2 0.88 1.3 0.93
1.6.2 1 0.94 0 1 0 l l 0.94 0.5 0.97
1.7.1 1 0.94 2 0.88 3 0.82 4 0.76 2.5 0.85
1.7.2 9 0.47 12 0.29 6 0.65 4 0.76 7.8 0.54
1.8.1 0 1 2 0.88 4 0.76 0 1.00 1.5 0.91
1.8.2 0 1 3 0.82 0 l 0 1.00 0.8 0.96
1.9.1 7 0.59 4 0.76 10 0.41 3 0.82 6.0 0.65
1.9.2 8 0.53 7 0.59 9 0.47 5 0.71 7.3 0.57
1.10.1 10 0.41 14 0.18 11 0.35 8 0.53 10.8 0.37
Sum 210 267 238 233 237

Average 4.77 0.72 6.07 0.64 5.41 0.68 5.30 0.69 5.39 0.68

 

 

118

 

 

 

 

 

80.N- 000.22 00.0 02- v.N2 00.0 0N- N22 00.0 0n.N- 0.2 000 8.0- N0.0 N2..0 0w80><
0N0- 0.2N 20.0 0 0N 020.0 0 0N 00.0 0 2.N 000 2- 2 00.0 0
m- 0.0 200 N- 2 00.0 2- 2. N00 0. 2 00.0 0- v 2L0 n2
0N0. 0.0 02.0 m- n 2.0 m- 0 N00 20. 0 00.0 h- m 8.0 O
mNN- 0.2 2 00.0 2- 22 80 2- 2 00.0 2-- n 00.0 0 02 200 Z
2- 0.2 00.0 0 02 00.0 0 2 00.0 2- 2 v0.0 0. 2. 02.0 E
0N0- 0.02 00.0 0 02 00.0 2- 2 00.0 0 2 2.0 0 N2 020 2
02-.N- 0.2. :0 N- 0 N00 N- 02 8.0 m- 0 2.0 N- 0 N00 2
mN.2.- m .0 00.0 N- 20 00.0 22- m 2.00 N- 02 02.0 2 2- n 00.0 2.
02.0- 0.22 00.0 0 0 2 0m .0 2- 22 00.0 2- 2 v0.0 2- 0 2.00 2
v- 0.0 2.2.0 0- 20 2000 2- 22 8.0 0 2- 20 00.0 N- 0 N00 I
02.0- m.m 8.0 v- N 00.0 m- 02 00.0 m- 0 02.0 2- v 20.0 0
00.2- 002 N00 0 02 :0 m- 0 00.0 0 02 80 N- 2- 00.0 "2
0N0- 0.0 000 m- m 00.0 2 2- 2. 0m .0 v- 0 2-0 2.- 0 00.0 m
0 0.02 00.0 0 mN 0200 0 02 00.0 0 02 00.0 0 202 00.0 D
000- 0.02 000 2- N 2.0 0 02 2.0.0 0 NN 0.0 N- 22 02-0 0
2.2- 002 80 2- 2 00.0 2- 2. N00 0. 02 2.0 N- 02 8.0 m2
m2- m.N2 00.0 0 N2 020 N- 2 00.0 N- v2 000 2- 22 02.0 <
2202 “220 “22022230. .020 .020 .0202 ..2202 .0202 020 .05 .020 22< .020 .0202 .0202 .3560
.0072 .0052 00.032 .002 .80 22< .002 .80 Ex .002 .822 00 .0072 .80 22<
00 .0>< 00 .0>< .0>< .0>< 00 Q0. .0>< .0>< 20 Q0. .0>< .0>< Q0. .0>< .0>< 20 Q0.
20-00.. 20-02. 20-00 22--Xm2
2005200

22000 .20\ 00300. 052002.230 0222020000000 0 £00m 20:0 222.20.003 20050222002022 220002 200300 20.20.3203 02020.2 :2 000=0K0§2Q

mm 02002-

 

119

table can be compared to Tables 13 and 23. Average proportions of correspondence in
topic inclusion between the curricula and the corresponding inclusive SC-test blueprint
ranged from .51 to .77 for countries. Correspondence in topic inclusion for the
curriculum-data sources was the same as in Table 34. The average proportion of
correspondence was .68, which again was a slight improvement over the ﬁeld-trial
instrument. However, it was .20 less than the correspondence in topic inclusion (or non-
inclusion) between the curriculum and the unique SC-test blueprints. The average
numbers of topics included in a country’s curriculum but not on the corresponding
inclusive-test blueprint was 11, and the average number of topics included on a test
blueprint but not in the corresponding curriculum source was 3. The numbers of topics in
curriculum sources not on the blueprints (positive differences) were about the same for all
data sources. The lowest numbers of topics in blueprints not in a corresponding
curriculum source (negative differences) were for topics included in the aggregate-data
source. Average rates of non-tested topics ranged from 5 to 21 and rates for topics on the
inclusive-test blueprints not in the curriculum ranged from 0 to 7.

Tables 36 and 37 show the differences between the curriculum-data sources and
the corresponding inclusive SC-test blueprints in topic emphasis. Table 36 highlights
differences for topics, and Table 37 highlights differences for countries. Positive
differences occurred when t0pics received a higher emphasis in the curriculum than on
the test blueprints, and negative differences occurred when topics receive a higher
emphasis on the test blueprints than in the curriculum.

On average across all curriculum sources, topics with the lowest average positive

difference in emphasis (more weight in the curriculum) were 1.8.1 Inﬁnite Processes

1 20
Table 36

Diﬂerence in Topic Emphasis between Each Inclusive- Test blueprint and Each Corresponding Curriculum Source for Each Topic

 

EX-UN EX-7l CG-UN CG-7l TX-UN

 

Ave. Ave. Ave. Ave. Ave. Ave. Ave. Ave. Ave. Ave.
Topic SD of P05. Neg. SD of P05. Neg. SD of P05. Neg. SD of P05. Neg. SD of P05. Neg.
Code all Dif. Dif. Dif. all Dif. Dif. Dif. all Dif. Dif. Dif. all Dif. Dif. Dif. all Dif. Dif. Dif.

 

 

1.1.1.1 0.004 0.011 -0.015 0.013 0.026 0 0.016 0.019 -0.020 0.025 0.040 0 0.020 0.031 001]
1.1.1.2 0.013 0.015 -0.017 0.020 0.033 0 0.016 0.022 -0.017 0.023 0.020 -0.022 0.032 0.060 -0.026
1.1.1.3 0.012 0.021 -0.018 0.022 0.037 0 0.015 0.016 -0.021 0.024 0.038 0 0.013 0.031 -0.01 3
1.1.2.1 0.016 0.029 0.018 0.015 0.032 -0.022 0.007 0.010 -0.022 0.016 0.032 0 0.019 0.035 -0.023
1.1.2.2 0.017 0.021 -0.015 0.017 0.023 -0.018 0.011 0.011 -0.020 0.018 0.034 O 0.014 0.031 -0.013
1.1.2.3 0.018 0.025 —0.015 0.018 0.021 -0.019 0.011 0.012 -0.022 0.020 0.036 0 0.005 0.013 -0.007
1.1.2.4 0.011 0.016 -0.017 0.012 0.012 -0.021 0.011 0.009 -0.020 0.017 0.034 0 0.020 0.030 -0.025
1.1.2.5 0.026 0.035 O 0.026 0.035 0 0.013 0.028 0 0.013 0.028 0 0.010 0.009 0
1.1.3.1 0.013 0.011 -0.019 0,014 0.012 -0.019 0.012 0.010 -0.015 0.016 0 0.020 0.019 0.033 -0.028
1.1.3.2 0.025 0.037 0 0.025 0.037 0 0.018 0.037 0 0.018 0.012 0 0.071 0.040 0
1.1.3.3 0.018 0.032 0 0.018 0.032 0 0.023 0.039 0 0.023 0.033 0 0.064 0.039 0
1.1.4.1 0.008 0.023 0 0.008 0.023 0 0.015 0.031 O 0.015 0.031 0 0.003 0.004 0
1.1.4.2 0.012 0.010 0.018 0.014 0.008 -0.020 0.012 0.011 -0.019 0.019 0.035 0 0.022 0.038 -0.025
1.1.4.3 0000 0 0 0.000 0 0 0.008 0.024 0 0.008 0.024 0 0.001 0.001 0
1.1.4.4 0.011 0.011 -0.014 0.012 0.0” -0.016 0.013 0.012 -0.018 0.019 0,011 -0.021 0.014 0.026 -0.011
1.1.4.5 0011 0,034 0 0.011 0.034 0 0.012 0,028 O 0.012 0.028 0 0.006 0.006 0
1.1.5.1 0.005 0.009 -0.017 0.013 0.026 0 0.004 0.018 -0.015 0.016 0.032 0 0.002 0.004 —0.002
1.1.5.2 0.011 0.013 -0.01 8 0.013 0.010 -0.021 0.009 0.009 -0.022 0.017 0.033 0 0.005 0.007 -0.006
1.1.5.3 0.011 0.017 -0.0|4 0.011 0.013 -0.018 0.005 0.013 -0.019 0.016 0.032 0 0.006 0.009 -0.005
1.1.5.4 0.013 0.014 -0.020 0.014 0.012 -0.023 0.005 0.013 -0.020 0.017 0.033 0 0.012 0.016 0006
12.1 0.010 0.011 -0.014 0.010 0.009 0017 0.013 0.013 -0.014 0.017 0.010 -0.019 0.028 0.044 -0.023
1.2.2 0.010 0.009 -0.013 0.011 0.006 -0.016 0.012 0.011 -0.017 0.017 0.006 -0.023 0.028 0.054 0045
1.2.3 0.012 0.009 -0.015 0.016 0.028 0 0.012 0.012 -0.020 0.019 0.036 0 0.002 0.006 -0.00|
1.31 0.012 0.012 -0.017 0.012 0.013 -0.018 0.012 0.007 -0.019 0.017 0.007 -0.019 0.021 0.027 0023
1.3.2 0.007 0.011 -0.017 0.010 0.010 -0.020 0.012 0.010 -0.016 0.017 0.008 -0.021 0.023 0.043 -0.028
1.3.3 0.009 0.012 -0.016 0.011 0.007 0021 0.012 0.019 -0.017 0.019 0.017 -0.025 0.035 0.049 0034
1.3.4 0.011 0.013 -0.016 0.013 0.007 -0.021 0.010 0.015 -0.018 0.016 0.028 -0.024 0.095 0.177 0045
1.3.5 0007 0.028 0 0.007 0.028 0 0.017 0.033 0 0.017 0.033 0 0.013 0.013 0
1.4.1 0.010 0.015 -0.019 0.011 0.012 -0.023 0.014 0.014 -0.017 0.020 0.039 -0.021 0.044 0.049 0043
1.4.2 0.011 0.017 -0.020 0.012 0.019 -0.021 0.013 0.009 -0.019 0.018 0.006 -0.021 0.042 0.081 -0.028
1.4.3 0.018 0.031 0 0.018 0.031 0 0.017 0.035 0 0.017 0.008 0 0.012 0.016 0
1.5.1 0.010 0.010 -0.014 0.011 0.009 -0.016 0.012 0.008 -0.018 0.017 0 0020 0.005 0.010 0007
1.5.2 0.012 0.012 -0.021 0.015 0.009 .0023 0.012 0.008 -0.015 0.015 0 -0.019 0.016 0.020 0014
1.5.3 0022 0.035 0 0.022 0.035 0 0.027 0.045 0 0.027 0.045 0 0.025 0.041 0
1.54 0.017 0.031 0 0.017 0.031 0 0.018 0.036 0 0.018 0.036 0 0.004 0.007 0
1.6.] 0.010 0.014 -0.018 0.011 0.011 -0.022 0.012 0.014 -0.019 0.019 0.017 -0.025 0.041 0.051 0025
1.6.2 0.012 0.009 -0.019 0.015 0.008 -0.021 0.010 0.015 -0.018 0.016 0.028 -0.024 0.060 0.134 0078
1.7.1 0.011 0.010 -0.022 0.014 0.008 -0.022 0.014 0.014 -0.017 0.020 0.039 -0.021 0.015 0.029 0028
1.72 0.008 0.012 0.017 0.016 0.029 0 0.013 0.009 -0.019 0.018 0.009 0 0.007 0.015 0003
1.8.1 0.000 0 0 0.000 0 0 0.012 0.034 0 0.012 0.034 0 0.001 0.003 0
1.8.2 0.000 0 0 0.000 0 O 0.011 0.028 0 0.011 0.028 0 0.000 0 0
19.1 0.018 0.031 0 0.018 0.031 0 0.016 0.034 0 0.016 0.034 0 0.072 0.037 0
1.9.2 0014 0.027 0 0.014 0.027 0 0.020 0.036 O 0.020 0.036 0 0.034 0.040 0
1.10.1 0.015 0.028 0 0.015 0.028 0 0.017 0.038 0 0.017 0.005 0 0.062 0.056 0
Average 0.012 0.018 -0.01 I 0.014 0.019 -0.010 0.013 0,020 0012 0.017 0.025 -0.010 0.024 0.033 -0.014

 

121

 

 

 

 

Table 36 (Contd.)
TX-7l AG-UN AG-7I
Ave. Ave. Ave. Ave. Ave. Ave. SD of

Topic SI) of Pos. Neg. SD of P05. Neg. SD of P05. Neg. Ave. of Ave. of SD of Neg. SD of
Code all Dif. Dif. Dif. all Dif. Dif. Dif. all Dif. Dif. Dif. Pos. Dif. Neg. Dif. Pos. Dif. Dif. All Dif.
1.1.1.1 0.026 0.023 0 0.010 0.028 -0.007 0.014 0.035 0 0.026 -0.007 0.009 0.007 0.018
1.1.1.2 0.035 0.058 -0.021 0.019 0.024 -0.027 0.032 0.051 O 0.035 -0.016 0.017 0.010 0.029
1.1.1.3 0015 0.034 -0.012 0.010 0.027 -0.018 0.023 0.044 0 0.031 -0.010 0.009 0.008 0.022
1.1.2.1 0.019 0.038 0025 0.008 0.025 0028 0.027 0.053 0 0.032 -0.017 0.011 0.010 0.027
1.1.2.2 0.015 0.029 -0.014 0.020 0.022 -0.026 0.031 0.048 O 0.027 -0.013 0.010 0.009 0.022
1.1.2.3 0.006 0.015 -0.006 0.007 0.015 -0.017 0.017 0.032 0 0.021 -0.01 I 0.009 0.008 0.018
1.1.2.4 0020 0.027 -0.029 0.016 0.024 -0.027 0.030 0.051 0 0.025 -0.018 0.013 0.011 0.024
1.125 0.010 0.009 0 0.014 0.031 0 0.014 0.031 0 0.026 0.000 0.010 0.000 0015
1.1.3.! 0.019 0.035 -0.030 0.015 0.019 -0.025 0.022 0 -0.049 0.017 0026 0.010 0.010 0.024
1.1.3.2 0.071 0.040 0 0.076 0.068 0 0.076 0.068 0 0.042 -0.003 0.017 0.007 0.026
1.1.3.3 0.064 0.039 0 0.028 0.048 0 0.028 0.048 O 0.039 -0.003 0.006 0.007 0.022
1.1.4.1 0.003 0.004 0 0.000 0 0 0.000 0 0 0.014 0.000 0.013 0.000 0.012
1.1.4.2 0021 0.040 0025 0.016 0.019 -0.023 0.026 0.045 0 0.026 -0.016 0.014 0.010 0.024
1.1.4.3 0.001 0.001 0 0.000 0 0 0.000 0 0 0.006 0.000 0.010 0.000 0.008
1.1.4.4 0.022 0,025 0 0.007 0.016 -0.019 0.019 0.035 0 0.018 -0.012 0.009 0 008 0.017
1.1.4.5 0.006 0.006 0 0.007 0.031 0 0.007 0.031 O 0.025 0.000 0.011 0.000 0.015

1.1.5.1 0.003 0.005 0 0.006 0.017 0006 0.010 0.023 0 0.017 0005 0.010 0.007 0.014
1.1.5.2 0.008 0.012 0 0.005 0.014 -0.012 0.014 0.027 0 0.016 -0.010 0.009 0.009 0.016
1.1.5.3 0.006 0.010 -0.005 0.004 0.014 0013 0.014 0.027 0 0.017 -0.009 0.008 0.007 0.015
1.1.5.4 0.015 0.017 0 0.007 0.023 -0.006 0.011 0.029 0 0.020 -0.009 0.007 0.009 0.017
1.2.1 0.028 0.044 0025 0.016 0.022 -0.028 0.029 0.050 0 0.025 -0.018 0.016 0.008 0.025
12.2 0.030 0.053 -0.053 0.026 0.045 -0.027 0.031 0.024 -0.071 0.026 -0.033 0.020 0.020 0.036
1.2.3 0.003 0.005 0 0.005 0.016 -0.006 0.010 0.022 0 0.017 -0.005 0.011 0.007 0.014
1.3.1 0.021 0.026 0024 0.015 0.021 -0.024 0.027 0.046 0 0.020 -0.018 0.012 0.007 0.021

1.32 0.026 0.049 -0.025 0.019 0.020 -0.024 0.025 0 -0.063 0.020 -0.027 0.015 0.014 0.028
1.3.3 0035 0.048 -0.035 0.021 0.032 0031 0.037 0 -0.087 0.023 -0.033 0.017 0.022 0.034
1.3.4 0.106 0.138 -0.030 0.031 0.063 -0.035 0.053 0.072 0 0.064 -0.024 0.059 0.013 0.06]

1.3.5 0.013 0.013 0 0.009 0.037 0 0.009 0.037 0 0.028 0.000 0.009 0.000 0015
1.4.1 0.043 0.058 0045 0.040 0.067 -0.038 0.061 0.076 0 0.041 -0.026 0.024 0.014 0.039
1.4.2 0.060 0.062 0 0.034 0.077 -0.030 0.053 0.069 0 0.043 -0.017 0.030 0.011 0.038
1.4.3 0.012 0.016 0 0.013 0.027 0 0.013 0.027 O 0.024 -0.002 0.009 0.006 0.015
1.5.] 0.010 0.014 0 0.008 0.013 -0.019 0.017 0.031 0 0.012 -0.012 0.008 0.008 0.014
1.5.2 0016 0.023 -0.018 0.018 0.027 -0.021 0.029 0.046 O 0.019 -0.016 0.013 0.007 0.020
1.5.3 0.025 0.041 0 0.028 0.054 0 0.028 0.054 O 0.044 0.000 0.007 0.000 0.022

1.5.4 0.004 0.007 0 0.000 0 0 0.000 0 O 0.018 0.000 0.015 0.000 0014
1.61 0.042 0.045 -0.025 0.020 0.032 -0.036 0.039 0.000 0079 0.023 -0.031 0.017 0.019 0032

1.6.2 0.088 0.167 -0.043 0.045 0.099 -0.067 0.080 0.026 -0.192 0.061 0058 0.059 0.055 0.082

1.7.1 0.021 0.019 0033 0.024 0.029 -0.032 0.035 0.014 —0.073 0.020 -0.031 0.010 0.017 0.029
1.7.2 0.008 0.009 0 0.006 0.020 -0.007 0.012 0.027 0 0.016 -0.008 0.007 0.008 0.015

1.8.1 0.001 0.003 0 0.000 O 0 0.000 0 0 0.009 0.000 0.014 0.000 0.011

1.8.2 0.000 0 0 0.000 0 0 0.000 0 0 0.007 0.000 0.012 0.000 0.009
1.91 0.072 0.037 0 0.024 0.047 0 0.024 0.047 0 0.037 0.000 0.006 0.000 0.019
1.9.2 0.034 0.040 0 0.019 0.037 0 0.019 0.037 0 0.035 0.000 0.005 0.000 0.018
1.10.1 0.062 0.056 0 0.032 0.046 0 0.032 0.046 0 0.038 0.000 0.016 0.000 0.022

Average 0.026 0.033 -0.012 0.016 0.029 -0.015 0.025 0.033 -0.014 0.026 -0.012 0.014 0.008 0.023

 

 

122
(.009) and 1.8.2 Change (.007). The highest averages of positive differences were for

1.3.4 3-D Geometry (.064) and 1.6.2 Equations and Formulas (.06). Aside from the
topics not in the blueprints, topics with the lowest negative difference were 1.1.5.1
Estimating Quantity and Size (-.005) and 1.2.3 Estimation Errors (—.005). The largest
was for 1.6.2 Equations and Formulas - which also had a high positive difference. The
average of the positive average differences was .026 while the average of the negative
average differences was -.012. The averages of the average positive differences for the
curriculum sources were all around .02; average negative differences were around -.01.
In general, topics received more weight in the curriculum than on the test blueprints.

Table 37 shows the variability in topic emphasis differences for countries.
Average positive differences in topic emphasis for countries ranged from .012 to .049,
and negative differences ranged from -.013 to -.025. For data sources, the positive
differences ranged from about .02 to .06. Negative differences ranged from .015 to .087.
The poorest correspondence in topic emphasis was with the aggregate 70%-intersection-
test blueprint, followed by the union aggregate-test blueprint. Lower numbers were for
the expert-mapping and curriculum-guide blueprints.

Correlations and Euclidean distance measures. Correlations between the
proportions-of-topic-emphasis proﬁles in each curriculum-data source and the topic-
weight proﬁles in each inclusive SC-test blueprint are in Table 38. The correlations
ranged from .00 to .90 with an overall average of .58. The average correlation within data
sources across countries was highest between the text union- and 70%-intersection-test
blueprint topic weight proﬁles and the topic emphasis proﬁles for the corresponding data

source and lowest between the curriculum-guide union-test blueprint topic weight proﬁles

123

 

 

 

 

08.0- 08.0 08.0 08.0- 08.0 .080 08.0- N80 280 08.0- 080 N80 0 8.0- 2. 2 0.0 08.0 00000322
20.0- 080 N80 m80- 080 2080 v8.0- 0200 08.0 08.0- 080 080 28.0 08.0- 08.0 0
080- N80 080 0 80- 208.0 08.0 0 2 0.0- 0 8.0 0 8.0 280- 2 8.0 80.0 08.0- 08.0 2.80 m
08.0- 000.0 08.0 08.0- 08.0 2.80 v80- N80 2080 v8.0. 080 N80 080- 080 N80 O
080- 2000 N000 2.8.0- 080 08.0 08.0- N80 2 8.0 0200- 2080 08.0 2 20.0- 0 8.0 m 200 Z
.080- 80.0 08.0 :00- N80 280 20.0- 080 08.0 080- N80 08.0 080- N80 080 22
08.0- 080 08.0 080- 080 0200 20.0. 20.0 080 08.0- 080 08.0 20.0- 080 080 2
:00- 280 08.0 08.0- 08.0 N80 080. N80 080 08.0- 080 080 08.0- 080 080 2
v8.0- 0000 0000 N80- 080 >200 08.0- 280 280 080- 08.0 080 08.0- 08.0 08.0 2.
08.0- 080 28.0 08.0- N80 N80 0 8.0- 200 08.0 08.0- 2200 08.0 m 8.0- 2 8.0 m 20.0 2
080- 08.0 2 8.0 300.0- Nm00 80.0 2 8.0- 0000 08.0 2. 80- 0 8.0 h 2 0.0 m 8.0- 2.8.0 2. 2 0.0 I
08.0- 080 000.0 08.0- 080 080 080- 08.0 08.0 08.0- 080 08.0 N80- 2200.0 080 0
080- 2-80 080 08.0- 08.0 N80 0 2 0.0- 28.0 0 2 0.0 2. 2 0.0- 080 08.0 220.0- 0 80 m 8.0 ”2
280- 08.0 02000 080- 2 80 08.0 280- m 8.0 0 8.0 08.0- 080 08.0 08.0- 080 080 m2
080- 080 080 08.0- 08.0 280 2 80- m 80 2.8.0 2200- 08.0 20.0 080- 2 200 N80 Q
080- 08.0 08.0 080- 08.0 2080 N80- 280 0 8.0 08.0- 08.0 08.0 m 80- N80 08.0 0
N80- 0000 N80 08.0- 080 N80 08.0- N80 N80 08.0- 28.0 28.0 20.0- 080 08.0 m
080- 08.0 08.0 08.0- v8.0 v8.0 080- 200 08.0 08.0- 280 08.0 v8.0- 080 20.0 <
.020 .0202 .05 220 .020 .0222 ..2202 220 .0202 .0202 .020 220 .0202 .0202 .020 220 .0202 .020 .0202 220 0
.0072 .80 00 Q0. .0072 .80 b8 Q0. .0072 .80 00 Q0. .0072 .832 h2o Q0. .0072 .0022 00 Q0. b22000
.0>< .0>< .0>< .0>< .0>< .0>< .0>< .0>< .0>< .0><
20-02-2- 20-00 75-00 K-Xm ZD-Xm

 

2005200 000m .20\.00.0200. 8020025020 0220000005200 2.200% 20:0 22.202003 20.0-N -03-3205 2.200% «2003.200 020022QEQ 0.220-m 2.2 0000000§Q

00 0200-2.

 

124

 

 

 

 

 

85 85 ~85 585- 85 85- 855 285 22 85- :55 285 85- 585 :55 852
285 585 5.55 ~85- 255 85- 85 85 2:55- 5255 5255 85- 0255 85 o
585 85 :55 85- ~85 825- 585 555 85- 585 ~85 85- 85 555 .2
~55 85 85 ~85- :55 855- 285 585 285- :55 85 ~85- 85 ~55 0
~55 85 85 585- 85 585- ~85 585 285- 555 85 ~85- ~85 85 z
585 585 855 85- ~85 585- 85 :55 5255- 5255 ~85 285- :55 ~55 .2
585 85 85 ~85- 285 585- 85 555 585- ~85 585 85- 0255 285 0
555 85 ~85 585- 585 85. 85 585 85- 355 85 585- 85 :55 v2
85 ~85 85 85- 355 555- 85 ~85 85. 855 ~85 555- 855 ~85 2
~85 85 5555 85- 85 ~85- 85 85 085- 5255 85 85- 285 585 2
~55 285 585 85- 85 555- 2525 85 85- 585 585 85- 085 ~85 :
555 85 555 ~85- 855 585- 825 855 85- :55 855 85- ~85 ~85 0
585 85 5255 585- 285 85- 85 ~85 285- 285 85 ~85- 85 555 ..2
555 :55 585 85- :85 ~85- 525 ~85 555- 85 ~55 585- 055 :55 m
285 585 855 ~85- 5255 2525- 85 855 555- :55 ~85 85- 5255 ~85 0
85 585 85 ~85- 85 5:5- ~85 ~85 085- 255 85 ~85- 555 585 0
555 85 85 85. ~85 585- 555 585 585- ~85 ~85 ~85- 585 555 m
585 5255 5255 85- 85 85- 85 ~55 ~85- ~85 ~85 585- £55 5255 <
.25.? .05 .25 .05 .25 .05 .55 05:5 .25 .25 .05:2“ .25 .520 05:5 8550
00 Q0. .0072 .0032 .002 .80 .0072 .852 00 Q0. .002 .852 00 Q0. .0072 .0032 00 Q0.
..wo Q94 MO QM. MO .O>< m0 .D>< .0>< .0>< .0>< .0>< .0>< .0><
2~-o< z:-o< 233

 

2 8800 - 2552.

 

 

 

Table 38

125

Correlations between the Proportions-of- T opic-Emphasis Proﬁles for Each Country in Each

Curriculum-Data Source with the T opic- Weight Proﬁles for Each Corresponding Inclusive- Test

 

 

 

Blueprint

Country EX-UN EX-7I CG-UN CG-7I TX-UN TX-7I AG-UN AG-7I Ave. SD
A 0.59M 0.51" .56M 34* .76** .77" .73" .66” 0.61 0.17
B 0.45" 0.47" .38" .54M .42” .43M .53" .34* 0.44 0.06
C 034* 0.31* .31* 0.20 .65" .64” .58" .30* 0.42 0.19
D 0.77M 0.74" .63" .47M .59M .60" .67" .46M 0.62 0.07
E 0.28 0.36* .62" .45" .67" .64" .69" .59” 0.54 0.08
F 0.69" 0.57" .33“ .44" .41" .44" .62" .51M 0.50 0.10
G 0.18 0.22 .57M .53“ .56” .57" .45" .54M 0.45 0.05
H 0.60** 0.67“ .39M .39“ .64" .65" .60" .53“ 0.56 0.12
1 0.66M 0.67“ .43M .43M .61M .61" .55" .49“ 0.56 0.08
J 033* 035* .38* .73“ .78" .70" .75” .78M 0.60 0.15
K 0.62” 0.64M .38* .61" .90” .89M .72" .75" 0.69 0.19
L 0.51" .48" .69" .64“ .85M .87" .76“ .63" 0.68 0.09
M 0.52" .62" .35* .52M .87" .86" .82" .79" 0.67 0.21
N 0.56" .49** 0.18 .42“ .56“ .56” .50" .48“ 0.47 0.14
O 0.57“ .61** 36* .39" .79" .76" .77“ .58" 0.60 0.19
P 0.49M .40** .47M .43" .34* .36* .29* 0.12 0.36 0.06
Q 0.55" .58" 0.05 0.25 .81“ .82” .73" .70M 0.52 0.39
Ave 0.51 0.51 0.41 0.44 0.66 0.66 0.63 0.54 0.59 0.11
SD 0.15 0.14 0.17 0.16 0.16 0.15 0.13 0.17 0.09 0.08

 

*p <05.

*1) <.01.

126

and the corresponding curriculum proﬁles. Average correlations for countries varied.
These ranged from a low of .36 (country P) to a high of .68 (country K). Standard
deviations of correlations for countries varied from .05 (country G) to .39 (country Q).
Euclidean distances between the proportion of topic-emphasis proﬁles in each
curriculum-data source and topic-weight proﬁles for the corresponding test blueprints are
shown in Table 39. The distances ranged from .08 to .66, with an overall average of .25.
The largest average distance was found between the aggregate 70% intersection-test
blueprint-topic proﬁles and the aggregate-data-source topic proﬁles. The smallest were
between the expert-mapping- and cuniculum-guide-test blueprint topic proﬁles and the
corresponding data-source topic proﬁles. The smallest average distance for countries was
country L (.14). The largest was for country N (.31). Average standard deviations of
distances were generally less than .10. Table 40 shows differences between the Euclidean
distances in Table 39 and those computed earlier using the unique-test blueprints. The
largest difference was for the aggregate 70%-test blueprint. The smallest was for the

curriculum-guide union-test blueprint.

Variations in Performance across Specially-Constructed Tests
Scores and Ranks
I computed country scores on SC tests using the following steps:
1. Identify topics included on each SC—test blueprint (i.e., either from the union,
70%-intersection, strict-intersection, or unique test blueprints).
2. Find average percent of students passing the items measuring each topic

included on each test for each country by averaging across the percent of

Table 39

127

Euclidean Distances between the Proportions-of— T opic-Emphasis Proﬁles for Each Country in Each

Curriculum-Data Source with the T opic- Weight Proﬁles for Each Corresponding Inclusive- Test

 

 

 

Blueprint

Country EX-UN EX-7I CG-UN CG-7I TX-UN TX-7I AG-UN AG-7I Ave. SD
A 0.10 0.13 0.11 0.16 0.16 0.16 0.14 0.30 0.16 0.02
B 0.12 0.14 0.14 0.14 0.35 0.35 0.20 0.38 0.23 0.09
C 0.13 0.15 0.12 0.16 0.21 0.22 0.15 0.37 0.19 0.04
D 0.08 0.10 0.09 0.14 0.19 0.20 0.13 0.35 0.16 0.04
E 0.17 0.17 0.12 0.16 0.26 0.27 0.29 0.35 0.23 0.07
F 0.10 0.13 0.12 0.15 0.25 0.26 0.17 0.33 0.19 0.06
G 0.25 0.25 0.15 0.17 0.42 0.42 0.44 0.43 0.31 0.13
H 0.11 0.12 0.24 0.25 0.21 0.21 0.26 0.35 0.22 0.02
I 0.10 0.1 l 0.12 0.15 0.22 0.23 0.15 0.34 0.18 0.04
J 0.19 0.19 0.14 0.11 0.32 0.34 0.26 0.25 0.23 0.09
K 0.11 0.12 0.16 0.15 0.25 0.23 0.23 0.26 0.19 0.04
L 0.11 0.13 0.09 0.12 0.13 0.13 0.12 0.32 0.14 0.01
M 0.13 0.13 0.13 0.14 0.27 0.26 0.11 0.26 0.18 0.07
N 0.10 0.13 0.21 0.19 0.66 0.66 0.21 0.34 0.31 0.22
O 0.15 0.14 0.16 0.18 0.26 0.26 0.19 0.32 0.21 0.04
P 0.16 0.18 0.12 0.15 0.29 0.30 0.42 0.22 0.23 0.11
Q 0.11 0.12 0.12 0.16 0.15 0.16 0.31 0.22 0.17 0.07
Ave 0.13 0.14 0.14 0.16 0.27 0.27 0.22 0.32 0.25 0.06
SD 0.04 0.04 0.04 0.03 0.12 0.12 0.09 0.06 0.05 0.05

 

128
Table 40

Diﬂerences in Euclidean Distances between the Proportions-of-Topic-Emphasis Proﬁles for Each
Country in Each C urriculum-Data Source with the Topic- Weight Proﬁles for Each Corresponding
Incl usive- T est Blueprint

 

 

 

Country EX-UN EX-7I CG-UN CG-7I TX-UN TX-7I AG-UN AG-71 Average SD
A 0.03 0.06 0.03 0.09 0.13 0.14 0.1 l 0.27 0.1 l 0.04
B 0.02 0.04 0.04 0.04 0.10 0.1 1 0.06 0.23 0.08 0.03
C 0.03 0.05 0.01 0.05 0.1 1 0.12 0.07 0.29 0.09 0.04
D 0.01 0.04 0.02 0.07 0.09 0.1 l 0.07 0.29 0.09 0.03
E 0.02 0.03 0.05 0.10 0.22 0.23 0.29 0.35 0.16 0.09
F 0.06 0.09 0.01 0.03 0.09 0.10 0.14 0.30 0.10 0.05
G 0.07 0.07 0.07 0.08 0.1 l 0.1 1 0.06 0.05 0.08 0.02
1-1 0.03 0.03 0.10 0.11 0.14 0.15 0.10 0.19 0.11 0.02
l 0.03 0.05 0.02 0.05 0.07 0.08 0.07 0.26 0.08 0.02
J 0.05 0.06 0.02 -0.01 0.31 0.33 0.21 0.21 0.15 0.15
K 0.04 0.05 0.03 0.02 0.17 0.15 0.16 0.19 0.10 0.07
L 0.02 0.05 0.02 0.05 0.10 0.10 0.07 0.27 0.08 0.03
M 0.03 0.02 0.02 0.03 0.1 1 0.10 0.05 0.19 0.07 0.04
N 0.01 0.04 0.03 0.01 0.12 0.12 0.02 0.15 0.06 0.05
O 0.07 0.07 0.04 0.05 0.15 0.15 0.19 0.32 0.13 0.06
P 0.08 0.10 0.03 0.07 0.27 0.28 0.37 0.18 0.17 0.13
Q 0.02 0.03 0.01 0.05 0.05 0.06 0.22 0.13 0.07 0.07
Sum 0.64 0.88 0.55 0.89 2.36 2.42 2.26 3.87 2.36 0.80

Ave 0.04 0.05 0.03 0.05 0.14 0.14 0.13 0.23 0.14 0.05
Stdev 0.02 0.02 0.02 0.03 0.07 0.07 0.09 0.08 0.03 0.04

 

SL

th

6)
he
3c

hi

PO
de

101

129

students within a country passing each item with codes corresponding to each
topic on a given test blueprint.

3. If the test was an unweighted test, obtain an average of the topic averages for
each topic on a given test blueprint. This was the country score on the
unweighted test.

4. If the test was a weighted test, multiply the topic averages by the
corresponding weight then sum over topics included on a given test blueprint.
This was the country score on the weighted test.

Table 41 presents country scores on the ﬁeld-trial instrument as well as a
summary across scores on each specially-constructed test. Appendix D contains country
scores on all tests. The ﬁeld-trial instrument was scored by averaging over all items on
the test. The unweighted union test represents an average of all topic scores on the test.

All country scores on the total ﬁeld-trial instrument were higher than the average
of scores on all other tests. Differences were around two to four points with the
exception of country N (less than ‘/2 a point) and country P (almost 6 points). Country M
had the lowest scores on both the ﬁeld-trial instrument and the average of all other test
scores, and country J had the highest scores. The difference between the lowest and
highest scores on both the ﬁeld-trial instrument scores and the average of all other scores
was nearly 30 percent. The difference between the average of each country’s scores on
the ﬁeld-trial instrument and the grand average of all average country scores was three
points. Standard deviations of the two sets of scores were almost identical. Standard
deviations of each country’s scores across all tests were around two to three percent. The

lowest standard deviation was 1.5; the largest was 3.7.

130
Table 41

Summary of Country Scores on Field-Trial Instrument and across
Specially-Constructed Tests

 

Across Tests

 

 

 

Country Field Trial AVE SD MIN MAX DIF
A 50.5 46.4 1.6 42.2 49.5 7.4
B 56.4 54.9 2.3 50.9 62.2 11.2
C 53.6 50.4 1.5 47.5 53.1 5.6
D 45.2 40.9 1.6 36.1 42.9 6.8
E 45.9 42.7 2.0 38.9 48.8 9.9
F 49.6 47.9 1.5 42.4 49.9 7.4
G 48.1 45.6 3.1 41.7 53.2 11.4
H 43.5 40.7 3.0 29.4 43.7 14.3
I 52.8 49.8 1.6 44.9 52.0 7.1
.1 64.0 62.2 3.7 55.7 71.9 16.1
K 56.0 53.9 2.2 48.6 58.3 9.8
L 51.5 48.9 1.9 44.2 53.3 9.1
M 35.4 32.8 1.5 29.8 38.3 8.5
N 45.0 45.1 2.4 41.6 53.5 11.9
0 61.9 58.3 1.7 55.3 62.0 6.7
P 45.8 40.2 2.9 32.4 47.4 15.0
Q 56.4 52.6 1.9 45.5 55.9 10.4
AVE 50.7 47.8 2.1 42.8 52.7 9.9

SD 6.9 7.1 0.7 7.6 7.8 3.0

 

131
Test scores ranged from a low of 29.4 for country H to a high of 71.9 for country

J. Two of the countries’ minimum scores (countries J and O) were higher than the
average maximum score (52.7), and one country’s maximum score (country M) was
lower than the average minimum score (42.8). Differences between minimum and
maximum scores for each country ranged from 5.6 points (country C) to 16.1 points
(country J). The average difference was almost 10 points.

Results of country ranks on tests are presented in Table 42. The second column
shows each country’s rank on the ﬁeld-trial instrument, and the third column shows each
country’s average rank across all specially-constructed tests. On average, not much
difference in ranks existed between the ﬁeld-trial instrument and other tests. Most
differences for countries ranged from less one than to slightly more than one rank.
Standard deviations of ranks across tests for each country were around one to two ranks.
One country (M) had a standard deviation of .20 ranks. Country G had the highest
standard deviation (1.9 ranks).

Differences between minimum and maximum ranks showed much more
variability than the averages did. No country received the same rank across all tests.
However, two countries had a difference of only one rank across all tests. One country
(country J) ﬂuctuated between the ﬁrst two ranks, while the other country (country M)
ﬂuctuated between the last two ranks. Three of the countries (E, G, and Q) had
differences that were eight or more ranks out of 17. Six additional countries had
differences of ﬁve or more ranks. The average difference between minimum and

maximum ranks was 4.8.

132
Table 42

Summary of Country Ranks on Field-Trial Instrument and across

Specially-Constructed Tests

 

Across Tests

 

 

 

Country Field Trial AVE SD MIN MAX DIF
A 9 10.6 0.9 9 13 4.0
B 3 3.3 0.6 2 4 2.0
C 6 6.5 0.6 5 8 3.0
D 14 14.8 0.8 13 16 3.0
E 12 13.2 1.6 8 16 8.0
F 10 9.2 1.1 7 12 5.0
G 11 10.8 1.9 S 13 8.0
H 16 14.4 1.3 11 17 6.0
I 7 7.0 1.4 6 12 6.0
J 1 1.3 0.4 1 2 1.0
K 5 4.0 0.9 3 7 4.0
L 8 8.3 1.1 4 10 6.0
M 17 17.0 0.2 16 17 1.0
N 15 11.0 1.9 6 13 7.0
O 2 1.8 0.6 1 3 2.0
P 13 15.0 1.5 10 16 6.0
Q 4 5.1 1.6 3 12 9.0
AVE 9.0 9.0 1.1 6.5 11.2 4.8
SD 4.9 4.8 0.5 4.1 4.7 2.4

 

133

Tables 43 and 44 show correlations between country performance on the ﬁeld-
trial instrument and on each of the specially-constructed tests. Table 43 shows score
correlations using a Pearson product-moment correlation; Table 44 shows rank
correlations using the Spearman rank-order correlation. Average correlations in both
cases were quite high and signiﬁcant (p < .01 in all cases). Only the correlation between
ﬁeld-trial scores and the expert-mapping strict-intersection-test scores was under .90, and
only three of the rank correlations were below .90. These were for the expert mapping
strict-intersection test (.85), the unweighted curriculum-guide strict-intersection test (.83),

and the weighted textbook strict-intersection test (.85).

Performance Differences

I computed differences between each country score on the ﬁeld-trial instrument
and their score on each specially-constructed test, and I did the same with the ranks.
Summaries are in Tables 45 to 48. Positive differences indicate higher performance on
the ﬁeld-trial instrument than on the specially—constructed test; negative differences
indicate the opposite.

Tables 45 and 46 present the summary results for the score differences. Table 45
presents results for tests, and Table 46 presents results for countries. Most countries had
positive score differences (i.e., higher performance on the ﬁled-trial instrument). The
main exception was for the strict-intersection test based on the curriculum guide.
Differences were split in half for this test. The highest average absolute score differences
were between the ﬁeld-trial instrument and the weighted strict-intersection test based on

the textbook data (6.32). This test also had high average positive differences and high

134
Table 43

Correlations between Country Scores on
the the F ield- T rial Instrument and Scores
on Each Specially-Constructed Test

 

 

 

Test Correlation
UNION 0.98 '
EX 1-71 0.96
EX 1 -SI 0.87
CG-71 0.98
CG-SI 0.90
TX-7I 0.98
TX-SI 0.94
AG 1 -UN 0.97
AG-7 1 0.97
WEX-UN 0.98
WEX-7I 0.97
WCG-UN 0.99
WCG-7I 0.98
WTX-UN 0.97
WTX-7I 0.97
WTX-SI 0.90
WAG-UN 0.98
WAG-71 0.96
EX-UQ 0.97
WEX-UQ 0.97
CG-UQ 0.97
TX-UQ 0.99
WTX-UQ 0.93
AG-UQ 0.99
WAG-UQ 0.95
Average 0.96

 

Note. All corelations are
signiﬁcatn, p <.01.

135

Table 44

Correlations between Country Ranks on
the the F ield- Trial Instrument and Ranks
on Each Specially-Constructed Test

 

 

 

Test Correlation
UNION 0.97
EX-7I 0.94
EX-SI 0.85
CG-7l 0.97
CG-SI 0.83
TX-7I 0.96
TX-SI 0.94
AG-7l 0.96
WEX-UN 0.96
WEX-7I 0,94
WCG-UN 0.97
WCG-7I 0.96
WTX-UN 0.94
WTX-7I 0.94
WTX-SI 0.85
WAG-UN 0.96
WAG-71 0.94
EX-UQ 0.95
WEX-UQ 0.94
CG-UQ 0.92
TX-UQ 0.96
WTX-UQ 0.90
AG-UQ 0.95
WAG-UQ 0.91
Average 0.93

 

Note. All correlations are
signiﬁcant, p <.01.

Table 45

136

Summary of Diﬂerences in Scores on the the F ield- T rial Instrument and Scores on Each

Specially-Constructed Test

 

 

 

AVE
Test ABSa SD MIN MAX Ave +b Ave -° Count +b Count -°
UNION 2.17 1.46 0.19 6.65 2.29 -0.01 16 1
EX-71 4.09 1.88 1.57 8.29 4.09 0 17 0
[EX-81 4.66 3.67 0.32 14.12 4.87 -0.79 14 3
CG-71 4.38 1.30 2.18 6.49 4.38 0 17 0
CG-SI 2.76 2.13 0.20 8.49 1.94 -3.92 8 9
TX-71 1.37 1.13 0.08 4.42 1.66 -0.13 13 4
TX-SI 4.44 1.90 1.67 9.34 4.74 -0.28 15 2
AG-71 3.31 1.60 0.59 6.29 3.45 -0.07 16 1
WEX-UN 3.08 1.47 0.85 6.69 3.08 0 17 0
WEX-71 3.82 1.80 1.44 7.67 3 .82 0 17 0
WCG-UN 2.86 1.15 0.67 5.09 2.86 0 17 0
WCG-7I 3 .99 1.3 5 1.36 6.03 3.99 0 17 0
WTX-UN 3.60 1.53 0.81 7.41 3.78 -0.05 16 1
WTX-71 3.07 1.53 1.16 7.22 3.15 -0.11 16 1
WTX-SI 6.32 2.85 0.02 13.41 6.93 -0.24 15 2
WAG-UN 3.26 1.35 0.07 6.36 3.46 0.00 16 1
WAG-71 4.42 1.88 0.01 7.34 4.96 -0.05 15 2
EX-UQ 2.41 1.39 0.35 4.92 2.58 -0.15 15 2
WEX-UQ 2.35 1.48 0.23 5.79 2.37 -0. 12 16 1
CG-UQ 2.38 1.44 0.20 5.18 2.65 -0.05 15 2
TX—UQ 1.72 0.97 0.46 3 .46 1.72 0 17 0
WTX-UQ 2.98 1.54 0.80 5 .76 3 .25 -0.65 13 4
AG-UQ 2.04 1.02 0.08 3.90 2.14 -0.02 16 1
WAG-UQ 2.23 1.28 0.54 5.02 2.28 -0.87 12 5
Average 3.24 1.63 0.66 6.89 3.35 -0.31 15.25 1.75
SD 1.1 0.6 0.6 2.5 1.2 0.8 2.0 2.0

 

3Average of the absolute value of the differences. bAverage/Number of positive

differences. cAverage/N umber of negative differences.

137
Table 46

Summary of Differences in Scores on the the F ield- T rial Instrument and Scores on Each
Specially-Constructed T est for Each Country

 

 

 

AVE
Country ABSa SD MIN MAX Ave +b Ave -° Count +b Count 5
A 4.11 1.61 0.98 8.37 4.11 0 24 0
B 2.23 1.61 0.22 5.77 2.33 -1.86 19 5
C 3.16 1.47 0.51 6.06 3.16 0 24 0
D 4.37 1.58 2.33 9.12 4.37 0 24 0
E 3.57 1.25 0.84 7.02 3.70 -2.12 22 2
F 1.73 1.42 0.19 7.11 1.94 -0.27 21 3
G 3.61 1.84 0.08 6.42 3.70 -3.14 20 4
H 2.87 3.01 0.20 14.12 2.98 -0.20 23 1
1 2.95 1.60 0.79 7.91 2.95 0 24 0
.1 3.36 2.39 0.02 8.29 3.85 -2.36 16 8
K 2.53 1.72 0.13 7.41 2.78 -1.29 20 4
L 2.88 1.44 1.21 7.34 2.98 -1.76 22 2
M 2.88 0.96 1.42 5.63 2.88 ~2.86 23 1
N 1.71 1.70 0.01 8.49 1.38 -2.16 14 10
O 3.64 1.69 0.08 6.61 3.79 -0.08 23 1
P 5.72 2.62 1.61 13.41 5.90 -1.61 23 1
Q 3.74 1.88 0.41 10.83 3.74 0 24 0
AVE 3.24 1.75 0.65 8.23 3.33 -1.16 21.53 2.47
SD 0.95 0.49 0.65 2.39 1.00 1.11 2.85 2.85

 

"Average of the absolute value of the differences. bAverage/N umber of positive
differences. cAverage/N umber of negative differences.

138

maximum differences. High maximum differences also existed for the strict-intersection
test based on the expert-mapping data. The strict-intersection test for the curriculum
guide had an average negative score difference of -3.9. This was higher than any of the
other average negative score differences. Lower average absolute score differences were
associated with the unweighted 70%-intersection test based on the textbook data (1.37)
and the unweighted unique test based on the textbook data (1 .72).

The overall average of average absolute score differences was around 3 points.
The overall average of average positive score differences was about the same; the average
of average negative score differences was only around -1. Across the 36 tests, most
differences were positive, indicating higher scores on the ﬁeld-trial instrument than other
tests. Country N had the largest number of negative differences (10). Average absolute
score differences ranged from around two to six points, and standard deviations were
around two points. Score differences ranged from a minimum of less than one point to 14
points. High differences were found for countries H, P, and Q.

Tables 47 and 48 present summary information on the differences in ranks. Test
information was in Table 47. All but two average absolute rank differences across
countries within tests were around one rank or less. The two exceptions were the strict-
intersection test based on the expert mapping (1.8) and the strict-intersection test based on
the curriculum guides (1.9). Most differences were fairly evenly distributed among
positive differences, negative differences, and no differences. The exceptions were for
the unweighted 70%-intersection test based on the expert mapping (only two zero
differences), the strict-intersection test based on the curriculum guides (only 3 zero

differences), the unweighted strict-intersection test based on the textbook (2 positive and

139
Table 47

Summary of Differences in Ranks on the the F ield- T rial Instrument and Ranks on Each
Specially-Constructed T est for Each Test

 

 

 

AVE
Test ABSa SD MAX Ave +b Ave -° Count +b Count -° Count 0‘1
UNION 0.9 0.9 3.0 1.3 -1 6 6 5
EX-71 1.4 1.0 4.0 2.0 -2 6 9 2
EX-SI 1.8 2.0 8.0 2.5 -3 6 5 6
CG-71 0.9 0.7 2.0 1.6 -2 5 7 5
CG-SI 1.9 2.1 9.0 2.7 -3 6 8 3
TX-71 1.1 0.9 3.0 1.8 -2 5 7 5
TX-SI 0.8 1.5 6.0 3.5 -4 2 4 11
AG-71 1.1 1.0 4.0 2.3 -2 4 8 5
WEX-UN 0.9 0.9 3 .0 1.6 -2 5 6 6
WEX-71 1.3 1.1 4.0 2.2 -2 5 8 4
WCG-UN 0.8 0.9 3.0 1.4 -1 5 5 7
WCG-71 1.1 0.9 3.0 1.8 -2 5 7 5
WTX-UN 1.1 1.3 5.0 2.3 -2 4 6 7
WTX-71 1.1 1.3 5.0 2.3 -2 4 6 7
WTX-SI 1.8 2.1 9.0 3.8 -4 4 9 4
WAG-UN 0.9 1.1 4.0 2.0 -2 4 6 7
WAG-71 1.2 1.2 5.0 2.0 -2 5 7 5
EX-UQ 1.1 1.1 3.0 1.8 -2 5 5 7
WEX-UQ 1.2 1.3 3.0 2.5 -3 4 5 8
CG-UQ 1.4 1.4 5.0 2.4 -2 5 7 5
TX-UQ 0.8 1.0 3.0 1.8 -2 4 4 9
WTX-UQ 1.5 1.6 6.0 2.2 -2 6 6 5
AG-UQ 1.1 1.1 4.0 1.5 -2 6 5 6
WAG-UQ 1.4 1.5 6.0 3 .0 -3 4 8 5
Average 1.2 1.3 4.6 2.2 -2.2 4.8 6.4 5.8
SD 0.3 0.4 1.9 0.6 0.6 1.0 1.4 1.9

 

8|Average of the absolute value of the differences. bAverage/Number of positive

differences. cAverage/Number of negative differences. dNumber of ranks with no
difference.

140

11 zero differences), and the unweighted unique test based on the textbook (9 zero
differences). Some tests with higher maximum differences were the strict-intersection
test for the expert mapping (8), the strict-intersection test based on the curriculum guides
(9), and the weighted strict-intersection test based on the textbook (9).

The country information is in Table 48. It also shows minimal differences in
ranks across tests. The average of average absolute rank differences was one rank. Most
of the average absolute rank differences for each country were one rank or less. The
exceptions were country H (1.7), country N (4.2), and country P (2.5). Large maximum
rank differences were found for country N (9) and country Q (8). Countries B, C, J, L,
M, and 0 had more zero differences than any other difference. Countries D, E, G, P, and
Q had more negative rank differences, and countries F, H, K, and N had more positive
rank differences. Country I had about as many positive rank differences as non-
differences. Countries I and P had higher average negative rank difference than the other
countries (-2.5 and -2.6 respectively), and country N had higher positive rank differences

(4.0).

Variations in Topic Performance

Little variation existed within countries across total scores and ranks on the
specially-constructed tests. However, signiﬁcant variation did exist when looking at
scores on individual topics. Table 49 presents the country scores on each topic. Within
countries, standard deviation of topic scores ranged from 9 to 16 points, with an average
of 9. Differences between minimum and maximum topic scores for a country were from

around 30 to up to 70 points. Variations in scores for each topic across countries also

141

Table 48

Summary of Diﬂerences in Ranks on the the F ield- T rial Instrument and Ranks
on Each Specially-Constructed T est for Each Country

 

AVE
Country ABSa SD MAX Ave +b Ave -° Count +b Count--c Count 0d

 

 

A 1.6 0.9 4.0 0.0 -1.7 O 23 1
B 0.4 0.5 1.0 1.0 -l.O 2 8 14
C 0.6 0.6 2.0 1.0 -l.1 1 ll 12
D 0.8 0.7 2.0 1.0 -l.3 1 15 8
E 1.6 1.0 4.0 2.0 -1.8 3 20 1
F 1.2 0.8 3.0 1.5 -1.3 16 3 5
G 1.1 1.0 3.0 2.7 -l.3 7 10 7
H 1.7 1.2 5.0 2.1 -l.0 l9 1 4
I 0.9 1.1 5.0 1.0 -2.5 10 4 10
J 0.3 0.4 1.0 0.0 -l .0 0 6 18
K 1.0 0.6 2.0 1.3 -2.0 19 1 4
L 0.7 0.9 4.0 2.5 -1.1 2 10 12
M 0.0 0.2 1.0 1.0 0.0 l O 23
N 4.2 1.9 9.0 4.0 0.0 24 O O
O 0.4 0.5 1.0 1.0 -l.0 6 '2 16
P 2.5 0.8 3.0 2.0 -2.6 2 20 2
Q 1.2 1.5 8.0 1.0 -l.5 2 20 2
AVE 1.2 0.9 3.4 1.5 -l.3 6.8 9.1 8.2
SD 1.0 0.4 2.3 1.0 0.7 7.6 7.7 6.6

 

8Average of the absolute value of the differences. bAverage/N umber of positive

differences. cAverage/N umber of negative differences. dNumber of ranks with
no difference.

142

 

dda

 

 

 

a.aa v._d e.e_ ade e.ae v.ae dda _de edd a.aa e.aa ada nae mde ade oda d._a o.oe o.ea e.ve X52
m.vv ad ad ddd m.vm dd— a.ad dd. ad odd add 93 adm a.e~ od_ ddd ad_ d._d v.v_ e._m _.ad 2:2
ma d.o~ ed v.a _.o_ ad_ m.a a.a a.o_ _d_ d.: o.e- e.a 92 dd_ m.: vd_ v.o_ d.: a.: o.: Om
Nee d._m d.a ddv d.vd d._v e.ad _.vv ddm mod e.vd v.ad a._d ddv _dv a.av adv adv o._m ddd a.av m><
mdd dim me ..ev ddd ddv ddv mdm d._m add d.av mdd vow a.ov v.av :d v.ov adv ddv vdv d.av d.a._
d.ae ddv a.e doe ade v.ve mde ddv edd doe _.ae d.ae dde a._d ddd mde ddd vdd d._e e.ee e.ve ..a._
d._e ded d.a vdv d._d e.ad d.vd mod ded mov ddv d._e :v mam a.vv a.am adm a._m d.vv mod dam Ne.—
ede odm d.a don doe o.vv a.ed adv odm adv e.vd ede o.ed dam mdv add ddv _.av a.vd d.av a.vd _.e._
o.ad Wad de d._v ddv edm o.ad v.am d.ad d.am ddv a.vd mdv adm ..am dam v.am dam o.av a.vv a.am dd.—
d.vd o._d vd adm _dv Wad d.vd ddm o._d odm adm d.vd adm ddm adm mom ~dd mdm ddv adm adm .d;
o.ae mom ad e.ov d._v dam adv e.em mom edv e.ev o.ae a._v _.am den o.em a.vm d._m ddm e.av _dm d.v._
vde e.vm ed a.ev a.av ddm d.vd d.vv a.vm d.av med vde ddv e.vm odv mod ddv edm ddd edd v.av _.v._
ada v.av ma doe _.oe ddd ede a.ed d.av ..ee o.oa ada mad v.av ddd _de ode _.vd a.oe dda e._e vd..
a.ae dad a.a e._v ddm odm a.vd adv ddd _dv _dv a.ae mdv adv dad add v.ov o.om d._v d.vv m.vm dd.—
a. _a v.ad d.a m.av ddv dam den e.vv a. _ m mdd ddd a._a e.ev v.ad dev _dv mav e.ov don add edv dd.—
e.ve dad e.o~ adv add dad m.av mdm dad vdm dde e.ve _.mv ddv edm add o.ev den d.vm a.ev a.ev :2—
a.aa e.vm md~ :e v.ad v.ae _de mdm add a.aa e.aa den mde e.vm odd oda d._a add edd dda a.ad Md;
o.oe vdd o.a a.am ddm v.vm o.oe Wan vdd e.ov o.ev v.ed adm m._m ddm dov _.vm mom m._v a.av o.am dd.—
dda ddv _.a ade e.oe dde v._a v.vd ddv a.ve d._a mda vde med o._e dde vde v.ad dda eda dde _d._
a.ae o.v- vd. adv a._d a.e_ a.ae ddv ddm a.oe vdv o.v_ a.vd ddd ..av mdd d.od o.od mad ddd .dm vd. 2
_._a o.ov dd dad ede e.ve a.ee a.ev o.ov :a o._e _dd _de odd v.av ddd d._d odd d.vd vde a. .e md._._
odd d._m a.a adv v.ed _.em v.ed vdv dim a.av a._d mdm vdd a.vv odv d.av Nam a.ov d.av odd d.vv dd.:
o.ae v._d ad m. _e dde o.ae ade v. _ d edd ade e._e v.ve m.ae _dd odd dad dde o.oe ddd nee d.ve _d._._
m.vv ad e.a ddd vdm dd. a.ad dd. ad odd add d._d m.vv a.e_ od_ ddd ad_ d. _d v.v_ e._m don v.v. _ .—
o.ea mdd e.e_ ddd vde a.ed dda _de _.em ddd .de mod e.vd mde :e a.oe o.em e.ov dde o.ea a.am d.v._._
a.oa ddd od~ don dde adv ddd edm ddd odv a.oa d._e e.ee o.am o.em a.oe mam vdv a.od d. _e d. _ d 3...:
e.ad md~ ad e._m m.vm ddd e.ad vdd dd. new den v.em edm m.vd d.a_ odd o.ad d.vd adm v.vm _.ad vd.—.—
d.ae dam e.a ddd o.oe v.vv d.ae ddv dam vdv e.vd d.ve v.vm mdv e.od _dv v.ev d._d ddd v.ed add md._._
eda v.am _d o.ed d._e ddd eda odv v.am med d._e a._a d.vd a.av add new add a.od doe ddd odd dd...—
d.ae o.ed a.a ddv ddd v.am d.ae d.vv o.ed d.vv mdd _.me ddd m._v _dv odv a.am _.mv v.vd v.ed a.av _d._._
e.ae d.vd _.v_ v.av e.ae d.vd v.ee a._e add o.em adv d.ae o.ov d.av ade adv don adm edd mad _dv 0:.—
dde mdm a.a ddd v.ad d.vv dde ddv mdm add mvd e.oe v.ad vdv ddd d.av dev _.ev a.ad ade ﬁnd d._._._
Nee edv o.a e.ad ddd _de dad odd edv a._e ade e.oe dde ddv v.ad 2% ..av d._d o.ad dee mde _. E.—
o.ve vdm ae a.od v.ed ddv age odv vdm d._d o.ed o.ve ddd ddv gdv e.av adv ddv edd v.ed mod «mob
x<2 7:2 Om m>< O a 0 Z 2 u— x a _ E O a m G U m < 0&8.
.3550

 

8:2on {.6on 353.89% Mica

av oEma

143

existed. Standard deviations ranged from 6 to 17 points, with an average of 9.
Differences between minimum and maximum scores within each topic were around 30 to
40 points.

Table 50 shows country ranks on each topic. Again, much variability existed.
Nine countries had a rank of one on at least one topic. This meant that these nine
countries had better performance than all other countries on at least one topic. Six
countries ranked last on at least one topic. Two countries had ranks that placed them ﬁrst
on at least one topic and last on at least one. Standard deviations of ranks were larger
across topics than they were across tests. They ranged from two to ﬁve places. Aside
from country M, the lowest difference between minimum and maximum ranks was six
places, with the next lowest being 10. All but four of the average ranks and ﬁve of the
modal ranks differed from the same country’s rank on the ﬁeld-trial instrument.

Table 51 shows differences between the ﬁeld-trial instrument total score and each
topic score for each country. The average of absolute score differences was eight points,
with a standard deviation of about four. The average minimum difference was 1.8 points;
the average maximum was 20 points. Average positive differences (score higher on ﬁeld-
trial instrument) were larger than the average negative differences (score lower on ﬁeld-
trial instrument). The average of the positive differences was 7 points; the average of the
negative differences was 5 points. The average number of countries with positive
differences within a topic was nine; the average number of countries with negative
differences was eight. Table 52 presents the same data for each country. This table
reports differences that range from almost nothing to 50 points. Numbers and averages of

positive and negative differences were about the same across countries.

Table 50

Country Ranks on Each Topic

1

44

 

 

 

 

Country

Topic A B C D E F G H l J K L M N O P Q
Test 9 3 6 14 12 10 11 16 7 l 5 8 17 15 2 13 4
1.1.1.1 5 1 9 14 15 12 8 16 4 7 2 6 17 13 11 3 10
1.1.1.2 10 2 4 14 13 12 7 16 5 3 8 9 17 11 l 15 6
1.1.1.3 11 7 6 15 8 12 4 9 13 2 10 14 16 5 3 17 1
1.1.2.1 8 3 5 12 15 13 9 14 6 2 7 10 17 11 1 l6 4
1.1.2.2 13 6 5 14 11 7 12 15 10 2 3 8 17 16 1 9 4
1.1.2.3 8 5 4 9 14 13 10 12 7 2 6 11 17 15 1 16 3
1.1.2.4 10 6 8 14 11 9 16 15 2 5 3 4 17 12 1 13 7
1.1.3.1 8 4 9 11 15 6 16 13 2 5 1 10 17 14 7 12 3
1.1.4.2 13 1 3 12 15 9 8 6 10 17 4 11 14 7 2 16 5
1.1.4.4 4 3 13 9 11 6 16 12 1 10 7 8 17 14 5 15 2
1.1.5.1 6 4 13 11 9 12 15 16 3 7 10 5 14 17 8 l 2
1.1.5.2 9 1 8 13 14 6 12 10 4 16 5 7 17 11 2 15 2
1.1.5.3 7 6 ll 12 14 9 15 13 4 10 8 1 17 16 2 5 3
1.1.5.4 l3 5 3 9 15 4 12 5 7 17 11 2 14 10 1 16 8
1.2.1 10 2 1 14 7 11 12 15 9 3 4 8 17 16 5 6 13
1.2.2 11 3 5 16 13 7 14 15 10 2 4 6 17 8 l 12 9
1.2.3 9 4 13 10 5 3 14 17 8 12 2 1 15 16 7 6 11
1.3.1 7 6 14 12 8 3 15 10 9 l 2 11 16 13 5 l7 4
1.3.2 11 3 6 14 8 7 10 17 9 1 5 4 16 13 2 15 12
1.3.3 14 6 9 15 10 ll 17 7 3 1 8 5 16 4 2 13 12
1.3.4 8 2 9 l4 7 6 11 17 12 1 3 4 16 13 5 15 10
1.4.1 9 4 5 14 12 6 13 17 11 1 2 8 16 10 3 15 7
1.4.2 15 3 9 16 14 13 12 8 6 1 4 5 17 11 2 10 7
1.5.1 9 12 3 13 16 14 9 7 11 1 5 8 17 6 2 15 4
1.5.2 8 5 3 12 14 13 ll 15 7 2 6 9 17 10 1 16 4
1.6.1 5 11 6 12 15 8 10 16 4 1 7 9 l7 l3 3 l4 2
1.6.2 13 4 8 15 14 12 7 11 9 1 6 10 17 5 2 16 3
1.7.1 5 4 10 13 12 7 14 16 9 l 3 11 15 17 8 6 2
1.7.2 6 11 12 13 15 4 9 14 5 l 7 3 l7 16 10 8 2
AVE 9.1 4.6 7.4 12.8 12.1 8.8 11.7 12.9 6.9 4.7 5.3 7.2 16.3 11.8 3.6 12.2 5.6
SD 2.9 2.8 3.5 1.9 3.0 3.3 3.2 3.6 3.2 5.1 2.6 3.3 1.0 3.7 2.9 4.6 3.6
MIN 4 1 l 9 5 3 4 5 1 1 l l 14 4 1 l 1
MAX 15 12 14 16 16 14 17 17 13 17 11 14 17 17 11 17 13
MODE 8 4 9 14 15 12 12 16 9 1 2 8 17 13 1 15 2

 

145

Table 51
Summary of Diﬂerences in Scores on the F ield- T rial Instrument and Scores on Each
Topic for Each Topic

 

 

 

AVE
Tqiic ABSa SD MIN MAX Ave +b Ave -° Count +b Count -°
1.1.1.1 7.83 4.54 1.13 19.28 3.77 -8.37 2 15
1.1.1.2 2.76 2.16 0.09 7.02 1.33 -3.54 6 11
1.1.1.3 9.75 5.42 0.86 21.65 11.69 -8.03 8 9
1.1.2.1 3.25 3.04 0.04 9.42 3.72 -1.70 13 4
1.1.2.2 5.34 2.62 1.50 10.72 0 -5.34 0 17
1.1.2.3 2.67 2.10 0.04 7.30 1.47 -3.17 5 12
1.1.2.4 19.06 5.26 4.30 28.62 19.06 0 17 0
1.1.3.1 6.54 4.12 0.94 14.70 5.39 -8.63 11 6
1.1.4.2 13.19 9.85 0.65 43.71 17.60 -11.35 5 12
1.1.4.4 27.92 7.83 8.47 42.81 27.92 0 17 0
1.1.5.1 10.66 5.97 0.41 23.17 0 -10.66 0 17
1.1.5.2 5.46 6.31 0.02 28.70 6.72 -1.39 13 4
1.1.5.3 7.68 5.24 0.78 19.57 4.83 -8.06 2 15
1.1.5.4 10.84 12.43 1.04 50.01 14.73 -6.46 9 8
1.2.1 13.20 4.18 4.26 20.19 0 -13.20 0 17
1.2.2 11.02 3.84 1.93 17.52 11.02 0 17 0
1.2.3 13.10 8.77 0.03 26.37 7.57 -14.29 3 14
1.3.1 8.42 5.64 0.04 19.10 9.97 -3.37 13 4
1.3.2 4.66 3.67 0.32 14.12 4.87 -3.66 14 3
1.3.3 9.85 5.63 0.77 20.95 10.72 -3.34 15 2
1.3.4 10.11 4.35 3.65 17.43 0 -10.11 0 17
1.4.1 4.61 3.19 0.33 9.99 5.19 -1.89 14 3
1.4.2 10.68 3.83 4.39 18.40 11.03 -5.01 16 1
1.5.1 14.93 4.33 7.68 23.74 14.93 0 17 0
1.5.2 9.18 2.40 4.91 12.43 9.18 0 17 0
1.6.1 3.03 1.96 0.20 8.62 3 .91 -2.41 7 10
1.6.2 8.87 4.11 2.52 18.24 9.09 -5.30 16 1
1.7.1 10.42 4.51 0.41 18.61 2.73 -10.90 1 16
1.7.2 5.04 4.44 0.25 16.12 5.82 -1.39 14 3
Average 9.31 4.89 1.79 20.29 7.73 -5.23 9.38 7.62
SD 5.2 2.3 2.3 10.4 6.5 4.2 6.3 6.3

 

8Average of the absolute value of the differences. bAverage/N umber of positive

differences. cAverage/Number of negative differences.

146

 

 

 

Table 52
Summary of Diﬂerences in Scores on the F ield- T rial Instrument and Scores on Each
Topic for Each Country

AVE
Country ABSa SD MIN MAX Ave +b Ave -° Count +b Count -°
A 9.20 6.27 0.55 21.39 10.52 -8 17 12
B 9.14 7.28 0.04 24.79 9.99 ~8.84 15 14
C 8.72 8.09 0.03 39.20 12.01 -6 14 15
D 8.83 5.82 0.52 23.73 9.97 -8 17 12
E 9.69 7.72 0.04 27.04 11.12 -8.60 16 13
F 9.13 6.30 0.02 25.48 9.56 -9.29 14 15
G 9.42 8.38 0.09 36.14 10.86 -8.13 17 12
H 7.63 6.37 0.09 26.81 7.82 -7.98 17 12
I 8.01 5.14 0.06 19.92 10.42 -7 13 16
J 10.72 13.29 0.41 50.01 16.05 -4.91 l6 13
K 9.52 6.78 0.33 30.06 9.57 -10.20 17 12
L 9.63 7.06 0.25 26.50 10.11 -9.74 16 13
M 8.19 6.82 0.11 29.75 8.10 -8.99 18 11
N 6.93 6.65 0.32 31.17 7.72 -6.54 15 14
O 7.23 5.98 0.37 32.20 8.84 -5.79 16 13
P 13.44 8.57 0.34 33.30 12.98 -15.90 20 9
Q 7.91 6.34 0.06 22.10 10.71 -6 14 15
AVE 9.02 7.23 0.21 29.39 10.37 -8.20 16.00 13.00
SD 1.46 1.76 0.18 7.23 1.96 2.43 1.68 1.68

 

aAverage of the absolute value of the differences. bAverage/N umber of positive

differences. cAverage/N umber of negative differences.

147

Table 53 presents a summary of differences between each country’s rank on the
ﬁeld-trial instrument and its rank on each topic. The average of absolute differences
across topics was 2.7 ranks. The average maximum difference was 8 ranks and the
average minimum difference was 3. An average of three countries for each topic had no
difference in ranks. Table 54 shows the summary across countries. The average number
of topics on which countries had no difference was ﬁve.

Table 55 reports topic ranks within countries. Seven of the topics had ranks of
one for at least one country. No topic had an average rank of 1. The highest average
ranks were for topics 1.1.5.1 Estimating Quantity and Size (5 - out of 29), 1.2.2
Perimeter, Area, Volume (3.6), 1.3.4 3-D Geometry (5.4), and 1.7.1 Data Representation
and Analysis (5.9). The lowest average ranks were for 1.1.2.4 Percentages (26.8), 1.1.4.4

Number Theory (28.2), and 1.5.1 Proportionality Concepts (25.6).

Performance Expectations

The TIMSS mathematics framework code not only contained codes for topics, but
also included codes for expected performance (See Appendix A). Textbook blocks were
coded with topic and performance-expectation codes as was each test item. Therefore,
country performance can also be evaluated in light of the performance expectations and
the combination of topic by performance expectation.

Table 56 presents the proportion of textbook blocks devoted to each performance
expectation. The highest of the average proportions was devoted to 2.1.3 Recalling
Mathematical Objects and Property (.313). This was followed by 2.2.2 Performing

Routine Procedures (.294), 2.3.3 Solving Problems (.114), and 2.1.1 Representing (.112).

148

Table 5 3
Summary of Diﬂerences in Ranks on the F ield- T rial Instrument and Ranks on Each
Topic for Each Topic

 

 

 

AVE
Test ABSa SD MAX Ave +b Ave -° Count +b Count -° Count 0d
1.1.1.] 3.4 2.8 10.0 3.6 -4 8 6 3
1.1.1.2 1.6 1.2 4.0 2.3 -2 6 8 3
1.1.1.3 3.8 2.7 10.0 5.3 -5 6 10 1
1.1.2.1 1.6 1.1 4.0 1.8 -2 8 6 3
1.1.2.2 1.5 1.3 4.0 1.9 -2 7 6 4
1.1.2.3 1.8 1.4 5.0 2.1 -2 7 ' 7 3
1.1.2.4 2.1 1.6 5.0 2.3 -2 8 6 3
1.1.3.1 2.7 1.6 5.0 2.6 -3 9 7 1
1.1.4.2 3.9 3.9 16.0 3.7 -4 9 7 1
1.1.4.4 3.3 2.6 9.0 3.5 -4 8 6 3
1.1.5.1 3.9 2.7 12.0 4.1 -4 8 8 1
1.1.5.2 2.6 3.5 15.0 2.9 -3 8 5 4
1.1.5.3 3.2 2.6 9.0 3.4 -3 8 7 2
1.1.5.4 4.6 3.8 16.0 5.0 -5 8 8 1
1.2.1 2.4 2.5 9.0 3.3 -3 6 8 3
1.2.2 2.0 1.7 7.0 2.1 -2 8 7 2
1.2.3 4.4 3.1 11.0 5.3 -5 7 9 1
1.3.1 3.2 2.1 8.0 3.4 -3 8 7 2
1.3.2 1.8 2.0 8.0 2.5 -3 6 5 6
1.3.3 3.5 3.2 11.0 5.0 -5 6 8 3
1.3.4 2.4 1.8 6.0 2.5 -3 8 6 3
1.4.1 1.6 1.6 5.0 2.8 -3 5 7 5
1.4.2 2.4 2.1 8.0 3.3 -3 6 7 4
1.5.1 2.8 3.2 9.0 4.8 -5 5 5 7
1.5.2 1.5 1.3 5.0 2.2 -2 6 7 4
1.6.1 1.9 1.9 8.0 2.3 -2 7 6 4
1.6.2 2.4 2.4 10.0 5.0 -5 4 10 3
1.7.1 2.5 1.9 7.0 3.0 -3 7 7 3
1.7.2 3.3 2.5 8.0 3.1 -3 9 6 2
Average 2.7 2.3 8.4 3.3 -3.3 7.1 7.0 2.9
SD 0.9 0.8 3.3 1.1 1.1 1.3 1.3 1.5

 

aAverage of the absolute value of the differences. bAverage/Number of positive

differences. cAverage/N umber of negative differences. dNumber of ranks with no
difference.

149

Table 54

Summary of Diﬂerences in Ranks on the F ield- T rial Instrument and Ranks on Each
Topic for Each Country

 

AVE
Country ABS” SD MAX Ave +b Ave -° Count +b Count -° Count 0d

 

 

A 2.3 1.7 6.0 0.0 -3.0 13 12 4
B 2.2 2.3 9.0 1.5 -3.1 6 18 5
C 3.0 2.2 8.0 2.3 -4.3 11 15 3
D 1.6 1.5 5.0 2.6 -1.4 16 5 8
E 2.6 1.5 7.0 3.4 -2.4 11 16 2
F 3.0 1.7 7.0 3.7 -2.3 17 12 0
G 2.6 1.8 7.0 2.7 -3.1 11 16 2
H 3.3 3.4 11.0 4.7 -1.0 20 4 5
I 2.7 1.7 6.0 3.2 -3.0 13 13 3
J 3.5 5.0 16.0 0.0 -5.9 0 18 11
K 2.1 1.5 6.0 2.2 -2.8 13 13 3
L 2.6 2.1 7.0 3.9 -2.5 13 11 5
M 0.6 1.0 3.0 1.7 0.0 11 0 18
N 3.7 3.1 11.0 4.8 0.0 21 7 1
O 2.1 2.5 9.0 1.0 -4.2 8 13 8
P 3.6 2.8 12.0 5.9 -2.6 11 16 2
Q 2.8 2.6 9.0 1.7 -5.0 11 13 5
AVE 2.6 2.3 8.2 2.7 -2.7 12.1 11.9 5.0
SD 0.7 0.9 3.0 1.6 1.5 4.8 4.9 4.2

 

2'Average of the absolute value of the differences. bAverage/Number of positive

differences. cAverage/N umber of negative differences. dNumber of ranks with
no difference.

150

 

 

 

ON O O O.O2 2 O ON ON 2 2 O2 2N O2 O2 O2 O2 2 O2 NN ON 2 N.O.2
O2 2 O O.O N O 2 2 O O O O O O O N O O O O 2 E2
ON O O O.ON ON ON ON O ON ON ON 2 ON 2 2 ON ON ON N 2 ON N.O.2
2N O O O.N2 O 2 2 2 O O2 O2 2 2 2 N2 2 O2 22 2 2N O 32
ON 2 N O.2N NN 2N 2 N 2 ON ON ON ON NN 2N ON ON ON ON ON ON N.O.2
ON NN N O.ON ON NN ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON 2.O.2
ON O O 2.NN ON 2 ON ON 2 N 2N O ON ON NN ON ON ON ON NN ON N.O.2
ON O O N.O2 2N 2 ON 2 2 2 2 O NN ON ON 2 O2 2 O2 O2 2 32
O2 2 O O.O 2 O O O O O O 2 O 2 O O O O O N O O.O2
ON O O O.ON ON 2 NN 2 N 2 ON O 2 O2 ON ON 2 ON ON ON ON O.O.2
ON O O N2 ON 2 2 2 2 22 2 O 2 ON 2 2 O 2 2 O2 2 N.O.2
ON O O O2 2 ON ON ON ON ON O 2 2N 2 ON 22 2 N ON ON 2 22
ON 2 O S 2 2 22 ON N 2 2 2 O ON O 2 2 N 2 O O O.N.2
ON O2 O O.ON ON ON O2 ON ON NN NN ON ON ON ON NN ON ON ON ON ON N.N.2
O 2 N O.O O O O O O O N N O N O O N O 2 O O 2.N.2
ON O O O2 2 ON O O O2 O ON ON 2 O 2 O ON 2 O 2 NN O.O.2.2
NN N O O.O O O O 22 O N 22 NN O O 2 2 O O 2 O O O.O.2.2
ON 22 O 2.2 2 2 ON 2 2 O2 2 ON 22 N2 2 O2 2 O2 2 2 O2 N.O.2.2
2 2 O O.O O N 2 O 2 O O 2 2 O O O O 2 O O N 2.O.2.2
ON ON N N.ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON ON O.O.2.2
ON 2 O N.2 O ON 2 2 22 2 O ON O2 2 N O NN 2 N 2 2N N.O.2.2
ON N O O.O2 O 2 2N NN NN 2 O 2 N N ON O 2N 2 O2 2 N2 2.O.2.2
ON 2 O O.ON ON ON 2 ON ON ON ON ON ON ON ON ON ON ON ON ON ON O.N.2.2
2 O O O22 22 22 O 2 O 2 2 22 2 O 22 O2 2 O 2 O2 2 O.N.2.2
2 N O NO O O N 2 2 O 2 O 2 O 2 2 O O O 22 22 N.N.2.2
ON O O O2 O2 O2 O 2 ON ON 2 O2 2 O2 2 2N O2 O2 O2 O2 O2 2.N.2.2
ON 2 O O.O2 2 ON O N ON ON ON O ON O 2 ON O NN 22 2 O2 222
2 O O N.22 O2 O2 2 O 2 2 O2 2 O 2 O 2 22 2 O O O N.2.2.2
2 O O O.O 2 O 2 O O O O 2 O 22 O O 2 O O O O 2.2.2.2

522 222 9O 23.. o .2 o z 22 .2 2 O 2 22 o 2 O2 2 0 O2 < 02222.

2222200

 

828a. maﬁa \o 228% 522269-235:

mm 225.

151

 

 

 

 

a2 ddd dmo So So Mdmo dvmo deo ddmo vvdo mvvo demo vddo odvo eado End and mvdo d2v.o memo ovv.o avdo 522
dd So 36 moo ao.o vao.o _o_.o av_.o md_.o vm_.o :o maoo ed_.o d2.o do_.o adoo 2:.o mv_.o eaoo edoo v__.o am2.o Om
d.v_ m~.o moo eo.o ao.o eeo.o ddoo adoo vd_.o do_.o ddo.o vdoo ddoo daoo deo.o odoo ddoo edoo 2eo.o avo.o ddoo meoo owﬁo><
22 doo oo.o doo 2o.o d2o.o 2oo.o o Mdoo dooo o eoo.o o o dooo Coo o o dooo edoo dooo ooo.o vdd
v2 d2.o 2o.o vo.o moo 22o dSo o deoo adoo :oo adoo mood mooo ddoo ddoo o o a2o.o amoo o2o.o aoo.o mdd
3 So So vo.o moo :oo o o av2.o a2o.o vooo d2o.o aooo avo.o ooo.o 326 o o wood wood 2oo.o mood ddd
3 So So vo.o moo oao.o o dooo v2o.o dm_.o mood 2do.o _oo.o dv_.o o d2o.o dooo o d_o.o dmoo ddo.o a2o.o 2dd
d 2o.o oo.o oo.o oo.o o o o o o o dooo o o o d2o.o o o o d2o.o dooo 2oo.o e.vd
2 Ed 5o moo moo d2o.o o :oo dd2.o dvoo o ddoo 22o :oo ddoo a2o.o o o _oo.o Mdoo ooo.o eoo.o d.vd
2 do oo.o doo doo d2o.o 2oo.o o dodo oaoo o2o.o dmoo voo.o o 2oo.o o2o.o o o dooo a2o.o dooo 2oo.o v.vd
.2 So So moo doo edo.o o o vao.o d2o.o 2oo.o edoo o o 2do.o 2o_.o o o2o.o d2o.o m2o.o v2o.o 2oo.o m.vd
2 do So ao.o vo.o dooo o o dvdo wood wood ddoo dooo dvoo dooo dd_.o o o odoo d2o.o eoo.o :oo d.vd
Q 2o 86 vo.o moo 2oo.o dooo o dvoo 23o o ao2.o o2o.o o ev2.o veoo o mood amoo aao.o o dmoo _.vd
d2 ao.o So moo doo e.oo dooo o ddoo eoo.o o ddoo aooo dooo eoo.o mdoo o o vooo ddoo o 2oo.o ddd
e2 moo eo.o So So 28o mooo 2oo.o e2o.o a_o.o mooo aooo dooo dooo _oo.o dooo a2o.o o d2o.o dooo 2oo.o mood vdd
a2 dmo doo o_.o 22o daoo 22o ddoo daoo dd2.o mdoo aa2.o v2o.o dd2.o eado aaoo ado.o vdmo daoo vmoo emoo adod mdd
d— So So moo doo m2o.o o 2oo.o dmoo 2do.o o daoo dooo Mdoo dooo do_.o dooo :oo m_o.o vdoo dooo _oo.o ddd
e. do ao.o eo.o vo.o aooo amoo dooo ddoo vm2.o eoo.o daoo mvoo d2o.o _oo.o 2ddo o aeoo aooo 2mo.o eoo.o 2oo.o 2dd
e2 ddo ao.o doo doo m2d.o 2o2.o 2mo.o dd_.o eao.o deoo deoo ddoo dmoo dddo 2oo.o adoo o ad2.o adoo mdoo deoo mdd
e2 vdo ado v_.o ado vedo dado mmnd e2v.o vvdo ando demo addo ev2.o dddo maoo 2dmo o d2v.o memo eado End ddd
e2 26 moo moo vo.o mmoo a2o.o vvoo wood ddoo vao.o edoo aoo.o a2o.o a2o.o _oo.o mmoo o 2ao.o emoo om_.o amoo 2dd
e2 ddo dmo So So mdmo dvmo dovo dddo ddmo mvvo avdo vddo odvo o ed_.o aamo eoo.o dd_.o ddoo ovv.o avdo n.2d
e2 omo doo ao.o vo.o dmoo edoo o dado avo.o o2o.o ddoo 2oo.o mvoo 28o o2o.o v2o.o mood a2o.o eeo.o aoo.o ddoo d._d
a2 vdo moo So So emoo vdoo dooo aado ad2.o 2do.o d2o.o dooo em2.o ddoo 2om.o Mdoo mvdo aZo mdoo dooo vmoo 2.2.2....
222280 E ZOE Gm m>< O a 0 Z 2 .2 v2 2. 2 I O ..2 m D U m < mDOU
b25200

 

A2589 «96% 3 825.8me mozcﬁxoxsk $62M 8 238622 238% «6328a \o Ozotuomoxk

ed 2an

152
The average of the proportions of textbook blocks for 2.4.6 Axiomatizing, 2.3.4

Predicting, and 2.5.4 Critiquing were small (.002, .008, .009 respectively). Aside from
the most frequently reported performance expectations, countries clearly had different
levels of expectations. Performance Expectation 2.1.1 Representing and 2.3.3 Solving
were the only performance expectations included in the textbooks of all countries

Tables 57 and 58 show country performance on items with the same performance-
expectation codes, regardless of content. As with topic scores, much variation in
performance clearly existed. For most countries, differences in minimum and maximum
scores on performance expectations were 50 to 70 percentage points. Differences in
minimum and maximum ranks were around 10 ranks. Six countries had a rank of 1 on at
least one performance expectation. Average ranks for each country differed from average
total score ranks. Also, ranks on the two performance expectations included in the
intersection of all countries (2.1.1, 2.3.3) differed from the ranks on the ﬁeld-trial
instrument for many countries.

Tables 59 and 60 present the same information as in Tables 57 and 58, except that
performance expectations have been grouped into six categories: knowing (2.1 and
2.5.1), performing routine procedures ( 2.2.1 and 2.2.2), performing complex procedures
(2.2.3), problem solving (2.3), reasoning (2.4), communicating (2.5 - except 2.5.1).
Again, some striking variation exists. Country J ranks ﬁrst in all categories except
category 5 communication. Country K ranks ﬁrst on this category. Country C had lower
ranks on category 3 performing complex procedures and category 5 reasoning. Country
0 had its lowest rank on category 1 knowing, while country Q had its highest rank in this

category.

153

 

 

 

d.va v. 2 m a.vm d.vm vde d.vm m.vm e. 2e v.am mde m.av d.oa dde d.va Nae d. E. a.ee 582
2dm o.o Won dd~ odm od a.v~ odv add o.o edd ddv edm a.av o.ov ﬁvv o._v £2
a.e 2d v.e v.a Ne a.a m.a md m.a a.a~ 2e N.a dd e.a ad a.e me Om
Nee dd_ d.vv e.vd mdm m.a_ mdm add adv v.om mdm med edv d.ve mdm v. _e med o><
v.ee v._m add odd add adm v.3. mdm odv mAm mdm mde _dm a._e add v.ee mde O
d.va me vdv _dd o.om me New dam o.ov m. _ m e. _ m ad m a.am d.va a.av aem mdm A
d._a a.vd add ﬁvm ddm a.vd adm odd v.am add m.av mde ede mdm odd N._a 2de 0
2mm o.o 2dm dd; m.ev edd ddm d.av 26v o.o v.vm Ndv _.av a.av m.av _dm a.om Z
vdm d.v mdm dd2 odm od a.v2 odv Ndd d.v edd ddv edm vdm o.ov _.vv o. v .2
2.va v.e2 m.av _.vm e.om v.e_ d.ad add ﬁvv mde e.ov a.am N.av _.va mdm m.ve dew A
v.va odd add d.vm Noe o.om e.om dde adv ddv a.em vde add v.va vdm a.ae 2de v_
dda a.a_ a.vm odd vde d.vm m.vd Noe a.am a.a_ adv d.oa dde dda Nae o.ee a.ee 2.
o.oe Ned a.av _.vm N.av _.ad Ned a.vm edv mdm ddm mdm N._m a.am _.vm o.oe odd 2
ddm o.a_ mdm e. _d Ndv o.a_ v.vd o.av e.am m.a_ _. 2 m mdv _. _v ddm o.av ddm m.av I
_.am a.a o.ov vd— e.om d.v_ a.a2 a.om d.am a.a mdm .dm a.ev mdm o.vm mam vdm O
o.ae ed mdv vd_ m6m e.a_ o.ed vdm adv ed a.oM mdm m.ev o.ae edv e.ve edm m
ade e.e_ m.ov ddd a.vv e.e_ m6. e2m e.am ddd a._m add _._v ade add vdm m._m N
_.am dd— o.ov v.od Ndv md_ ed2 adv a.em a.vv mdd odd _dv 2am adv o.em add 0
2.2a md2 mdv edm mdm md~ a.ad vdm ddv ed_ mdm aem mdm :a add dae _.oe U
mda mdd e._m Nam a.ad mdd odv e._e N.av odd aem vde add dda vdm dde ade m
dde ad_ m.vv mdd _dv ad~ m.vd e._m adv a.vm ddm dam a.av mde add d.am d.am <

a 2 d m a de 2 o. _e oa m mm 2 ad mac: m2
3 7:2 m>< mdd ~dd m.vd m.vd vdd mdd ddd ﬁmd mdd ddd ~dd mdd N._d _._d 5:300

 

mCO Sgooaxm Doggotonm

 

828m. totedommﬁddonoﬁxetask

ad 053.

154

 

 

3 N e.v m e N d v e v v d v 3 d m d 0
e2 2 e.o2 d 2 e2 3 e 2 a m2 22 e2 2 N2 M2 N2 2
v2 2 dd d v v v m 2 d 2 N d v2 m 2 d 0
a2 m ed2 a2 2 m m 2 22 a2 d e2 3 a2 22 e2 2 Z
a2 e d.e2 e2 a2 a2 a2 a2 a2 e2 a2 a2 a2 e2 a2 a2 a2 22
N2 2 e.e d a 2 a a d 2 m e a m a d a ..2
a 2 ad 2 d a e d v e m m e N v d m 2
d2 2 vd a 2 2 2 m d d2 d 2 2 e 2 e 2 2.
22 d 2.a d 22 d. a d m d E d a 22 a a a 2
e2 o o.v2 22 m2 222 2 e2 2 2 v2 2 2 2 e2 2 e2 I
2 a a.22 2 d v2 v2 «.2 m2 2 a 2 22 2 d 22 22 0
2 a v.o2 v2 a d 22 a 3 v2 2 a Q a 2 a 222 2
e2 d 2d2 d2 v2 22 e2 22 v2 2 N2 N2 v2 d o2 2 v2 m
e2 m ad2 m2 m2 m2 2 v2 e2 m e2 v2 9 2 v2 v2 2 0
2 m d.a 3 m 2 d 3 a 22 22 2 m m e v e U
a 2 2.v e m e m 2 m a e v d v m d v m
N2 a d.a a 222 a d2 Q a d a a d a 22 S d <
g 2.222 m2>< mdd 2dd mvd mvd vdd mdd Ndd 2dd mdd Ndd 2dd m.2d d.2d 2.2d 5:200

 

8022228me 858202.52

 

2.222% tomotmmxméotoﬁxotbmk

dd 02222.

155

 

 

 

”on m.2m o.m2 ﬁvm a.vm v.ev me ”on 0.3 2.8 x32
N.Nv o.2 2 m.o2 vdm 22.9 0.2 2 w.wm N.Nv N.vm ﬁov :22
m6 w.m v.2 m6 v.2. a.a 22.2. N2. .22. ed Om
2.5m m. 2N m.m2 v.av e.vm a.vm Wmv Wow v.av Nmm o><
e.oe m. 2 m 2.22 2.wv o.mm m.2m bsv mdo 0mm e.oe O
own N222 w.m2 32m 2.mm N222 mdv Qmm v.ev a. 2 m a
OS 2.6m v.m2 O.Nm 2.vm ﬁom 3% «:3 «do 22.2w 0
vdm 22.2 m. 22 2.5m w.N2 hdm mdm N.mv mﬁv vdm z
N.Nv o. 2 2 v.2 2 v.wm N.m2 o. 2 2 w.mm N.Nv N.vm Eov 22
5% v.vm m. 22 v.vv 2.vm v.vm wvv 2.5m o.ov Qmm 2
v.ae v.em v.N2 a.vv a.vm v.em va v.8 2.vm ”29 v.
wdn QMN od 2 v.vm odm o.ov mém wdn 0.3 2.8 2.
m.wm 0.3 :22 0.3 2.vm o.om v.vv m.wm w. 2 m 9mm 2
va v. 2N W222 mém o. 2 N v. 2N o.om m.wv Omv va I
N.vm v.m2 o.v2 me v.m2 22.2.2 a.om 2.mm m.2.v a.vm 0
ham v.2: 2.52 mdv v.w2 vdm a.2v m.wm wﬁv «.mm ...2
Qmm 2.2.2 2.2 v.3. wdm 2.2 Q? mam o.mv 2.Nm m
o. 2 m 2.02 0.2 mém v.em 2.02 0.5m o. 2 m 2.mv 2.22m n2
9% 0. 2m m.v2 Wmv 0. 2m QNN m.vv o.om mdm 0.22m 0
v.ae mdm m. 2 2 Qwv mam vdm v.av v.ae Nmm w. 20 m
22.? N.NN 2.m2 v.av mam N.Nm vdv wﬁm 2.ov v.vm <

2. m ow 20 m2. 5 2282 222
29.22 2232 am m2>< 52282:: chOmmom wc2>2om 8522805 $522882 $223052 $550

.5800 82995 52229200 0:22:32

 

 

motowouao 2522989212 comma—.8282

 

3.89m. bemﬁab =83~83Q$§c§3§mm

on 2an

156

 

 

:2 m.v m m m w v m 0
m.N o.N2 w 22 22 22 02 2 n.
22.2 m.N N v 2 N N v 0
9m w.N2 2.2 m 2 o2 N2 v2 2
md ”62 02 2.2 2.2 2 n2 n2 .2
2.N m6 N w n o o n .2
Q2 o.v 2 n v m c m V.
5N m.N o 2 N 2 2 2 2.
0.2 No N o o w n 22 _
22.2 N.v2 22 N2 $2 2 2 $2 I
0.2 m.N2 m2 v2 N2 m2 22 02 0
ON N.o2 v2 o 222 n 222 22 ..2
2.2 N.m2 N2 2 v2 N2 v2 N2 m
22.2 m.v2 m2 222 2 v2 22 m2 0
v.N 2: O2 222 w E m o 0
v.2 2mm c N m v m N m
2.2 w.» n 22 a a w a <
Qm m2>< 52282 $225832 w=2>2om 8.52585 $522305 $2230.02 9550
-EEESU 6022282 53:80 0:22:32

 

motowoumo cotﬁooaxm oocagotom

 

 

SEEN A3339 =o28§§xﬂ$§a$3xwmn~

co 2an

157

I used the textbook information on performance expectations as well as the
combination of performance expectation by topic and constructed unique tests for each
country. The results were in Table 61. Again, variation clearly existed when tests were
developed that closely match the topics students were taught and the performance that
was expected of them. Scores and ranks on these tests for some countries are different
from those on the ﬁeld-trial instrument. Some of the scores are 10 or more points

different, and some of the ranks are up to 7 places different.

158
Table 61

Country Performance on Unique Tests based on Performance Expectaions and Topics
Crossed with Performance Expectations

 

 

 

 

Field Trial Scores Field Trial Ranks
Scores PE Unique CxPE Unique Ranks PE Unique CxPE Unique
A 50.5 48.9 49.6 9 5 9
B 56.4 55.3 57.1 3 3 3
C 53.6 39.8 53.0 6 13 5
D 45.2 43.2 44.1 14 12 15
E 45.9 44.3 47.0 12 9 12
F 49.6 46.1 48.1 10 6 10
G 48.1 33.9 46.9 11 16 13
H 43.5 34.0 41.2 16 15 16
I 52.8 44.9 53.0 7 7 6
J 64.0 56.8 65.3 1 2 1
K 56.0 43.9 51.5 5 10 7
L 51.5 52.1 50.8 8 4 8
M 35.4 28.7 33.5 17 17 17
N 45.0 35.1 47.4 15 14 11
O 61.9 57.0 58.9 2 l 2
P 45.8 44.8 46.2 13 8 14
Q 56.4 43.3 54.1 4 11 4

 

CHAPTER V

Discussion, Summary, and Recommendations
I set out in this study to answer the following questions:

1. How much variation in content exists across the 17 nations in the mathematics
curricula for 13-year-old students? How well does the content of the TIMSS ﬁeld-
trial instrument match these curricula?

2. What test speciﬁcations provide a curricular match across countries? How well does
the content of the TIMSS ﬁeld-trial instrument match these test speciﬁcations?

3. What test speciﬁcations would improve the content match between the TIMSS ﬁeld-
trial instrument and the countries’ math curricula? How well do these speciﬁcations
match the curricula?

4. How stable are country scores and ranks across tests that increase the correspondence
between the TIMSS ﬁeld-trial instrument and the curricula of the 17 countries? How
stable are country results across topics and performance expectations?

Each of these questions is discussed below. Following the discussion is a summary of

conclusions and recommendations for future work.

How Much Variation Exists in Curricular Content?
A surprising amount of variation in curricular content is present both within and
across countries as well as within and across data sources. However, some commonalties

do exist.

159

160

Variation in Coverage for T0pics within Each Data Source

In the expert-topic-mapping data source, for example, just over half of the 44
topics in the mathematics framework are intended for inclusion in instruction at age 13 in
at least 70% of the countries; however, only one topic is intended for inclusion in all
countries. Likewise, only 3 topics are intended to be excluded from instruction at age 13
in all countries; two topics are intended to be excluded in all but one or two countries.
The average number of countries intending instruction on a topic is 11 (65%); on the
other hand, the average number of countries intending that a topic receive special focus is
only 4.

The patterns in the other data sources are similar. The average number of
countries including a topic in their corresponding curriculum-guide and textbook sample
is l 1. No topic is excluded from the curriculum-guide samples of all countries, and only
one is excluded from all country textbook samples. Only two topics appear in the
curriculum-guide samples of all countries, and only three appear in the textbook samples
of all countries. In both these data sources, less than half of the topics are included in the
document samples of at least 70% of the countries. Variation in topic emphasis is about
twice as large in the textbook-data source as in the curriculum-guide- or expert-mapping-
data sources.

Several topics were rarely included in the curriculum. These topics are the
Number topics of Binary Arithmetic (1.1.4.1), Complex Numbers (1 .1.4.3), and
Systematic Counting (1.1.4.5) as well as the Calculus topics of Inﬁnite Processes (1.8.1)
and Change (1.8.2). These are generally included in the data sources of only four or less

countries each. On the other hand, Basic 2D Geometry (1.3.4) is included in the expert-

161

topic-mapping- and textbook-data sources of all countries, and Equations and Formulas
(1 .6.2) is included in the curriculum-guide and textbook samples of all countries. Other
topics with high inclusion rates across data sources are Polygons and Circles (1.3.3), 3D
Geometry (1.3.2), Proportionality Problems (1.5.2), Patterns, Relations, and Functions
(1.6.1), and Data Representation and Analysis (1 .7.1). These appear in the data sources
of around 15 to all 17 of the countries each. In each of the data sources, at least 10
countries include Other Content (1.10). Overall, the topic that seems to have the highest
inclusion rate and most emphasis across data sources is the algebra t0pic 1.6.2 Equations

and Formulas.

Variation in Topic Coverage for Countries within Each Data Source

Topic inclusion and emphasis also vary across countries within each data source.
The difference between the largest number of topics included by a country in a data
source and the minimum number of topics in the same data source is around 25 to 30
t0pics. However, the average number of topics included. across the country-data sources
is 28. The average of the average proportion of emphasis countries devote to topics
included in the expert-topic-mapping and curriculum-guide-data sources is .04, or about 7
class periods. The average proportions of emphasis for countries across all data sources
range from .02 (4 class periods) to .09 (16 class periods). The average proportion of
textbook blocks devoted to topics included in textbooks is .05, with a range of .03 to .09.

The countries include different numbers of topics in each of the three data
sources. At the most extreme, country N has an inclusion difference of 22 topics across

data sources (39 topics included in the expert-topic-mapping and 17 topics in the

162

curriculum-guide-data sources). On average, countries included about 18 topics in all
three data sources, and 7 in none of them. Thus, countries had an average rate of

agreement of topic inclusion across data sources of just over half the topics.

Potential Explanations of Variation

Variations in topic coverage across countries is to be expected. Countries
approach schooling in different ways. Some cover many topics over a period of many
years, and others focus on select topics for shorter periods of time. Some countries begin
with the “basics,” adding topics only as students grasp necessary concepts; others want to
continually challenge their students. Schooling at age 13 is just a slice in the pattern of
schooling that for students in most countries began eight years earlier. However, the
commonalties in topic inclusion and exclusion (e.g., the focus on geometry and algebra,
the exclusion of certain complex numbers and calculus topics) show that there may be an
underlying pattern of mathematics sequencing followed by most countries.

What is surprising to see is the great variation within countries. Some of this may
be explained by the differing roles of curriculum guides and textbooks across countries as
well as the structure of the educational systems. It is useful to think of educational
systems as falling along a continuum of centralization. More centralized systems have
common curricula that are often mandated. In these cases, curriculum guides, textbooks,
and often lesson plans, may be written from the same “blueprint.” One would expect to
see higher agreement among documents in countries with highly centralized educational
systems than in other less centralized systems. Less centralized systems often leave

curriculum development completely up to local authorities, resulting in collections of

163

curriculum guides and textbooks within the country. As a result, it may be more difficult
to ﬁnd agreement of topic coverage across data sources in less centralized countries.

Additionally, the experts who completed the expert topic mapping in some
countries may have had better knowledge of their country’s curricula than experts in other
countries. It is also possible that expert-mapping data in a particular country may reﬂect
recent reform movements; movements that may not have found their way yet into the
curricular documents. On the other hand, new curriculum guides may have been written
in a country that has not gone through the process of writing new textbooks.
Furthermore, in some classrooms, textbooks may be used only as a resource, while in
others they may provide a daily map for instruction.

The variations in curricula, and potential reasons behind these variations,
demonstrate the need to consider multiple sources of information to obtain a complete
picture of curricular intentions across countries. These variations in patterns both within
and across countries need to be considered in test development. Additionally, the
variations stress the need for test developers and researchers to be speciﬁc about the
inferences they intend to make from the tests they develop or use. Tests that are used to
demonstrate student “achievement of mathematics cuniculum” may face criticism
concerning their validity because “achievement of mathematics curriculum” is Open to

many interpretations.

1 64
How Well Does the Content of the F ield-Trial Instrument Match the Content of the

C urriculum-Data Sources?

Topics

Deﬁnite gaps exist between country-level treatment of topics in each data source
and topic inclusion on the ﬁeld-trial instrument. However, three of the topics with the
highest coverage across data sources (1.6.1 Patterns, Relations, Functions, 1.6.2
Equations & Formulas, 1.7.1 Data Representation & Analysis) also have high numbers of
items on the ﬁeld-trial instrument. Topic 1.6.2, which is heavily emphasized in all three
curriculum sources, is one of the topics included in the ﬁeld-trial instrument that is also
included in most data sources for most countries (Table 12). On the other hand, the three
geometry topics that are prevalent in the curriculum sources (1.3.2 Coordinate Geometry,
1.3.3 Polygons Circles, 1.3.4 3D Geometry) have few items on the ﬁeld-trial instrument.
Topic 1.3.4 is covered by only four items even though over 70% of the countries include
it in each data source. Other topics that deserve more items are 1.1.3.1 Negative
Numbers, 1.1.3.3 Real Numbers, 1.1.4.2 Exponents, and 1.1.4.4 Number Theory. Topics
deserving less items are 1.1.5.1 Estimating Quantity and Size and 1.7.2 Uncertainty and
Probability. The highest proportion of test items on the ﬁeld-trial instrument is for topic
1.1.2.1 Common Fractions which appears in the curriculum-guide samples of only nine
countries. This topic also has an average of 11% more emphasis on the ﬁeld-trial
instrument than across the curriculum sources.

On the other hand, none of the topics previously mentioned as having low

coverage across data sources (1.1.4.1, 1.1.4.3, 1.1.4.5, 1.8.1, and 1.8.2) are included on

165
the ﬁeld-trial instrument. However, topics 1.1.3.1 Negative Numbers and 1.1.3.3 Real

Numbers also have no items on the ﬁeld-trial instrument but are prevalent in the

curricula.

Data Sources

Determining to which data source the content ﬁeld-trial instrument is most similar
depends upon how one chooses to evaluate the similarity. On average, 87% of the items
on the ﬁeld-trial instrument test topics included in the textbook-data source; only 55% of
the items test topics in the aggregate-data source. The data source with the lowest
coverage on the ﬁeld-trial instrument is the curriculum-guide-data source. The expert-
mapping data source has the best correspondence of topic inclusion (or non-inclusion)
with the ﬁeld-trial instrument. The worst correspondence of topic inclusion with the
ﬁeld-trial instrument is for the aggregate-data source. The difference between topic
emphasis in each data source and topic weight on the ﬁeld-trial instrument is fairly even
across all data sources, except that the textbooks and aggregate-data sources tend to have
higher emphasis on certain topics than did the ﬁeld-trial instrument. The largest
correlations between topic-weight proﬁles on the ﬁeld-trial instrument and topic-
emphasis proﬁles in the data sources are between the ﬁeld-trial instrument and the
textbook- and aggregate-data sources; however, this is likely due to the fact that more
variation exists in the proportions in the textbook-data source as opposed to the other data
sources. The lowest correlations are between the ﬁeld-trial instrument and the

curriculum-guide—data source.

166
The largest Euclidean distance between topic-emphasis proﬁles for each data

source and topic-weight proﬁles for the ﬁled-trial instrument is between the topic weights
on the ﬁeld-trial instrument and topic emphasis in the textbooks. Other Euclidean
distances are fairly similar to one another.

The Euclidean distances, which take into take into account topic means, standard
deviations, and rank ordering of topic emphasis within countries, indicate that the
textbook would seem to have the overall poorest match of topic coverage to the ﬁeld-trial
instrument. However, not all “mis-matches” are the same. One type of topic-coverage
mis-match occurs when students receive instruction on more than is tested; the other type
of mis-match occurs when students are tested on topics they have not been taught. These
have different consequences for validity. Mis-matches between the ﬁeld-trial instrument
and the textbooks generally result when more emphasis is placed on certain topics in the
textbooks than topics on the ﬁeld-trial instrument. This represents the ﬁrst type of mis-
match (instruction on topics not tested). However, the higher correlation between topic-
emphasis proﬁles in the textbooks and topic-weight proﬁles on the ﬁeld-trial instrument
suggests that the relative ranking of topics on the ﬁeld-trial instrument is more similar to
the textbook topic rankings than the rankings from the expert-mapping and curriculum-
guide-data sources. The same is true of the ranking of topics in the aggregate-data source
for each country, but, on average, the aggregate-data sources contain fewer topics than are
contained on the ﬁeld-trial instrument. The curriculum-guide-topic proﬁles have a poor
correlation with the ﬁeld-trial-instrument-topic proﬁles because their proﬁles are
basically ﬂat — all topics included in a country’s curriculum-guide sample have the same

proportion. Correlations probably are not the best measure for the similarity between the

167
content of the curriculum guides and the content of the ﬁeld-trial instrument. The expert

mapping seems to fare well across all analyses.

Countries

The correspondence between topic coverage on the ﬁeld-trial instrument and in
the data sources varies also across countries. The ﬁeld-trial instrument covers about 70%
to 90% of what most countries include across the three data sources. For one country (D)
almost 100% of the items on the ﬁeld-trial instrument test topics that are included in the
curriculum; however, for another country (J), less than 50% of the items test topics that
are in the curriculum. On average, 20% of the items on the ﬁeld-trial instrument test
topics not included in the curriculum of a country. On the other hand, an average of 10%
to 30% of the curriculum of the countries is untested by the ﬁeld-trial instrument. Overall
negative and positive differences between each country and the ﬁeld-trial instrument in
emphasis are fairly even, although variation exists across countries and data sources. A
difference of .40 exists across countries between the highest and lowest average
correlations between topic-weight proﬁles on the ﬁeld-trial instrument and topic-
emphasis proﬁles in each of the data sources (SD of .07). The average correlation is only
.36. Differences in Euclidean distances between topic—weight proﬁles on the ﬁeld-trial

instrument and topic-emphasis proﬁles in each data source are .20 (SD .04).

Conclusions about Test-to-Curriculum Match
The summaries of most indices show reasonable correspondence between topic
coverage on the ﬁeld-trial instrument and in the data sources. On average, 75% of tested

items are in the curriculum-data sources; an average of 84% of the curricula are tested on

168
the ﬁeld-trial instrument; an average of a 68% of the topics are included on the ﬁeld-trial

instrument and included in the curriculum or not included in both; an average of only an
8% absolute difference exists in topic emphasis (4% positive and 4% negative).
However, some data sources or countries do not show as much consistency. The ﬁeld-
trial instrument is more similar to some data sources in topic coverage than others.
Therefore, a ﬁnal evaluation of content validity depends upon the purpose of the test and
which curriculum source represents the most appropriate comparison. A more serious
problem is the differential match in topic coverage of the ﬁeld-trial instrument to each
country. Such differences raise serious questions about validity. If the test content is
more similar to the curriculum of some countries than it is to the curriculum of others, it
would constitute a better measure of the intended or potentially implemented curriculum
in those countries. Comparisons referenced to such a test would be difﬁcult to interpret

across countries.

How Does the Content of the Test Blueprints Compare with the Content of the F ield-T rial
Instrument?
Focus of the Test Blueprints
The test blueprints described above were based on different testing purposes and
differed in two respects. First, they differed in the curricular domain they represent. Test
blueprints based on the expert mapping and curriculum-guide-data sources represent the
intended curriculum of the nations. Test blueprints based on the textbooks and aggregate-

data sources represent the potentially implemented curriculum of the nations.

169
Second, the blueprints differed in speciﬁcity of the intended inferences. Intended

inferences from tests developed using the union-test blueprints relate to student
achievement of the topics covering the full range of math topics. They represent all of
what is possible in the mathematics curriculum. Tests based on the 70% intersections
yield inferences related to achievement of “prevalent” math topics for 13-year-old
students. Tests based on the strict intersections yield inferences related to those topics
that all countries ﬁnd important. Finally, inferences based on the “unique” tests relate to
student achievement of the topics they were intended to learn. Each combination of
speciﬁc domain and type of inference is meaningful. Validity needs to be evaluated in
light of the purposes behind each particular combination.

The amount of variation in the content of the test blueprints showed that they were
not equivalent. The blueprints differed on topic inclusion as well as emphasis. The 70%-
intersection-test blueprints included about half of the 44 framework topics for all test
blueprints except those based on the aggregate of the data sources. The strict-
intersection-test blueprints included only a few topics. No strict-intersection—test
blueprints could be developed for the aggregate-data source because no topic appeared in

all three data sources of all countries.

Variation in Correlations between the Test Blueprints and the F ield-Trial Instrument
Although I compared the content of the ﬁeld-trial test instrument to the content of

all three types of test blueprints (union, 70% intersection, and strict intersection), I did not

expect a good quantitative match with the strict-intersection tests. Overall, however,

approximately 60% of the topics on average appeared in both the ﬁeld-trial instrument

170

and the blueprints. An average of 61% of the ﬁeld-trial items were covered by topics in
the test blueprints, and an average of 91% of the “items” on the test blueprints were
covered by topics on the ﬁeld-trial instrument. Differences in topic emphasis on the
ﬁeld-trial instrument and topic emphasis on the test blueprints is about 4% across topics,
but 21% across curriculum sources. However, disregarding the strict intersection
blueprints would lower the difference in emphasis across curriculum sources to
approximately 3%.

On average, the new test blueprints place more emphasis on topics 1.3.2 Basic
Geometry, 1.3.4 3D Geometry, and 1.6.2 Equations and Formulas than is on the ﬁeld-
trial instrument. However, when looking at data sources, the expert-topic-mapping and
curriculum-guide blueprints place less emphasis on topic 1.6.2 than it receives on the
ﬁeld-trial instrument. The ﬁeld-trial instrument also places more emphasis on topics
1.1.2.1 Common Fractions and 1.7.1 Data Representation and Analysis than is in the new
blueprints. Correlations between topic-weight proﬁles on the ﬁeld-trial instrument and
topic-weight proﬁles on the test blueprints average around .45 and Euclidean distances
between topic-weight proﬁles for the test blueprints and the topic-weight proﬁle for the
ﬁeld-trial instrument average around .38 (larger than the distances seen earlier). Topic
coverage on the ﬁeld-trial instrument seems most similar to topic coverage on the union-
test blueprint based on the aggregate of the data sources and the 70%-intersection-test
blueprint based on the textbook-data source; topic coverage on the ﬁeld-trial instrument
seems least similar to topic coverage on the 70%-intersection test blueprints based on the
curriculum-guide-data source and the aggregate of the data sources (disregarding the

strict-intersection-test blueprints).

171

How Well Does the Content of the Specially Constructed Test Blueprints Match the
Content of the C urriculum-data sources?

The content of the unique specially-constructed-test blueprints written using only
topics included on the ﬁeld-trial instrument is more similar to the content of the
curriculum than was the content of the ﬁeld-trial instrument. An average of 80%-90% of
the topics were either included in the unique-test blueprints and in curriculum-data
sources or not included in both. This is a 10% to 15% increase in the correspondence in
topic inclusion between the ﬁeld-trial instrument and the curriculum-data sources across
topics and a 15%-30% increase across countries. Less than a 1% difference between
topic emphasis on the unique—test blueprints and in the curriculum-data sources exists
across topics (2% less difference than between the ﬁeld-trial instrument and the
curriculum), and across countries a 4% difference in topic emphasis between specially-
constructed tests and the curriculum sources results in 2%-3% less difference than
between the ﬁeld-trial instrument and the curriculum. Correlations between topic-weight
proﬁles on the test blueprints and topic-emphasis proﬁles in the curriculum sources
improve by .50 (to an average of .84) and distances between the topic-weight proﬁles of
the test blueprints and the topic-emphasis proﬁles of the curriculum sources shrink by .17
(to an average of .11). Unique-test blueprints based on the aggregate-data source were
most similar to the curriculum on topic inclusion and topic-pattem proﬁles but was most
dissimilar from the ﬁeld-trial instrument on overall topic emphasis. The unique-test
blueprints based on the textbook data have the highest of the correlations of topic

emphasis with the corresponding curriculum and the most similarity in topic emphasis,

172

and the unique-test blueprints based on the curriculum-guide data has the lowest of the
topic-emphasis correlations with the corresponding data source and the least similarity
with the data source in topic inclusion.

The content of the inclusive test blueprints was more similar to the content of
each corresponding curriculum source than was the content of the ﬁeld-trial instrument.
In some cases the improvements are small, but overall improvement in content similarity
looks good; however, not as good as was seen with the unique tests. The proportion of
tested items included in each country’s curricula increases by 5% with the unique-test
blueprints, to a new overall proportion of 82%. However, the tests based on 70%
intersections are only testing an overall average of 63% of the curricula - a decline of
21%. This coverage ranges, however, from 43% for the aggregate-data source to 80% for
the textbook-data source. No improvement over the ﬁeld-trial instrument is seen in the
similarity of topic inclusion between the and only a slight improvement is seen in the
similarity of topic emphasis averaged across topics. However, the similarity between
topic inclusion in the aggregate-data source and the corresponding test blueprints was
better than between the data source and the ﬁeld-trial instrument; topic emphasis on the
tests based on the expert-mapping and curriculum-guide-data sources was more similar to
these data sources than was topic emphasis on the ﬁeld-trial instrument. Differences
between the curriculum and the tests in topic emphasis decreased by 2.5% over countries
with most improvement for the expert mapping and textbook-data sources. A decline
was seen for the aggregate-data source.

The largest improvement for specially-constructed test blueprints over the ﬁeld-

trial instrument is seen in the correlations between the topic-weight proﬁles on the test

173

blueprints and the topic-emphasis proﬁles in the curricula. This means that the “proﬁles”
of the specially-constructed tests had a more similar shape (i.e., relative weighting of
topics) to the curriculum-topic-emphasis proﬁles than did the ﬁeld-trial-instrument topic-
weight proﬁle. The average correlation between topic emphasis on a specially-
constructed-test blueprint and topic-emphasis proﬁles in the corresponding data source
rises to .58, an increase of .20. However, this is still .30 less than the average correlation
between topic-weight proﬁles on the unique specially-constructed-test blueprints and
topic-emphasis proﬁles in the data sources.

Euclidean distances between topic-weight proﬁles on the test blueprints and topic-
emphasis proﬁles in the curricula improved slightly over those between the ﬁeld-trial
instrument and the curriculum-topic-emphasis proﬁles. As stated earlier, Euclidean
distances are best interpreted as relative measures. However, I provided several
benchmarks earlier. Two topic proﬁles with no difference between them would have a
distance of 0 and topic proﬁles of tests and cunicula emphasizing completely different
topics would have a distance of 1.4. The original distances between the topic proﬁles of
the ﬁeld-trial instrument and the topic proﬁles of the curricula ranged from .30 to .04.
The distances between the test blueprints and the curricula ranged from .10 to .20. If all
topics differed in emphasis by .01, the distance would be .07; if all topics differed in
emphasis by .025, the distance would be .165; if all topics differed by .05, the distance
would be .33. Additionally, if 1A of the topics differed by .1 and the others had no
difference, the distance would be .33. One percent of a 180 period mathematics school
year is around 2 class periods; 2 V2 percent of the school year is almost 5 class periods;

ﬁve percent if almost 10 class periods, and ten percent is 18 class periods.

174

The distances between test-blueprint t0pic-weight proﬁles and curriculum-topic-
emphasis proﬁles for the expert mapping and curriculum guides are very close to those
between the unique test-blueprint topic-weight proﬁles and the curriculum-t0pic-
emphasis proﬁles. The textbook- and aggregate-union-test blueprint Euclidean distances
are about .10 higher than those for the unique-test blueprints. The distance between the
aggregate 70%-intersection-test-blueprint topic-weight proﬁles and the aggregate-data-
source topic-emphasis proﬁles is larger than the distance between the ﬁeld-trial-
instrument topic-weight proﬁles and that data source’s topic-emphasis proﬁle.

The new test blueprints based on ﬁeld-trial topics did improve the similarity in
topic coverage between the tests and the curriculum for all but the tests based on the strict
intersections. However, data sets and countries did not all show the same degree of
similarity with the ﬁeld-trial instrument or the test blueprints. This was due in part to the
missing topics on the ﬁeld-trial instrument. The missing topics, though, can be treated as
topics for which “good” items do not exist or that may have been negotiated out of the
test. Thus, the test blueprints may represent the best overall match possible between the
test and country-curriculum sources. Even under the best circumstances, mis-match will

exist and needs to be considered in test interpretation.

How Does Country Performance Vary?
I evaluated variation in test performance across the specially-constructed tests to
obtain a sense of the impact that test-curriculum mis-matches in content coverage might
have on test interpretation. I used data on the proportions of students passing each item

to calculate potential country scores on the new “tests.”

175

Dijﬁrences in Total Scores and Ranks

When scores and ranks are averaged over all the specially-constructed tests, little
difference is seen from the original ﬁeld-trial scores and ranks. Average differences in
passing rates on the ﬁeld-trial instrument and passing rates on other tests are only about
3%, and all country ranks are nearly identical. Correlations of all specially-constructed-
test scores and ranks with the ﬁeld-trial scores and ranks are near .90 and above.
However, as much as a 16% difference in passing rate is seen across all specially-
constructed tests, with an average of a 10% difference. Differences in highest and lowest
ranks within a country are as high as 9 places with an average of 5. This means that, on
average, countries would rank in different quartiles (of this distribution of 17 countries)
based on their highest and lowest ranks, with some countries ranking in the top half of the
distribution for some tests and in the bottom half for others. Largest differences in scores
are between the ﬁeld-trial instrument and the strict-intersection tests and the smallest are
between the ﬁeld-trial instrument and the unique tests. The lowest correlations between
the ﬁeld-trial instrument and specially-constructed tests are also with the strict-
intersection tests. On average little difference could be found between performance on
the ﬁeld-trial instrument and performance on the specially-constructed tests. Although,
some countries (H, J, P) do display score ﬂuctuations, and others (E, G, Q) display rank

ﬂuctuations.

Diﬂerences in Topic Scores and Ranks
More variation is evident within countries, however, when looking at individual

topic scores. Within countries, standard deviations across topic scores are 10%, and

176

differences between minimum and maximum passing rates within a country are as high as
50%. Countries J and P have larger score ﬂuctuations than other countries. Additionally,
nine different countries rank ﬁrst on at least one topic, and six countries rank last on at
least one topic. This shows clear patterns of strength and weakness across countries.
Larger differences in performance are seen on topics 1.1.4.4 Number Theory and 1.1.2.4
Percentages than other topics. Differences in scores for these two topics are around 10%.
Both these topics are included by over half the countries in the three primary data sources.
However, 1.1.4.4 had only one item on the ﬁeld-trial instrument.

Topic ranks within countries varied across countries. Countries, on average,
performed their best on topic 1.2.1 Measurement Units and their worst on topic 1.1.4.4
Number Theory. Reviews of prior coverage of these two topics in each country’s
curricula many explain these results. Countries N and J have different patterns of

performance than other countries.

Differences in Performance-Expectation Results

Even more signiﬁcant are performance results when looking at performance
expectation codes and topic by performance-expectation intersections. Overall, students
perform better on basic-understanding and routine-computation items (which had the
highest proportions of textbook blocks) and more poorly on reasoning and
communication items. More of these items also happened to be extended response. Six
countries rank ﬁrst on at least one performance expectation and two rank last on at least
one. Average ranks across performance-expectation categories were within one to two

places of ﬁeld-trial ranks. However, when developing unique tests for countries that only

177

include items with performance-expectation codes included in their curriculum, slightly

more variation is seen.

Within-Country Variation

Within-country variation was noticeably larger in performance at the item and
topic level than at the total score level. Figure 2 shows box plots of country scores across
all specially-constructed tests, across all TIMSS reporting sub-scales, across all topics,
across all performance expectations, and across all items within two of the topics. Topic
1.6.2 Equations and Formulas is a topic emphasized in the curriculum across all
countries and data sources. Topic 1.7.2 Uncertainty and Probability is a topic that is not
highly emphasized.

The reduction in variation as scores get further away from individual items is
striking. Some of this variation may be explained by measurement error - especially
when looking at the item level. However, measurement error most likely does not
account for all the variation. Table 62 provides estimated reliabilities and standard errors
for the scores in each set of box plots. These are estimates treating each country as a case
and each total, scale, topic, performance-expectation, or item score as an item. Standard
errors for each individual country were estimated under the assumption that the
reliabilities were the same for each country as it was for the group. Reliabilities were all
.90 and above. Standard errors for the scales ranged from .31 to almost 2, producing
error bands of :6 to :4 points). Across countries though, standard errors could be close
to 9 points. These would need to be investigated further to determine the effect of

measurement error on score variation.

178

 

100

90

80

60

40

30

20

 

Country Scores on All

Specially-Constructed Tests

 

70'

 

$59

A B C D E

is

955%?

F G H I J K

Countries

L

'2'

M

i

N

é
é

 

 

90

80

70

60

50

Scores

40

30

20

10

 

 

Country Scores on TIMSS Sub-Scales

 

100 "

 

 

 

 

FGHIJKLMNOP

Countries

 

Q

 

Figure 2. Boxplots of scores on all specially—constructed tests, TIMSS sub-scales,
topics, performance expectations, items for topic 1.6.2, and items for topic 1.7.2.

 

 

 

 

179

 

Country Scores on Topics

 

 
 

 

    
   

100"
so
I
60 _-I
a .I- I
r II=:' I. .-
40' .-
20'
0|
ABCDEFGHIJKLMNOP
Countries

 

 

Country Scores on Performance Expectations

 

100 '-

Scores

 

 

 

 

ABCDEFGHIJKLMNO

 

Countries

 

P

 

Figure 2. (Cont ’d.)

 

 

 

 

180

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Country Scores on Items for
Topic 1.6.2 Equations and Formulas
100 '
so '
60 '
Q
8
(I)
40 '
20 '
0 I
ABCDEFGHIJKLMNOPQ
Country
Country Scores on Items for
Topic 1.7.2 Uncertainty & Probability
100 ' -—
-- " it
so '

 

 

 

 

 

 

 

Scom
-_
_
Ii
.-
-
-

. l I
6°' ' II. I I
4“ - I'I'|I
20 . H I J- . I
ll

ABCDEFGHIJKLMNOPQ

 

 

Country

 

 

 

Figure 2. (Cont ’d.)

181
Table 62

Estimated Reliabilities and Standard Errors

 

 

Estimated

Std.Err.

SD Range Range

Reliability across across
(a) SD Std.Err. Countries Countries
All Tests 0.99 7.1 0.31 1.7- 3.7 .07-.16
Scales 0.95 6.8 1.38 4.9-1 1.9 .99-2.4
Topics 0.96 9.2 1.84 9.3-16.0 1.9-2.4
PEs 0.90 6.4 1.99 12-18.0 3.7-5.6
1.6.2 0.96 9.2 1.73 17-25.0 3.2-4.7
1.7.2 0.90 6.3 1.95 l7—28.0 5.3-8.7

 

Note. Reliabilities and standard errors were calculated treating each

country as a case and each total, scale, topic, performance expectation, or

item score as an item. Because individual student results were not

available, reliabilities and standard errors could not be calculated for each

country. However, reliabilities and standard errors were estimated for

each country under the assumption that the test was as reliable for each

country as it was for the group of countries as a whole.

The variation in performance on items, topics, and performance expectations
highlights the complexity of curriculum-to-test matching, and, hence, evaluations of test
validity. The items on the ﬁeld-trial instrument each had their own “signature” which
identiﬁes the unique nature of each item. Few items had the exact same signature. This
signature was created by a particular combination of topic and performance-expectation
codes and was further enriched by item format. No one country have consistent
performance or consistent rankings across items. Evaluations of the signatures of the
items on which countries demonstrate strength and weakness provide more information

than knowing that students were “low in geometry.” The item signatures allow one to

evaluate the speciﬁc nature of a problem on which students are excelling or with which

182

they are having difﬁculty. Additionally, items represent the ways in which the curriculum
is delivered in the classroom. Teachers do not teach “math” or “geometry.” They teach
complex interrelated topics and performances. Thus, the “implemented” curriculum for
geometry and the expectations for performance will not be the same from class to class.
Aggregations of items and topics, however, begin to mask the multi-dimensional nature
of mathematics. What is left over is general-math achievement.

Schools do not directly teach global constructs, but instead try to develop

speciﬁc skills, introducing one skill at a time for the student to integrate

with previously acquired skills...Hence, there are more apt to be

differences between schools and programs at the speciﬁc objective level

than at the total score level. (Airasian & Madaus, 1983, p. 105-106)

While rankings of countries on general math achievement may provide
information of interest to some stakeholders, such rankings certainly do not provide very
descriptive information on student achievement. They also may complicate evaluations
of test-curriculum match. Certainly each country’s mathematics curriculum is more than
a collection of isolated topics. Therefore, variation in performance across countries
cannot be explained by variation in topic coverage alone. Variations in expectations for

performance and the complex blending of topics and performance expectations also must

be considered.

Summary
I set out in this study to develop test blueprints for cross-national assessments that
validly measure student achievement of topics in the mathematics curriculum for 13-year-
old students. However, the variation within and across nations in curriculum and lack of

an adequate item pool complicated this goal. Through my analyses, I found that

183

The intended and potentially implemented mathematics curriculum varies across
nations and also varies within nations. Some countries include few topics in their
curricula (as indicated by the data sources), and others include many. Some countries
focus on particular topics; others spread their focus across many topics. However,
some commonalties do exist, with a handful of topics either missing from most
countries’ cuniculum sources or being highly emphasized in most countries.
Variations within each country’s data sources point out the need for multiple
representations of math curricula.

The content of the ﬁeld-trial instrument is more similar to the content of the
curriculum of some countries than others and is more similar to the content of some
of the data sources than others. This “differential match” has implications for the
validity of inferences made from the test, but ﬁnal conclusions about test validity will
depend on the purpose for which the test will be used.

Test blueprints varied according to test purpose. Topic coverage and emphasis were
inconsistent across the blueprints due to the variability in the curriculum sources.
Some blueprints, though, were very similar to one another (e.g., all the union
blueprints), while others were very different (e.g., the strict intersections). Each
blueprint provides a different look at student achievement.

The content of the test blueprints for specially-constructed tests were more similar to
the curriculum sources than was the content of the ﬁeld-trial instrument, especially in
the relative emphasis of topics. Thus, an increase in validity and less bias would be
expected. However, variations in the similarity of the content of these tests to the

content of the curriculum of particular countries still existed - primarily due to the

184

missing topics. Impact of the mis-match needs to be balanced with other information
provided by the tests. For example, because the strict-intersection tests do not
represent the entire curriculum of any country, the weighting of the topics on the test
relative to other topics in a country’s curriculum is lost. However, the strict-
intersection tests do provide information on how students perform on topics included
in the curriculum of all countries. Furthermore, the unique tests provide an indication
on how students performed on their unique curricula, and these tests have a good ﬁt to
the curriculum of each country. However, comparisons of student performance when
all students do not take the same test are complicated.

Variation in county scores and ranks on specially-constructed tests was minimal;
however, some isolated differences did exist. Patterns suggest that tests covering a
comprehensive range of math topics are unlikely to produce striking variations in
performance, suggesting that, at the total-score level, the impact of test-curriculum
mis-match is likely to be minimal. However, variation in performance across topics
and performance expectations indicate that country ranks of total scores may be
reﬂecting a general-math achievement, rather than achievement of a particular
curriculum. Performance across countries did vary when unique tests were
developed based on topics and performance expectations. The concept of test-
curriculum match is more complex than merely matching on topic coverage. The
content of the curriculum is made up not only of topics but also of expectations for
performance on each topic. Both vary separately and together across countries. Thus,
all countries may include “algebra” topics in their curriculum, but may have widely

different intentions for student achievement. These differing expectations,

185

undoubtedly, would result in subtle differences in goals, textbooks, and instruction.

Such differences will need to be considered in evaluating test-content validity.

Limitations

Some limitations of the results should be pointed out. First, the study used pilot
data. As such, the student samples were not random within countries and items were
being tested for inclusion on the ﬁnal TIMSS assessment. However, population sample
sizes were within or close to the IEA guidelines in all countries. Furthermore, the item
pool on the ﬁeld-trial instrument was much larger than on the ﬁnal TIMSS assessment.
In fact, one of the main purposes of the ﬁeld trial was to try out the extended-response
items, and many of these items were dropped from the ﬁnal assessment due to testing-
time constraints.

Another limitation was the lack of student-level data for the ﬁeld trial. Only
country-level p-values were provided. Thus, it was impossible to construct scores for
individual students or evaluate with any certainty variation on items. For this reason,
signiﬁcance tests were not conducted for most analyses. It is difﬁcult to determine,
therefore, the statistical signiﬁcance of these results.

The lack of items for some topics was another limitation. Country-level
performance was not available on all topics, some of which factored highly in the
curriculum of some countries. Results may have changed had these data been available.
The lack of depth of coverage in all topic areas was also a problem. A full range of items
covering all topics crossed with all performance expectations would be ideal. However,

the item sample from the ﬁeld-trial instrument likely reﬂects the reality of test

186

development. The item pool of rich items covering all topics and performances is not yet
available (Garden & Orpwood, 1996).

Differences in the level of speciﬁcity of topic codes presents another issue to
consider. Some select topics on the framework are coded in detail (e.g., the fractions
topics). Thus, a richer picture of curriculum is available for these topics. Other topics are
reported at the more global level. For example, algebra is covered by only two codes.
However, algebra (1.6.2) is prevalent in the curriculum of the countries. A ﬁner
distinction of algebra topics may have pointed out more variation in curriculum coverage
and differences in the similarity of the test items to these curricula.

Finally, my analyses are based on the assumption that all items are good measures
of the behaviors they represent. As discussed earlier, content validity is just one aspect of
validity, and should never be taken as a ﬁnal indication of test validity. It is important

that items not only represent the content of the domain, but also do it well.

Recommendations and Conclusion

My ﬁrst recommendation is that a higher quality item pool be developed for cross-
national work. Several topics important to many countries were missing from the test,
and items measuring complex applications of topic knowledge and understanding were
not available for all topics. The items were not a comprehensive representation of the
performance expectation aspect of the framework. It is difficult to determine how
country performance might vary if more items measuring higher order skills were
included on the test. Many countries expect their students to demonstrate complex use of

subject matter. If researchers want to adequately measure such skills, better items will

187
need to be developed. Fortunately, within the US. research is being conducted on

content standards as well as performance standards (Linn & Baker, 1995). Cross-national
researchers should look to these studies to guide their research. Until better item pools
are developed, results of cross-national achievement testing should be interpreted with
caution.

Second, researchers developing cross-national achievement tests should clearly
state the purpose and the domain of the instrument. Without such information, it is not
possible to evaluate how well a test represents a domain. This is one of the ﬁrst rules of
thumb taught in any measurement course. Unfortunately, it is not often followed, and
consumers are left to guess at the domain, or researchers imply that the test represents
more than it actually does. Secondary analysts may also be guilty of applying test results
to too broad a domain. These situations can be avoided by clearly describing the item
domain.

Third, this study has shown variation in curriculum, in test-curriculum match, and
in performance on topics and performance expectations. These variations should be
reported with the results of cross-national achievement tests so stakeholders can better
interpret ﬁndings. The study has demonstrated the importance of the ﬁrst rule in test
development - identify the purpose of the testing. Simply starting with collections of
items and piecing them together to ﬁt a content map is not adequate. Test developers
need to clearly articulate what they are attempting to measure and what types of
inferences are appropriate and inappropriate.

Finally, my recommendation is that researchers take into account the complexity

of the curriculum and items when evaluating test-curriculum match. A clear match with

188

curriculum is unlikely to emerge by focusing only on topics. Two countries may
demonstrate the same level of coverage on a topic, but have different expectations for
performance. Likewise, two items may measure the same topic, but be very different in
the type of performance or application expected. Replications of the analyses in this
study may produce different results if performance expectations were included in the
analyses.

In the current period of educational reform, cross-national studies are receiving
renewed attention as educational systems across the world strive for “world class”
standards and ﬁght to maintain or gain competitive economic footing (Linn & Baker,
1995; Porter, 1990; Schmidt & Valverde, 1995). The results of such studies are useful for
both accountability and school improvement. However, researchers and policy-makers
cannot allow themselves to be lured into the international horse race and to be swayed by
public demands for simplistic results and explanations. The international educational
system is varied and complex, and analyses of this system should reﬂect this complexity.

My answer to people who want comparative standings is to give them

comparative standings - lots of them: in different topics, at different ages,

with different kinds of tasks, both unadjusted and adjusted for factors such

as national curricula and proportion of students in school. Recognizing

that no single index of achievement can tell the full story and that each has

its own limitations, we increase our understanding of how nations compare

by increasing the breadth of our vision. Even so, however, simply

ascertaining nations’ relative standing tells us little about how to set

educational policy or improve instructional practice.(Mislevy, 1995,
p.419)

APPENDICES

APPENDIX A

Appendix A

Mathematics Curriculum-Framework Categories’

Content

1.1 Numbers
1.1.1 Whole numbers
1.1.1.1 Meaning
1.1.1.2 Operations
1.1.1.3 Properties of operations

1.1.2 Fractions and decimals
1.1.2.1 Common fractions
1.1 .2.2 Decimal fractions
1.1.2.3 Relationships of common and decimal fractions
1.1.2.4 Percentages
1.1.2.5 Properties of common and decimal fractions

1.1.3 Integer, rational, and real numbers
1.1.3.1 Negative nrunbers, integers, and their properties
1.1.3.2 Rational numbers and their properties
1.1.3.3 Real numbers, their subsets, and their properties

1.1.4 Other numbers and number concepts
1.1.4.1 Binary arithmetic and/or other number bases
1.1.4.2 Exponents, roots, and radicals (integer, rational, and real
exponents)
1.1.4.3 Complex numbers and their properties
1.1.4.4 Number theory
1.1.4.5 Counting

1.1.5 Estimation and number sense
1.1.5.1 Estimating quantity and size
1.1.5.2 Rounding and signiﬁcant ﬁgures
1 .1 .5.3 Estimating computations
1.1.5.4 Exponents and orders of magnitude

 

‘ from Robitaille, D.F., McKnight, C., Schmidt, W.H., Britton, E., Raizen, S., & Nicol, C. (1993). Curriculum frameworks for
mathematics and science. Vancouver, Canada: Paciﬁc Educational Press.

189

 

190

1.2 Measurement
1.2.1 Units
1.2.2 Perimeter, area, and volume
1.2.3 Estimation and errors

1.3 Geometry: position, visualization, and shape
1.3.1 Two-dimensional geometry: coordinate geometry
1.3.2 Two-dimensional geometry: basics
1.3.3 Two-dimensional geometry: polygons and circles
1.3 .4 Three-dimensional geometry
1.3.5 Vectors

1.4 Geometry: symmetry, congruence, and similarity
1.4.1 Transformations
1.4.2 Congruence and similarity
1.4.3 Constructions using straight-edge and compass

1.5 Proportionality
1.5.1 Proportionality concepts
1.5.2 Proportionality problems
1.5.3 SlOpe and Trigonometry
1.5.4 Linear Interpolation and Extrapolation

1.6 Functions, relations, and equations
1.6.1 Patterns, relations, and functions
1.6.2 Equations and formulas

1.7 Data representation, probability, and statistics
1.7.1 Data representation and analysis
1.7.2 Uncertainty and probability

1.8 Elementary analysis
1.8.1 Inﬁnite processes
1.8.2 Change

1.9 Validation and structure
1.9.1 Validation and justiﬁcation
1.9.2 Structuring and abstracting

1.10 Other Content
1.10.1 Informatics

191

Performance Expectations

2.] Knowing
2.1.1 Representing
2.1.2 Recognizing equivalents
2.1.3 Recalling mathematical objects and properties

2.2 Using routine procedures
2.2.1 Using equipment
2.2.2 Performing routine procedures
2.2.3 Using more complex procedures

2.3 Investigating and problem solving
2.3.1 Formulating and clarifying problems and situations
2.3.2 Developing strategy
2.3.3 Solving
2.3.4 Predicting
2.3.5 Verifying

2.4 Mathematical reasoning
2.4.1 Developing notation and vocabulary
2.4.2 Developing algorithms
2.4.3 Generalizing
2.4.4 Conjecturing
2.4.5 Justifying and proving
2.4.6 Axiomatizing

2.5 Communicating
2.5.1 Using vocabulary and notation
2.5.2 Relating representations
2.5.3 Describing/discussing
2.5.4 Critiquing
Perspectives

3.1 Attitudes towards science, mathematics, and technology

3.2 Careers involving science, mathematics, and technology

3.2.1 Promoting careers in science, mathematics, and technology
3.2.2 Promoting the importance of science, mathematics, and technology in non-

technical careers

3.3 Participation in science and mathematics by underrepresented groups

3.4 Science, mathematics, and technology to increase interest
3.5 Scientiﬁc and mathematical habits of mind

APPENDIX B

Table Bl

Topic Coverage on the TIMSS Mathematics Item F ield-Trial Instrument for Population 2

Appendix B

TIMSS Field-Trial Instrument Content Coverage

 

 

Total

 

 

ER

 

 

SA

 

 

 

 

Book 8

T me sa

81'

 

 

Book 6
sa

81'

me

 

T

 

Book 5

Book 3

Cf

 

 

 

 

 

 

2
0
4
l
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

O
l
9
3
2
l
0
l
O
0
0
l
0
0
0
l
0
l
1
4
l

 

 

 

1.1.2.3
1.1.2.4
1.1.2.5
1.1.3.1
1.1.3.2
1.1.3.3
1.1.4.1
1.1.4.2
1.1.4.3
1.1.4.4
1.1.4.5
1.1.5.1
1.1.5.2
1.1.5.3
1.1.5.4

1 2.1
1.2.2

 

 

 

 

 

1.1.1.2
1.1.1.3
1.1.2.1

1.1.2.2

1.2.3

 

 

p—A
\O
N

Bolded items indicate where linked items occur.

mc=multiple choice; sa=short answer; er=extended response;

Shading highlights cells with no items.

linked items

unique items; I

u:

193

.2253 222 8 2222 222222 22222 .32: 2222:2228 .0228 222228 2222 2222222 2222: 2:222 .2222:
2E8. 838m .282 222 28 222 222222 2.2228 E22223 2:282 20 202222.222: 222 88222222 222228 .20 38 22222 2F
.2882 c: 22223 2228 283222222 922222228 .2268 22:82 3222222 22223 28222222 2.222.282 22022282

2222222 229222221 22:28.2 22222222222":
20222222222 Bv=o2xoﬂ20 2222322222 tonmnum 622222222 022222222qu

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F22 22N 2 222 N MN 222 82 22 N 2 -222 2.2 N N 2.2 _ 22 2. 2 22 22 2 22 222. 228+
22 22 22 2:22 2222 22 22 22 22...22 22 22 22 22 22, 22 22 22.22 22 22 22 222.2
22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22.2
22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22, 22 22 22.2
2 22 22 22 2 2 N 222 w 22 22 22 ,22 2 22 22 2 22 22 2 222 N 22 2 2 N.N.2
22 N 22 2. 22 N 22 N 22 2 22 N 2 22 22 2 2 22 2 2 22 N 2 2. 2.2.2
22 22 22 2. 2 N 22 N 22 2. 2 22 22 22 2 22 2 22 22 2 22 22 2 22 N22
2. N2 22 2 22 22 2. 22 2 2 22 2. 2 22 22 2 N 22 22 N. N 22 22 N 2.2.2
22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 . 22 22 22 22 22 22 22 22 22. 2.2.2
22 22 22 22 22 22 22. 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 2.2.2
2 N 22 N 22 2. 2 N2 2 22 22 2 22 N N 22 N 22 22 N 2. 22 N N N22
N 22 2 N 22 2 2 2 2 2 22 22 2 22 2 2 N 2 22 2 2 2 22 22 2.2.2
22 22 22 22 22 22 22 22 22 22 22 22 22 22 22‘ 22 22 22 22 22. .22. 22 22 22 2.2.2
2 2.2 2 N 22 22 2. N2 22 22 22 22 N 22 22 N 2 2 22 2 22 N 22 22 N22
2 222 22 N 22 22 2 N N 22 22 N N 2 22 N 2. 2 22 n N 2 22 2 2.2.2
22 .22. 22 22‘ 22 22 22 22 22 22 22 22,122 22 22 w 22 22 22 22 .22 22 22 22 22 2.2.2
22 2. 22 22 22 22 22 2. 2 22 22 2 2 22 22 2 2 22 22 2 2 22 22 2 2.2.2
2 22 22 22 22 22 2 22 2 22 22 2 2 22 22 2 N 22 22 N 2 22 22 2 2.2.2
2 N 22 2 22 2 2 2 2 22 22 2 2. 2 22 N. N 22 2 N 22 22 22 22 N.N.2
2 2 22 2 22 22 2 2 N 2 22 2 .222 22 22 :22. 2. 22 22 2. 2 22 22 2 2.2.2
2 22 2 22 2 22 2 22 .2. .20 222.2 2: _. .2. .20 am 08 .2. .20 222.. 08 .2. .222 am 222 022
.2228 mm <2 02 2 28m 2 2 228m 2 2822 N v2222222

 

 

 

 

 

 

 

 

 

2 22.32.22 222 222222

 

APPENDIX C

Appendix C

Curriculum Data for Each Country and Each Data Source
Table C 1

Expert T optc Mapping Proportions for 13 Year Old Students

 

 

 

Country

Topic

Code A B C D E F G H l J
1.1.1.1 0 0.024 0.027 0.020 0 0.027 0 0 0.020 0
1.1.1.2 0.025 0.024 0.027 0.020 0 0.054 0 0 0.020 0
1.1.1.3 0.050 0 0.027 0.020 0 0.054 0 0 0.020 0
1.121 0.025 0.024 0.027 0.039 0.030 0.054 0.105 0.024 0.041 0
1.1.2.2 0.025 0.024 0.027 0.039 0.030 0.027 0.105 0.024 0.041 0
1.1.2.3 0.025 O 0.027 0.039 0 0.027 0.105 0.024 0.041 0
1.1.2.4 0.050 0.048 0.027 0.020 0.030 0.054 0 0 0.041 0
1.1.2.5 0.025 0.024 0.027 0.020 0.030 0.027 0.105 0.024 0 0
1.1.3.1 0.025 0.024 0.027 0.039 0 0.027 0 0.049 0.041 0
1.1.3.2 0.025 0.024 0.054 0.020 0.030 0 0.105 0.049 0.020 0
1.1.3.3 0 0.048 0.054 0.020 0.030 0 0.053 0.024 0.020 0
1.1.4.] 0 0 0 0 0 0 O 0 0 0.032
1.1.4.2 0.025 0.048 0.027 0.039 0 0.027 0.053 0.049 0.041 0
1.1.4.3 0 O 0 0 0 O 0 0 0 0
1.1.4.4 0.025 O 0.027 0.020 0.030 0.027 0 0.024 0.041 0.032
11.4.5 0 0 0 0 0 0 0 0 0 0
1.1.5.1 0.025 0.024 0.027 0.020 0 0 0 0 O 0
1.1.5.2 0.050 0.048 0.027 0.020 0 0.027 0 0.049 0.041 0.065
1.1.5.3 0.025 0.024 0.027 0.020 0 0 0 0.049 0.041 0
1.1.5.4 0 0.024 0.054 0.039 0 0.027 0 0 0.041 0
1.2.1 0.025 0.024 0.054 0.020 0.030 0.054 0 0.049 0.041 0
12.2 0.025 0.048 0.027 0.020 0.030 0.027 0 0.024 0.041 0
1.2.3 0.025 0.024 0.027 0.020 0 0.027 0 0.024 0.020 0.065
1.3.1 0.025 0.024 0.000 0.039 0 0.000 0.053 0.024 0.020 0065
1.3.2 0025 0.024 0.027 0.039 0.030 0.027 0.053 0.024 0.041 0.065
1.3.3 0.025 0.048 0.027 0.039 0.030 0.054 0 0.049 0.020 0065
1.3.4 0.050 0.048 0.027 0.039 0.030 0.027 0.053 0.049 0 0.032
1.3.5 0 0 0 0 0 0 0 0 0 0
1.4.] 0.025 0.024 0.027 0.020 0.061 0.054 0 0.049 0.020 0.032
1.4.2 0025 0.024 0.027 0.020 0.061 0.027 0 0.024 0.041 0.065
1.4.3 0.025 0.024 0.027 0.020 0.061 0.027 0 0 0.020 0.032
1.5.1 0050 0.024 0.054 0.039 0.030 0.027 0 0.049 0.041 0
1.5.2 0.050 0.024 0.027 0.039 0.061 0.027 O 0.049 0.020 0.065
1.5.3 0025 0.048 O 0.020 0.061 O 0 0.049 0 0065
1.5.4 0 0 0 0.020 0.061 0 0 0 0 0.032
1.6.1 0.050 0.048 0.027 0.020 0.061 0.027 0.053 0.024 0.041 0065
1.6.2 0.025 0.048 0.027 0.039 0.061 0.054 0.053 0.049 0.041 0.065
1.7.] 0.025 0.024 0 0.039 0.061 0.054 0.053 0.049 0.020 0065
1.7.2 0.025 0 0 0.020 0.030 0.027 0.053 0 0 0
1.8.1 0 0 0 0 0 0 0 0 0 0
1.8.2 0 0 0 0 0 0 0 0 0 0
1.9.1 0 0 0.027 0 0.030 0 0 0 0.020 0.065
1.9.2 0.025 0 0.027 0.020 0 0 0 0 0.041 0
1.10.1 0.025 0.048 0 0.020 0 0 0 0.024 0 0.032

 

Averageb 0.030 0.032 0.031 0.032 0.042 0.036 0.071 0.037 0.032 0.053
Deviation 0.016 0.018 0.016 0.013 0.023 0.02 0.036 0.02 0.017 0.028
Count' 33/7 31/11 32/5 37/14 24/9 28/9 14/5 27/14 31/18 19/12

" Number ofnon-zero topics'nurnber of emphasized topics. ‘Average of non-zero numbers.

194

Table Cl((.'ont'd. )

195

 

 

 

 

Topic Max

Code K L M N O P Q X SD Median Prop. Count'
1.1.1.1 0.027 0 0 0.027 0.034 0 0 0.012 0.013 0 0.034 8/1
1.1.1.2 0 0.024 0 0.027 0 0.071 0 0.017 0.020 0.020 0.071 9/3
1.1.1.3 0 0.024 0 0.027 0 0.071 0 0.017 0.022 0 0.071 8/4
1.1.2.1 0027 0.024 0.028 0.027 O 0.071 0 0.032 0.025 0.027 0.105 14/6
1.1.2.2 0.027 0.024 0 0.027 0 0.036 0.020 0.028 0.023 0.027 0.105 14/4
1.1.2.3 0027 0.024 0 0.027 0 0.036 0.020 0.025 0.024 0.025 0.105 12/4
1.1.2.4 0054 0.024 0 0.027 0 0.036 0.020 0.025 0.019 0.027 0.054 12/6
1. 1.2.5 0 0.024 0 0.027 0 0.071 0.020 0.025 0.026 0.024 0.105 12/3
1.1.3.1 0.054 0.049 0.028 0.027 0.034 0.071 0.041 0.032 0.019 0.028 0.071 14/8
1.1.3.2 0.054 0.024 0.028 0.027 0.034 0 0.020 0.030 0.025 0.025 0.105 14/5
1.1.3.3 0 0.024 0 0.027 0.034 0 0.020 0.021 0.018 0.020 0.054 1 1/3
1.1.4.1 0 0 0 0.014 0 0 0 0.003 0.008 0 0.032 2/0
1.1.4.2 0054 0.024 0.056 0.027 0.069 0.036 0.041 0.036 0.018 0.039 0.069 15/9
1.1.43 0 0 0 0 0 0 0 0 0 0 0 0/0
1.1.4.4 0 0.049 0.056 0.027 0.034 0 0.020 0.024 0.016 0.027 0.056 13/4
1.1.4.5 0 0 0.028 0 0 0 0.041 0.004 0.011 0 0.041 2/1
1.1.51 0.027 0.024 0 0.027 0 0.036 0.020 0.014 0.013 0.020 0.036 9/1
1.1.5.2 0.027 0.049 0.028 0.027 0 0.036 0.020 0.030 0.018 0.027 0.065 14/7
1.1.5.3 0054 0.049 0.028 0.027 0 0.036 0.020 0.023 0.018 0.025 0.054 12/5
1.1.5.4 0.027 0.024 0.056 0.027 0.069 0.036 0.041 0.027 0.021 0.027 0.069 12/7
1.2.1 0.027 0.024 0.028 0.027 0.034 0.036 0.020 0.029 0.015 0.027 0054 15/5
1.2.2 0.054 0.024 0.028 0.027 0.034 0.036 0.041 0.029 0.014 0.027 0054 15/5
1.2.3 0 0.024 0 0.027 0 O 0.020 0.018 0.016 0.020 0.065 “/2
1.3.1 0.054 0.024 0.028 0.027 0.034 0.036 0.041 0.029 0.018 0.027 0.065 14/5
1.3.2 0.054 0.049 0.028 0.027 0.034 0.071 0.041 0.039 0.014 0.034 0.071 17/8
1.3.3 0027 0.024 0.056 0.027 0.034 0.036 0.020 0.034 0 016 0.030 0065 16/7
13.4 0.027 0.024 0.028 0.027 0.069 0 0.041 0.034 0.017 0.030 0.069 15/7
1.3.5 0 0 0.028 0 0 O 0 0.002 0.007 0 0.028 1/0
1.4.1 0054 0.024 0.028 0.027 0.069 0 0.041 0.033 0.019 0.027 0.069 15/7
1.4.2 0 0.024 0.056 0.027 0.069 O 0.041 0.031 0.021 0.027 0.069 14/7
1.4.3 0 0.049 0.056 0.014 0.034 0 0.020 0.024 0.018 0.024 0.061 13/3
1.51 0.027 0.024 0.028 0.027 0.034 0.036 0.020 0.030 0.015 0.028 0.054 15/6
1.5.2 0.054 0.024 0.056 0.027 0.069 0.071 0.041 0.041 0.019 0.041 0071 16/11
1.5.3 0.027 0 0.028 0.014 0 0 0.020 0.021 0.022 0.020 0.065 10/4
1.5.4 0.027 0 0 0.027 0 0 0.020 0.011 0.017 0 0.061 6/2
1.6.1 0.027 0.024 0.028 0.027 0 0 0.020 0.032 0.018 0.027 0.065 15/6
1.6.2 0.027 0.024 0.056 0.027 0.069 0 0.041 0.041 0.017 0.041 0.069 16/11
1.7.1 0027 0.024 0.056 0.027 0.069 0.036 0.041 0.039 0.018 0.039 0.069 19/9
1.7.2 0027 0.024 0 0.014 0 0 0.041 0.015 0.016 0.014 0.053 9/1
1.8.1 0 0 0 0 0 0 0 0 0 0 O 0/0
1.8.2 0 0 0 O 0 0 0 0 0 0 0 0/0
1.9.1 0 0.024 0 0.027 0 0 0.020 0.013 0.018 0 0.065 7/2
19.2 0 0.024 0.028 0.027 0 0 0.020 0.012 0.014 0 0.041 l2
1.10.1 0 0.024 0.028 0.027 0.034 0 0.020 0.017 0.015 0.020 0.048 10/2
Averageb 0.037 0.029 0.037 0.026 0.048 0.048 0.029 0.024 0.018 0.027 0.064 12/5
Deviation 0.021 0.014 0.021 0.009 0.026 0.026 0.015 0.012 0.006 0.013 0.025 5/3
Count. 27/10 35/6 27/9 39/35 21/8 21/7 35/14 41 41 33 41 41/39

 

" Number of non-zero toptcsr-‘number of emphasized topics. bAverage of non-zero numbers.

Table C2

Curriculum Guide Topic Coverage Data

196

 

 

 

 

Country

Topic

Code A B C D E F G H I J
1.1.1.1 0.034 0 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0
1.1.1.2 0034 0.042 0.026 0.029 0.045 0.028 0 0.091 0.032 0
1.1.1.3 0.034 0 0.026 0.029 0.045 0.028 0 0.091 0.032 0
1.1.2.1 0.034 0 0.026 0.029 0.045 0.028 0 0 0.032 0
1.1.2.2 0.034 0 0.026 0.029 0.045 0.028 0.056 0 0.032 0
1.1.2.3 0 0 0.026 0.029 0.045 0.028 0 0 0.032 0
1.1.2.4 O 0.042 0.026 0.029 0.045 0.028 0 0 0.032 0
1.1.2.5 0 0 0.026 0.029 0.000 0.028 0 0 0 0
1.1.3.1 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0 0.032 0.04
1.1.3.2 0 0.042 0.026 0.029 0.000 0.028 0.056 0 0.032 0.04
1.1.3.3 O 0.042 0.026 0.029 0.000 0.028 0 0.091 0.032 0.04
1.1.4.1 0.034 0 0.026 0 0 0 0 0 0 0.04
1.1.4.2 0 0.042 0.026 0.029 0.000 0.028 0 0 0.032 0
1.1.4.3 0 0 0.026 0 0 0 0 0 0 0
1.1.4.4 0.034 0 0.026 0.029 0.045 0.028 0.056 0 0.032 0.04
1.1.4.5 0034 0 0.026 0 0 0.028 0 0 0 0
1.1.5.1 0.034 0.042 0 0.029 0.000 0 0 0 0 0
1.1.5.2 0.034 0.042 0.026 0.029 0.000 0 0 0 0.032 0
1.1.5.3 0.034 0.042 0.026 0.029 0.000 0 0 0 0 0
1.1.5.4 0.034 0.042 0 0.029 0.000 0 0 0 0.032 0.04
1.2.1 0.034 0 0.026 0.029 0.000 0.028 0.056 0 0.032 0.04
1.2.2 0.034 0 0.026 0.029 0.000 0.028 0.056 0 0.032 0.04
1.2.3 0.034 0 0.026 0.029 0.000 0.028 0.056 0 0 0.04
1.3.1 0.034 0.042 0.026 0.029 0.045 0.028 0 0 0.032 0.04
1.3.2 0.034 0.042 0.026 0.029 0.000 0.028 0.056 0 0.032 0.04
1.3.3 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0.04
1.3.4 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0.04
1.3.5 0.000 0.042 0.026 0.029 0.000 0.028 0 0 0.032 0.04
1.4.1 0.034 0.042 0.026 0.029 0.045 0.028 0 0.091 0.032 0.04
1.4.2 0 0.042 0.026 0.029 0.045 0.028 0.056 0 0 0.04
1.4.3 0 0.042 0.026 0.029 0.000 0.028 0 O 0.032 0.04
1.5.1 0034 0.042 0.026 0.029 0.045 0.028 0.056 0 0.032 0.04
1.5.2 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0 0.032 0.04
1.5.3 0 0 0.026 0 0 0 O 0.091 0.032 0
1.5.4 0 0 0.026 0 0 0.028 0 0 0 0.04
1.6.] 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0.04
1.62 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0.04
1.7.1 0.034 0.042 0.026 0.029 0.045 0.028 0 0.091 0.032 0.04
1.7.2 0.034 0 0.026 0.029 0.045 0.028 0.056 0 0 0.04
1.8.1 0 0 0 0 0.045 0 0 0 0 0
1.8.2 0.034 0 0 0 0 0.028 0 0 0 0
1.9.1 0 0 0.026 0 0 0.028 0 0 0 0
1.9.2 0 0 0.026 0 0 0.028 0.056 0 0.032 0
1.10.1 0.034 0.042 0 0.029 0.045 0.028 0 0 0.032 0.04
Average‘ 0.034 0.042 0.026 0.029 0.045 0.028 0.056 0.091 0.032 0.040
Standard

Deviation 0.016 0.021 0.008 0.012 0.023 0.011 0.027 0.039 0.015 0.020
Count 29 24 39 35 22 36 18 l 1 31 25

 

'Average of non-zero numbers.

197

Table C2 (Cont’d. )

 

 

 

Country

Topic Max.

Code K L M N O P Q X SD Median Prop. Count
1.1.1.1 0 0.033 0 0 0 0 0.023 0.023 0.025 0.026 0.091 10
1.1.1.2 0.05 0.033 0 0 0 0.034 0.023 0.027 0.023 0.029 0.091 12
1.1.1.3 0 0 0 O 0 0.034 0.023 0.020 0.024 0.023 0.091 9
1.1.2.1 0 0.033 0 0 0 0.034 0.023 0.017 0.016 0.023 0.045 9
1.1.2.2 0 0.033 0 0 0 0.034 0.023 0.020 0.018 0.026 0.056 10
1.1.2.3 0.05 0.033 0 0.059 0 0.034 0.023 0.021 0.020 0.026 0.059 10

1.1.2.4 0 0.033 0.032 0 0.048 0.034 0.023 0.022 0.017 0.028 0.048 11

1 . 1.2.5 0 0 0 0 0 0.034 0.023 0.008 0.013 0.000 0.034 5
1.1.3.1 005 0.033 0.032 0.059 0 0.034 0.023 0.033 0.016 0.033 0.059 15
1.1.3.2 0.05 0.033 0.032 0.059 0 0.034 0.023 0.028 0.018 0.032 0.059 13
1.1.3.3 0 0.033 0.032 0.059 0 0.034 0.023 0.028 0.023 0.029 0.091 12
1.1.4.1 0 0 0.032 0 0 0 0.023 0.009 0.015 0 0.040 5
1.1.4.2 0 0.033 0.032 0.059 0.048 0.034 0.023 0.023 0.019 0.028 0.059 11
1.1.4.3 0 0 0 0 0 0 0.023 0 0 0 0 2
1. 1.4.4 0 0.033 0.032 0.059 0 0 0.023 0.026 0.019 0.029 0.059 12
1.1.4.5 0 0 0 0 0 0 0.023 0.007 0.012 0.000 0.034 4
1.1.5.1 0 0 0.032 0 0 0.034 0.023 0.011 0.016 0.000 0.042 6
1.1.5.2 0 0.033 0.032 0 0.048 0.034 0.023 0.020 0.017 0.026 0.048 10
1.1.5.3 0 0.033 0.032 0 0 0.034 0.023 0.015 0.016 0.000 0.042 8
1.1.5.4 0 0 0.032 0 0 0.034 0.023 0.016 0.017 0.000 0.042 8
1.2.1 005 0.033 0.032 0 0.048 0.034 0.023 0.027 0.017 0.032 0.056 13
1.2.2 0.05 0.033 0.032 0.059 0.048 0.034 0.023 0.031 0.017 0.032 0.059 14
1.2.3 0.05 0.033 0.032 0 0.048 0 0.023 0.023 0.019 0.028 0.056 11
1.3.1 0 0.033 0.032 0 0.048 0 0.023 0.024 0.017 0.029 0.048 12
1.3.2 0 0.033 0.032 0.059 0.048 0.034 0.023 0.030 0.017 0.032 0.059 14
1.3.3 0 0.033 0.032 0.059 0.048 0.034 0.023 0.038 0.019 0.034 0.091 16
1.3.4 0.05 0.033 0.032 0.059 0.048 0.034 0.023 0.041 0.016 0.034 0.091 17
1.3.5 0 0 0.032 0 0.048 0 0.023 0.018 0.017 0 0.048 9
1.4.1 0.05 0.033 0.032 0 0.048 0.034 0.023 0.035 0.020 0.033 0.091 15
1.4.2 0.05 0.033 0.032 0.059 0.048 0.034 0.023 0.032 0.018 0.033 0.059 14
1.4.3 0.05 0.033 0.032 0 0.048 0.034 0.023 0.024 0.017 0.029 0.050 12
1.5.1 0.05 0.033 0.032 0 0.048 0 0.023 0.030 0.017 0.032 0.056 14
1.5.2 0.05 0.033 0.032 0 0.048 0.034 0.023 0.032 0.015 0.033 0.056 15
1.5.3 0.05 0 0.032 0.059 0.048 0 0.023 0.021 0.027 0.000 0.091 8
15.4 0.05 0 0 0 0.048 0 0.023 0.013 0.018 0 0.050 6
1.6.1 0.05 0.033 0.032 0.059 0 0.034 0.023 0.038 0.019 0.034 0.091 16
1.6.2 0.05 0.033 0.032 0.059 0.048 0.034 0.023 0.041 0.016 0.034 0.091 17
1.7.1 0.05 0.033 0.032 0 0.048 0.034 0.023 0.035 0.020 0.033 0.091 15
1.7.2 0.05 0.033 0.032 0 0 0.034 0.023 0.025 0.018 0.029 0.056 12
1.8.1 0 0 0 0 0 0 0.023 0 0 0 0 2
1.8.2 0 0 0 0 0 0 0.023 0 0 0 0 3
1.9.1 0 0 0 0.059 0 0 0.023 0.008 0.016 0 0.059 4
1.9.2 0 0 0.032 0.059 0 0 0.023 0.015 0.020 0 0.059 7
1.10.1 0.05 0.033 0.032 0.059 0.048 0.034 0.023 0.031 0.017 0.033 0.059 14

 

Average' 0.050 0.033 0.032 0.059 0.048 0.034 0.023 0.023 0.017 0.030 0.060 11
Standard
Deviation 0.025 0.016 0.015 0.029 0.024 0.016 0.000 0.010 0.003 0.014 0.019 4

Count 20 30 3 l 17 21 29 44 44 44 3 l 44 44

3Average of non-zero numbers.

Table C3

198

Proportion of Blocks Devoted to Topics in Each C ountry's Textbook(s)

 

 

 

 

Country

Topic

Code A B C D E F G H I J
1.1.1.1 0.003 0 0.040 0.018 0 0 0.004 0 0.106 0
1.1.1.2 0.043 0.005 0.072 0.035 0 0.010 0.106 0.008 0.093 0
1.1.1.3 0.031 0.001 0.069 0.016 0.009 0.057 0.049 0.004 0.061 0
1.1.2.1 0.038 0.016 0.070 0.076 0.002 0.057 0.126 0.007 0.074 0
1.1.2.2 0.019 0.001 0.061 0.064 0 0.005 0.014 0.004 0.062 0
1.1.2.3 0.006 0.004 0.030 0.021 O 0.029 0.010 0.019 0.010 0
1.1.2.4 0.035 0.052 0.037 0.072 0 0.129 0.002 0 0.046 0
1.1.2.5 0.010 0.001 0.042 0.001 0.014 0.005 0.002 0 0.001 0
1.1.3.1 0.050 0.001 0.010 0.081 0.076 0.110 0.098 0.083 0.035 0
1.1.3.2 0.001 0 0.031 0.008 0.012 0.010 0.306 0.013 0.010 0
1.1.3.3 0.001 0.001 0.022 0.000 0.024 0 0.003 0.011 0.020 0
1.1.4.1 0 0 0.001 0 0.012 0 0 0.002 0 0
1.1.4.2 0038 0.024 0.041 0.034 0 0 0.100 0.002 0.024 0
1.1.4.3 O 0 0.001 0 0 0 0 0.002 0 0
1.1.4.4 0.029 0 0.015 0.026 0 0 0.025 0.002 0 0.010
1.1.4.5 0.001 0 0.001 0.001 0 0.005 0.000 0.002 O 0
1.1.5.] 0 0.001 0 0.011 0 0 0.002 O 0.004 0.007
1.1.5.2 0 0 0.001 0.018 0 0 0 0.008 0.028 0
1.1.5.3 0 0.001 0.010 0.032 0 0 0.005 0.017 0.006 0
1.1.5.4 0.002 0 0 0.004 0 0 0.012 0.062 0.023 0
1.2.1 0.060 0.002 0.077 0.031 0 0.067 0.012 0.035 0.062 0
1.2.2 0.145 0 0.041 0.083 0 0.024 0 0.141 0.075 0
1.2.3 0001 0 0.009 0 0 0 0 0 0 0
1.3.1 0.092 0.081 0.038 0.022 0 0 0.036 0.032 0.043 0
13.2 0.076 0.012 0.130 0.043 0.142 0.024 0.004 0.004 0.094 0.104
1.3.3 0.067 0.093 0.086 0.065 0.099 0.043 0 0.084 0.111 0.110
1.3.4 0.001 0.282 0.064 0.025 O 0 0 0.136 0.008 0
1.3.5 0 0 0.020 0.002 0 0 0 0 0.001 0
1.4.1 0 0.021 0.163 0.060 0.243 0.067 0.007 0.098 0.052 0.007
1.4.2 0 0.008 0.046 0.012 0.090 0 O 0 0.007 0.231
1.4.3 0010 0 0.028 0.031 O O 0.015 0.003 0.035 0
1.5.1 0.011 0.008 0.020 0.025 0 0 0 0.028 0.019 0
1.5.2 0.024 0.003 0.032 0.020 0 0 0 0.040 0 0
1.5.3 0 0 0.009 0 0 0 0 0.058 0.046 0
1.5.4 0 0 0.006 0 0 0.014 0 0 0 0
1.6.1 0.031 0.012 0.014 0.049 0.076 0.053 0.084 0.075 0.038 0.061
1.6.2 0.134 0.120 0.110 0.064 0.194 0.086 0.374 0.091 0.123 0.388
1.7.] 0.070 0.030 0.012 0.059 0.054 0.033 0 0.087 0.026 0.07]
1.7.2 0 0 0.001 0.015 0 0.005 0 0 0 0
1.8.1 0 0 0 0.001 0 0 0 0 0.002 0
1.8.2 0 0 0 0 0 0 0 O 0 0
1.9.1 0 0.008 0.003 0.002 0.007 0 0.001 0 0 0.003
1.9.2 0.022 0 0.016 0 0.024 0 0.014 0 0.117 0
1.10.1 0.001 0.223 0.000 0.087 0 0.148 0 0 0 0.006
Average 0.024 0.023 0.033 0.028 0.025 0.022 0.032 0.026 0.033 0.023
Standard

Deviation 0.035 0.057 0.037 0.027 0.053 0.037 0.075 0.039 0.037 0.070
Max 0.145 0.282 0.163 0.087 0.243 0.148 0.374 0.141 0.123 0.388
Count 30 26 39 36 16 21 25 30 33 ll

 

199

Table C3 (Cont'd. )

 

 

 

Country
Topic Max.
Code K L M N O P Q X SD Median Prop. Count
1.1.1.1 0 0.014 0.004 0.009 0.003 0.042 0.011 0.015 0.026 0.004 0.106 11
1.1.1.2 0.010 0.073 0.002 0.009 0.005 0.184 0.022 0.040 0.049 0.010 0.184 15
1.1.1.3 0 0.002 0.004 0.013 0.031 0.001 0.007 0.021 0.023 0.009 0.069 15

1.1.2.1 0.007 0.067 0.030 0.009 0.013 0.065 0.036 0.041 0.034 0.036 0.126 16
1.1.2.2 0.007 0.065 0.021 0.003 0.013 0.040 0.022 0.024 0.024 0.014 0.065 15
1.1.2.3 0.005 0.031 0.019 0.006 0.002 0.024 0.008 0.013 0.010 0.010 0.031 15
1.1.2.4 0.013 0.047 0.015 0.003 0.017 0.073 0.059 0.035 0.034 0.035 0.129 14
1.1.2.5 0 0.002 0.015 0 0.006 0.010 0.001 0.042 11
1.1.3.1 0.040 0.047 0.002 0.014 0.013 0.043 0.041 0.036 0.040 0.110 15
1.1.3.2 0.011 0 0.002 0.046 0 0.025 0.028 0.071 0.010 0.306 12
1.1.3.3 0.040 0 0.002 0.278 0 0.032 0.026 0.064 0.002 0.278 1 1

o
.C
o
o
w
0

1.1.4.1 0 0 0 0 0 0.001 0.001 0.003 0 0.012 4
1.1.4.2 0.062 0.018 0.117 0.101 0.055 0.041 0.038 0.034 0.117 14
1.1.4.3 0 0 0 0 0 0.001 0 0 0 O 3
1.1.4.4 0.003 0.007 0.064 0.072 O 0.026 0.016 0.022 0.007 0.072 1 1
1.1.4.5 0 0.006 0 0 0 0.025 0.002 0.006 0.000 0.025 7

ooooooooo
.0
o
o
#

1.1.5.1 0.003 0 0.008 0 0.004 0.001 0.002 0.003 0.001 0.011 9

1.1.5.2 0.011 0.014 0.008 0 0 010 0.018 0.007 0.007 0.008 0.007 0.028 10
1.1.5.3 0.001 0.023 0.008 0.006 0 0.007 0.017 0.008 0.009 0.006 0.032 12
1.1.5.4 0 0.007 0.008 0 0 0 0 0.007 0.015 0.000 0.062 7

1.2.1 0.010 0.083 0.013 0.014 0.005 0.167 0.038 0.040 0.042 0.031 0.167 15
1.2.2 0.148 0.023 0.127 0.092 0.164 0.081 0.059 0.071 0.057 0.075 0.164 13
1.2.3 0 0.001 0.006 0.006 0.001 0 0.008 0.002 0.003 0.000 0.009 7
1.3.1 0.010 0.010 0.015 0.039 0.112 0.002 0.038 0.034 0.032 0.032 0.112 14
1.3.2 0.057 0.063 0.025 0.073 0.022 0.033 0.023 0.055 0.042 0.043 0.142 17
1.3.3 0.199 0.118 0.202 0.174 0.125 0.037 0.049 0.098 0.054 0.093 0.202 16
1.3.4 0.037 0.051 0.047 0.469 0.014 0.004 0.019 0.068 0.121 0.019 0.469 13

1.3.5 0 0 0.053 0.010 0 0.001 0.001 0.005 0.013 0 0.053 7
1.4.1 0 0 0.062 0.069 0.079 0 0.021 0.056 0.064 0.052 0.243 13
1.4.2 0.025 0 0.034 0.101 0.118 0 0.015 0.040 0.060 0.012 0.231 11
1.4.3 0 0.009 0 0.002 0 0 0.008 0.008 0.012 0.002 0.035 9

1.5.1 0.011 0.001 0.004 0 0 0 0.016 0.008 0.010 0.004 0.028 10
1.5.2 0.040 0.017 0.011 0.006 0.095 0.024 0.030 0.020 0.023 0.017 0.095 12
1.5.3 0 0 0 0.032 0.083 0 0.016 0.014 0.025 0.000 0.083 6
1.5.4 0 0.001 0 0.006 0 0 0 0.002 0.004 0 0.014 4
1.6.1 0.055 0.037 0.174 0.208 0.001 0.003 0.046 0.060 0.054 0.049 0.208 17
1.6.2 0.356 0.174 0.323 0.371 0.296 0.043 0.236 0.205 0.118 0.174 0.388 17

1.7.1 0.058 0.057 0.066 0 0 0.099 0.094 0.048 0.032 0.057 0.099 14
1.7.2 0 0.001 0 0 0 0.001 0.034 0.003 0.008 0.000 0.034 6
1.8.1 0 0 0 0 O 0 0.003 0 0 0 0 4
1.8.2 0 0 0 0 0 0 0 0 0 O 0 0
1.9.1 0.011 0 0.013 0.309 0 0 0.016 0.022 0.072 0 0.309 10
1.9.2 0.015 0 0.051 0.098 0 0 0.007 0.021 0.034 0 0.117 9
1.10.1 0.001 0.020 0.006 0.029 0 0.006 0.085 0.036 0.062 0.006 0.223 11

 

Average 0.028 0.025 0.035 0.061 0.029 0.022 0.029 0.029 0.032 0.020 0.119 10.95
Standard
Deviation 0.063 0.036 0.063 0.107 0.058 0.041 0.039 0.035 0.029 0.032 0.109 4

Max 0.356 0.174 0.323 0.469 0.296 0.184 0.236 0.205 0.121 0.174 0.469 17

Count 28 32 35 32 22 26 4O 43 43 33 43 43

200

Table C4

Number of Data Sources in which Topics Appear within a

Country

 

Country

 

Topic

E

ABCD

Code

 

2
3
3
3
3
2
2
2
3
2
1
1
2
0
3
2
2
2
2

1.1.1.1
1.1.1.2
1.1.1.3
1.1.2.1
1.1.2.2
1.1.2.3
1.1.2.4
1.1.2.5
1.1.3.1
1.1.3.2
1.1.3.3
1.1.4.1
1.1.4.2
1.1.4.3

1.1.4.4
1.1.4.5
1.1.5.1
1.1.5.2
1.1.5.3
1.1.5.4

1.2.1

2

1.2.2

1.2.3

1.3.1

1.3.2

1.3.3

1.3.4

1.3.5

1.4.1

1.4.2

1.4.3

1.5.1

1.5.2

1.5.3

1.5.4

1.6.1

1.6.2

1.7.1

1.7.2
1.8.1

1.8.2
1.9.1

1.9.2

3
2.1

1.10.1

Ave

 

1.3

1.5 2.2

1.9 1.3

1.4

1.8 2.5 2.5

Med

17 5 8 25
10

7
10

19 19 30 32
0.55 0.64 0.75 0.80 0.39 0.52 0.34 0.39 0.70 0.55

#35
#05

15

 

semit-

20 1
Table C4 (C ont'd. )

 

 

 

 

Country

Topic

Code K L M N O P Q AVE MED # 3s # Os Am
1.1.1.1 1 2 l 2 2 1 2 2 2 3 1 0.24
1.1.1.2 2 3 1 2 l 3 2 2 2 8 1 0.53
1.1.1.3 0 2 l 2 l 3 2 2 2 6 2 0.47
1.1.2.1 2 3 2 2 l 3 2 2 2 8 l 0.53
1.1.2.2 2 3 l 2 l 3 3 2 3 9 l 0.59
1.1.2.3 3 3 l 3 I 3 3 2 3 9 1 0.59
1.1.2.4 2 3 2 2 2 3 3 2 2 8 2 0.59
1.1.2.5 O 2 l l 0 3 2 2 2 4 3 0.41
1.1.3.1 3 3 3 3 l 3 3 3 3 12 0 0.71
1.1.3.2 3 2 3 3 l 1 3 2 2 8 0 0.47
1.1.3.3 1 2 2 3 1 l 3 2 2 6 0 0.35
1.1.4.1 0 0 l l 0 0 2 l 0 0 9 0.53
1.1.4.2 2 3 3 3 3 3 3 2 3 10 2 0.71
1.1.4.3 0 0 0 0 0 0 2 0 0 0 14 0.82
1.1.4.4 1 3 3 3 l 0 3 2 2 8 2 0.59
1.1.4.5 0 l l 0 0 0 3 l 0 l 9 0.59
1.1.5.1 2 l 2 l 0 3 3 1 1 4 4 0.47
1.1.5.2 2 3 3 1 2 3 3 2 2 7 2 0.53
1.1.5.3 2 3 3 2 0 3 3 2 2 7 4 0.65
1.1.5.4 l 2 3 l l 2 2 2 1 3 1 0.24
1.2.1 3 3 3 2 3 3 3 3 3 11 0 0.65
1.2.2 3 3 3 3 3 3 3 2 3 12 0 0.71
1.2.3 1 3 2 2 2 0 3 2 2 4 2 0.35
1.3.1 2 3 3 2 3 2 3 2 2 8 0 0.47
1.3.2 2 3 3 3 3 3 3 3 3 14 0 0.82
1.3.3 2 3 3 3 3 3 3 3 3 15 0 0.88
1.3.4 3 3 3 3 3 2 3 3 3 ll 0 0.65
1.3.5 0 0 3 l l l 2 l 1 1 6 0.41
1.4.1 2 2 3 2 3 1 3 3 3 11 0 0.65
1.4.2 2 2 3 3 3 1 3 2 3 9 0 0.53
1.4.3 1 3 2 2 2 1 3 2 2 5 0 0.29
1.5.1 3 3 3 l 2 l 3 2 3 9 0 0.53
1.5.2 3 3 3 2 3 3 3 3 3 10 0 0.59
1.5.3 2 0 2 3 2 0 3 l l 3 4 0.41
1.5.4 2 1 0 2 1 0 2 l 1 0 7 0.41
1.6.1 3 3 3 3 l 2 3 3 3 15 0 0.88
1.6.2 3 3 3 3 3 2 3 3 3 l6 0 0.94
1.7.1 3 3 3 1 2 3 3 3 3 13 0 0.76
1.7.2 2 3 1 l 0 2 3 2 2 4 4 0.47
1.8.1 0 1 0 0 0 0 2 0 0 0 12 0.71
1.8.2 0 0 0 0 0 0 l 0 0 0 14 0.82
1.9.1 1 1 1 3 0 0 3 1 1 3 4 0.41
1.9.2 1 1 3 3 0 0 3 1 l 5 5 0.59
1.10.1 2 3 3 3 2 2 3 2 2 8 2 0.59
Ave 1.7 2.2 2.1 2 1.5 1.7 2.7 1.905 1.98 7 2.7 0.57
Med 2 3 3 2 1 2 3 2.088 2 8 1 0.59
# 3s 11 25 23 17 ll 18 32 0 l6 4 1 0

#05 8 5 4 4 11 ll 0 0 5 5 l7 0
agnnt. 0.43 0.68 0.61 0.48 0.50 0.66 0.73 0.00 0.48 0.20 0.41 0.00

 

Table C5

202

Average Emphasis Devoted to Topics across Expert Topic Mapping,
Curriculum Guides, and Textbooks

 

 

 

 

Country

Topic

Code A B C D E F G H 1 J
1.1.1.1 0.000 0.000 0.030 0.022 0.000 0.000 0.000 0.000 0.053 0.000
1.1.1.2 0.047 0.029 0.040 0.028 0.000 0.046 0.000 0.000 0.048 0.000
1.1.1.3 0.053 0.000 0.040 0.021 0.000 0.070 0.000 0.000 0.038 0.000
1.1.2.1 0.045 0.000 0.040 0.048 0.054 0.070 0.000 0.000 0.049 0.000
1.1.2.2 0.036 0.000 0.037 0.044 0.000 0.030 0.122 0.000 0.045 0.000
1.1.2.3 0.000 0.000 0.027 0.029 0.000 0.042 0.000 0.000 0.028 0.000
1.1.2.4 0.000 0.059 0.029 0.040 0.000 0.107 0.000 0.000 0.040 0.000
1.1.2.5 0.000 0.000 0.031 0.016 0.000 0.030 0.000 0.000 0.000 0.000
1.1.3.1 0050 0.028 0.020 0.049 0.000 0.083 0.000 0.000 0.036 0.000
1.1.3.2 0.000 0.000 0.036 0.019 0.000 0.000 0.327 0.000 0.021 0.000
1.1.3.3 0.000 0.037 0.033 0.000 0.000 0.000 0.000 0.074 0.024 0.000
1.1.4.1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.1.4.2 0.000 0.047 0.030 0.034 0.000 0.000 0.000 0.000 0.032 0.000
1.1.4.3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.1.4.4 0.041 0.000 0.022 0.025 0.000 0.000 0.000 0.000 0.000 0.045
1.1.4.5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.1.5.1 0.000 0.028 0.000 0.020 0.000 0.000 0.000 0.000 0.000 0.000
1.1.5.2 0.000 0.000 0.017 0.022 0.000 0.000 0.000 0.000 0.034 0.000
1.1.5.3 0.000 0.028 0.020 0.027 0.000 0.000 0.000 0.000 0.000 0.000
1.1.5.4 0.000 0.000 0.000 0.024 0.000 0.000 0.000 0.000 0.032 0.000
1.2.1 0.055 0.000 0.051 0.026 0.000 0.075 0.000 0.000 0.045 0.000
1.2.2 0.094 0.000 0.030 0.043 0.000 0.040 0.000 0.000 0.049 0.000
1.2.3 0.028 0.000 0.020 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.3.1 0.070 0.061 0.000 0.030 0.000 0.000 0.000 0.000 0.032 0.000
1.3.2 0.062 0.032 0.059 0.037 0.000 0.040 0.079 0.000 0.056 0.114
1.3.3 0.058 0.076 0.045 0.044 0.121 0.063 0.000 0.131 0.054 0.117
1.3.4 0.039 0.154 0.038 0.031 0.000 0.000 0.000 0.161 0.000 0.000
1.3.5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.4.1 0.000 0.036 0.070 0.036 0.242 0.075 0.000 0.139 0.035 0.043
1.4.2 0.000 0.030 0.032 0.020 0.136 0.000 0.000 0.000 0.000 0.183
1.4.3 0.000 0.000 0.026 0.026 0.000 0.000 0.000 0.000 0.029 0.000
1.5.1 0.044 0.030 0.032 0.031 0.000 0.000 0.000 0.000 0.031 0.000
1.5.2 0.050 0.028 0.028 0.029 0.000 0.000 0.000 0.000 0.000 0.000
1.5.3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.116 0.000 0.000
1.5.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.6.1 0053 0.042 0.022 0.032 0.126 0.054 0.135 0.111 0.037 0.090
1.6.2 0.089 0.087 0.053 0.044 0.208 0.085 0.338 0.135 0.065 0.269
1.7.1 0.059 0.040 0.000 0.042 0.111 0.058 0.000 0.133 0.026 0.096
1.7.2 0.000 0.000 0.000 0.021 0.000 0.030 0.000 0.000 0.000 0.000
1.8.1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.8.2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.9.1 0.000 0.000 0.018 0.000 0.000 0.000 0.000 0.000 0.000 0.000
1.9.2 0.000 0.000 0.022 0.000 0.000 0.000 0.000 0.000 0.063 0.000
1.10.1 0.028 0.129 0.000 0.045 0.000 0.000 0.000 0.000 0.000 0.043
Standard

Deviation 0.028

Count

19 19

30 32

0.035 0.019 0.016 0.057 0.032 0.073 0.049 0.022 0.055

7 17 5 8 25 9

 

203

Table C5 (C ont'd. )

 

Country
Topic Max.
Code K L M N O P Q X SD Vledian Prop. Count

 

 

1.1.1.1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.006 0.014 0.000 0.053
1.1.1.2 0.000 0.049 0.000 0.000 0.000 0.121 0.000 0.024 0.032 0.000 0.121
1.1.1.3 0.000 0.000 0.000 0.000 0.000 0.045 0.000 0.016 0.023 0.000 0.070
1.1.2.1 0.000 0.047 0.000 0.000 0.000 0.072 0.000 0.025 0.027 0.000 0.072
1.1.2.2 0.000 0.046 0.000 0.000 0.000 0.046 0.023 0.025 0.031 0.023 0.122
1.1.2.3 0.048 0.034 0.000 0.023 0.000 0.039 0.018 0.017 0.017 0.018 0.048
1.1.2.4 0.000 0.040 0.000 0.000 0.000 0.060 0.036 0.024 0.030 0.000 0.107
1.1.2.5 0.000 0.000 0.000 0.000 0.000 0.046 0.000 0.007 0.014 0.000 0.046
1.1.3.1 0.083 0.049 0.021 0.025 0.000 0.050 0.038 0.031 0.027 0.028 0.083
1.1.3.2 0.067 0.000 0.021 0.034 0.000 0.000 0.024 0.032 0.076 0.000 0.327
1
1
1
l
1
1
1
1
1

~

.133 0.000 0.000 0.000 0.093 0.000 0.000 0.027 0.017 0.028 0.000 0.093
14.1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.000
14.2 0.000 0.029 0.068 0.048 0.089 0.030 0.042 0.026 0.026 0.030 0.089
.143 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0 0 O
.144 0.000 0.034 0.050 0.040 0.000 0.000 0.024 0.017 0.019 0.000 0.050
.145 0.000 0.000 0.000 0.000 0.000 0.000 0.031 0.002 0.007 0.000 0.031
15.1 0.000 0.000 0.000 0.000 0.000 0.031 0.016 0.006 0.010 0.000 0.031
15.2 0.000 0.036 0.022 0.000 0.000 0.037 0.018 0.011 0.014 0.000 0.037
15.3 0.000 0.040 0.022 0.000 0.000 0.032 0.021 0.011 0.014 0.000 0.040
1.1.5.4 0.000 0.000 0.032 0.000 0.000 0.000 0.000 0.005 0.011 0.000 0.032

—
N—WQQJiﬂOOOOOO‘OONAOOOOWO‘OObJ

 

1.2.1 0.050 0.053 0.024 0.000 0.039 0.099 0.029 0.032 0.029 0.029 0.099 1
1.2.2 0.146 0.031 0.062 0.045 0.111 0.063 0.043 0.045 0.041 0.043 0.146 1
1.2.3 0.000 0.022 0.000 0.000 0.000 0.000 0.018 0.005 0.010 0.000 0.028 4
1.3.1 0.000 0.026 0.025 0.000 0.087 0.000 0.036 0.022 0.027 0.000 0.087 8
1.32 0.000 0.055 0.028 0.041 0.047 0.058 0.031 0.043 0.028 0.041 0.114 14
1.3.3 0.000 0.066 0.096 0.066 0.093 0.045 0.033 0.065 0.037 0.063 0.131 15
1.3.4 0.066 0.041 0.035 0.142 0.059 0.000 0.029 0.047 0.053 0.035 0.161 11
1.3.5 0.000 0.000 0.037 0.000 0.000 0.000 0.000 0.002 0.009 0 0.037 1
1.4.1 0.000 0.000 0.040 0.000 0.088 0.000 0.030 0.049 0.061 0.036 0.242 11
1.42 0.000 0.000 0.040 0.048 0.106 0.000 0.028 0.037 0.053 0.020 0.183 9
1.4.3 0.000 0.034 0.000 0.000 0.000 0.000 0.018 0.008 0.013 0.000 0.034 5
1.5.1 0.051 0.022 0.021 0.000 0.000 0.000 0.021 0.017 0.017 0.021 0.051 9
1.5.2 0.083 0.028 0.033 0.000 0.095 0.054 0.033 0.027 0.029 0.028 0.095 10
1.5.3 0.000 0.000 0.000 0.027 0.000 0.000 0.021 0.010 0.028 0.000 0.116 3
15.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.000 0
1.6.1 0.076 0.036 0.077 0.075 0.000 0.000 0.032 0.059 0.039 0.053 0.135 15
1.6.2 0.251 0.088 0.136 0.117 0.186 0.000 0.106 0.133 0.087 0.106 0.338 16
1.7.1 0.078 0.043 0.051 0.000 0.000 0.071 0.056 0.051 0.038 0.051 0.133 13
1.7.2 0.000 0.022 0.000 0.000 0.000 0.000 0.035 0.006 0.012 0.000 0.035 4
1.81 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0 O 0 0
1.8.2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0 0 0 0
1.9.1 0.000 0.000 0.000 0.101 0.000 0.000 0.021 0.008 0.024 O 0.101 3
1.9.2 0.000 0.000 0.037 0.047 0.000 0.000 0.018 0.011 0.019 0 0.063 5
1.10.1 0.000 0.029 0.022 0.029 0.000 0.000 0.045 0.022 0.032 0.000 0.129 8
Standard

Deviation 0.049 0.023 0.030 0.036 0.044 0.031 0.020 0.024 0.019 0.023 0.075 4
Count 11 25 23 17 ll 18 32 39 39 16 39 39

APPENDIX D

Appendix D

Scores and Ranks on Specially-Constructed Tests

 

 

 

 

vmd mmé cud vmﬁ v0.2. 32.2. 2N2. mms 222.2. vwd 86 me Om
20.2m omév $0.2 Nodv 292v mhdv Nndv 20.2m cmdv Nmsv a.vv mmdv m2><
no.3 mmwv No.2.“ cmNm Nm.Nm mva mN.mm moﬁm Nc.2w mw.mv N2.Nm o2.vm O
mh.2v oven 2w.2 mmdm de ovdm 2.2m. 2v eo.ov mndm vwﬁm Sim w5.2v 22
23.8 2v.mw m 2 .N N.N.wm 2. 2 Km 3.3 no. 20 222.220 2v.nn omen 2.2 .am Nodn O
cvgmm mm.2v Nv.m vvﬁv vw.mv madv 2 26v ovemm mm.2v vovv 2w.Nv m2.vv Z
2m.wm $.2m moN nm.mm NwNm ”N.Nm $6.2“ 2m.wm ”O.Nm va.2m 2b.; mwﬁm 22
omdm moNv mmN modv 2 2.wv 2m.2.v 206v MN.mm mosv omxnm maﬁv mNdm 2
NaNm 3.22m hoN 8.2mm wovn cwdn mwﬁm NaNm 2w.Nm 2.2.Nm oN.mm anvm .22
3.22. N53 86 vv.mo ”2.3 once 220.3 ”0.50 32¢ 8.22. Nbdm 922.2% 2.
25.2m Rev 2.2 36v om.2w Nasv cm.2m m2dv 226v hmév vo.om 25.2w 2
2m.mv amdN omv ”2.9” oodm vawm o2.mv 2m.mv vadm mmdN mm.2v MNNv I
Nw.2m mw.2v v2.m MN.mv mw.2v cvév omdv Nw.2m mw.2v a26v mNNv m2.mv O
wwdv hvdv 2 .2 vmdv adv hvdv vwdv wwdv wmﬁv equ amSv mhdv ..2
vwdv vo.ov moN 236v vN.Nv ow.2v o2.mv vN.Nv mNNv aNNv vo.ov ocdv m
ova owdm ON.2 m2.2v oNdv owdm 2v.Nv oan wadm amdv mcdv weNv n2
aodm vwﬁv No. 2 mmdm ovdv 3.3 229mm Non vwﬁv owdm hmdv co. 2 m U
2.2.Nw ov.2m v2.m 26m aﬁmm ov.2m Gem CNo oNNm 86m 09mm avdm m
mmdv 2 2.mv mv.2 mwév avev 2o.ov hmdv mmdv Nodv mmwv 2 2.mv moNv <
X<2>2 Z232 Om m>< 2 5.0222. 2m-X.2. 25-22. 5-00 2 5-00 2m-Xm2 22.232 20275 2222222200

 

 

238% 22.2.2 2222o=L22=oD-A22e2ooo.m 2222223292225

2Q 22222222.

204

205

 

 

 

2N.» vo.o mvd mNN $0.2. no.2. 2N.” vv.2. ”N2. v92. vo.o vad 2.0.0 Gm
Nw.2.v ”v.vv cad 36v mmév mv.2.v m2..vv Nw.2.v w2ﬁv ouév Nw.2.v adv onﬁv m2><
29mm 2.N.2m eo.o emNm mm. 2 n mmNm «N.Nm mwNm N2 .Nm 5N. 2 m mem 2m.Nm 29mm 0
82v mmNm vm.N omdm mvdm mvdm aQNM wmdm amdm Nodv no.2v m2.wm modv 22
Exam mN.mm Nov. 2 vaﬁm omen vw.wn aN.nm 2.22m Nbdm hwdm «mam MNdm comm O
mvﬁv Nv.Nv ”2.. 2 a.vv havv 86v avdv Nhev «vdv Nv.Nv ohmv hmdv oodv Z
vv.mm deN 20.2 NmNm Nb. 2 m mmNm Nde EuNm mmNm mon vv.mm acNm aqmm 22
2N.mv o 2 .vv mm. 2 mmﬁv Nhév 36v 2 .vv voﬁv wmsv mvﬁv 2N.mv wmﬁv wadv A
movm mm.wv 292 o2..Nm 2a. 2 m 0N.mm mndv Nw.mm 2 2.mm moan movm ov.mm 5.9 v2
Nave ”new mnN M220 Nave 220.20 move 8N0 220.20 moNo Naﬁw wmém N03 2.
3.22m owvv mm . 2 aodv mNdv dev cwvv w 2 .av oadv mNdv bmdm vadv 2mm.ow 2
no.2v omsm mm.2 vvdv eo.om vo. 2v omﬁm Nm.2v o2.2v dem we. 2v wo.2v 00.2v 2.2
ﬂumv 25.2v w 2 .2 2.5.mv 22.. 2v 2 2.vv ﬂamv N2..vv 2o.vv aNNv vav oan 222.vv O
wwdv mva 222.2 n2.2.v mmév ovﬁv nva ooﬁv voﬁv 2m.2.v wm.wv N2.wv Enwv ...2
NN.mv Nm.wm 2 m2 32v 22v 2.2..Nv Nadm 2N.mv vav eva NN.mv oo.2v 2m.Nv m2
Ne. 2v 2 2.02m No.2 928m mm.wm m2 .ov 2 2.0m vbdm 020mm 2N.ov No.2v ondv on. 2v Q
awdm vm.2.v ac. 2 mmdv mN.wv N2 .8 vm.2.v 3.22m adv mvdv a2 .3 8.8 Nvdm U
mN.wm No.8 wN.2 25mm 09% NN.vm Nada mN.mm vmvm 2v.Nm cmvm 2o.vm mmvm m
eo.ov o2 .Nv 2m.2 2m.mv mvvv ohdv $2 .Nv mhﬁv N2 .mv w2dv eo.ov a2 .mv omev <
g 2222 Om m2>< 22.-NO<3 ZD-NO<3 5-22.3 22302.3 75-22.? 221-003 222-003 25-22%? ZD2Xm3 22.222500

 

 

 

8.292% 222k BquhEoDAEEomQW 222222.2M2mx222

NO 0222222.

206

 

 

 

 

m2 ”3 3o ts a? 8.2 m: X: who «3 3.2 mm
3% 3.3 93 2.3. 3% a3: 8.3. 8.? an? 3.: an? m><
8% 2% as 3.? 2d «Nan 2.3 5% 2.x 2.? 3% d
8.? 8.8 RN 2.? 3.? 2.3 8.: 3.? 3:. 8.8 3.8 m
2.6 5.8 S; 3% 8.3 3% SS N20 8% 3% 2.3 o
3.3 3.? a: 3.3. 32. 3.? 3.3 3.3 3.9. 3.3 2.3 2
8.3. 3.: 8.2 92m 8.3. 8.9. a: 8.2 3.2 F: 3% 2
2.8 3% ed 3% 3% E? 2.8 8.8 32% 3% 3.3 4
«2% 8.8 SN 2.? ~23 3.6m 3% 3.? 23m 3.? 2.2 M
2.8 8.3 m3 8% 38 2.8 3.3 3.8 8% 8.8 8.8 2
8.3 $2 $3 3.8 3% 3.8 3.8 3.: $2 3.8 8.8 2
2.2. 2.: ao.o 8.? 8.? on? 5.; 3% on? 3.9 3.; m
2.2 3.? 3m 3: 2.2 8.3 a: 9.3 8.9 ~29. 8.8 o
8.3 8.3 as 2.? 8.3 is. 3.2 23v 8.? 53 No.3 “2
3.? on: 85 3.? 3.2 3.? 8.? ad“ 3.; 8.? 8.; m
2.? 3.8 23 8.; 8.8 8.? 8.8 8.? 3.? 3S 8.? D
3% 3.8 5.0 3. _ m ”Zn 3.8 :ﬂ 3% :1; 8.9 3.: o
2% 3% 2: 8.3 2.3 3% 3% 5.2 3% 2.8 :3 m
a.a. RS on; 2.: N33 3.: :3 8.3 2.3 one. 2.: <
5% E2 am m>< 03-8,; o:-8< 02-5.3 03? 02-8 022x95 03-58 9550

 

 

8.28m Ems hw2u=hm=oU4©2Equm $3.35

ma 2an

207

 

 

 

 

 

 

N. N N N N N N N N N. N N o
N. N. . N. N. N. N. N. N. N. N. N. N
N . . N N N . N N N . . o
N. N N .. .. N N. N N. N. .. N. z
N. N. N N. N. N. N. N. N. N. N. N. .2
N N N N N N N N N N N N ..
N N . N N N N N N N N N v.
N . N . . . N . . . N N N
N. N N N N N N N. N N N N .
N. N. . N. N. N. N. N. N. N. N. N. N.
N. N . .. N. .. .. N N. N. N. .. o
N. N . N N N. N N. N N N N ...
N. N N N. N. N. N. N. .. N N. N. N.
N. N. . N. N. N. N. N. N. N. N. N. a
N N . N N N N N N N N N u
N N . N N N N N N N N N m
N. N. . .. N. N. N. .. N. .. N. N. <
5:). 2.2 cm m>< .N-N0< 5.5 .3? 5-8 .N-ou .N-.xm. .N-.xm 2223 F55

 

 

2.8K ..Nmtzhwtob-b~§um&m ..NNNEMBNNNS NS 833%

vQ oEmP

 

208

 

 

N N . N N N N N N x N N N N a
N. N. N N. N. N. N. N. N. N. N. N. N. N
N . N N N N N N N N N . . o
N. N N N. N. .. N N. N. N. N. .. N. z
N. N. N N. N. N. N. N. N. N. N. N. N. 2
N. N . N N N N. N N N N N N a.
N N N N N N N N N N N N N v.
N . N . . . . . . . . N N N
N N . N N N N N N N N N N .
N. N. . N. N. N. N. N. N. N. N. N. N. z
N. N . .. N. N. N N. N. N. .. N. .. o
.. N . N N N .. N N N N N N N
N. .. . N. N. N. N. N. N. .. N. N. N. N.
N. N. . N. N. N. N. N. N. N. N. N. N. N.
N N N N N N N N N N N N N o
N N N N N N N N N N N N N N.
N. N. . .. .. N. N. .. .. N. N. N. N. <
x5. 2.2 .8 m>< .N-No<3 z:-No<3 5-53 .333 ESE... $83 75-83 .N-.xm3 22.53.... 3.80

 

NNNNN .NNNNNENNNNQU-AN2NN...NNN~W kmeBxx NS 8.36%

mm 2an

209

 

 

 

 

N N . N N N N N N N N o
N. N. N N. N. N. N. N. N. N. N. N
N . . N N N N N . N N o
N. N . N. N. N. N N. N. N. N. z
N. N. N N. N. N. N. N. N. N. N. s.
N N N N N N N N N N N ..
N N . N N N N N N N N v.
N . N . . . . . N . . N
N N . N N N N N N N N .
N. .. . N. N. N. N. N. .. N. N. N.
N. N N N N N. .. .. N. N N 0
N. N . N. .. N N. N N. N. N. N
N. N. . N. N. N. N. N. N. N. N. N.
N. N. . N. N. N. N. N. N. N. N. a
N N . N N N N N N N N o
N N . N N N N N N N N m
N. N . .. N. .. N. N. N .. .. <
5...). 2:). QN m>< 0:-No<3Jo:-No< dank... 02-5. 02-8 02.5.3 02.x... 9.58

 

SEN ..NNNNNNNNtoDAQSNNNNm $3.25 NE 82232

on 03;

LIST OF REFERENCES

LIST OF REFERENCES

Anastasi, A. (1982). Psychological testing. New York: MacMillan Publishing Co., Inc.

Airasian, P.W., & Madaus, GP. (1983). Linking testing and instruction: Policy issues.
Journal of Educational Measurement, 20(2), 103-118.

American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education. (1985). Standards for
educational and psychological testing. Washington, DC: Author.

Baker, DP. (1993, April). Compared to Japan the US. is a low achiever. . .really: New
evidence and comments on Westbury. Educational Researcher, 22(3), 18-20.

Berliner, DC. (1993, Fall). lntemational comparisons of student achievement: A false
guide for reform. National Forum, 25-29.

Bracey, G.W. (1995, Oct.). The ﬁfth Bracey report on the condition of public education.
Phi Delta Kappan, 149-160.

Burstein, L. (1991). Conceptual considerations in instructionally sensitive assessment.
Los Angeles: Center for Research in Evaluation, Standards, and Student Testing.
(ERIC Document Reproduction Service No. ED 335367)

Burstein, L. (1993). Studying learning, growth, and instruction cross-nationally:
Lessons learned about why and why not engage in cross-national studies. In
Burstein, L. (Ed.) The IEA Study of Mathematics III: Student Growth and

Classroom Processes. New York: Pergamon Press. (p.xxvi-lii).

Burstein, L., Aschbacher, P., Chen, 2., Lin, L., & Sen, Q. (1990). Establishing the

content validity of tests designed to serve multiple purposes: Bridging secondary-
postsecondary mathematics. Los Angeles, CA: UCLA Graduate School of
Education. CSE Dissemination Ofﬁce.

Cattell, RB. (1949). r and other coefﬁcients of pattern similarity. Psychometrica, 14(4),
279-298.

210

211

Cronbach, L.J. & Gleser, G.C. (1953). Assessing similarity between proﬁles. The
Psychological Bulletin, 50(6), 456—473.

Cronbach, L.J. (1971). Test validation. In Thorndike, R.L. (Ed.), Educational
Measurement. Washington, DC: American Council on Education.

Cohen, M. (1988, April). Designing state assessment systems. Phi Delta Kappan, 583-
588.

Crocker, L.M., Miller, M.D., & Franks, EA. (1989). Quantitative methods for assessing
the ﬁt between test and curriculum. Applied Measurement in Education, 2Q),
179-194.

Crocker, L., Llabre, M., & Miller, MD. (1988). The generalizability of content validity
ratings. Journal of Educational Measurement, 25, 287-299.

Fitzpatrick, AR. (1983). The meaning of content validity. Applied Psychological
Measurement, 7(1 ), 3-1 3.

Freeman, D.J., Belli, G.M., Porter, A.C., Floden, R.E., Schmidt, W.H., Schwille, J .R.
(1983, Fall). The inﬂuence of different styles of textbook use on the instructional
validity of standardized tests. Journal of Educational Measurement, 20(3), 259-
270.

F rederikson, J.R. & Collins, A. (1989). A systems approach to .educational testing.
Educational Researcher, 18(9), 27-32.

Gamoran, A., Porter, A.C., Smithson, J., & White, RA. (1996, March). Upgrading high
school math instruction: Improving opportunities for low-achieving, low income
youth. A paper presented at the annual meeting of the American Education
Research Association, New York, NY.

Garden, R.A., & Orpwood, G. (1996). Development of the TIMSS achievement tests. In
IEA (Eds.), Third International Mathematics and Science Study Technical Report
Volume I: Design and Development. Boston: Bocton College.

Guion, RM. (197 8). Scoring of content domain samples: The problem of fairness.
Journal of Applied Psychology, 63(4), 499-506.

Guiton, G., & Oakes, J. (1995). Opportunity to learn and conceptions of educational
equality. Educational Evaluation and Policy Analysis, 1 7(3), 323-336.

Guskey, T.R., & Kifer, E.W. (1990). Ranking school districts on the basis of statewide
test results: Is it meaningful or misleading? _Educational Measurement: Issues
and Practice, 9(1), 11-16.

212

Guthrie, J .T. (1986). Roles of the National Assessment of lntemational Progress in
international studies (Publication NO. TM 870 049). The Nation ’s Report Card.
(ERIC Document Report Service NO. ED279678).

Haertel, E. & Calfee, R. (1983, Sum.). School achievement: Thinking about what to
test. Journal of Educational Measurement, 20(2), 119-132.

Husen, T. (1982). A cross-national perspective on assessing the quality of learning.
Washington, DC: National Commission on Excellence in Education. (ERIC
Document Reproduction Service NO. ED225992).

Husen, T. (1983). Are standards in US schools really lagging behind those in other
countries? Phi Delta Kappan, 64, 455-461.

Husen, T. (1987). Policy impact of IEA research. Comparative Education Review, 20,
81-92.

lntemational Association for the Evaluation of Educational Achievement. (1994a).
Teacher questionnaire population 2 math. (Doc. Ref. ICC 880/NRC417). The
Hague: Author.

lntemational Association for the Evaluation of Educational Achievement. (1994b).
TIMSS ﬁeld trial data analysis plan. The Hague: Author.

lntemational Association for the Evaluation of Educational Achievement. (1994c).
TIMSSﬁeld trial manual for national research coordinators. (Doc. Ref. ICC
714/NRC277). The Hague: Author.

Kaestle, C. (1985, F eb.). Education reform and the swinging pendulum. Phi Delta
Kappan, 422-423.

Kupermintz, H., Ennis, M.M., Hamilton, L.S., Talbert, J.E., Snow, RE. (1995, Fall).
Enhancing the validity and usefulness of large-scale educational assessments: I.
NELS: 88 mathematics achievement. American Educational Research Journal,

32(3), 523-554.

LaPointe, A.E. (1991). NAEP: A national report card for education and the public. The
Assessment of National Goals: Proceedings of the I 990 E TS Invitational
Conference, 47-62.

Leinhardt, G., & Seewald, A.M. (1981). Overlap: What’s tested, what’s taught?
Journal of Educational Measurement, 18(2), 85-96.

213

Leinhardt, G. (1983). Overlap: Testing whether it is taught. In Madaus, G.F. (Ed.), The
Courts, Validity, and Minimum Competency Testing. Boston: Kluwer-Nijhoff
Publishing.

Linn, R.L. (1987). State-by-state comparisons of student achievement: The deﬁnition of
the content domain for assessment. (Technical report #275). Los Angeles, CA:
University of California Los Angeles, Center for Research on Evaluation,
Standards, and Student Testing.

Linn, R.L. (1988). Accountability: The comparison of educational systems and the
quality of test results. Educational Policy, 1, 181-198.

Linn, R.L., & Baker, EL. (1995). What do international assessments imply for world-
class standards? Educational Evaluation and Policy Analysis, 1 7(4), 405-418.

Linn, R.L. Baker, E.L., & Dunbar, SD (1991). Complex, performance-based
assessment: Expectations and validation criteria. Educational Researcher, 20(8),
5-21.

Maeroff, G. (1991). The public’ 3 expectations for assessment of National Educational
Goals. The Assessment of National Goals. Proceedings of the 1990 E T S
Invitational Conference, 87- 95.

McDonnell, L. M. (1995). Opportunity to learn as a research concept and a policy
instrument. Educational Evaluation and Policy Analysis, 1 7(3), 305-322.

McKnight, C.C., Crosswhite, F.J., Dossey, J .A., Kifer, E., Swafford, 1.0., Travers, K.J.,
Cooney, TJ. (1987). The underachieving curriculum: Assessing U.S. school
mathematics from an international perspective. Champaign, IL: Stipes
Publishing Company.

Mehrens, WA. (1984, Fall). National tests and local curriculum: Match or mismatch?
Educational Measurement: Issues and Practice. 9-15.

Mehrens, W.A., & Lehmann, 1.1. (1991). Measurement and evaluation in education and
psychology. Fort Worth: Holt, Rinehart and Winston, Inc.

Mehrens, W.A., & Phillips, SE. (1987). Sensitivity of item difﬁculties to curricular
validity. Journal of Educational Measurement, 24(4), 357-370.

Messick, S. (1994). The interplay of evidence and consequences in the validation of
performance assessments. Educational Researcher, 23(2), 13-23.

214

Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement (pp. 13-
103). New York, New York: American Council on Education, Macmillan
Publishing Company.

Millman, J. & Greene, J. (1989). The speciﬁcation and development of tests of
achievement and ability. In R.L. Linn (Ed.), Educational Measurement (pp. 335-
366). New York, New York: American Council on Education, Macmillan
Publishing Company.

Mislevy, R]. (1995). What can we learn from international assessments? Educational
Evaluation and Policy Analysis, 1 7(4), 419-43 7.

Moss, RA. (1992). Shifting conceptions of validity in educational measurement:
Implications for performance assessment. Review of Educational Research,
62(3), 229-258.

Muthen, B., Huang, L., Jo, B., Khoo, S., Goff, G., Novak, J., & Shih, J. (1995).
Opportunity-to-learn effects on achievement: Analytical aspects. Educational
Evaluation and Policy Analysis, 1 7(3), 371-403.

National Commission on Excellence in Education. (1983). A Nation at Risk.
Washington, DC: US. Dept. of Ed.

Nitko, A.J. (1989). Designing tests that are integrated with instruction. In R.L. Linn
(Ed.), Educational Measurement (pp. 453-474). New York, New York:

American Council on Education, Macmillan Publishing Company.
Passow, AH. (1984). The IEA national case study. Educational Forum, 48, 469-487.

Pelgrum, W.J. (1989). Educational assessment: Monitoring, evaluation and the
curriculum. Enschende, The Netherlands: University of Twente Department of
Education.

Phillips, S.E., & Mehrens, WA. (1988). Effects of curricular differences on achievement
test data at item and objective levels. Applied Measurement in Education, 1(1),
33-51.

Porter, AC. (1990). Assessing national goals: Some measurement dilemmas. The
Assessment of National Goals: Proceedings of the I 990 E TS Invitational
Conference, 21-42.

Postlethwaite, TN. (1987). Comparative educational achievement research: Can it be
improved? Comparative Education Review, 31, 150-163.

215

Purves, AC. (1987). The evolution of the IEA: A memoire. Comparative Education
Review, 31, 10-28.

Raizen, S.A. & Jones, L.V. (1985). Indicators of precollege education in science and
mathematics: A preliminary review. Washington, D.C.:’ National Academy
Press.

Resnick, L.B., Nolan, K.J., & Resnick, D,P. (1995). Benchmarking education standards.
Educational Evaluation and Policy Analysis, 1 7, 4, 43 8-461.

Robitaille. D.F. & Gardden. RA. (1996). Research questions & study design. The Third
lntemational Mathematics and Science Study Monograph No. 2. Vancouver,
Canada: Paciﬁc Educational Press.

Robitaille, D.F., McKnight, C., Schmidt, W.H., Britton, E., Raizen, S., & Nicol, C.
(1993). Curriculum frameworks for mathematics and science. Vancouver,
Canada: Paciﬁc Educational Press.

Romberg, T.A., & Wilson, L.D. (1992). Alignment of tests with the Standards. The
Arithmetic Teacher, 40(1), 18-22.

Schmidt, W.H. (1983). Content biases in achievement tests. Journal of Educational
Measurement, 20(2), 165-178.

Schmidt, W.H., & McKnight, CC. (1995). Surveying educational opportunity in
mathematics and science: An international perspective. Educational Evaluation
and Policy Analysis, 1 7(3), 337-353.

Schmidt, W.H., McKnight, C.C., Valverde, G.A., Houang, R.T., & Wiley, DB. (in press).
Many visions, many aims: A cross-national investigation of curricular intentions.
Third lntemational Mathematics and Science Study, Michigan State University,
MI.

Schmidt, W.H., Porter, W.H., Schwille, J .R., Floden, RE, & Freeman, DJ. (1983).
Overlap: Testing whether it is taught. In Madaus, G.F. (Ed.), The Courts,
Validity, and Minimum Competency Testing. Boston: Kluwer-Nijhoff
Publishing.

Schmidt, W.H., & Valverde, GA. (1995). National policy and cross-national research:
United States participation in the Third International Mathematics and Science
Study. Manuscript in preparation, East Lansing, MI: Michigan state University,
Third lntemational Mathematics and Science Study.

216

Sireci, S.G. (1990). Applying empirical analyses to the evaluation of test content. Paper
presented at the annual meeting of the Northeastern Educational Research
Association, Ellenville, NY, November, 1990.

Skinner, HA. (1978). Differentiating the contribution of elevation, scatter, and shape in
proﬁle similarity. Educational and Psychological Measurement, 38, 297-308.

Stedman, LC. (1994, Oct.). Incomplete explanations: The case of US. performance in
the international assessments of education. Educational Researcher, 23(7), 24-32.

Survey of Mathematics and Science Opportunities (SMSO). (1993, Nov.). A description
of the T IMSS ’ achievement test content design test blueprints. East Lansing, MI:
Michigan State University.

Walker, D. & Schaffarzick, T. (1974). Comparing curricula. Review of Educational
Research, 44(1), 83-11.

Westbury, I. (1992, June-July). Comparing American and Japanese achievement: Is the
United States really a low achiever? Educational Researcher, 21(5), 18-24.

Westbury, I. (1993, April). American and Japanese achievement. . .again. Educational
Researcher, 22(3), 21-25.

Wolf, RM. (1988, April). The NAEP and international comparisons. Phi Delta
Kappan, 69, 580-582.

"Illlllllllllllllf