INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the
most advanced technological means to photograph and reproduce this document
have been used, the quality is heavily dependent upon the quality of the material
submitted.
The following explanation of techniques is provided to help you understand
markings or notations which may appear on this reproduction.
1.T he sign or “target” for pages apparently lacking from the document
photographed is “Missing Page(s)” . If it was possible to obtain the missing
page(s) or section, they are spliced into the film along with adjacent pages.
This may have necessitated cutting through an image and duplicating
adjacent pages to assure you of complete continuity.
2. When an image on the film is obliterated with a round black mark it is an
indication that the film inspector noticed either blurred copy because of
movement during exposure, or duplicate copy. Unless we meant to delete
copyrighted materials that should not have been filmed, you will find a
good image of the page in the adjacent frame.
3. When a map, drawing or chart, etc., is part of the material being photo­
graphed the photographer has followed a definite method in “sectioning”
the material. It is customary to begin filming at the upper left hand comer
of a large sheet and to continue from left to right in equal sections with
small overlaps. If necessary, sectioning is continued again-beginning
below the first row and continuing on until complete.
4. For any illustrations that cannot be reproduced satisfactorily by
xerography, photographic prints can be purchased at additional cost and
tipped into your xerographic copy. Requests can be made to our
Dissertations Customer Services Department.
5. Some pages in any document may have indistinct print. In all cases we
have filmed the best available copy.

Universto
Micrdfrilms
International
3 0 0 N. 2 E E B R O A D , ANN A R B O R , Ml 4 8 1 0 6
18 B E D F O R D ROW, L ONDON WC1R 4 E J , E N G L A N D

8112076

E r w i n ,P a u l D

ean

THE MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM: A STUDY OF
THE RELATIONSHIP BETWEEN MICHIGAN’S EXPERIMENTAL READING
TEST AND SELECTED READING INSTRUCTIONAL PROGRAMS

PH.D.

Michigan State University

University
Microfilms
International

300 N. Zeeb Road, Ann Arbor,MI 48106

Copyright 1980
by
Erwin, Paul Dean
All Rights R eserved

1980

THE MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM:
A STUDY OF THE RELATIONSHIP BETWEEN MICHIGAN'S
EXPERIMENTAL READING TEST AND SELECTED READING
INSTRUCTIONAL PROGRAMS
By
Paul Dean Erwin

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OP PHILOSOPHY
Department of Administration
and Higher Education
19 80

ABSTRACT
THE MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM:
A STUDY OF THE RELATIONSHIP BETWEEN MICHIGAN'S
EXPERIMENTAL READING TEST AND SELECTED READING
INSTRUCTIONAL PROGRAMS
By
Paul Dean Erwin

Purpose of the Study
This study was an attempt to establish the degree of
concurrence between the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven
and the K-6 reading instructional programs most commonly
used in Michigan.

The purpose of the study was four-fold:

(1) to determine the concepts presented in the K-6 reading
instructional programs,

(2) to determine the concepts

measured by the Michigan Educational Assessment Program
Experimental Reading Test Grades Four and Seven, (3) to
analyze and compare the concepts presented in the K-6
reading instructional programs and the concepts measured by
the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven as measured by the
Reading Concepts Checklist, (RCC), and (4) to establish the
degree of congruence between the K-6 reading instructional

Paul Dean Erwin
programs as measured by the Reading Concepts Checklist,
(RCC) .

Procedure and Design
The Reading Concepts Checklist, (RCC), was developed
as a means of describing, within a common framework, the
concepts presented in the reading instructional materials
and the concepts tested in the Michigan Educational
Assessment Program Experimental Reading Test Grades Four
and Seven.

This checklist was developed on the basis of

conceptual consensus of agreement and based on the work of
recognized authorities in the field of reading.
checklist formed the basis of two instruments:

The
(1) A

classification of K-6 instructional concepts matrix, and
(2) A classification of tested concepts grades Four and
Seven matrix.
The data for the reading instructional programs were
collected by surveying the sixty-five teachers' manuals of
the five reading instructional programs.

As a concept

was presented in the manual by the program, it was recorded
in the matrix according to the appropriate grade level and
concept.
The data from the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven
were collected through a review and evaluation of the test

Paul Dean Erwin
by a panel of reading experts.

The panel determined the

reading process being measured by each test Item and
recorded the test item in the tested concepts matrix
according to the appropriate concept and test level.
The analysis leading to the comparison of the concepts
presented in the reading instructional programs to the
concepts tested in the Michigan Educational Assessment
Program Experimental Reading Test Grade Four and Seven
requires data from all levels of the K-6 reading instruc­
tional programs.

The criteria which guided the selection

of the reading instructional programs were as follows:
(1)

the reading instructional programs must be used by a

majority of Michigan's K-6 students,

(2) the term majority

was defined as a clearly definitive number, not simply
"more than half," and (3) the majority must be large enough
that it represented a reasonable cross-section of Michigan's
rural, urban, and large-city K-6 students.

Therefore, the

lower acceptable limit which defined a majority of students
using the reading instructional programs to be included in
the study was established as seventy-five percent.

The

final selection of the reading instructional programs was
based on a national survey of K-8 reading specialists and
reading supervisors.

The reading instructional programs

selected to be compared with the Michigan Educational
Assessment Program Experimental Reading Test Grade Four
and Seven, and which were used by at least seventy-five

Paul Dean Erwin
percent of Michigan's K-6 students, were (1) Ginn and
Company,

(2) Harcourt, Brace and Jovanovich,

Rinehart, and Winston,

(3) Holt,

(4) Houghton-Mifflin Company,

and (5) Scott, Foresman Company.

Usuable data were

acquired from sixty-five teachers' manuals and the
independent ratings of the researcher and three reading
experts of the two levels of the Michigan Educational Assess­
ment Program Experimental Reading Test Grade Four and Seven.
The two major hypotheses, developed and tested, were
stated as follows:
I.

There will be no difference between the five
reading instructional programs in grades K-3
in the concepts they present or between the
degree of concurrence between the concepts
presented in each of the five reading instruc­
tional programs in grades K-3 and the concepts
tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 4 as
shown in the Reading Concepts Checklist, (RCC) .

II.

There will be no difference between the five
reading instructional programs in grades 4-6
in the concepts they present or between the
degree of concurrence between the concepts
presented in each of the five reading instruc­
tional programs in grades 4-6 and the concepts
tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 7 as
shown in the Reading Concepts Checklist, (RCC) .

A non-parametric, distribution free test, Cochran's
"Q" Test, compared to a Chi-square distribution, was used
to test the significance between the observed differences
between the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).

The determination

of the magnitude and direction of the significance of the

Paul Dean Erwin
difference between the proportion scores was conducted by
multiple comparisons of the means of the proportion scores
through the use of the Dunn-Bonferroni pairwise comparison
technique.

The Cochran Q Test was employed to determine

the level of reliability and the degree of inter-rater
agreement of the panel of reading experts.

Major Findings and Conclusions
The following appraisal of the findings was reached:
1.

The findings of the study indicate a lack of

concurrence between the Michigan Educational Assessment
Program Experimental Reading Test Grade Four and Seven, and
the K-6 reading instructional programs.

Total proportion

scores of the matches and mismatches across the Reading
Concepts Checklist, (RCC), and proprotion scores from the
nine subcategories in the Reading Concepts Checklist, (RCC),
indicate a lack of concurrence between the Michigan Educa­
tional Assessment Program Experimental Reading Test and
each of the five reading instructional programs.
2.

The findings of the study indicate the degree of

concurrence present among the reading instructional programs
is significantly greater between (1) Ginn and Company,
(2) Harcourt, Brace and Jovanovich, and (3) Holt, Rinehart,
and Winston, and is significantly greater between (4)
Houghton-Mifflin Company and (5) Scott, Foresman Company;
Thus forming two distinct groups.

Paul Dean Erwin
3.

The findings of the study indicate significant

differences exist in the K-3 reading instructional programs
in categories V, Comprehension:

Vocabulary Development,

VII, Inferential Comprehension, and IX, Study Skills of the
Reading Concepts Checklist, (RCC), while significant
differences exist in the 4-6 reading instructional programs
in categories III, Phonic Analysis, IV, Structural Analysis,
VI, Literal Comprehension, and VII, Inferential Comprehension
of the Reading Concepts Checklist, (RCC).
In general, the findings of a significant lack of
concurrence between the K-6 reading instructional programs
and the Michigan Educational Assessment Program
Experimental Reading Test Grade Four and Seven, should be
of importance to everyone concerned with the assessment of
reading concepts and skills in Michigan's K-6 grades.

DEDICATION
One of America's Best loved poets wrote:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim.
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

Two roads diverged in a wood, and II took the one less traveled by,
And that made all the difference.*
This work is dedicated to my wife and son,
Mary Jane and Brian
for their love, encouragement, and support.

They have

sacrificed so much that I might have an opportunity to
travel the less traveled path.

*"The Road Not Taken” by Robert Frost

ACKNOWLEDGEMENTS

Every doctoral program is unique in its own way.
Yet, every doctoral program shares a commonality; it could
not have been accomplished without the assistance of many
concerned and enthusiastic people.

It is hoped that all

who shared in the completion of this program realize their
support and assistance is deeply appreciated.

There are

some, however, whose contribution needs special recognition.
Special recognition and appreciation is extended to
Dr. Herbert C. Rudman, chairman of my guidance committee,
for the many hours spent in guiding this candidate through
the program and especially the work contained here.

He

was to become not only an advisor but a friend as well.
To Drs. William Durr, Keith Groty, and Frederick R.
Ignatovich for their service on the Guidance Committee.
Each has been willing to provide necessary guidance and
help when and where needed.
To Drs. Gerald Duffy, William Durr, and George
Sherman for their willing assistance and the hours they
gave in evaluating and reviewing the Michigan Experimental
Reading Test.

To Dr. Edward D. Roeber of the Michigan Department of
Education for his cooperation and assistance with assessment
information and materials.
To Necia Black who served as a source of encouragement
and information through the statistical aspects of this
endeavor.

TABLE OF CONTENTS

Page
LIST OF T A B L E S ..................................... viii
LIST OF APPENDICES ...............................

xi

Chapter
I

II

STATEMENT OF THE PROBLEM ................

1

Introduction ...........................
Statement of the Problem . . . . . . . .
Statement of the Purpose ..............
Significance of the Study..............
Theory and Supportive Research ........
Limitations and Assumptions............
Hypotheses .............................
General Hypothesis I ................
General Hypothesis II................
Organization of the T h e s i s ............

1
1
5
6
9
15
16
16
19
22

RELATED LITERATURE .......................

24

Introduction . . . . . ................
Purpose of Evaluation................
Objectives Referenced Tests............
Distinctions Between Test Types. . . .
Characteristics of Objectives........
Criterion-Referenced Test Construction .
Model for Test Construction..........
Task A n a l y s i s . .......................
Test Plan.............................
Test Construction....................
Item Analysis.........................
Test Validity...........................
Types of Validity....................
Construct Validity ..................
Criterion-Related Validity ..........
Related Studies.........................
Summary.................................

v

24
25
30
30
37
43
43
46
48
53
56
63
63
63
64
68
74

Page

Chapter
III

IV

METHODOLOGY OF THE S T U D Y ...................

77

Development of the Instrument and Its Use.
The Instru ment.......... ...............
The Use of the Instrument...............
Selection of InstructionalMaterials . . .
Treatment of the Data.....................
Statistical Methodology and Research
D e s i g n .................................
Research Design.........................
Statistical Methodology.................
Summary............

77
77
77
81
84

ANALYSIS OF RELATIONSHIPS BETWEEN VARIOUS
READING PROGRAMS AND THE MICHIGAN EDUCA­
TIONAL ASSESSMENT PROGRAM EXPERIMENTAL
READING T E S T ...............................

86
86
89
90

91

A n a l y s i s .................................
92
General Hypothesis I ...................
92
Summary of Hypothesis I Results..........
95
Statistical Test and Treatments..........
97
Results and Evaluation of Statistical
Treatment.................................. 101
Total Proportion Scores................... 101
Pairwise Comparison Scores ..............
103
A n a l y s i s .................................... 118
General Hypothesis II..................... 118
Summary of Hypothesis II R e s u l t s ........... 121
Statistical Test and T r e a t m e n t ............. 125
Results and Evaluation of Statistical
Treatment.................................. 126
Total Proportion Scores................
126
Pairwise Comparison Scores ..............
127
Inter-Rater Reliability Classification of
Test Concepts.............................. 145
Summary of Inter-Rater Reliability Tests . 145
Statistical Tests and Treatments ..........
147
Results and Evaluation of Statistical
Treatment.................................. 147
V

SUMMARY, CONCLUSIONS, IMPLICATIONS AND
RECOMMENDATIONS..............

149

Summary...................................... 149
Purpose and Major Hypotheses . . . . . .
Selection of Instructional Materials . . 151
Instrumentation and Data Collection., . . 151
Treatment of Data and A n a l y s i s ........... 153
Scope and Delimitations of the Study . . 154
vi

Page

Chapter
V

Major Findings ..........................
Conclusions..............................
Relationships Between Michigan Experi­
mental Reading Test and K-6 Reading
Instructional Program ................
Relationship Between Inter-Rater
Reliability Study to the Michigan
Educational Assessment Program
Experimental Reading Test Grade 4 and 7
Relationship Between the K-6 Reading
Instructional Programs................
Implications ............................
Recommendations..........................
Michigan Educational Assessment Program
Experimental Reading Test Grades Four
and Seven ............................
Development of a Communication Process
and Favorable Attitudes ..............
Revisions, Continued Development and Use
of the Reading Concepts Checklist, (RCC)

155
160
161

161
162
164
167
167
168
169

APPENDICES.......................................

171

BIBLIOGRAPHY

228

LIST OF TABLES
Table
1.

Page
Summary of the Total Proportion Scores of the
Matches and Mismatches of the K-3 Reading
Instructional Programs and the Michigan Educ­
ational Assessment Program Experimental Read­
ing Test Grade 4 as Measured by the 103
Concepts Contained in the Reading Concepts
Checklist, (RCC)............................

99

2.

Interval Estimate of the Multiple Comparison
of Proportion Scores for the K-3 Reading
Programs and the Experimental Reading Test,
Grade 4 ....................................... 100

3.

Differences in Total Proportion Scores of the
K-3 Instructional Programs and the Michigan
Educational Assessment Program Experimental
Reading Test Grade 4 .......................... 102

4.

Interval Estimate of the Multiple Comparison
of Proportion Scores for the K-3 Reading
Programs and the Experimental Reading Test
Grade 4 ....................................... 105

5.

Interval Estimate of the Pairwise Comparison
of Proportion Scores Between Harcourt, Brace
and Jovanovich and Three K-3 Reading Programs
and the Experimental Reading Test Grade 4 . .

107

Interval Estimate of the Pairwise Comparison
of Proportion Scores Between Holt, Rinehart,
and Winston and the Two K-3 Reading Programs
and the Experimental Reading Test Grade 4 . .

108

6.

7.

Interval Estimate of the Pairwise Comparison
of Proportion Scores Between Houghton-Mifflin
Company and Scott, Foresman Company K-3
Reading Programs and the Experimental Reading
Test Grade 4 ...................................110

viii

Table
8.

9.

10.

11.

Page
Interval Estimate of
of Proportion Scores
Program Published by
and the Experimental

the Pairwise Comparison
Between the K-3 Reading
Scott# Foresman Company
Reading Test Grade 4 . .

Ill

Summary of the Interval Estimate of the Pair­
wise Comparisons of the means of the propor­
tion Scores Between the K-3 Reading Programs
and Each of the K-3 Reading Programs and the
Experimental Reading Test Grade 4 . ........... 112
Interval Estimate of the Multiple Comparison
of Proportion Scores for the K-3 Reading
Programs and the Experimental Reading Test
Grade 4 by Individual Categories in the
Reading Concepts Checklist, (RCC) ..........

117

Summary of the Total Proportion Scores of
the Matches and Mismatches of the 4-6
Reading Instructional Programs and the
Michigan Educational Assessment Reading
Experimental Reading Test Grade 7, as
Measured by the 103 Concepts Contained in
the Reading Concepts Checklist, (RCC) . . . .

124

12.

Interval Estimate of the Multiple Comparison
of Proportion Scores for the 4-6 Reading
Programs and the Experimental Reading Test
Grade 7 ....................................... 125

13.

Differences in Total Proportion Scores of
the 4-6 Reading Programs and the Michigan
Educational Assessment Program Experimental
Reading Test Grade 7 ........................... 128

14.

Interval Estimate of Pairwise Comparison of
Proportion Scores Between Ginn and Company
and Four 4-6 Reading Programs and the
Experimental Reading Test Grade 7 ..........

131

Interval Estimate of the Pairwise Comparison
of Proportion Scores Between Harcourt, Brace
and Jovanovich and Three 4-6 Reading Programs
and the Experimental Reading Test Grade 7 . .

133

Interval Estimate of the Pairwise Comparison
of Proportion Scores Between Holt, Rinehart,
and Winston and Two 4-6 Reading Programs and
the Experimental Reading Test Grade 1 . . . .

134

15.

16.

ix

Table

Page

17.

Interval Estimate of the Pairwise Comparisons
of Proportion Scores Between Houghton-Mifflin
Company and Scott, Foresman Company 4-6 Read­
ing Programs and the Experimental Reading Test
136
Grade 7 ......................................

18.

Interval Estimate of
of Proportion Scores
Program Published by
and the Experimental

19.

20.

21.

the Pairwise Comparison
Between the 4-6 Reading
Scott, Foresman Company
Reading Test Grade 7. .

137

Summary of the Interval Estimate of the
Pairwise Comparisons of the Means of the
Proportion Scores Between the 4-6 Reading
Programs and Each of the 4-6 Reading
Programs and the Experimental Reading Test
Grade 7 ......................................

138

Interval Estimate of the Multiple Comparison
of Proportion Scores for the 4-6 Reading
Programs and the Experimental Reading Test
Grade 7, by Individual Categories in the
Reading Concepts Checklist, (RCC) ...........

143

Inter-Rater Reliability Total Proportion
Scores for the Experimental Test Grades
4 and 7 ......................................

148

x

LIST OP APPENDICES
Page

Appendix

A

Reading Concepts Checklist: Class­
ification of Instructional Concepts . . .
Reading Concepts Checklist: Class­
ification of Tested Concepts. . . . . . .

B

Communication Skills Objectives ........

184

C

Proportion Scores of the Reading
Instructional Programs and the Michigan
Educational Assessment Program Experi­
mental Reading Test Grade 4 as measured
by the Reading Concepts Checklist, (RCC).

200

Differences Between Proportion Scores
Between the Reading Instructional Pro­
grams and Between the Michigan Educational
Assessment Program Experimental Reading
Test Grade 4 as Measured by the Reading
Concepts Checklist, (RCC) ..............

203

Summary of the Values of the Pairwise
Comparison of the Means of the Proportion
Scores Between the K-3 Reading Instruc­
tional Programs and the Michigan Educ­
ational Assessment Program Experimental
Reading Test Grade 4 by Individual
Category Scores Within the Reading
Concepts Checklist, (RCC) ..............

209

Proportion Scores of the Reading Instruc­
tional Programs and the Michigan Educa­
tional Assessment Program Experimental
Reading Test Grade 7 as Measured by the
Reading Concepts Checklist, (RCC) . . . .

215

Differences Between Proportion Scores Be­
tween the Reading Instructional Programs
and Between the Michigan Educational
Assessment Program Experimental Reading
Test Grade 7 as Measured by the Reading
Concepts Checklist, (RCC) ..............

218

A

D

E

F

G

xi

173
178

Page

Appendix
H

Summary of the Values of the Pairwise
Comparisons of the Means of the
Proportion Scores Between the 4-6 Read­
ing Instructional Programs and the
Michigan Educational Assessment Program
Experimental Reading Test Grade 7 by
Individual Category Scores Within the
Reading Concepts Checklist/ (RCC). . . .

224

CHAPTER I

INTRODUCTION
Statement of the Problem
Nearly every element of the mass media has published or
broadcast a news item discussing the downward trend of the
achievement levels in America's schools.

The increased

publicity about the quality of American educational programs
has caused taxpayers to question what they are receiving
for the money they are spending.

The public believes the

schools are certifying incompetent students as competent
by passing them along, graduating them, granting them
diplomas.^
Increased concern about the quality of American
education has led to a renewed interest in competency based
education.

The concept of competency based education

suggests the existence of standards or a desired level of
preformance.

A result of citizen interest in competency

based education has been to place the pressure of
accountability upon all levels of the educational system.

1Robert L. Ebel, "The Case for Minimum Competency
Testing," Phi Delta Kappan (April, 1978)' p. 546.
1

2
The pressure of accountability is evidenced in the document,
State Activity; Minimal Competency Testing, prepared by
Pipho in October, 1978. 2

Thirty-six states were involved

in some phase of an accountability program.

Michigan is one

of those thirty-six states.

It has a comprehensive six

step accountability model:

1) identify goals, 2) develop

performance objectives, 3) assess needs, 4) analysis of
delivery systems, 5) testing and evaluation, 6) final
3
recommendation for change or recycle to step one.
A
portion of Michigan's Model is known as the Michigan
Educational Assessment Program.

The Michigan Educational

Assessment Program (MEAP) was initiated by the State
Board of Education, supported by the Governor, and
first funded by Act 307 of the Public Acts of 1969 and then
4
under Act 38 of the Public Acts of 1970.
Initially, it
took the form of a norm-referenced test.

It was changed to

an objective-referenced test in 1973-1974 because 1) the
accountability model specifically called for objective-

2

Chris Pipho, State Activity Minimal Competency
Testing, Education Commission of the States, Denver,
Colorado, October 5, 1978, p. 1-12.
3Michigan Department of Education, Michigan
Accountab:lity 1976-77 (Lansing, Michigan: Undated) p.3.
4
Michigan Educational Assessment Program, First Report
of the 1977-78 Michigan Educational Assessment Program,
Interpretive Manual (Lansing, Michigan: Michigan Department
of Education, 1978), Foreward.

3
referenced assessment, 2) the development of performance
objectives and tests tied directly to them is a useful
process for educators for the classification of instructional
intentions, and 3) the objective-referenced test data are
much more specific and more useful in assisting teachers
to respond to individual student needs.^
The statement, the development of performance
objectives and tests tied directly to them as a useful
process for educators for the classifications of instructional
intentions, is accurate only to the extent of the relation­
ship between the test and the field of study.
of that relationship has been debatable.

The extent

The debate

centers on issues such as whether there is a consensus of
opinion among educators that the objectives constitute the
worthwhile objectives local districts should be striving to
'i

attain and who was involved in writing the objectives.

The

claim that hundreds of Michigan teachers, curriculum
specialists, and administrators were involved in the writing
of the objectives^ was countered by the claim that only a
few persons were involved and that the objectives chosen do
not represent consensusal choices of even the small group

Philip Kearney, David L. Donovan, and Thomas H. Fisher,
“In Defense of Michigan's Accountability Program," Phi
Delta Kappan 56 (September, 1974), p. 16.
g
William'Mehrens. Technical Report; The Fifth
Report of the 1973-74 Michigan Educational Assessment
Program. (New York: ERIC Document Reproduction Services,
Ed 120218, July, 1976), p. 18.

who were involved in developing the objectives.

Both

issues are important because they underscore the problem
this research seeks to address.
The rationale that objective-referenced test data are
more specific and more useful in assisting teachers to
respond to individual student needs has legitimate bases.
Objectives are specific.
the teacher.

Objectives provide direction for

They assist the teacher in planning instruc­

tion, guiding student learning, and provide the criteria to
Q

evaluate student outcomes.

The debate concerning the

degree of concurrance between the test content and the
instructional materials tends to raise questions concern­
ing the usefulness of the Michigan Educational Assessment
Program in assisting teachers to respond to individual
student needs.
The general problem this research project seeks to
address is the insufficiency of available data concerning
the relationship of the Michigan Educational Assessment
Program's content to the instructional programs used through­
out the State of Michigan.

7
Ernest R. House, Wendell Rivers, and Daniel L.
Stufflebeam, "A Counter-Point to Kearney, Donovan, and
Fisher," Phi Delta Kappan 56 (September, 1974), p. 19.
g

William A. Mehrens and Irvin J. Lehmann, Measurement
and Evaluation in Education and Psychology 2nd e d ., New
York: Holt, Rinehart and Winston, 1978, p. 19.

5
Within this general framework, of specific concern
is the need to identify the relationship between the
concepts being measured by the Michigan Educational
Assessment Program Experimental

Reading Test for grades

four and seven and the concepts presented through local
instructional programs.

Statement of the Purpose
Little appears to have been done in investigating
the relationship between the Michigan Educational Assess­
ment Program's Experimental Reading test for grades four
and seven and the concepts presented through local reading
instructional programs.

One result of the contested

relationship between the test content and the concepts
presented in the instructional materials has been a
continuance of the questioning of the content validity of
the assessment test.
The Michigan Department of Education appears to be
moving toward resolving the question.

In September, 1979,

the Michigan Department of Education conducted its annual
assessment program.

Concurrently, the Department pilot

tested an experimental assessment program.

However, the

experimental assessment program has been prepared and pilot
tested along the same procedural lines as the current
assessment program.

Therefore, the potential for the debate

over the content validity of the experimental assessment
test remains.

6

The purpose, then, of this research project will be
to establish the degree of concurrence between the concepts
measured in the Michigan Educational Assessment Program
Experimental Reading Test and the concepts presented in the
selected instructional programs used in Michigan.
Specifically, the researcher will undertake to
determine:
1.

What knowledge, skills, abilities, or behaviors
(tasks) are presented in the selected instructional
programs used in Michigan?

2.

What knowledge, skills, abilities, or behaviors
(tasks) are presented in the experimental reading
objectives and items in the Michigan Educational
Assessment Program Experimental Reading Program,
Grades Four and Seven?

3.

What is the degree of overlap between the
selected instructional reading programs and the
Michigan Educational Assessment Program
Experimental Reading Test?

Significance of the Study
Measurement and evaluation play a vital role in
education.

The predominant mode of evaluation is through

written tests.

More recently those tests tend to be

objective-referenced tests, that is, a test based upon a
set of objectives assumed to be representative of the content
domain from which they have been taken.

From the perspective

7
of program evaluation, use of such test results for
either diagnostic and prescriptive or suimnative purposes
depends upon the degree to which the test is a represent­
ative sample of the content domain.
The significance of this research project lies in
its attempt to identify and appraise the relationship be­
tween the selected reading instructional materials and the
Michigan Educational Assessment Program Experimental
Reading Test.
The identification and appraisal of the relationship
between the instructional materials and the Experimental
Test is significant to several groups: 1) the Michigan
Department of Education, 2) the local school districts that
are using the test, and 3) the publishers of the instruc­
tional materials in use throughout the State of Michigan.
The Michigan Department of Education is attempting to
create a new assessment test.

The intended outcome is that

the new test will more nearly reflect the instructional
materials used in Michigan.

The results of this study will

provide the Michigan Department of Education with data
showing the degree of concurrence between the Michigan
Educational Assessment Program Experimental Reading Test
and the instructional materials used in this study.

There­

fore, the actual identification and appraisal of the degree
of concurrence between the Experimental Reading Test and the
instructional reading materials will have a direct impact

8
on the policy and practice of the Michigan Department of
Education in its attempt to revise and implement the
Experimental Reading Test for grades four and seven.
The debate centering around the content validity
issue has caused some problems for teachers and admin­
istrators at the local school district level.

The public

already believes the schools are granting diplomas to
incompetent students.

g

In some instances, publication of

test results seems to indicate the public is correct.
In their own defense, school officials attempt to explain
their test results on the basis of the test's lack of
content validity even though neither side of the debate has
been substantiated.

Local school district officials and

teachers need empirical data that illustrate

the relation­

ship of the instructional programs used in their district
and the Experimental Reading Test being developed by the
State of Michigan.

The significance of this research pro­

ject, then, for administrators and teachers of the local
school district is that it will provide them the data
concerning the relationship of the selected instructional
reading materials to the Experimental Reading Test Grades
Four and Seven and the relationship which exists between
the instructional programs themselves.
As the Experimental Reading Test is an objectivereferenced test, many local districts are developing

g

Ebel,

"The Case for Minimum Copetency Testing" p. 546.

9
objectives upon which to base their instructional programs.
Textbook selection is becoming more sophisticated and the
final selection is more frequently based on the degree to
which the district objectives and the textbook objectives
match.

Knowledge of the relationship of a given instruc­

tional program to the other instructional programs, or its
relationship to the Experimental Reading Test is significant
to the publishers of the instructional programs used in
this project.

Theory and Supportive Research
The insufficiency of available data concerning the
relationship of the Michigan Educational Assessment Program's
content.to instructional programs used throughout the State
of Michigan has been identified as the general problem this
study will address.

One aspect of the lack of available

data is a lack of evidence to support the relationship be­
tween the concepts measured by the Michigan Educational
Assessment Program Experimental Reading Test and local
district instructional programs.

The degree to which the

relationship exists is determined by how well the test items
measure the objectives and sample the content domain.^
Whether the test is norm-referenced or criterion-referenced,

William A. Mehrens and Robert L. Ebel, Some Comments
on Criterion-Referenced and Norm-Referenced Achievement Tests,
NCME Measurement in Education, Vol. 10, No. 1 (Washington,
D. C.: National Council on Measurement in Education, Winter,
1979), pp. 4-5.

10

the test items should be keyed to a set of objectives and
should be representative of a specified content domain.
If that is the case, the test is likely to have content
validity.^
Magnusson discusses content validity as the extent to
which a test covers a field of study.

12

.

In this instance,

the test items serve as a sample taken from a domain
representing the content or aims of the course.

Content

validity is established by the extent to which the sample is
representative of the total domain.

Before one can estimate

content validity, one must explicitly define the aims of
instruction given in the field and the material which the
13
students should have grasped.
In his chapter "The Validity of Classroom Tests,"
Ebel discusses two categories of validity:

1) primary

or direct validity, and 2) derived or secondary validity.

14

13,Ibid., p. 3.
12

David Magnusson, Test Theory, Trans, by Hunter
Mabon, Reading, Mass.: Addison-Wesley Publishing Company,
1967, p. 129.
13Ibid.
14
Robert L. Ebel, Essentials of Educational
Measurement.
2nd ed., Englewood Cliffs, New Jersey:
Prentice-Hall, Inc., 1972, p. 438.

Direct validity is defined as the extent to which the
i

tasks included in the test faithfully represent and in the
proper proportion, are the kinds of tasks that provide an
operational definition of the trait or achievement in
question, whereas

derived validity is the extent to which

the scores it yields correlate with criterion scores that
possess direct validity. 15 Lists of various types and
definitions of validity have been suggested by numerous
authors in the field of educational measurement and
psychology.

Of particular interest are content validity,

defined as being concerned with the adequacy of sampling
a specified universe of content,1^ and curricular validity,
defined as being determined by an examination of the content
of the test itself and judging to what degree it is a true
measure of the important objectives of the course, or is a
truly representative sampling of the essential materials of
instruction. 17 The importance of the correlation between

16

American Psychological Association, Inc., Technical
Recommendations for Psychological Tests and Diagnostic
Techniques, Washington, D. C: APA 1954 in Ebel, Robert L.
Essentials of Educational Measurement. 2nd ed., Englewood
Cliffs, New Jersey:Prentice-Hall, Inc., 1972, p. 437.
17
C. C. Ross and Julian C. Stanley, Measurement in
Today1s Schools, Englewood Cliffs, New Jersey: Prentice-Hall,
Inc., 1954 in Ebel, Robert L., Essentials of Educational
Measurement, 2nd ed., Englewood Cliffs, New Jersey: PrenticeHall , Inc., 1972, p. 437.

12

these two definitions.is made apparent from the point of
view that a test author may succeed to some degree in
attaining his goal if he defines his domain and writes
items to represent the domain.

However, from the point of

view of the one who uses the test, content validity is
situation-specific.

Teachers teaching the same course

titles are not necessarily teaching the same content domain.
The result is that the test would have high content validity
18
for one teacher and low content validity for another.
Content validity and curricular validity are determined
by the test content, the extent to which the test content
is a representative sample of the essential materials of
instruction, and is a representative sample in proportion
to the total population.

18
William A. Mehrens and Irvin J. Lehmann,
Measurement and Evaluation in Education and Psychology,
2nd ed., New York: Holt, Rinehart and Winston,
1978, pp. 111-112.

13
Spool, 19 Magnusson, 20 Lennon, 21 and Tanenbaum,
et al.,

22

have all suggested various components or measures

of the appraisal of content validity.
of those components are:

The basic elements

1) the behavior to be exhibited

in the performance domain, 2) the behavior to be demonstrated
in testing, and 3) the relationship between the two.

The

relationship between behaviors in the performance domain
and behaviors required by the test determines the test's
validity.

The goals of the test must match the goals of

the instructional program.

This does not constitute

teaching to the test, but rather it is the selection of a
test capable of measuring growth in the specific objectives
of the instructional program. 23 This point is illustrated

19Mark D. Spool, Performing a Content Validity Study,
Paper Presented at the Annual Meeting of the Souteastern
Psychological Association (21st, Atlanta, Ga.) 1975, p. 3.
20

Magnusson, Test Theory, P. 129.

21

Robert T. Lennon, "Assumptions Underlying the Use
of Content Validity," Readings in Measurement and
Evaluation in Education and Psychology, Edited by William
A. Mehrens, New York: Holt, Rinehart and Winston, 1976, p. 47.
22

Arlene B. Tanenbaum and Christine A. Miller, The
Use of Congruence Between the Items in a Norm-Referenced Test
and the Content in Compensatory Educational Curricula in
the Evaluation of Achievement Gains, Paper Presented at the
Annual Meeting of the American Educational Research
Association (61st, New York, New York), 1977, pp. 1-10.
23 Roger Farr, Reading: What Can Be Measured?, (Newark,
Delaware: International Reading Association, 1969), p. 36.

14
by the study of Jenkins and Pany.

This study concerned

itself with the extent and direction of curriculum bias in
five widely used standardized achievement tests by comparing
the relative overlap in the content of these reading
achievement tests with the first and second grade contents
24
of seven commercial reading series.
They found that
examination of scores for the curricula they studied
revealed that expected annual growth would vary according to
which test was used.

They concluded, therefore, that it is

doubtful the use of conventional achievement tests can
provide an unbiased estimate of a curriculum's effect, at
25
least with regard to the early grades.
The work of Jenkins and Pany underscores the need for
a high correlation of relationship between the behaviors in
the performance domain and those to be demonstrated in
testing.

While they do raise some questions regarding the

manipulation of the tests against the curriculum used to
cause the scores to reflect the users bias, these questions
deal with the issue of misuse of test results.

Constructors

of achievement tests have always emphasized the importance
of defining the content domain and sampling from it in an
appropriate fashion.

Therefore, whether they are norm-

referenced or criterion-referenced, good achievement test

24 Joseph R. Jenkins and Darlene Pany,
"Curriculum
Biases in Reading Achievement Test," Journal of Reading
Behavior, Vol. X, No. 4 (Winter, 1978), p. 348.
25Ibid., p. 353.

15
items should be based on a set of objectives and represent
a specified content domain.

26

The extent to which that

relationship exists will determine how much content validity
a test has for a particular purpose. 27

Limitations and Assumptions
Any comparative research faces a limitation in the
extent to which terms used have a shared definition
across individuals and subject groups.

It is an assumption

of this research that the terms used will have a high degree
of meaning and similarity of meaning across reading
specialists and test constructors.
This research is also limited by the fact that the
source of information used to select the comparions
instructional materials only provides information for the
national and regional levels.

The assumption is that the

regional information provides a reasonable approximation
of the most commonly used materials in Michigan.
Another limitation in this study is the fact that the
publishers have more than one edition in use at the same
time.

The assumption is that skills presented tend to

remain constant from one edition to the next and that the
latest edition may be used for analysis.

26

Mehrens and Ebel, Some Comments on CriterionReferenced and Norm-Referenced Achievement Tests, p. 3.
27Ibid., p. 5.

16
The research is limited in that no attempt shall be
made to address the issue of instructional validity, that is
the degree of emphasis placed on concepts taught within and
between classrooms.

The assumption is that teachers tend

to follow the instructional reading programs which they use.

Hypotheses
General Hypothesis I
There will be no difference between the five reading
instructional programs in grades K-3 in the concepts they
present or between the degree of concurrence between the
concepts presented in each of the five reading instructional
programs in grades K-3 and the concepts tested by the
Michigan Educational Assessment Program Experimental
Reading Test Grade 4 as shown in the Reading Concepts
9p
Checklist, (RCC).
Operational Hla
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Harcourt, Brace and Jovanovich according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC)
Operational Hlb
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Holt, Rinehart and Winston according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).

28
Appendix A

17
Operational Hlc
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Houghton-Mifflin Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC) ..
Operational Hid
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Scott, Foresman Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hie

...

There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Holt, Rinehart, and
Winston according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational Hlf
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Hlg
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational Hlh
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Holt, Rinehart, and Winston and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC) .

18
Operational Hli
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Holt, Rinehart, and Winston and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC) .
Operational Hlj
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Houghton-Mifflin Company and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC) .
Operational Hlk
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Ginn and Company
and the concepts tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 4 according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hll
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4 according to the proportion of matches and
mismatches across the Reading Concepts Checklist, (RCC) .
Operational Him
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Holt, Rinehart
and Winston and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4 according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC) .

19
Opeational Hln

. .

There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Houghton-Mifflin
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational Hlo
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches
across the Reading Concept Checklist, (RCC).

General Hypothesis II
There will be no difference between the five reading
instructional programs in grades 4-6 in the concepts they
present or between the degree of concurrence between the
concepts presented in each of the five reading instructional
programs in grades 4-6 and the concepts tested by the
Michigan Educational Assessment Program Experimental Reading
Test Grade 7 as shown in the Reading Concepts Checklist,
(RCC).
Operational H2a
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional program
published by Hartcourt, Brace and Jovanovich according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).

20

Operational H2b
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Holt, Rinehart, and Winston according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2c
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Houghton-Mifflin Company according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2d
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational H2e
There will be’ no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Holt, Rinehart, and
Winston according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2f
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational H2g
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).

21

Operational H2h
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational H2i
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational H2j
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Houghton-Mifflin Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational H2k
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Ginn and Company
and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concetps Checklist, (RCC) .
Operational H21
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Harcourt,
Brace and Jovanovich and the concepts tested by the Michigan
Educational Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).

22

Operational H2m
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Holt, Rinehart,
and Winston and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7 according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC).
Operational H2n
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Houghton-Mifflin
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2o
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).

Organization of the Thesis
This chapter contains a statement of the problem,
a statement of the purpose, the significance of the study,
and the theory and research upon which the study is based.
It also includes the limitations and assumptions of the.
study and the testable hypotheses.
In Chapter II, a review of related literature is
presented.

The review includes objective-referenced test

construction, related studies of the relationship of test
content to instructional materials, and the theory of
content validity.

23
In Chapter III, the procedure and methodology of
the study are presented.

The detailed description includes

selection of instructional materials, data collection, the
instrumentation, and the statistical analysis treatment.
The results of the analysis of the data are presented
in Chapter IV.
In Chapter V, the summary, discussion of the major
findings, recommendations, and areas for further research
are presented.

CHAPTER II

' '

RELATED LITERATURE

Introduction
Inflation, technological advancements, the complex­
ities of attempting to meet the needs of students have
placed a strain on the imagination of educators across
America.

Educators’ efforts seem to be achieving less and

parents' complaints seem stronger as evidence appears to
mount in support of the notion that the cost of education
continues to rise while its achievements seemingly decline
annually.

It has become the opinion of the citizens of

the community that it is necessary and proper to hold the
school board members, the school administrators, the
teachers, and the students accountable for their successes
or failures in the learning process.^
One result of the demand for accountability has been
renewed interest in the Competency Based Education (CBE)
movement.

The move toward CBE has renewed interest in the

■'■Robert L. Ebel, Essentials of Educational Measurement
3rd ed., Englewood Cliffs, New Jersey: Prentice-Hall,
Inc., 1979, p. 3.

24

25
field of measurement and evaluation, specifically in the
form of Criterion Referenced Tests and mandated assessment
programs by Departments of Education at the state level.
In 1978, thirty-six states were involved in some phase of
an accountability program and the evidence indicates this
number will increase rather than decline.

2

Questions concerning the adequacy of these mandated
tests arise from teachers and administrators.

A major

question concerns the correspondence between test content
and instructional content.

A review of the literature

concerning the theory behind test validity would be
inadequate if it did not include a discussion of the
purpose of evaluation and the procedures involved in test
construction.

Purpose of Evaluation
Tests are used in a variety of situations.

They may

be administered prior to instruction as a survey of prior
knowledge.

They may be administered during the course of

instruction to monitor student understanding of the
material being presented.

Tests may be administered at

the conclusion of the course of instruction to assess the
level of student achievement.

2

These processes can be

.
Chris Pipho, State Activity
Minimal Competency Testing,
Educaton Commission of the States, Denver, Colorado,
October 5, 1978, pp. 1-12.

26
adopted separately or in any or all combinations.

The

purpose of the evaluation process in each case is to provide a
3
description or a representation of a person.
The function of the evaluation is to aid in the
decision-making process.

If a test does not aid in the
decision-making process, the test is useless. 4 The term
"decision" is defined as all possible courses of action
5
which might follow from test scores.
Linking the two
terms, function and decision-making process, adds yet
another dimension to the overall view of educational
evaluation.

The function of the evaluation can assume

different meanings depending upon the perspective from
which the evaluation is viewed.

As Cronbach^ defines the

functions of evaluation, there are five:

1) learner

feedback, 2) learner reinforcement, 3) teacher feedback,
4) counseling decision, and 5) administrative decisions.

William A. Mehrens and Irvin J. Lehmann, Measurement
and Evaluation in Education and Psychology 2nd ed., New
York: Holt, Rinehart and Winston, 197 8, p. 110.
4
Jum C. Nunnally, Educational Measurement and Evaluation
2nd ed., New York: McGraw Hill Book Company, 1972, p. 4.
^Ibid., p. 5.
®Lee J. Cronbach, Educational Psychology 2nd ed.,
New York: Harcourt, Brace and World, Inc., 1963, p. 539.

27
The idea behind learner feedback is to assist the
student to realize how he should change or develop his
behavior, while learner reinforcement provides the
student with confirmation of his own assessment regarding
his level of achievement.

Students who receive high scores

on tests are encouraged to continue the work habits and
methods of study which brought them success.

Students

who score poorly on tests are warned to work harder,
change their methods of study, and seek help.

As the

students mature, the cummulative effect of the years of
testing will enable the students to learn where their
strengths and weaknesses are and allow them to plan for
their future.

Tests also provide feedback to student about
7
the key concepts in instruction.
Patterns begin to

develop.

Concepts emphasized on previous examinations

become the emphasis for students to study for future
examinations.
The information provided to teachers through the use
of tests helps them judge the adequacy of teaching methods.
Student performance on the tests indicates to teachers what
needs to be retaught and which methods are effective and
g

can be used again to teach to the same objective.

Only to

the extent that the average student meets the objectives

7
Nunnally, Educational Measurement and Evaluation,
p. 126.
Q

Cronbach, Educational Psychology, p. 540.

28
can the teacher feel satisfaction with the instruction
as a whole, and the progress of individual students is
judged largely by how well they perforin with respect to
the objectives.

Lacking the knowledge of that progress,

intelligent decisions about the individual or the class
g
as a whole cannot be made.
Opportunities for promotion within school or to
advance studies in colleges and universities, recom­
mendations to pursue a particular type of employment or
types of employment not to consider are the types of
decisions frequently made by counselors and administrators
on the basis of test results.

Some of those decisions are

reached with the students and some are reached for the
students.

Administrative decisions concerning the total

school program are based on the use of test results.
Analysis of test scores provides indications of the program's
strengths and weaknesses.

Inferior areas will need to be

brought up to standard through a change in instructional
materials, different instructional strategies, or b o t h . ^
The purposes and functions of educational evaluation
instruments are predicated upon the assumption that the
tests have been constructed according to the requirements
for constructing tests.

9

It is widely recognized that

Nunnally, Educational Measurement and Evaluation, p. 124.

■^Cronbach, Educational Psychology, p. 540.

29
teachers and administrators are encouraged to use standard­
ized test results to assess achievement, identify learning
problems, and evaluate the effectiveness of instructional
strategies.

The use of test results to achieve any of

these functions can be considered only in view of the
teachers' knowledge of the extent to which the content of
the test corresponds to the content of instruction.^

The

same caution holds true to a somewhat lesser degree of
teacher-made tests.

If the procedures of test construction

are followed for teacher-made, or tailor-made achievement
tests, the underlying assumption is that the teacher-made
test will be more directly linked to the course objectives
than the more global standardized test in that it is an
assumption that the goals and objectives of tailor-made
tests are tied more closely to the smaller units of
instruction.

The tailor-made tests are constructed for a

specific purpose and are a sample of a more constricted
domain.

12

The caution as stated here will be more

Donald Freeman, Therese Kuhs, Lucy Knappen, and
Andrew Porter, A Closer Look at Standardized Tests,
Institute for Research on Teaching, East Lansing, Michigan,
November 1978, p. 1.
12

William A. Mehrens and Robert L. Ebel, Some
Comments on Criterion-Referenced and Norm-Referenced
Achievement Tests, NCME Measurement in Education, Vol
10, No. 1 (Washington, D. C.: National Council on
Measurement in Education, Winter, 1979), p. 4.

30
appropriately expanded and treated fully in the section
later in this chapter concerning test validity.

Objective Referenced Tests
Distinctions Between Test Types
Tests, generally, can be classified into two major
categories:

1) essay tests and 2) objective tests.

Essay

tests are answered in the narrative form by the examinee.
The essay test requires less time to prepare, but a
greater amount of time to grade.

The grading of essay

tests is subjective in nature and dependent upon the
judgment of the rater as to whether the question has or
has not been answered and the degree to which the question
has or has not been answered.
Objective tests contain the distinctive character­
istics of providing a greater number of items which allows
for a more extensive sampling of the content domain; of not
usually requiring the student to produce an answer all on
his own, but rather only requiring that he recognize the
correct answer by one method or another; of having rules
13
for scoring that are absolutely clear.
Objective tests are those tests which are usually
classified as standardized tests.

They are standardized

in the sense that they conform to specific criteria.

13Nunnally, Educational Measurement and Evaluation,
p. 155.

31
Within those criteria, there are several tests which can
be classified as "standardized tests."

The problem seems

to be one of definition when reference is made to the
various types of objective tests.

The generally accepted

classification of objective tests is 1) standardized
achievement tests, 2) tailor-made achievement tests,
3) objective-referenced tests, and 4) domain-referenced
14
tests.
In some instances, the distinctions between
these tests are major and in other instances the dis­
tinctions are much more subtle.

Within the standardized

achievement test category, classifications are subdivided
into criterion-referenced tests, norm-referenced tests,
objective-referenced tests, and domain-referenced tests.
While all good achievement tests are objective based, the
major distinction is the manner in which the user wishes to
use the data gathered.

It is a misconception that

objective-referenced tests, criterion-referenced tests, and
domain-referenced tests are not "standardized tests."
Rather, it is the interpretation of their use which
differentiates them from the other standardized test, a
"norm-referenced" test.

All are commerically prepared and

draw their sample from a broad domain of general interest.
All can be used for normative referencing or criterion

14
Mehrens and Ebel, Some Comments on CriterionReferenced and Norm-Referenced Achievement Tests, p. 4.

referencing.^

The difference between the normative

reference interpretation and the criterion reference
interpretation is that the meaning of an individual's
score gains its meaning through comparison to some specific
criterion of proficiency.

If the comparison is to scores

of other individuals in a particular group, it is normative
referencing.

If the comparison is to specific criterion

of proficiency, it is criterion referencing.

Further

confusion lies in the fact that the terms criterionreferenced and objective-referenced are used interchange­
ably.

An objective-referenced test, simply stated, is a

test in which the tasks have been related directly to a
set of objectives. 16
Another major distinction between norm-referenced
tests and criterion-referenced tests is that the normreferenced test is descriptive and predictive in nature
and the criterion-referenced test is generally diagnostic
and prescriptive. 17 The criterion-referenced test reflects
the examinee's standing relative to the curriculum.

The

discrimination is between the level of mastery or non­

15Ibid.

17Glen E. Roudabush, Item Selection for CriterionReferenced Tests, Paper Presented at the Annual Meeting
of The American Educational Research Association, (57th,
New Orleans, La.) 1973, p. 2.

33
mastery of the objectives making up the curriculum of
interest from which the criterion-referenced test was
constructed.

From the information gathered as to which

objectives have or have not been mastered (diagnostic
information), decisions for further instruction can be made
(prescriptive information).

Following additional instruction

based on the decisions made from the previous criterionreferenced examination, another criterion-referenced test
can be administered which would reflect changes in the
examinee's capability to perform.

The implications of this

major difference are that the items for a criterion-re­
ferenced test should be sensitive to instruction, while the
items of a norm-referenced test should be sensitive to
individuals.^
The purpose of the criterion-referenced test involves
the classification of individuals into one of several
19
mutually exclusive categories.
The mutually exclusive
categories may be masters and non-masters, instructed and
uninstructed students, or some other group in which there
is a control group and a random group.

By so placing the

18Ibid.
19 Douglas A. Smith, The Effects of Various Item
Selection Methods on the Classification Accuracy and~
Classification Consistency of Criterion-Referenced
Instruments, Paper Presented at the Annual Meeting of the
American Educational Research Association (62nd, Toronto,
Ontario, Canada) 1978, p. 3.

34
individual into a mutually exclusive category, the intended
behavior or instructional objective can be said to have
been measured.20
Criterion-reference measurement differentiates from
normative-reference measurement in that criterionreference measurement is more likely to be undimensional
or homogeneous. 21 Criterion-referenced tests are composed
of clusters of items.

Those clusters of items are keyed

directly to specific objectives and are intended to indicate
whether or not the objective has or has not been achieved.

22

Therefore, a criterion-referenced test is one that is
deliberately constructed to yield measurements that are
directly interpretable in terms of specified performance
23
standards.

20

Ronald A. Berk, A Consumers' Guide to CriterionReferenced Test Item Statistics, Paper Presented at the
Annual Meeting of the National Council on Measurement in
Education (Toronto, Ontario, Canada), 1978, p. 2.
21

Albert C. Crambert, Estimation of Validity for
Criterion-Referenced Tests, Paper Presented at the Annual
Meeting of the American Educational Research Association
(61st, New York, New York), 1977, p. 9.
22

Ebel, Essentials of Educational Measurement, 1979,

p. 351.
23 R. Glaser and A. J. Nitko, Measurement m Learning
and Instruction.
In R. L. Thorndike ed. Educational
Measurement, Washington: American Council on Education,
1971, pp. 625-670, in Ronald K. Hambleton and William P.
Gorth, Criterion-Referenced Testing: Issues and Applications,
Paper Presented at the Annual Meeting of the Northeastern
Educational Research Association (Liberty, New York), 1970,
p. 1.

35
Because of the homogeneity of the test and the
clustering of items around specified objectives, less
emphasis is placed on item analysis in the item selection
process; however, item analysis is used to a degree.
The uses of the two types of tests, norm-referenced
and criterion-referenced, depend largely on what infor­
mation the user wishes to obtain.

The distinctions be­

tween the two types of tests are, primarily, what
Airasian 25 has called
formative evaluation and summative
evaluation.

Formative evaluation indicates how students

are changing with respect to their attainment of the
instructional goals.

Summative evaluation is end-of-

instruction evaluation, primarily to grade student
achievement.

It provides information with respect to how

students have changed relative to course objectives.
significant difference is the verb.

The

Formative evaluation

attempts to provide data relative to weaknesses and direct
corrective teaching action.

Formative evaluation should
26 When being used in
occur frequently during instruction.

24 Crambert, Estimation of Validity for CriterionReferenced Tests, p. 9.
25Peter W. Airasian, "The Role of Evaluation m
Mastery Learning," in Mastery Learning Theory and Practice,
James H. Block, ed. New York: Holt, Rinehart, and Winston,
Inc., 1971, p. 78.
26Ibid., p. 79.

36
formative evaluation, criterion reference measurement
provide their most important information.

In this stage of

the evaluation process, data are used by those in charge of
developing curriculum to make judgments about how to
maximize the probability of learning ar
27
of objectives.

established set

Both the tailor-made achievement test and the domainreferenced test can be used and inferences can be drawn
from them in the same manner as the norm-referenced and
criterion-referenced tests.

There are, however, some

differences between the tailor-made tests, the domainreferenced tests, and the norm-referenced and criterionreferenced tests.

The tailor-made test and the domain-

referenced test sample opposite ends of the spectrum.

The

tailor-made test's primary distinction is that it is
built for a specific purpose and samples from a constricted
domain.

Such a test could be commercially prepared or

prepared at the local school district level.

The domain-

referenced test consists of tasks that are sampled from a
thoroughly defined population of tasks in such a manner that
one can estimate the proportion of tasks in the population

27John A. Emrick, The Experimental Validation of
an Evaluation Model for Mastery Testing, Final Report,
Office of Education, Washington, D. C., November, 1971,
p. 1.

37
'
28
at which the student is likely to succeed.

Tailor-made

tests tend to be program oriented while domain-referenced
are more global representing the entire domain.
What can be concluded concerning the distinctions
between the various types of objective tests is: 1) they
are based on a set of objectives, 2) at least as far as
administration procedures, they are all "standardized,"
3) they may be used as instruments to gather norm referenced
or criterion referenced data.

Therefore, the proper

distinctions are between the more global standardized
tests and the more constricted tailor-made tests, and
whether the intrepretation is to be criterion-referenced
or norm-referenced.2®

Characteristics of Objectives
Ebel 30 has said that a result of an educational
achievement test should be to measure what the process of
education has sought to achieve.

Therefore, the test

constructor must concern himself with educational objectives,
objectives that relate to the total process of education

28Mehrens and Ebel, Some Comments on CriterionReferenced and Norm-Referenced Achievement Tests, p. 4
29Ibid.
30

Robert L. Ebel, Essentials of Educational
Measurement, 2nd ed., Englewood Cliffs, New Jersey:
Prentice-Hall, Inc., 1972, p. 57.

38
and objectives that relate specifically to the course,
subject, or unit of instruction for which the test was
constructed.

The test designed should be consistent with

the objectives of society, the school, and the test
constructor.^ The objective characteristic,
then, is
relevance. 32 The advancement in technology since the
early 1950's should cause current educators to re-examine
their curricular offerings.

The relevance issue of

objectives raises questions about society's needs and the
needs of students.

Automation in industry has lessened

the demand for great amounts of workers.

What, if any,

impact does this have on the aims of education?
relevant objective deal with career planning?

Would a
Should

students be taught to deal with leisure time because the
possibility exists that they will spend less and less time
at work?
Another characteristic of an educational objective
is feasibility.
ations.

Feasibility is an umbrella of consider­

It takes into consideration striving for goals

that are parallel with what psychologists know about how
children develop, how they learn and how they differ one
from another in these two respects, as well as whether or

31Ibid., p. 58.
32
Mehrens and Lehmann, Measurement and Evaluation in
Education and Psychology, p. 20.

39
not the resources are available to achieve these goals
successfully. 33
Objectives provide guidance.

They answer such

questions as "Where do I want to go?", "How do I get
34
there?", and "How do I know 1 have arrived?"
In this
situation, objectives serve a multiple purpose.

They

direct the educational process toward the intended educa­
tional outcome and at the same time are the desired out­
come in stated form.

An outcome has been defined as what
35
occurs as a result of an educational experience.
In its
stated form, an objective directs both the teacher and
the learner through the learning process.
To be complete, to provide the means to evaluate
successful achievement of the objective, the objective
needs to be specific.

By adding the element of stated

observable performance in which the learner will be engaged
during the evaluation process, it becomes possible to
determine whether or not the learner has achieved the

33Ibid., p. 21.
34
Albert R. Wight, "Beyond Behavioral Objectives,"
Readings in Measurement and Evaluation in Education and
Psychology, Edited by William A. Mehrens, New York: Holt,
Rinehart and Winston, 1976, p. 90.
35Mehrens and Lehmann, Measurement and Evaluation in
Education and Psychology, p. 18.

40
objective.

The objective has become a "behavioral

objective."3®
A behavioral objective is specific and contains an
action verb.

The behavioral objective describes what

the learner will be doing during evaluation.

A behavioral

objective should not contain the statement that the student
will gain an appreciation for the American form of govern­
ment because the learner can not be observed "appreciat37
ing" during the evaluation process.
In discussing the
construction of criterion-referenced tests, Roudabush 38
states objectives are coherent, clearly stated and
specifically describe the behavior the examinee will be
able to perform if he has mastered the objective, that is,
each objective specifies a limited domain of behaviors.
Behavioral objectives provide a basic plan of action for
the teacher and the learner from either a pre-instruction
or a post-instruction vantage point.

Objectives provide

both, the teacher and the learner, with the information as
to what is expected during the course of instruction and
with the information as to the level of achievement after

36Ibid., p. 19.
37Ibid., p. 19.
38
Roudabush, Item Selection for Criterion-Referenced
Tests, p. 3.

41

instruction.

By providing directive guidance to them,

objectives take the surprises out of the teaching-learning
process.
In their writing of an

army training manual, Swezey

and Pearlstein maintain that an objective only covers a
single task, not a combination of tasks, that the main
intent of the objective is clear, and the performance
indicators are simple, direct, and part of what the trainee
can already do. 39 An objective is composed of three parts:
40
1) a performance, 2) a condition, and 3) a standard.
The
performance is what is to be accomplished.

It is the task,

action, knowledge, skill, or ability required for the job.
The condition is the circumstance or situation under which
the performance is to be accomplished.

The condition might

be the tools and equipment required, the materials required,
or where it is to be accomplished.

For a military trainee,

the condition could conceivably be under simulated
conditions in the classroom or on a training field "battle­
ground."

In educational terms, the condition refers to a

classroom setting on the one hand or to a "job" situation
under other circumstances.
quality of performance.

The standard is the level or

It can be stated in terms of how

39 Robert W. Swezey and Richard B. Pearlstein,
Guidebook for Developing Criterion-Referenced Tests,
Army Research Institutefor the Behavioral and Social
Sciences, Arlington, VA., August, 1975, p. 2:9.
^ I b i d . , p. 2:3.

42
well the performance is to be accomplished or in terms
41
of time, how quickly it is to be accomplished.
An author of behavioral objectives must keep in mind
several features and attributes if the objective is to
be adequate.

It is not enough to include some aspects

and exclude the rest.

All must be considered.

To be

relevant, an objective must meet the needs of the society,
the student, the school, the instructor.

Some modification

may be necessary to insure that the objective is feasible.
Constraints of money, time, space, materials, and most
importantly, the growth and development and the abilitites
of the students concerned affect the feasibility of an
objective.

The objective must be written with enough

specificity so as to define and describe its intent and
limit it to a single task.

The specificity should provide

instructional guidance to both the learner and the teacher.
It should contain an action verb describing what observable
performance will take place during the evaluation process.

41Ibid., p. 2:7.

43
Criterion-Referenced Test Construction
Model for Test Construction
The construction techniques of objective-referenced
tests, based on the principles of "standardization", is
debatable.

It is generally accepted that instrument

adequacy depends on the extent to which the instrument is
capable of assigning individuals to their true level of
performance, for example, pass-fail or master-non-master,
and the degree to which decisions made are consistent across
42
repeated administrations of the instrument.
These considerations conform to what Cronbach calls
the diagnostic purpose

of testing, that is, a test appraises

the pupil's performance by observing his work on a sample
of tasks or items.

The sample must be representative of

the area being tested and must contain enough items to
give evidence which is dependable.

To yield dependable

evidence, the test must be given in the same way to all
43
students.
For an instrument to assign individuals to
their true level of performance, it must have objectivity.

42
Smith, The Effects of Various Item Selection
Methods of the Classification Accuracy and Classification
Consistency of Criterion-Referenced Instruments, p. 1.
43Cronbach, Educational Psychology, pp. 549-550.

44
A measurement is said to be objective if it can be verified
by another independent evaluator.

Objectivity is not the

process by which the measures are obtained, but rather a
44
characteristic of the measure obtained.
Most experts are able to agree with the above
definitions and requirements.

The methodology for con­

structing criterion-referenced tests on the basis of
conventional statistical processes is the questionable
issue.

Mehrens and Lehmann represent the opposing point of

view to the use of conventional item-analysis procedures in
criterion-referenced tests construction quite well.

A

summary of their point of view is 1) a test item should
not be discarded because it does not discriminate providing
it does reflect an important attribute of the criterion,
2) a negative discriminator may be caused by one of the
several reasons: a) a faulty item, b) ineffective instruc­
tion, c) inefficient learning on the pupil's part, and
3) more research is needed before any conclusive answer
can be obtained regarding the usefulness of conventional
item analysis procedures for criterion-referenced tests.

44Ebel, Essentials of Educational Measurement,
1979, p. 62.
45Mehrens and Lehmann, Measurement and Evaluation
in Education and Psychology, p. 334.

45

45
There appears to be, however, a consensus of opinion
that conventional item analysis procedures are of value in
the construction of criterion-referenced tests, and the
practice is, in fact, common practice.

Douglas U. Smith

defends the practice by contending that it is presumptous
to think each item comprising a domain of items measures it
equally well.

The items will vary in difficulty as well

as their relationship to the domain.

The use of empirical

methods of item selection may enhance test characteristics
by alleviating some of the subjective judgments in the
item writing process.^
Since it is generally common practice to use con­
ventional procedures in criterion-referenced test con­
struction, the following steps for criterion-referenced
test construction can be developed, based on the work of
Rubinstein and Nassif-Royer^ and Gavin.^

46
Smith, The Effects of Various Item Selection
Methods on the Classification Accuracy and Classification
Consistency of Criterion-Referenced Instruments, . 2-3.
47Sherry Ann Rubinstein and Paula Nassif-Royer, The
Outcomes of Statewide Assessment; Implications for Curriculum
Evaluation, Paper Presented at the Annual Meeting of the
American Educational Research Association (61st, New York,
New York), 1977, p. 4.
48
Anne T. Gavin, Guide to the Development of Written
Tests for Selection and Promotion; The Content Validity
Model. Technical Memorandum 77-6^ Civil Service Commission,
Washington, D. C.: Personnel Measurement Research and
Development Center, June, 1977, p. 6.

46
Step

I:

Step

II:

Step

III:

Step

IV:

Task Analysis
Test Plan
Test Construction
Estimate Test Reliability and Content
Validity

Task Analysis
Task analysis is defined as the process of determining
the purpose and parameters of the test in terms of the
subject area and domain to be assessed. 49 An underlying
assumption involved in this definition is that the develop­
ment of the objectives and the definition of the domain to
be assessed ,can be clearly and specifically stated.

When

the criterion-referenced test is designed to evaluate
learning outcomes relative to objectives for a specific
curriculum, the likelihood for success of the task analysis
process is increased.

The reason is that the criterion-

referenced test was pioneered for use in the classroom, that
is, criterion-referenced tests are generally administered
before or after small units of instruction.^®

The greater

49 Rubinstein and Nassif-Royer, The Outcomes of
Statewide Assessment: Implications for Curriculum
Evaluation, p . 5.
50Ronald K. Hambleton and M. R. Norick, "Toward an
Integration of Theory and Method for Criterion-Referenced
Tests," Journal of Educational Measurement, 1973, 10,
159-170, in Rubinstein and Nassif-Royer, The Outcomes of
Statewide Assessment; Implications for Curriculum Evaluation,
p. 6.

47
the diversity of curricula/ the broader the task analysis
must be defined.

Diversity of curricula modifies the purpose

of task analysis to imply that the domain being defined is
one to which all students have been exposed, a "common
ground" area. 51 This appears counter-productive. The more
thoroughly defined the domain, the greater the possibility
52
of building a domain referenced test.
The closer one
comes to building a domain referenced test, the closer one
comes to constructing a test sensitive to instruction.

A

criterion-referenced test begins with a set of objectives
representing some curriculum and ends with reporting per­
formance on each of those objectives.

It should discriminate

well between mastery and non-mastery of each of the
objectives making up the curriculum of interest as opposed
to a good norm referenced test discriminating well between
examinees who have differing amounts of achievement in a
general area of interest.^

Rubinstein and Nassif-Royer, The Outcomes of
Statewide Assessment:
Implications for Curriculum
Evaluation, p. 8.
52Mehrens and Lehmann, Measurement and Evaluation
in Education and Psychology, p. 110.
53 Roudabush, Item Selection for Criterion-Referenced
Tests, p. 2.

48
Test Plan
If one wishes to travel from New York to California
by car, one has several options available.

One can randomly

strike out and hope his sense of direction is sufficient to
plot a course which will lead him to California.

One can

install a compass in his car and use it as a guide until he
finally reaches California.

In each case the probability

of reaching the destination rests on several considerations.
One would have to ask oneself if he were willing to invest
the time and money, not to mention patience, to embark on
such a journey.

The logical course of action to follow if

one wished to complete such a journey in an efficient and
effective manner would be to use a map showing the major
highways and the most direct route from New York to
California.
The construction of a test is no different than
planning a trip.

One must have a plan of action, a guide

determining the direction the test will take.
becomes the directing force for the test.
outlines, maps out the test.

The test plan

It defines,

The test plan is, indeed,

the table of specifications for the test constructor.
Using a table of specifications provides that a) only the
objectives involved in the instructional process will be
assessed, b) each objective will receive a proportional
amount of emphasis on the test in the same relation as the
emphasis placed on that objective instruction, and c) no
important objective or content area will be accidentally

49
omitted. 54

To be assured the table of specifications

will yield these provisions, a set of explicit specifications
should be observed.

The following list is a summary of

Ebel's suggestion as to what a table of specification
should contain:
1.
2.
3.
4.
5.
6.
7.

The forms of the test items to be used
The number of items of each form
The kinds of tasks the items will present
The number of tasks of each kind
The areas of content to be sampled
The number of items in each area
55
The level and distribution of item difficulty

As the level of difficulty of intellectual objectives
varies, so does the level of difficulty of test items vary.
The form of the test item becomes one of the determiners of
the level of difficulty.

The form may be of the true-false

variety, the completion (fill-in-the-blank) type, matching
one column of items to their correct response in another
56
column, or the multiple-choice method.
. The decision must
be made as to which type (form) of item is to be used.

54Mehrens and Lehmann, Measurement and Evaluation
in Education and Psychology, p. 179.
55Ebel,
1979, p. 69.
56

Essentials of Educatxonal Measurement,

Swezey and Pearlstein, Guidebook for Developing
Criterion-Referenced Tests, p. 3:14-15.

50

Matching, completion, classification types of items, and
short answer can be effectively used, but they have more
limited applicability.

The true-false form and the

multiple-choice form will measure any aspect of cognitive
educational achievement.

What is measured by the true-

false item or the multiple-choice item is determined more
by its content than its form. 57
The kinds of tasks the items will present will be
determined by the objectives as defined through the task
analysis process.

Practical constraints such as time and

cost will have a bearing on the number of items selected
58
to measure the individual objectives.
The purpose of
the test and the information desired, as well as the scope
of the area to be measured, will determine the number of
objectives to be measured.

Measuring too many objectives,

each with several items, causes the test length to increase.
Decreasing the number of items per objective effects
the reliability of the test.

The reliability of a test is

its ability to measure the same thing through repeated
administrations of the test. 59 For the estimate of
reliability to be held stable, an objective must be measured

57Ebel, Essentials of Educational Measurement, 1972, p. 103
5 8Swezey and Pearlstein, Guidebook for Developing
Criterion-Referenced Tests, p . .1:10.
^ I b i d . , p. 1:11.

51

by at least four items.

This would allow up to twenty-five

objectives to be measured.

However, varying item lengths

would realistically bring the number of objectives closer
to fifteen.
Defining the content domain becomes a definition based
on practical concerns.

The content validity of a test has

been defined as based on a hypothetical universe of
61
situations.
A "universe of situations" is the whole
collection of measurements that might have been made.

62

An attempt to define all possible situations would be
subject to severe criticism.

It would be subjective

rather than objective.

It would be prohibitively costly in

terms of human effort.

It would be unmanagably long and

detailed to the extent its usefullness would be questionable.
The result is that most criterion-referenced test are of the
"content-specified" approach on the basis of a listing of
the intended educational outcomes of the institution, a

^Rubinstein and Nassif-Royer, The Outcomes of State­
wide Assessment; Implications for Curriculum Evaluation,
p. 11.
61

Roger T. Lennon, "Assumptions Underlying the Use of
Content Validity," Readings in Measurement and Evaluation
in Education and Psychology, Edited by William A. Mehrens,
New York: Holt, Rinehart and Winston, 1976, p. 46.
62

Marsha M. Linehart, Content Validity in Behavioral
Assessment, Paper Presented at the Annual Meeting of the
American Psychological Association (84th, Washington,
D. C.), 1976 , p. 3.

52
table of specifications, or some other means of detailing
the intended content of the test.

In a criterion-referenced

test, the universe of items can be described, but not fully
defined.

The criterion-referenced test is considered to be

only illustrative of the universe and not a sample of i t . ^
For a test to be content valid, the table of specific­
ations requirement for the determination of the number of
items to be used in each of the content areas to be sampled
takes on added importance.

A factor in determining the

content validity of a test is documenting that the behaviors
demonstrated in the test constitute a representative sample
of the behaviors to be exhibited in the desired content
domain. 64 If a reading instructional program devotes
twenty percent of its presentation to structural analysis,
ten percent of its presentation to phonic analysis, sixty
percent of its presentation to the various aspects of
comprehension skills, and ten percent of its presentation
to study skills, the number of items should be appropriately
proportioned.

63
Crambert, Estimation of Validity for CriterionReferenced Tests, p. 6.
64Michigan Educational Assessment Program, Technical
Report, (Lansing, Michigan : MDE), 1977, p. 13.

53
Test Construction
At the very heart of a criterion-referenced test,
specifically, or any test in a more general sense, is the
"item," the "thing" that is scored as correct or incorrect.
It is the item which ultimately determines the content
validity of the test. 65 . It is the item which, joined with
other items, measures the educational objective, the desired
outcome toward which the learning process is being directed.
The selection of the item(s) for a test, and a criterionreferenced test in particular, is of prime importance in
the test construction process.

The match between the item

and its objective is determined by the objective.

The

specificity of the objective is the factor which determines
the restrictions placed on an item writer's freedom to alter
the original intent of the objective.

Generally,

objectives written in vague generalities give item writers
latitude to define the tasks required by the objective.
The greater the specificity of an objective, the more likely
66
will be the precision of the item which measures it.
The item which is selected for inclusion in the
final form of the test comes from an item pool.

Swezey

CC

Michigan Educational Assessment Program,
Report, (Lansing, Michigan: MDE), 1977, -. 13.

Technical

^William Mehrens, Technical Report: The Fifth Report
of the 1973-74 Michigan Educational Assessment Program.
Michigan State Department of Education, Lansing, Michigan,
1975, p. 16.

67
68
and Pearlstein,
Rubinstein and Nassif-Royer,
and
Roudabush 69 suggest the item pool comes from one of two
sources.

Either totally new items are generated by item

writers or items could be obtained from existing item pools.
Authoring original items offers the probability of a
higher degree of precision in the match between item and
objective.

A constraint placed on this approach is cost:

cost in terms of paying for the writers' time to develop
the items themselves.

Drawing items from an existing item

pool saves time and money; however, a decrease in the
precision of correspondence between the objective and the
item may cause a mismatch between the objective and the
item and require a modification of the original objective.
Once a pool of items has been established, one of two
processes may be observed in selecting which items will
be included in the test.

Items may be included through

empirical item sampling or random sampling from the item
pool.

One empirical item sampling

method represents

6 7Swezey and Pearlstein, Guidebook for Developing
Criterion-Referenced Tests, p. 1:10.
68

Rubinstein and Nassif-Royer, The Outcomes of State
wide Assessment: Implications for Curriculum Evaluation,
p . 14.
69

Roudabush, Item Selection for Criterion-Referenced
Tests, p. 3.

55
selecting items that show the greatest difference in item
difficulty computed from uninstructed-instructed samples.
The uninstructed-instructed sample consisted of two-hundredfifty-eight dental students who were administered two forms
of a 100-item test.
samples:

The data were analyzed on two types of

1) a post-instruction sample representing

instructed students, and 2) a pre and post-instruction
sample representing the full range of attainment in the
achievement domain.

The test contained both knowledge of

basic dental anatomy and a collection of items defined by
objectives of the text.

The conclusion was that tests

which are created by random sampling seem to provide the
smallest errors of measurement.^
Smith, on the other hand, suggests the use of item
selection procedures does not necessarily affect the content
validity of the instrument because the developer could
select only the most highly discriminating items and
remain with the original test plan, retaining the same
category proportions as the original item pool.

The

empirical approach to item selection may enhance the test
characteristics by alleviating some of the subjective

70
Tom Haladyna and Gale Roid, A Theoretical and
Empirical Comparison of Three Approaches to Achievement
Testing, tNew York; ERIC Document Reproduction Service,
Education 148903, May, 1978), pp. 10-i8.

56
judgments in the item writing process.

71

However, because

of the particular significance in content-referenced
measurement of the relationship between the instructional
objectives and the test content, it is necessary that the
test development procedure be designed and executed with
greater care and higher standards for consensus judgment
than are usually thought to be necessary for norm-referenced
measurement.72

Item Analysis
A characteristic of a behavioral objective has already
been identified as an observable performance in which the
learner will be engaged during the evaluation process.

The

prupose, then, of the items in a criterion-referenced test is
to measure behavior in relation to the instrumental objective.
Item analysis is a procedure designed to express the degree
or relationship between the

intent of each item and the

responses of the students to each item.

Nineteen different

statistics have been identified as having the ability to
provide quanitiative evidence of item validity. 73

71Smith, The Effects of Various Item Selection Methods
on the Classification Accuracy and Classification
Consistency of Criterion-Referenced Instruments, p. 3.
72Crambert, Estimation of Validity for CriterionReferenced Test, p. 9.
72Berk, A Consumers' Guide to Criterion-Referenced Item
Statistics, p. 2.

57
Before item analysis can be performed, the responses
74
of the students to each item must be tabulated.
Tabulation of student response to the various items yields
a variety of information.

An Index of Item Difficulty can

be computed through calculating the proportion of students
who responded to the item correctly:
Diff = X/N
where
X = the number of students responding
correctly
N = the total number of students tested

75

The result, the level of difficulty, is what has been re­
ferred to as a proportion score ("P" score) or an expression
representing the frequency of correct responses to an item,
giving the proportion of the total number of examinees tested
who answered the item correctly.

An increase in the score

indicates an easier item with a lower degree of discriminat­
ing power.

The maximum level of item discrimination occurs

with a "P" score of 0.50.

As the "P" score approaches a

perfect 1.00 or 0.00, the item becomes useless because the

74
James E. Wert, Charles O. Neidt and J. Stanley
Ahman, Statistical Methods in Educational and Psychological
Research, New York: Appleton-Century-Crofts, Inc., 1954,
p. 338.

58
level of difficulty is either extreme.
,. or none. 76
all
Estes, Colvin, and Goodwin

77

The frequency is

validated the items in

their criterion-referenced test by using Truman Kelly's
discrimination method of extreme groups.

Kelly has

demonstrated that using extreme groups, each formed by
approximately 27 percent of the total group, the ratio
of the difference in average abilitites of the groups to
the standard error of their difference is maximum. 7 8 In
so doing, Estes, et. al., used the following
D = H-L/N
where
H = the number of students in the top 27
percent who responded correctly
L = the number of students in the lower
27 percent who responded correctly
N = the number of students 27 percent
represents

76
David Magnusson, Test Theory, Trans, by Hunter Mabon,
Reading, Mass.: Addison-Wesley Publishing Company, 1967,
p. 219.
77Gary Estes, Lloyd W. Colvin and Coleen Goodwin, A
Criterion-Referenced Basic Skills Assessment Program in a
Large City School System, Paper Presented at the Annual
Meeting of the American Educational Research Association
(60th, San Francicso, California), 1976, p. 7.
78Truman Kelley, "The Selection of Upper and Lower Groups
for the Validation of Test Items," Journal of Educational
Psychology, Vol. 30, (1939), pp. 17-24, in Robert L. Ebel,
Essentials of Educational Measurement, 2nd ed., Englewood
Cliffs, New Jersey: Prentice-Hall, Inc., 1972, p. 386.

59
and selected items whose discrimination value was at least
0.20 and whose difficulty value fell between 0.40 and
79
0.80.
There is a degree of variation m the field of
measurement concerning the range of values.

Nunnally

establishes the range of difficulty values as 0.20 to
80
0.80,
while Ebel establishes the discriminating level
for test items as 0.30 and up as reasonably good and 0.40
and up as very good items.
to poor items. 81

Items below 0.29 are marginal

A very useful and frequently used statistic in item
82
anlaysis is the one which Magnusson
referred to as a
"short-cut" method which investigates differences between
extreme groups on the test and the criterion distributions
respectively.

It is the Phi Coefficient, symbolized

by "0 ".

79 Estes, Colvin and Goodwin, A Criterion-Referenced
Basic Skills Assessment Program in a Large City School
System, p. 8.
80

Nunnally, Educational Measurement and Evaluation,

p. 188.
81Ebel, Essentials of Educational Measurement, 1979,
p. 267.
82Magnusson, Test Theory, p. 198.

60
The Phi Coefficient is written

(A+B)

(C+D) A+C)

(B+D)

To validate the item, one needs to know 1) the masters
and non-masters who pass the item, and 2) the masters
83
and non-masters who did not pass the item.
The
following illustration demonstrates how the item can be
validated and the value of the information received through
the process.

The illustration is a summarization of work
84
85
by Swezey and Pearlstein
and Edmonston and Randall.

83 Swezey and Pearlstein, Guidebook for Developing
Criterion-Referenced Test, p. 5:11.
®^Ibid., pp. 5:8-9.
85Leon P. Edmonston and Robert S. Randall, A
Model for Estimating the Reliability and Validity of
Criterion-Referenced Measures, Paper Presented at the
Annual Meeting of the American Educational Research
Association (56th, Chicago, 111.), 1972, pp. 16-20.

61
Master
and
Non-Master

1

Item Number
2 3 4 5 6

P
P
F
P
P
F

P
F
P
P
F
F

P
P
P
F
P
P

P
P
P
P
F
F

P
F
P
P
F
F

Number Masters
Passed

2

2

3

3

Number Non-Masters
Passed

2

1

2

Total Number Passed

4

3

5

Student
1
2
3
4
5
6

M
M
M
NM
NM
NM

Number
Items
Passed

7

8

P
P
F
F
F
F

P
P
F
P
F
F

P
P
F
P
F
F

6
2
1

2

2

2

2

18

1

1

0

1

1

9

4

3

2

3

3

27

8
6
4

To compute the Phi coefficient for Item 4, the following
grid will be used

Item Number 4
Pass
Masters

Non-Masters

a 3

Fail
b 0

62
Substituting the values in the grid into the preceeding
formula

0 =

(3) (2) - (0) (1)_________
✓ (3+0) (1+2) (3+1) (0+2)

=

(6 ) - (1 )_________________
S

(3) (3) (4) (2)

= ______ 5_________
/ 72
5/8.485
.589

= .59

The total range of the Phi coefficient is from -1.00 through
zero to +1.00.

An item has acceptable discriminating

power if its score falls between 0.30 and 1.00. 86

It

could be concluded that the sample item above would be
acceptable for inclusion in the test.

The same computations

should be completed for each item.
While there is a lack of consensus of opinion as to
whether or not conventional methods of test construction
should be used in the construction of objective-referenced
tests, there appears to be a sufficient body of information
where the conventional methods have been used successfully.
The use of conventional methods of test construction tends

86

Swezey and Pearlstein, Guidebook for Developing
Criterion-Referenced Test, p. 5:12.

63
to identify items capable of discriminating in such a manner
so as to satisfy a purpose of criterion-referenced measure­
ment, that is, classifying individuals into mutually
exclusive categories.

Test Validity
Types of Validity
A survey of the rather extensive amount of literature
pertaining to test validity yields discussions of many
varieties of validity.

As the varieties increase, some

minor changes in interpretation begin to appear.

Lists

have been compiled which provide definitions for these
varied forms of validity.

One of these lists contains
87
ten different varieties of validity.
However, the

American Psychological Association delimits only three
kinds of validity:

1) construct validity, 2) criterion-

related validity, and 3) content valididty.

88

Construct Validity
A construct has been described as an attribute
Q g

of people assumed to be reflected in test performance.

87 Ebel, Essentials of Educational Measurement, 1972,
pp. 436-437.
88Mehrens and Lehmann, Measurement Evalaution in
Education and Psychology, p. 110.
89 Haladyna and Roid, A Theoretical and Empirical Com­
parison of Three Approaches to Achievement Testing, p. 2.

64
A construct, then, is a psychological trait.

Construct

validity is the measurement of a psychological trait, not
of the trait itself but of the presence of the trait. 90
The items in a test designed to test logic measure a
person's tendency to think logically in a given situation.
Personnel specialists have the option of either administering
a written test or require an applicant to perform the
acutual job for which the application has been made.

For

reasons of health and safety, it might not be practical to
"perform" the actual job.

In this situation the written

test would be preferable.

The test is assumed to contain

the constructs to measure the necessary attributes required
91
to perform the 3 0 b.

Criterion-Related Validity
Criterion-related validity applies to the relationship
between the scores on a test and an independent external
92
measure.
If the personnel specialist, from the above
illustration, decided on the basis of the test scores to

90Nunnally, Educational Measurement and Evaluation,
p. 31.
91Gavin, Guide to the Development of Written Tests for
Selection and Promotion: The Content Validity Model.
Technical Memorandum 77-6. p. 2.

65
employ the applicant, the personnel specialist could
determine the criterion-related validity of the test accord­
ing to the degree of success or failure of the applicant's
job performance (external criterion).

What criterion-

related validity permits the test user to do is predict.
In criterion-related validity, the aim is to determine how
well one can generalize from one score to another. 93 If
the comparison of test results is with data gathered at the
same time as the time of test administration, it is said
to have concurrent validity.

However, if the comparison

of test results is with data collected at some future
date, it becomes predictive validity. 94 In either case
(predictive validity or concurrent validity) they are both
concerned with prediction. 95
In education, measurement is primarily concerned with
achievement.

The measurement may concern itself with

assessment of student knowledge across a broad, general

93
Mehrens and Lehmann, Measurement and Evaluation
in Education and Psychology, pT 112.
94 Swezey and Pearlstein, Guidebook to Developing
Criterion-Referenced Tests, p. 7:6.
95Mehrens and Lehmann, Measurement and Evaluation
in Education and Psychology, p. 112.

66

area of study or it may concern itself with assessment of
student mastery of the goals and objectives of the course
of instruction.

In either situation, the relationship of

test content to the course content is of prime importance.
In terms of validity, this relationship is referred to as
content validity.
The American Psychological Association has stated in
its Standards for Educational and Psychological Tests that
to demonstrate the content validity of a set of test scores,
it must be shown that the behaviors demonstrated in testing
constitute a representative sample of the behaviors to
96
be exhibited in a desired performance domain.
Therefore,
there are three components to the content validity of a
test:

1) the behavior to be exhibited in the performance

domain, 2) the behavior to be demonstrated in testing, and
97
3) the strength of the relationship between the two.
The establishment of content validity is essentially
an inference of the adequacy of the sampling process.

The

inference of content validity requires a judgment that the
specified content domain has been adequately sampled by

9 6American Psychological Association, Standards for
Educational and Psychological Tests, p. 28.
97Mark D. Spool, Performing a Content Validity Study,
Paper Presented at the Annual Meeting of the Southeastern
Psychological Association {21st, Atlanta, Ga.) 1975, p. 3.

67
the test.

The issue is one of reasonable (not statistical)

representativeness.

The term "representativeness" refers

to both the types of behaviors assessed and the proportional
coverage of the different knowledge, skills, and abilities. 9 8
The establishment of content validity is an inference, but
not an ideal inference.

It is a careful judgment, based on

the test's apparent relevance to the behaviors which are
legitimately inferable from those delimited by the
criterion. 9 9 The establishement of content validity
through careful judgment requires that specific procedures
be followed to assure the accuracy of the validation process.
One model for those procedures is 1) a thorough and accurate
analysis of the content domain, 2) a review and evaluation
of the test by experts, 3) a comparison between the test
content and the instructional content to assess the extent
of the relationship between the two, and 4) document each

9 8Gavin, Guide to Development of Written Tests for
Selection and Promotion: The Content Validity Model.
Technical Memorandum 77-6, p. 4.
9 9W. James Popham and T. R. Husek,
"Implications of
Criterion-Referenced Measurement,” Journal of Educational
Measurement, 1969, 6, 1-9, in Ronald K. Hambelton and
William P.Gorth,
Criterion-Referenced Testing; Issues
and Implications, Paper Presented at the Annual Meeting
of the Northeastern Educational Research Association
(Liberty, New York), 1970, p. 14.

68
procedure of the study.’*'®®

Although not specifically

stated, there have been several studies conducted regarding
content validity which have approximated this model.

Related Studies
Tallmadge and Horst'*'®^' conducted a study related to
the validity of achievement tests and the instructional
programs used by local school districts involved in Title
I federal programs.

Their hypothesis was that not all

achievement tests are sensitive to achievement gains.
The purpose of their study was to argue against Title I
policy allowing only one standardized test to be used as
a measure of achievement gains due to the effect of Title I
assistance to children with reading difficulties.
The study analyzed the instructional programs of
Houghton-Mifflin, Ginn and Company, and Economy.

The

standardized tests were the California Achievement Test
and the Metropolitan Achievement Test.
The report indicated that a poor correlation was found
to exist between the instructional programs and the tests.

100Spool, Performing a Content Validity Study, p. 3.
Kasten Tallmadge and Donald P. Horst,
Different Achievement Tests in the ESEA Title I
System, Paper Presented at the Annual Meeting of
American Educational Research Association (62nd,
Ontarion, Canada), 1978, pp. 4-8.

The Use of
Evaluation
the
Toronto,

69
The conclusion is, it seemed highly probable that when the
content of a test shows a low correlation with the content
of a curriculum, the test will be insensitive to whatever
achievement gains the curriculum might produce.

The

conclusion further emphasized that the only valid way to
assess the effects of an instructional treatment is to
use a test that measures what is taught, a test in which
the items are samples from the same domains as the
teaching-learning exercises.
While the results of the study are founded on the
procedures to be followed in a content validation procedure,
the basic issue, and therefore, the major weakness of the
study, is the usage of conventional instructional programs
in an unconventional fashion which results in an inappropri­
ate application of the standardized tests.

The conclusion

reached, probably would have been the same had they
addressed the basic issue rather than their hypothesis.
Only the means of achieving the conclusion "might" have
been different.
The Tallmadge and Horst study reflects an attempt to
evaluate the behaviors required in the performance domain,
the behaviors to be demonstrated in testing, and the
interrelationship between the two.

It is not an easy task.

There are some features which may add to the strength of
such a study.

70
Tanenbaum and Miller 102 formulated rules to in­
corporate into their procedure to compensate for what they
felt to be deficiencies in instructional material outlines,
tests, and teaching strategies.

They devised two files:

1) showed curricula taught, and 2) showed curricula keyed
to the test.

These files were devised as a result of

finding the outlines provided by the publishers were, in
their opinion, not sufficiently precise.

These files

formed their own description of the content and the criterion
for each item.

A strategy of "near transfer" was adopted.

All features had to be represented in the curricula exactly
as they were found in the test format.

They established

the level of readability on the Dale-Chall formula.

To

compensate for the fact that not all teachers teach to the
same

degree, a word was considered taught if a pupil was

exposed twice to curricula that contained the word in
a well marked exercise.

Using these guidelines, they

conducted an evaluation of Project Information Packages
(PIPS).

A content analysis was performed to detect the

congruence between the Metropolitan Achievement Test and
six exemplary compensatory education program curricula.

102

Arlene B. Tanenbaum and Christine A. Miller, The
Use of Congruence Between the Items in a Norm-Referenced
Test and the Content m Compensatory Education Curricula in
the Evaluation of Achievement Gains, Paper Presented at the
Annual Meeting of the American Educational Research
Association (61st, New York, New York), 1977, pp. 1-10.

71
Fall-spring testing patterns (fail-pass; pass-pass;
pass-fail; fail-fail) were tallied to compare performance
on congruent and non-congruent items.

Eventually, a model

factorial design was devised to incorporate the variables
which appear to influence the patterns of achievement.
The results of the study appear very small.

The

degree of congruence appears to fall between 5 percent
and 20 percent and decreases with an increase in grade
level from grade four to grade eight.

The results show

that the amount of congruence was too small to make strong
inferences about the quality of the PIP education programs.
The merit of this study lie in its attempt to define
the domain and to compensate for the differences in teaching
strategies.

However, the addition of factor analysis

appears to have altered the results markedly.
The work of Jenkins and Pany1®"* underscores the need
for a high correlation of relationship between the behaviors
in the performance domain and those to be demonstrated in
testing.

Their research was directed toward detecting

bias in achievement tests.

To detect the extent of bias,

Jenkins and Pany studied five standardized tests and seven
first and second grade commercial reading series.

Joseph R. Jenkins and Darlene Pany, "Curriculum
Biases in Reading Achievement Test," Journal of Reading
Behavior, Vol. x, No. 4. (Winter, 1978), pp. 345-357.

72
The procedure which they used was to use publishers’
guides to determine which books were used in first and
second grade levels and teachers' manuals to compile
alphabetical

word lists for each book in the series.

Next,

alphabetized lists of all words in the standardized tests
of word recognition were prepared.

By comparing the two

lists, the extent of overlap could be established by
determining the total number of word matches per grade
level.
The results of their study indicate that expected
annual growth would vary according to which test was
administered in conjunction with which curriculum was in
use.

They concluded that it is doubtful that the use of

conventional achievement tests can provide an unbiased
estimate of a curriculum's effect, at least with regard to
the early grades.
The

significance of their work is that the com­

bination of curriculum being used and the tests which are
administered can be manipulated to affect the achievement
gain scores.

While this is an issue concerning the misues

of tests and test results, it holds a high degree of
relationship to content validity.

The level of bias was

directly proportional to the degree of congruence between
the tests and the curricula.
One aspect of the work of Jenkins and Pany is the
item-by-treatment interaction.

Their word lists were created

73
from the instructional materials.

Freeman, et. al.,

104

have completed a study of four commercial achievement tests
in elementary school mathematics.
devised a taxonomy of mathematics.

For their analysis they
The taxonomy consisted

of a classification matrix which had three dimensions:
1) mode of presentation, 2) nature of material, and
3) operation, which specified the process which was
required.
They concluded that there are striking differences
between the content

covered by the four most commonly

used standardized tests of elementary school mathematics.
They also concluded that significant discrepancies between
the content a teacher presents to students and the content
which is being tested on the standardized tests
administered are likely to exist.

These mismatches have

a negative effect on the use of standardized tests for
instructional purposes.

In order to diagnose student

strength or weaknesses or to diagnose program strength
and weaknesses, either the program must be modified or
the test must be selected with extreme care to insure
a proper match.

104 Freeman, Kuhs, Knappen, and Porter, A Closer
Look at Achievement Tests, pp. 1-10.

74
Summary
The pressure of accountability is being applied with
more intensity today than it has been in several decades.
One type of response to the pressure has been for state
Departments of Education to implement accountability
programs.

The programs have, as a major component, a

mandated assessment test.
To mandate an assessment test means that an evaluation
of someone or something will occur.

Therefore, there needs

to be a definition of the purpose of evaluation.

The

definition of the purpose of the evaluation process, as
presented in this chapter, is to describe or represent a
person.

The function of the evaluation process is to aid

in the decision-making process.

If the evaluation process

does not accomplish that function, the process is con­
sidered useless, a waste of time for both the evaluatee
and the evaluator.
The evaluation process, as it relates to education,
consists mainly of paper-and-pencil tests.

There are two

categories of tests: 1) an essay form, that is, a written
narrative, and 2) an objective form, that is a short
answer variety which does not require the student to
provide the answer completely on his own.

Within the

category of objective tests, a variety of types are
identified.

The basic distinctions, however, were between

75

the more global standardized tests and the more constricted
tailor-made tests, and whether the interpretation was to
be norm-referenced or criterion-referenced.
All good achievement tests are objective referenced.
Of particular interest is the behavioral objective and the
characteristics which make up the objective.

To write

a behavioral objective, certain attributes must be included
if the behavioral objective is to be useful and capable
of being assessed.
feasible.

An objective must be relevant and

An objective must be written with enough

specificity to limit the objective to a single task and
describe and define its intent.

An objective describes

what observable performance will be taking place during
the evaluation process.

A behavioral objective contains

three parts: 1) the performance, 2) the condition, and
3) the standard.
Although opinions differ concerning the use of
conventional methods of test construction, there appears
to be a sufficient body of information where the conven­
tional methods have been used successfully.

The use of

conventional methods of test construction tends to identify
items capable of discriminating in such a manner as to
satisfy a purpose of criterion-referenced measurement, that
is, classifying individuals into mutually exclusive
categories.

76
While many varieties of validity appear in the
literature, the American Psychological Association de­
limits only three: 1) construct validity, 2) criterionrelated validity, and 3) content validity.

Of these three,

educational assessment is primarily concerned with content
validity.

From the definition, it can be said content

validity is composed of three components:

1) the

behavior to be exhibited in the performance domain,
2) the behavior to be demonstrated in testing, and 3) the
strength of the relationship between the two.
The establishment of content validity is based on
careful judgment of the test's apparent relevance by
using a thorough and accurate analysis of the content domain
and the content of the test.
Several studies have been identified which indicate
that the relationship between the content of several widely
used instructional programs and the content of several of
the more popular standardized achievement tests is suspect.
The studies have revealed that the degree of match between
a program and a test will vary depending on which program
is matched with which test.
The significance of the related studies to the study
currently under investigation is that the present study is
attempting to establish the degree of concurrance between the
Michigan Educational Assessment Program Experimental Reading
Test (a criterion-referenced test) and five reading
instructional programs.

CHAPTER III

METHODOLOGY OF THE STUDY
The present study is based on a design that makes
possible the determination and analysis of the concepts
presented in the five reading instructional programs and
the concepts tested in the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven
as measured by the Reading Concepts Checklist, (RCC) .^

Development of the Instrument
and Its Use
The Instrument
The Reading Concepts Checklist, (RCC) , was developed
as a means of describing, within a common framework, the
concepts presented in the instructional materials and
the concepts tested in the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven.
It was recognized at the beginning of this study that
terminology and definitions would vary to some degree across

■'"Appendix A
77

78

specialists.

The goal, therefore, was to develop the

Reading Concepts Checklist, (RCC), on the basis of
conceptual consensus of agreement to insure that its terms
and definitions would have a high degree of meaning and
similarity of meaning across reading specialists and test
constructors.
The construction of the Reading Concepts Checklist,
(RCC), was based on the work of recognized authorities in
2
3
the field of reading. Cohen and Hyman, Barbe, and
4
Ekwall agree, generally, upon the major divisions of the
Reading Concepts Checklist, (RCC).

Duffy and Sherman^ use

terminology which is different, but have basically the same
g
divisions as the Reading Concepts Checklist, (RCC).

Reid's

2

Alan S. Cohen and Joan S. Hyman, Instructional
Objectives in Reading, New York: Random House, Inc., 1977,
pp. 1-8, 15-19.
^Walter B. Barbe, Personalized Reading Instruction,
9th Printing, Englewood Cliffs, New Jersey: Prentice-Hall,
Inc., 1967, pp. 142-143, 152-153, 160-161, 168-169, 182-183,
192-193, 204-205.
4
Eldon E. Ekwall, Diagnosis and Remediation of the
Disabled Reader, 2nd Printing, Boston, Mass.: Allyn-Bacon,
Inc., 1976, pp. 59-61.
5
Gerald G. Duffy and George B. Sherman, Systematic
Reading Instruction, 2nd ed., New York: Harper and Row,
1977, p. 82.
^Ethna R. Reid, Teaching Literal and Inferential
Comprehension, Salt Lake City, Utah: Cove Publishers,
1978, pp. 10-11.

79
overall structure agrees with the Reading Concepts Checklist,
(RCC); however, Reid subdivides the categories into greater
detail than that contained in the Reading Concepts Checklist,
(RCC).

The six major divisions of the Reading Concepts

Checklist, (RCC), are
1.
2.
3.
4.
5.
6.

Auditory Discrimination
Visual Discrimination
Phonic Analysis
Structural Analysis
Comprehension, and
Study Skills.

Each major category was subdivided into its predominant
categories and numerically coded for later use with a
computer program.

The Reading Concepts Checklist, (RCC),

formed the basis for two matrices:

1) the classification

of concepts presented in the instructional materials in
grades K-6, and 2) the classification of the concepts
tested in the Michigan Educational Assessment

Program

Experimental Reading Test Grades Four and Seven.

The Use of the Instrument
The matrix developed for use with the instructional
materials consisted of the Reading Concepts Checklist, (RCC) ,
being placed down the left side and the K-6 grade levels
being placed across the top.

The five instructional pro­

grams were alphabetically ordered and chronologically
numbered.

If a given program presented a concept at any

or all grade levels, the code number representing the

80

program was placed in the cell formed by the intersection
of the concept and the appropriate grade level.

To

determine which concepts were presented at the various
grade levels, each teacher's manual for each grade level
was examined in its entirety.

The process was repeated

for each of the five instructional programs.
The form of the matrix for the classification of the
concepts tested by the Michigan Educational Assessment
Program Experimental Reading Test differed from that used
for the instructional programs in that only two categories
were placed across the top of the matrix.

They were

"Grade 4" and "Grade 7."
The classification of the concepts tested by the
Michigan Educational Assessment Program Experimental
Reading Test was conducted independently by the researcher
and three reading experts.
The materials used to implement the classification
of the test's concepts consisted of the matrix, a copy of
the draft copy of the Michigan Department of Education's
7
Communication Skills Objectives: Reading, response keys
for the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven, and a copy of the
Experimental Test for grades four and seven.

7
Appendix B

81

Each of the two levels of the test consists of onehundred-forty-one items which measure six major categories
of reading skills.

The sixth category, "Positive Responses

to Reading," and the test items 126-141, are attitudinal in
nature and have no "correct" response.

Therefore, the sixth

category, and its accompanying items, were not classified
into the matrix.
1.
2.
3.
4.
5.

The other five categories are

Vocabulary Meaning
Literal Comprehension
Inferential Comprehension
Critical Reading Skills, and
Related Study Skills

The instructions given each judge were to match each
item with its stated objective, read each testitem

and

determine the nature of the task being required ofthe
examinee.

Finally, based on the above determination, the

judges were instructed to list the category, objective
and the test item number in the appropriate cell of the
matrix according to the grade level and the concept.

Each

item of the test was treated in the same manner until all
125 items had been included in the matrix by the judges.
The process was followed for each level of the test.

Selection of Instructional Materials
The process of selecting comparison materials involves
such questions as "What are the predominant instructional
programs in use in Michigan's public schools?"

and "What

combinations of those programs are used by a majority of

Michigan's Kindergarten through sixth grade students?"

In

defining the term "majority", several aspects were taken
into consideration.

First, a majority should be a clearly

definitive number, not simply "more than half."

Next,

a majority should be large enough so as to insure a popu­
lation of students large enough to be exposed to the de­
fined content domain, a domain from which a representative
test sample could conceivably be taken.

Finally, a

majority should be large enough that it represents a
reasonable cross-section of Michigan's rural, suburban,
urban, and large-city school children.

Therefore, based

on these considerations, the lower acceptable limit which
defined a majority of students using the reading instruct­
ional materials to be included in the present study was
established at seventy-five percent.
The basis for answering these questions and selecting
the reading instructional materials for this study is the
result of a 1977 national survey of reading instructors
and reading supervisors.

Market Data Retrieval, Inc. mailed

11,889 questionnaires to reading instructors and reading
supervisors.

Of that number, 2052 valid responses were
g

used which made the response rate 17.3 percent.

Although

O

Market Data Retrieval, Inc., HMCo. Market Research
Report No. 17, Reading K-8 Survey, (New York: Market Data
Retrieval, Inc., 1977), p. 97.

83
the agency was under contract to a particular publishing
company, the questionnaire appears to be free from bias
toward any publisher.

The survey results provided statistics

for both national and regional levels of the market share
captured by the several publishing companies.

The survey

revealed that the predominant reading instructional
materials used in the region which included Michigan are:
1) Ginn and Company, 2) Harcourt, Brace and Jovanovich,
3) Holt, Rinehart and Winston, 4) Houghton-Mifflin Company,
and 5) Scott, Foresman Company.

g

The survey also provided

data which satisfied the lower acceptable limit definition
of seventy-five percent of Michigan's K-6 students using
the reading instructional materials.
the percentage to be VS.Se.'*'0

The survey indicates

The following table indicates

the distribution of students using the reading instructional
materials by area.

The table does not indicate the market

share of the publishers.

It illustrates the concentration

of the publications according to the three types of areas
nationally.

^Ibid., p. 5.

84

DISTRIBUTION OF STUDENTS USING TEXTS OF THE FIVE
MAJOR PUBLISHERS BY A R E A H

Publisher

Urban

Suburban

Rural

Ginn and Company

24.9

37.9

36.2

Harcourt, Brace and
Jovanovich

24.0

34.5

41.4

Holt, Rinehart
Winston

38.4

29.5

31.8

Houghton-Mifflin
Company

23.4

26.2

49.9

Scott, Foresman
Company

11.4

39.6

47.8

The national and regional levels of information which
this survey provided permits a high degree of confidence
to be placed in the assumption that the five reading
instructional programs selected for this study do, indeed,
constitute those programs which are the predominant
programs in use in Michigan's K-6 grades and are used by
least seventy-five percent of Michigan's K-6 students.

Treatment of the Data
Due to the nature of the Michigan Educational
Assessment Program Experimental Reading Test, the data
which had been compiled in the instructional materials
classification matrix were grouped into a K-3 category to be

^ I b i d . , p. 7.

85

compared with the Grade 4 Test and a 4-6 category to be
compared with .the Grade 7 Test.

A concept was considered

presented if it appeared in the K-3 or 4-6 category.

The

matrix was then reduced to dichotomous data in either of
the K-3 or 4-6 instructional levels.

Concepts which were

presented were assigned a numerical value of "1" while
concepts which were not presented were assigned the value of
"

0"

.
The data were then punched and verified for IBM and

computer tabulation.

A separate set of data cards was

prepared for the K-3 and 4-6 levels.

The IBM card layout

used nine columns, providing for the identification of
each individual concept (3 columns); individual instructional
program concept data (5 columns).

The final column was

reserved for data pertinent to the test.

Printed IBM

listing from card data was completed to facilitate compu­
tations for further statistical tests and to recheck the
completeness and accuracy.
The compiled data from the test classification matrix
was also converted to dichotomous data.

A concept was

considered tested if one or more test items were identified
by the judges as measuring that concept.

Concepts which

were tested were assigned a numerical value of "I" while
concepts which were not tested were assigned the value of
"0".

Value assignment was based on majority agreement

among three of the four judges.

86

The data were then punched and verified for IBM and
computer tabulation.

A separate set of data cards was

prepared for the Grade 4 and Grade 7 tests.

The IBM card

layout used seven columns, providing for the identification
of each individual concept (3 columns) and invididual
judges' responses

(4 columns).

Statistical Methodology and Research Design
Research Design
A statistical test may be termed nonparametric if
it does not test a hypothesis which characterizes one of
the parameters of the parent variable of interest.

Or,

a statistical test may be termed distribution-free if the
sampling distribution of the statistic on which the test is
based is completely independent of the parent distribution
of the variable.

The two terms are imperfect synonyms and

tend to be blurred frequently.

Therefore, many statisticians
12
tend to use them interchangeably.
The research design chosen for this study falls into
the category of the nonparametric, distribution-free
statistical test model.

It is Cochran's Q Test.

Cochran's Q test is an extention of the McNemar
two-sample test and is considered appropriate in an

12

Leonard A. Marascuilo and Maryellen McSweeney,
Nonparametric and Distribution-Free Methods for the Social
Sciences, Monterey, California: Brooks/Cole Publishing
Company, p . 5.

87
experiment involving repeated observations or matched
groups where the dependent variable can take only two
values:

1) X^k = 1 if the observation for the subject "i"

under condition "k" can be termed a "success;" or
2)

= 0 if the observation for the subject "i" under

condition "k" is a "failure".

The term success is

arbitrarily applied to the outcome of interest.
role of the numerical score

The

is to assign individuals

to one of two categories.13' 14
Cochran's Q test has a distribution that is
approximately

x

2

with v = K-l degrees of freedom.

The

statistic for the test is
K (K-l) Z C 2 - (K-l) N

15

3
0 = -----------------------------

2
%X K-l

KN - ZR2
l
where
C.j

= the sum of the column values

R.i

= the sum of the row values

K

= the number of rows or subjects

N

= either the sum of the columns or the
sum of the rows as they are equal values.

13William L. Hays, Statistics for the Social Sciences,
New York: Holt, Rinehart and Winston, Inc., 1973, pp. 773,
775.
14Marascuilo and McSweeney, Nonparametric and
Distribution-Free Methods for the Social Sciences, p. 177.
15Ibid., p. 178.

88

A test of the hypothesis that the proportions of
success are the same for all treatments, or that treatment
effects are absent, can be made by rejecting Hq if:
o > x j U . 1-

16

If Hq is rejected on the basis of the hypothesis test,
it is not possible to determine the magnitude or the
direction of the difference in treatments.

Post hoc

multiple comparions of the treatment means can be used to
examine the differences among treatments more carefully.
Multiple comparisons of the treatment means can be con­
ducted through the use of the Dunn-Bonferroni inequality
test.

The use of the Dunn-Bonferroni test provides a
»
17
narrower confidence interval than the Scheffe technique.
The research design was applied to the study under
investigation in that the reading instructional programs
were considered the treatments and the reading concepts
were considered the subjects.

If an instructional program

presented a given concept, the value of "1" was assigned.
Presentation of a concept was equated with "success".
The lack of a program's presentation of a given
concept was considered a "failure" and the value of "0"
was assigned.

To be considered a "success", the concept

16Ibid.
17Ibid., p. 180.

89
had to have been presented in any of the grades K-3 to
be compared with the Experimental Reading Test Grade 4
or in any of the grades 4-6 to be compared with the
Experimental Reading Test Grade 7.

A "failure" was the

total absence of the presentation of a concept by an
instructional program in either of the appropriate levels
K-3 or 4-6.

A "success" was the presentation of a concept

by an instructional program at any grade level within the
appropriate levels of K-3 or 4-6 to be compared with the
appropriate level of the Experimental Reading Test.
The Cochran Q test was used to obtain inter-rater
reliability scores between the independent rating of the
judges.

The reliability was computed from the proportions

of the individuals1 ratings of which concepts the test items
measured.

Statistical Methodology
Statistical treatments of the data in this study were
conducted through the use of the facilities of the Computer
Laboratory, Michigan State University.

The statistical

package for the Social Sciences (SPSS) routines were
used to compute the proportions data.

The calculations

of the computer were randomly checked by performing the
statistical treatments on a mechanical calculator.
The Dunn-Bonefrroni pairwise comparisons were conducted
using a mechanical calculator to perform the statistical
treatments to examine the differences between proportions

90
scores of the instructional materials and the Experimental
Reading Test.

Summary
The Reading Concepts Checklist, (RCC), was developed
as a means of describing, within a common framework, the
concepts presented in the instructional materials and the
concepts tested in the Michigan Educational Assessment
Program Experimental Reading Test.

The Reading Concepts

Checklist, (RCC), was developed on the basis of conceptual
consensus of agreement obtained from the work of several
recognized authorities in the field of reading.

It was

formed into two matrices for the purpose of classifying
the instructional materials' presented concepts and the
Experimental Reading Tests' tested concepts.
The data were coded for IBM tabulation.

Statistical

treatments required for tests of inter-rater reliability and
the significance of the difference between the proportions
were processed through the use of the facilities of the
Computer Laboratory, Michigan State University.

The

Cochran Q test was used to compute the significance of the
difference between proportions.

The Cochran Q test was used

to obtain inter-rater reliability scores to determine the
significance of difference between judges.

The Dunn-

Bonferroni pairwise comparisons were performed to examine the
differences in significance of the proportions.

CHAPTER IV
ANALYSIS OF RELATIONSHIPS BETWEEN
VARIOUS READING PROGRAMS AND THE
MICHIGAN EDUCATIONAL ASSESSMENT PROGRAM
EXPERIMENTAL READING TEST
This chapter contains a restatement of the major
hypotheses tested, a summary of the findings, a description
and interpretation of the statistical treatment of the
data, and an evaluation of each hypothesis.
The hypotheses which are being tested are stated in
the null form and are designated by the symbol Hq .
of significance used is .05.

The level

If the probability of the

occurrence of the data is smaller than the level of signif­
icance, the data are considered contradictory to the
hypothesis and a decision is made to reject the null
hypothesis.

Rejection of the null hypothesis is regarded

as a decision to accept the research hypothesis.

A non­

rejection of the null hypothesis indicates there is no
statistical difference and signifies a rejection of the cor­
responding research hypothesis.
This chapter contains an analysis of the degree of
concurrence between the five reading instructional programs
surveyed and the relationship between each of the five
reading programs in the Michigan Educational Assessment
91

92
Program Experimental Reading Test, Grades Four and Seven,
as measured by the Reading Concepts Checklist, (RCC).

Analysis
General Hypothesis I
The first general hypothesis and fifteen operational
null hypotheses are as follows:
There will be no difference between the five reading
instructional programs in grades K-3 in the concepts they
present or between the degree of concurrence between the
concepts presented in each of the five reading instructional
programs in grades K-3 and the concepts tested by the
Michigan Educational Assessment Program Experimental Reading
Test Grade 4 as shown in the Reading Concepts Checklist, (RCC) .
Operational Hla:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional
program published by Harcourt, Brace, and Jovanovich
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational Hlb:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Holt, Rinehart, and Winston according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hlc:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Houghton-Mifflin Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).

93
Operational Hid:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Scott, Foresman Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hie:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Holt, Rinehart, and
Winston according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational Hlf:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational Hlg:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Hlh:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Holt, Rinehart, and Winston and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .

94
Operational Hli:
There will be no difference between the concepts
presented in the K-3 reading instructional orogram published
by Holt, Rinehart and Winston and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hlj:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Houghton-Mifflin Company and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hlk:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Ginn and Company and the
concepts tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 4 according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hll:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Harcourt, Brace and
Jovanovich and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational Him:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Holt, Rinehart and
Winston and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).

95
Operational Hln:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Houghton-Mifflin Company
and the concepts tested by the Michigan Educational Assess­
ment Program Experimental Reading Test Grade 4 according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational Hlo:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4 accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).

Summary of Hypothesis I Results

1.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC),
show a significant degree of mismatch between each K-3 read­
ing instructional program and the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
(Table 1).
2.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC),
show a significant degree of mismatch between the K-3
reading instructional programs (Table 1).
3.

Pairwise comparisons using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC), show a significant degree of

96

mismatch between each reading instructional program and the
Michigan Educational Assessment Program Experimental Reading
Test Grade 4 (Table 2).
4.

Pairwise comparisons using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC), show no statistical difference
between Ginn and Company and 1) Harcourt, Brace and Jovanovich,
2) Holt, Rinehart, and Winston, and 3) Houghton-Mifflin
Company? show no statistical difference between Harcourt,
Brace

and Jovanovich and Holt, Rinehart, and Winston? show

no statistical difference between Holt, Rinehart, and Winston
and Houghton-Mifflin Company? show no statistical difference
between Houghton-Mifflin Company and Scott, Foresman Company
at the K-3 reading instructional program level (Table 2).
5.

Pairwise comparisons using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC) , show a significant degree of
mismatch between Ginn and Company and Scott, Foresman
Company? show a significant degree of mismatch between
Harcourt, Brace and Jovanovich and 1) Houghton-Mifflin
Company and 2) Scott, Foresman Company? show a significant
degree of mismatch between Holt, Rinehart, and Winston and
Scott, Foresman Company at the K-3 reading instructional
program level (Table 2).

97
6.

An analysis of the findings of this study indicates

a strong lack of concurrence between each reading instruct­
ional program and the Michigan Educational Assessment Program
Experimental Reading Test Grade 4,

Differences are

apparent between the reading instructional programs in the
total category score but are less apparent when pairwise
comparisons are performed.
7.

The overall findings related to the degree of

concurrence between the K-3 reading programs surveyed and
each K-3 reading program and the Michigan Educational
Assessment Program Experimental Reading Test Grade 4,
as measured by the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC), indicate a
lack of concurrence between the Michigan Educational
Assessment Program Experimental Reading Test Grade 4,
and each of the K-3 reading programs.

The relationship

of the Reading Concepts Checklist, (RCC), to the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4, will be analyzed in detail following the results
of Hypothesis II.

Statistical Tests and Treatments
The Cochran Q test, utilizing a Chi-square

2

(x )

distribution, was used to test the significance of the
observed differences between the proportion of matches

98
and mismatches across the Reading Concepts Checklist, (RCC) .
The limits within which the hypotheses will be accepted
and outside of which they will be rejected are predicated
on the .05 level of significance.

The x

2

values which cut

off 2.5 percent of the area in each tail of the x

2

distribution provide the measure of the difference between
the proportion scores. The Q statistic will be numerically
2
larger than the x distribution when the null hypotheses are
not true.
The null hypothesis will not be rejected if the

x

2

value is greater than the .05 level of significance (p > .05).
The region of rejection for the null hypothesis is defined
by the confidence limits,

(.025, .975).

When very strong

rejections of the null hypotheses occur, higher probability
levels for rejecting the null hypotheses are given, for
example: p < .01 or p < .001.

99
Table 1.

Summary of the total proportion scores of the
matches and mismatches of the K-3 reading
instructional programs and the Michigan
Educational Assessment Program Experimental
Reading Test Grade 4 as measured by the 103
concepts contained in the Reading Concepts
Checklist, (RCC)-1

Program

Matches

Mismatches

Proportion

Ginn and Company

89

14

.8641

Harcourt, Brace
Jovanovich

90

13

.8932

Holt, Rinehart
and Winston

89

14

.8641

Houghton-Mifflin
Company

72

31

.6990

Scott, Foresman
Company

70

33

.6796

Test-Grade 4

25

78

.2427

Summaries of the results of the statistical treatments
are presented in the following sections.

Additional data

are included in the appendices and referred to as necessary
in the analysis of the results.
The determination of whether observed differences in
the total proportion scores indicates the degree of con­
currence is of major interest.

Additional examination and

analysis is concerned with the degree of concurrence

^See Appendices C and D for additional statistical data.

100
between the K-3 reading programs surveyed and each of the
K-3 reading instructional programs and the Michigan
Educational Assessment Program Experimental Reading Test,
Grade 4.
Table 2.

Interval estimate of the multiple comparison
of proportion scores for the K-3 reading
programs and the Exterimental Reading Test,
Grade 4.

1
1
2

2

3

4

5

0

.1651

.1845

.6214

.0291

.1942

.2136

.6505

.1651

.1845

.6214

.0194

.4563

-.0291

3
4
5

Key:

t4

c.i.a
±.1764

.4368

1 = Ginn and Company
2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman Company
T 4 - Michigan Educational Assessment Program Experimental
Reading Test Grade 4
aConfidence Interval

101

Results and Evaluation of Statistical Treatment
Total Proportion Scores
In order to determine the degree of concurrence between
the K-3 reading programs surveyed and between each of the
K-3 reading instructional programs and the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4, the total proportion scores which appear in
Table 3 between each of the K-3 reading programs and the
Experimental Test Grade 4, were compared by means of the
Cochran Q test.

Based on the significant difference in

total proportion scores, the null hypothesis:
There will be no difference between the five reading
instructional programs in grades K-3 in the concepts they
present or between the degree of concurrence between the
concepts presented in each of the five reading instructional
programs and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4 as
shown in the Reading Concepts Checklist, (RCC).
is rejected; therefore, the research hypothesis that there
is a significant statistical difference between the K-3
reading instructional programs and between each of the K-3
reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 4, as shown in the
Reading Concepts Checklist, (RCC), is accepted.
This difference indicates a significant lack of
concurrence between each of the K-3 reading instructional
programs and the Michigan Educational Assessment Program
Experimental Reading Test Grade 4, and a lack of concurrence

102

between the K-3 instructional programs.

The difference

does not indicate the magnitude nor the direction of the
difference.

Table 3.

Score
Total
Matches

Differences in total proportion scores of the
K-3 instructional programs and the Michigan
Educational Assessment Program Experimental
Reading Test Grade 4.2

1 2

3

4

5 T.

Programs
and Test

89 90 89 72 70 25

Programs
Only

89 90 89 72

Key:

1
2
3
4
5

70

Q

D.P.

P

162.4435

5

p < .001 - S

31.9865

4

p < .001 - S

S indicates a level of significance between
proportion scores at a minimum of P< .05.
P <.001 represents higher levels of significance
than minimum required.
= Ginn and Company
= Harcourt, Brace and Jovanovich
= Holt, Rinehart, and Winston
= Houghton-Mifflin Company
= Scott, Foresman Company

2
See Appendix D for additional statistical data.

103

Pairwise Comparison Scores
Table 4 contains the values of the pairwise comparison
of the means of the proportion scores between Ginn and
Company and 1)

Harcourt, Brace and Jovanovich, 2) Holt,

Rinehart, and Winston, 3) Houghton-Mifflin Company, 4)
Scott, Foresman Company K-3 reading instructional programs,
and 5) the Michigan Educational Assessment Program Experimental
Reading Test Grade 4.

On the basis of the lack of a

significant statistical difference between the means of the
proportions scores, the following null hypotheses are
accepted.
Operational Hla:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading Instructional program
published by Harcourt, Brace and Jovanovich according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC) .
Operational Hlb:
There will be no difference between the concepts
presented in the K-3 reading instructional progarm published
by Ginn and Company and the K-3 reading instructional
program published by Holt, Rinehart, and Winston according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational Hlc:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Houghton-Mifflin Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC) .

104
The corresponding research hypotheses that a significant
statistical difference exists are rejected.
A significant statistical difference between the means
of the proportion scores is evident and the following null
hypotheses are rejected:
Operational Hid:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational Hlk:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Ginn and Company
and the concepts tested by the Michigan Educational Assess­
ment Program Experimental Reading Test Grade 4 according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
The corresponding research hypotheses, then, are accepted.
The values in Table 5 of the pairwise comparison of the
means of the proportion scores between Harcourt, Brace and
Jovanovich and 1) Holt, Rinehart, and Winston, 2) HoughtonMif flin Company, 3) Scott, Foresman Company, and 4) the
Michigan Educational Assessment Program Experimental Reading
Test Grade 4, yield a non-significant statistical difference
between the means of the proportion scores.
following null hypothesis is accepted.

Thus, the

105
Operational Hie:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Holt, Rinehart, and Winston
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
and the corresponding research hypothesis that significant
statistical difference exists is rejected.
Interval estimate of the multiple comparison
of proportion scores for the K-3 reading
programs and the Experimental Reading Test
Grade 4.

Harcourt, Brace and
Jovanovich
Holt, Rinehart,
and Winston

Ginn and Company

-.0291

o

Publisher

•
H
•

Table 4.

NS

0

NS

Houghton-Mi ff1in
Company

.1651

NS

Scott, Foresman
Company

.1845

S

Experimental Reading
Test, Grade 4

.6214

S

NS
S

±.1764

indicates a non-significant statistical difference
between the means of the proportion scores.
indicates statistically significant difference between
the means of the proportion scores at a minimum
of p < .05.

106
The

occurrence of a significant statistical difference

in the means of the proportion scores forms the basis for
rejecting the following null hypotheses:
Operational Hlf:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational Hlg:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Hll:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4 according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC) .
and, conversely, the basis for accepting the corresponding
research hypotheses that significant statistical differences
do exist.
The pairwise comparison values in Table 6 of the means
of the proportion scores between Holt, Rinehart, and Winston
and Houghton-Mifflin Company fail to illustrate a significant
statistical difference.

Therefore, the following null

hypothesis is accepted and its research hypothesis stating
the existence of a significant statistical difference is re­
jected.

107
Operational Hlh:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Holt, Rinehart and Winston and the K-3 reading instruct­
ional program published by Houghton-Mifflin Company accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Table 5.

Interval estimate of the pairwise comparion of
proportion scores between Harcourt, Brace and
Jovanovich and three K-3 reading programs and
the Experimental Reading Test Grade 4.

Publisher

Harcourt Brace
and Jovanovich

C .I .

Holt, Rinehart,
and Winston

.0291

NS

Houghton-Mifflin
Company

.1942

S

Scott, Foresman
Company

.2136

S

Experimental Reading
Test Grade 4

.6505

S

NS
S

±.1764

indicates a non-significant statistical difference
between the means of the proportion scores.
indicates statistically significant difference between
the means of the proportion scores at the minimum of
p < .05.

However, the means of the proportion scores in Table 6
exhibit a significant statistical difference between Holt,
Rinehart, and Winston and 1) Scott, Foresman Company and
2) the Michigan Educational Assessment Program Experimental
Reading Test Grade 4.

As a result of these significant

108
differences, the research hypotheses that significant
statistical differences exist are accepted and the following
null hypotheses are rejected:
Operational Hli:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Holt, Rinehart, and Winston and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Him:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Holt, Rinehart,
and Winston and the concepts tested by the Michigan Educat­
ional Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Table 6.

Interval estimate of the pairwise comparison of
proportion scores between Holt, Rinehart, and
Winston and the two K-3 reading programs and the
Experimental Reading Test Grade 4.

Holt, Rinehart,
and Winston

Publisher
Houghton-Mifflin
Company
Scott, Foresman
Company
Experimental Reading
Test Grade 4
NS
S

indicates
the means
indicates
the means
p < .05.

C.I.

.1651

NS

.1845

S

.6214

S

±.1764

a non-significant statistical difference between
of the proportion scores.
statistically significant difference between
of the proportion scores at a minimum of

109
Table 7 presents the results of the pairwise comparison
of the means of the proportion scores between HoughtonMif flin Company and Scott, Foresman Company K-3 reading
programs.

The values reveal the lack of a significant

statistical difference.

Based on the results of the

comparison score, the following hypothesis is accepted:
Operational HIj:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Houghton-Mifflin Company and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Because the above hypothesis is accepted, the corresponding
research hypothesis advocating the existance of a significant
statistical difference is rejected.
The relationship between the K-3 reading program
published by Houghton-Mifflin Company and the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4, is also presented in Table 7 in the form of the
means of the proportion scores.

The values of the means of

the proportion scores indicate a significant statistical
difference exists.

Therefore, the research hypothesis

declaring the existence of a significant statistical
difference is accepted and the following null hypothesis is
rejected:

110

Operational Hln:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Houghton-Mifflin
Company and the concepts Tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4,
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Table 7.

Interval estimate of the pairwise comparison of
proportion scores between Houghton-Mifflin
Company and Scott, Foresman Company K-3 reading
programs and the Experimental Reading Test
Grade 4.

Publisher

Houghton-Mifflin
Company

C .I .

Scott, Foresman
Company

.0194

NS

Experimental Reading
Test Grade 4

.4563

S

NS
S

±.1764

indicates a non-significant statistical difference
between the means of the proportion scores.
indicates statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

On the basis of a significant statistical difference,
Table 8, between Scott, Foresman Company K-3 reading
program and the Michigan Educational Assessment Program
Experimental Reading Test Grade 4, the null hypothesis:

Ill
Operational Hlo:
. There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program experimental Reading Test Grade 4,
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
is rejected and the research hypothesis that a significant
statistical difference exists is accepted.

Table 8.

Interval estimate of
of proportion scores
program published by
and the Experimental

Publisher
Experimental Reading
Test Grade 4
S

the pairwise comparison
between the K-3 reading
Scott, Foresman Company
Reading Test Grade 4.

Scott, Foresman
Company

.4369

C.I.

S

±.1764

indicates statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

Table 9 contains a summary of the values of the pair­
wise comparisons of the means of the proportion scores
between the K-3 reading instructional programs and between
each of the K-3 reading programs and the Michigan Educational
Assessment Program Experimental Reading Test Grade 4.

The

table contains information indicating the level of
significance regarding whether or not the value is
statistically significant, the variance between the proportion

112

scores, and the "psi" value which indicates the confidence
limits beyond which rejection of the null hypothesis occurs.
Table 9.

1
1

Summary of the interval estimate of the pairwise
comparisons of the means of the proportion scores
between the K-3 reading programs and each of the
K-3 reading programs and the Experimental Reading
Test Grade 4.

2
-.0291a

2

3
0a
.0291a

3

4

5

T.4

.1651a

.1845

.6214

.1942

.2136

.6505

.1651a

.1845

.6214

4

.0194a .4563

5

.4369

Key:

1 = Ginn and Company

C.I.
±.1764

Var. = .0036
p < .05

2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman Company
T 4 = Michigan Educational Assessment
Program Experimental Reading Test Grade 4
aNon-significant Statistical Difference

1.

The data contained in Table 9 clearly support the

research hypotheses that significant statistical difference
exists between each of the K-3 reading programs and the
Michigan Educational Assessment Program Experimental Reading

113
test Grade 4, according to the proportion of matches and
3

mismatches across the Reading Concepts Checklist, (RCC).
2.

The data contained in Table 9 indicate a non­

significant statistical difference exits between the K-3
reading programs published by Ginn and Company and Harcourt,
Brace and Jovanovich; Ginn and Company and Holt, Rinehart,
and Winston? Holt, Rinehart, and Winston and HoughtonMif flin Company; and Houghton-Mifflin Company and Scott,
Foresman Company.

Therefore, the null hypotheses are

accepted and the corresponding research hypotheses that
such a statistical difference exists are rejected.
3.

The data contained in Table 9 support the research

hypotheses that significant statistical difference exists
between the K-3 reading program published by Ginn and Company
and Scott, Foresman Company; Harcourt, Brace and Jovanovich
and Houghton-Mifflin Company; Harcourt, Brace and Jovanovich
and Scott, Foresman Company; and Holt, Rinehart, and Winston
and Scott, Foresman Company according to the proportion of
matches and mismatches across the Reading Concepts Checklist,
(RCC); therefore, the null hypotheses that there will be no
differences between the concepts presented in the K-3
reading instructional programs are rejected.

3See Appendices D and E for additional statistical
data.

114
The data which have been analyzed have been concerned
with the proportion of matches and mismatches between the
K-3 reading programs surveyed and between each of the K-3
reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 4.

The proportion

scores have involved the total proportion scores based on
the 103 concepts contained in the Reading Concepts Checklist,
(RCC).

From the data contained in Table 9, additional

analysis of data which was statistically non-significant
was deemed unnecessary.

Additional analysis of the

statistically significant data was conducted.

The additional

analysis was conducted to determine the areas in which the
K-3 instructional programs differed from each other and the
Grade 4 test.

To determine the areas of difference, the

data contained in the Reading Concepts Checklist, (RCC),
were analyzed according to the major categories.
The data presented in Table 10 add additional support
that the null hypotheses:
Operational Hid:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Ginn and Company and the K-3 reading instructional program
published by Scott, Foresman Company according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).

115
Operational Hlk:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Ginn and Company and the
concepts tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 4 according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational Hlf:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational Hlg:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Harcourt, Brace and Jovanovich and the K-3 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Hll:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading
Test Grade 4 according to the proportion of matches and
mismatches across the Reading Concepts Checklist, (RCC) .
Operational Hli:
There will be no difference between the concepts
presented in the K-3 reading instructional program published
by Houghton-Mifflin Company and the K-3 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC),

116
Operational Him:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Holt, Rinehart,
and Winston and the concepts tested by the Michigan Educat­
ional Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational Hln:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3 reading
instructional program published by Houghton-Mifflin Company
and the concepts tested by the Michigan Educational Assess­
ment Program Experimental Reading Test Grade 4 according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational Hlo:
There will be no difference in the degree of
concurrence between the concepts presented in the K-3
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 4
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
are rejected and the research hypotheses that significant
statistical difference exists between the K-3 reading
instructional programs and each of K-3 reading programs
and the Michigan Educational Assessment Program Experimental
Reading Test Grade 4, are accepted.

117
Table 10

1

Interval estimate of the multiple comparison of
proportion scores for the K-3 reading programs
and the Experimental Reading Test Grade 4 by
individual categories in the Reading Concepts
Checklist, (RCC).4

2

3
Category:

1
2
3
4
5

0a

0a
0a

1
2
3
4
5

T4

C.I .

Vocabulary Development

.1667
.1667
.1667

Category:
1
2
3
4
5

5

4

.6667
.6667
.6667
.50

.3334
.3334
.3334
.1667
-.3333

±.0882

Inferential Comprehension

-.0583a .0583a .2353
0a
.2941
.2941

.3530
.4118
.4118
.1177

.4118
.4707
.4706
.1765
.0588a

Category: Study Skills
.2727
.5455
.1818 .0909 .3637
.0909a .3637
-.0909a .1891
.2728
.1818
.4546
-, Q910a .1818
.2728

+.1017

±.1211

Continued
4
See Appendices D and E for additional statistical data.

118
The null hypotheses are rejected at the 0.05 level.
Higher levels are indicated.
aNon-significant Statistical Difference.
Key:

1 = Ginn and Company
2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman Company
T 4 = Michigan Educational Assessment Program
Experimental Reading Test Grade 4

Analysis
General Hypothesis II
The second general hypothesis and fifteen operational
null hypotheses are as follows:
There will be no difference between the five
reading instructional programs in grades 4-6 in the concepts
they present or between the degree of concurrence between
the concepts presented in each of the five reading
instructional programs in grades 4-6 and the concepts tested
by the Michigan Educational Assessment Program Experimental
Reading Test Grade 7 as shown in the Reading Concepts
Checklist, (RCC) .
Opeational H2a:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Harcourt, Brace and Jovanovich according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC) .

119
Operational H2b:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Holt/ Rinehart, and Winston according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2c:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Houghton-Mifflin Company according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2d:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2e:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Holt, Rinehart, and
Winston according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2f:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2g:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).

120

Operational H2h:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2i:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2j:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Houghton-Mifflin Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational H2k:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6 read­
ing instructional program published by Ginn and Company and
the concepts tested by the Michigan Educational Assessment
Program Experimental Reading Test Grade 7 according to the
proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).
Operational H21:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7 according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC) .

121

Operational H2m:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Holt, Rinehart,
and Winston and the concepts tested by the Michigan Educa­
tional Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2n:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Houghton-Mifflin
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2o:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).

Summary of Hypothesis II Results
1.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC),
show a significant degree of mismatch between each of the
4-6 reading programs surveyed and the Michigan Educational
Assessment Program Experimental Reading Test Grade 7,
(Table 11).
2.

The total proportion scores of the matches and mis­

matches across the Reading Concepts Checklist, (RCC) , show

122

a significant degree of mismatch between the 4-6 reading
programs (Table 11).
3.

Pairwise comparisons, using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC), show a significant degree of
mismatch between each of the reading programs and the
Michigan Educational Assessment Program Experimental
Reading Test Grade 7, (Table 12).
4.

Pairwise comparisons, using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC) , show no statistical difference
between Ginn and Company and Holt, Rinehart, and Winston?
Harcourt, Brace and Jovanovich and Holt, Rinehart and
Winston; Houghton-Mifflin Company and Scott, Foresman
Company (Table 12).
5.

Pairwise comparisons, using mean scores of the

proportions of matches and mismatches across the Reading
Concepts Checklist, (RCC), show a significant degree of
mismatch between Ginn and Company and 1) Houghton-Mifflin
Company and 2) Scott, Foresman Company; show a significant
degree of mismatch between Harcourt, Brace and Jovanovich
and 1) Houghton-Mifflin Company and 2) Scott, Foresman
Company; show a significant degree of mismatch between
Holt, Rinehart, and Winston and 1) Houghton-Mifflin Company
and 2) Scott, Foresman Company (Table 12).

123
6.

An analysis of the findings of this study indicates

a strong lack of concurrence between each of the reading
programs surveyed and the Michigan Educational Assessment
Program Experimental Reading Test Grade 7.

Differences are

apparent between the reading programs in the total category
score but are less apparent when pairwise comparions
are performed.
7.

The overall findings related to the degree of

concurrence between the 4-6 reading instructional programs
surveyed and each of the 4-6 reading programs and the
Michigan Educational Assessment Program Experimental
Reading Test Grade 7, as measured by the proportion of
matches and mismatches across the Reading Concepts Checklist,
(RCC), indicate the lack of concurrence between the Michigan
Educational Assessment Program Experimental Reading Test
Grade 1, and each of the 4-6 reading programs.

The

relationship of the Reading Concepts Checklist/ (RCC),
to the Michigan Educational Assessment Program Experimental
Reading Test Grade 7, will be analyzed in detail following
the results of Hypothesis II.

124
Table 11.

Summary of the total proportion scores of the
matches and mismatches of the 4-6 reading
instructional programs and the Michigan
Educational Assessment Program Experimental
Reading Test Grade 7, as measured by the 103
concepts contained in the Reading Concepts
Checklist, (RCC).5

Program

Matches

Mismatches

Proportion

Ginn and Company

85

18

.8252

Harcourt, Brace
and Jovanovich

85

18

.8252

Holt, Rinehart,
and Winston

87

16

.8447

Hough ton-Mi ff1in
Company

53

50

.5147

Scott, Foresman
Company

67

36

.6505

Test-Grade 7

28

75

.2718

5
See Appendices P and G for additional statistical
data.

125

Table 12.

1
1

Interval estimate of the multiple comparison
of proportion scores for the 4-6 reading
programs and the Experimental Reading Test
Grade 7.

2
0

2

3

4

5

-.0195

.3105

.1747

.5534

-.0195

.3105

.1747

.5534

.3300

.1942

.5729

-.1358

.2429

3
4

t7

5
Key:

•

1

Ginn and Company

2

Harcourt, Brace and Jovanovich

C.I.
±.1714

3787

3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman Company
T7 = Michigan Educational Assessment Program
Experimental Reading Test Grade! 7.

Statistical Test and Treatment
The Cochram Q test, utilizing a Chi-square distribution,
was used to test the significance of the observed difference
between the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).

The level of significance

to determine whether the null hypotheses were rejected or not

126
rejected was the .05 level.

The null hypotheses will be

accepted if the Chi-square value is greater than the .05
level of signficance (p > .05), indicating concurrence
between the 4-6 reading programs surveyed and each of the
reading programs and the Michigan Experimental Reading
Test Grade 7.

The Q statistic will be numerically larger

than the Chi-square distribution when the null hypotheses
are not true, indicating a lack of concurrence between the
4-6 reading programs and each of the reading programs and
the Michigan Experimental Reading Test Grade 7.

The full

tests and techniques described and used in analyzing
Hypothesis I are used to analyze Hypothesis II.

Results and Evaluation of Statistical Treatment
Total Proportion Scores
In order to assess the degree of concurrence between
the 4-6 reading instructional programs surveyed and each of
the reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 7, the total
proportion scores of the 4-6 reading programs and the
Experimental Test Grade 7, were compared by means of the
Cochran Q test.
Based on the significant difference in total proportion
scores, Table 13, the null hypothesis:

127
There will be no difference between the five
reading instructional programs in grades 4-6 in the concepts
they present or between the degree of concurrence between
the concepts presented in each of the five reading instruc­
tional programs in grades 4-6 and the concepts tested by
the Michigan Educational Assessment Program Experimental
Reading Test Grade 7, as shown by the Reading Concepts
Checklist, (RCC).
is rejected; therefore, the research hypothesis that there
is a significant statistical difference between the 4-6
reading instructional programs surveyed and each of the
reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 7, as shown in
the Reading Concepts Checklist, (RCC), is accepted.
This difference indicates a significant lack of concurrence
between the 4-6 reading programs and the Michigan Educa­
tional Assessment Program Experimental Reading Test Grade 7,
and a lack of concurrence between the 4-6 reading instruc­
tional programs.

The difference is not indicative of

the magnitude nor the direction of the difference.

Pairwise Comparison Scores
The magnitude and the direction of the difference
in total proportion scores between the 4-6 reading
programs and each of the 4-6 reading program and the
Michigan Educational Assessment Program Experimental Reading
Test Grade 7, was determined through the use of the DunnBonferroni pairwise comparisons technique.

128
Table 13.

Score
Total
Matches

Differences in total proportion scores of the
4-6 reading programs and the Michigan Educational
Assessment Program Experimental Reading Test
Grade 7.6

1

2

3

4

5

T7

Q

D.F.

P

Programs
and Test

85

85

87

53

67

28

153.224

5

p < .001S

Programs
Only

85

85

87

53

67

64.579

4

p < .001S

P

< .001 represents higher level of significance than
minimum.

S

indicates a level of significance between proportion
scores at a minimum of p < .05.

Key:

1 = Ginn and Company
2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman and Company
T7 = Michigan Educational Assessment Program
Experimental Reading Test Grade 7.

Table 14 contains the values of the pairwise comparison
of the means of the proportion scores between Ginn and
Company and 1) Harcourt, Brace and Jovanovich, 2) Holt,
Rinehart, and Winston, 3) Houghton-Mifflin Company,

®See Appendices F and G for additional statistical data.

129
4) Scott', Foresman Company, and 5) the Michigan Educational
Assessment Program Experimental Reading Test Grade 7.
The lack of a significant statistical difference between
the means of the proportion scores results in the
following null hypotheses being accepted:
Operational H2a:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Harcourt, Brace and Jovanovich
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Operational H2b:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Holt, Rinehart, and Winston according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
and the corresponding research hypotheses that a significant
statistical difference exists are rejected.
However, the significant statistical difference between
the means of the proportion scores for the null hypotheses:
Operational H2c:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by the Ginn and Company and the 4-6 reading instructional
program published by Houghton-Mifflin Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC)
Operational H2d:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional program
published by Scott, Foresman Company according to the pro­
portion of matches and mismatches across the Reading Concepts
Checklist, (RCC).

130
Operational H2k:
There will be no difference in the degree of ,
concurrence between the concepts presented in the 4-6
reading instructional program published by Ginn and Company
and the concepts tested by the Michigan Educational Assess­
ment Program Experimental Reading Test Grade 7, according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC) ,
are rejected and the corresponding research hypotheses
stating a difference exists between the concepts presented
by the reading program published by Ginn and Company and
1) Houghton-Mifflin Company and 2) Scott, Foresman Company
and a difference exists between the concepts presented by
the Ginn and Company 4-6 reading program and the concepts
tested by the Michigan Experimental Reading Test Grade 7,
are accepted.
The pairwise comparison of the means of the proportion
scores, Table 15, between Harcourt, Brace and Jovanovich
and Holt, Rinehart, and Winston 4-6 reading programs failed
to indicate a significant statistical difference.

The

non-sigifnicant statistical difference indicates the
value is within the confidence interval.

Therefore, the

research hypothesis stipulating a significant statistical
difference exists is rejected and the following null
hypothesis is accepted:

131
Operational H2e:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Holt, Rinehart, and
Winston according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC).
Table 14.

Interval estimate of pairwise comparison of
proportion scores between Ginn and Company
and four 4-6 reading programs and the Experimental
Reading Test Grade 7.

Publisher

Ginn and Company

C.I.

Harcourt. Brace
and Jovanovich

0

NS

Holt, Rinehart,
and Winston

-.0195

NS

Houghton-Mif f1in
Company

.3105

S

Scott, Foresman
Company

.1747

S

Experimental Reading
Test Grade 7

.5534

S

NS
S

±.1714

indicates a non- significant statistical difference
indicates statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

However, differences in the means of the proportion
scores, Table 15, between the 4-6 reading programs published
by Harcourt, Brace and Jovanovich and 1) Houghton-Mifflin
Company and 2) Scott, Foresman Company exceeded the level of

132
probability.

Furthermore, the differences in the means of

the proportions between the concepts presented in the 4-6
reading program published by Harcourt, Brace and Jovanovich
and the concepts tested by the Michgian Educational
Assessment Program Experimental Reading Test Grade 7, are
statistically significant and justify rejecting the
following null hypotheses:
Operational H2f:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational H2g:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC) .
Operational H21:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6 reading
instructional program program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7, according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC).
Therefore, the corresponding research hypotheses declaring
the existence of significant statistical differences are
accepted.
The pairwise comparison values, Table 16, of the means
of the proportions scores between the 4-6 reading programs

133
of Holt, Rinehart, and Winston and 1) Houghton-Mifflin
Company and 2) Scott, Foresman Company and 3) the concepts
tested by the Michigan Educational Assessment Program
Experimental Reading Test Grade 7, illustrate a significant
statistical difference.
Table 15.

Therefore, the following null

Interval estimate of the pairwise comparison of
proportion scores between Harcourt, Brace and
Jovanovich and three 4-6 reading programs and
the Experimental Reading Test Grade 7.

Publishers
Holt, Rinehart
and Winston

Harcourt, Brace
and Jovanovich

C.I.

-.0195

NS

Houghton-Mifflin Company

.3105

S

Scott, Foresman Company

.1747

S

Experimental Reading
Test Grade 7

.5534

S

NS
S

±.1714

indicates a non-significant statistical difference.
indicates a statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

hypotheses are rejected and their research hypotheses
claiming a statistical difference

exists are accepted:

Operational H2h:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading instruc­
tional program published by Houghton-Mifflin Company accord­
ing to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).

134
Operational H2i:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading instruc­
tional program published by Scott, Foresman Company according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2m:
There will be no difference in the degree of
concurrence between the concepts presented in 4-6 reading
instructional program published by Holt, Rinehart, and
Winston and the concepts tested by the Michgian Educational
Assessment Program Experimental Reading Test Grade 7,
according to the proportion of matches and mismatches
across the Reading Concepts Checklist, (RCC).
Table 16.

Interval estimate of the pairwise comparison of
proportion scores between Holt, Rinehart, and
Winston and two 4-6 reading programs and the
Experimental Reading Test Grade 7.

Publisher

Holt, Rinehart,
and Winston

C.I.

Houghton-Mifflin
Company

.3300

S

Scott, Foresman
Company

.1942

S

Experimental Reading
Test Grade 7

.5729

S

S

±.1714

indicates statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

Table 17 presents the results of the pairwise comparison
of the means of the proportion scores between HoughtonMif flin Company and Scott, Foresman Company 4-6 reading

135
programs.

The values reveal the lack of a significant

statistical difference.

Based on the results of the

comparison score, the following hypothesis is accepted:
Operational H2j:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Houghton-Mifflin Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Because the above hypothesis is accepted, the corresponding
research hypothesis advocating the existence of a significant
statistical difference is rejected.
The relationship between the 4-6 reading program
published by Houghton-Mifflin Company and the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7, is also presented in Table 17 in the form of the
means of the proportion scores.

The values of the means

of the proportion scores indicate a significant statistical
difference exists.

Therefore, the research hypothesis

declaring the existence of a significant statistical
difference is accepted and the following null hypothesis is
rejected:
Operational H2n:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Houghton-Mifflin
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7,
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).

136
Table 17.

Interval estimate of the pairwise comparions of
proportion scores between Houghton-Mifflin
Company and Scott, Foresman Company 4-6 reading
programs and the Experimental Reading Test
Grade 7.

Publisher
Scott, Foresman
Company
Experimental Reading
Test Grade 7
NS
S

Houghton-Mifflin
Company

C.I.

-.1358

NS

.2429

S

±.1714

indicates a non-significant statistical difference.
indicates statistically significant difference between
the means of the proportion scores at a minimum of
p < .05.

A significant statistical difference between the
pairwise comparison of the means of the proportion scores
shown in Table 18 negates the following null hypothesis:
Operational H2o:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7, accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC),
and justifies accepting the corresponding research hypothesis
which states a difference exists in the degree of con­
currence between the concepts presented in the Scott, Foresman
Company 4-6 reading program and the concepts tested by the

137
Michigan Educational Assessment Program Experimental
Reading Test Grade 7.
Table 18.

Interval estimate of the pairwise comparison of
proportion scores between the 4-6 reading program
published by Scott, Foresman Company and the
Experimental Reading Test Grade 7.

Scott, Foresman
Company

Publishers
Experimental Reading
Test Grade 7
S

.3787

C.I.

S

±.1747

indicates statistically significant difference between
the mean of the proportion scores as a minimum of
p < .05.

Table 19 contains a summary of the values of the pair­
wise comparisons of the means of the proportion scores be­
tween the 4-6 reading programs surveyed and each of the 4-6
reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 7.

The table

contains information indicating the significance level as to
whether or not the value is statistically significant, the
variance between proportion mean scores, and the “psi"
value which indicates the confidence limits beyond which
rejection of the null hypothesis occurs.
1.

The data contained in Table 19 clearly support the

research hypotheses that significant statistical difference
exists between each of 4-6 reading programs and the Michigan

138
Educational Assessment Program Experimental Reading Test
Grade 7, according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC).^
Table 19.

1
1

Summary of the interval estimate of the pairwise
comparisons of the mean of the proportion scores
between the 4-6 reading programs and each of the
4-6 reading programs and the Experimental Read­
ing Test Grade 7.

2

3
0a

2

4

5

-.0195a

.3105

.1747

.5534

-.0195a

.3105

.1747

.5534

.3300

.1942

.5729

3
4

-.1358a .2429

5

.3787

Key:

C.I.

T7

1 = Ginn and Company

±.1714

Var. = .0036
p < .05

2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Comapny
T

5 = Scott, Foresman Company
7 = Michigan Educational Assessment Program
Experimental Reading Test Grade 7.

aNon-significant Statistical Difference.

7
See Appendices G and H for additional statistical data.

139
2.

The data contained in Table 19 indicate a non­

significant statistical difference exists between the 4-6
reading programs published by Ginn and Company and 1)
Harcourt, Brace and Jovanovich, and 2) Holt, Rinehart, and
Winston; indicate a non-significant statistical difference
exists between Harcourt, Brace and Jovanovich and Holt,
Rinehart, and Winston; indicate a non-signficanct statistical
difference exists between Houghton-Mifflin Company and
Scott, Foresman Company.

Therefore, the null hypotheses

indicating there would be no difference are accepted and
the corresponding research hypotheses indicating a difference
would exist are rejected.
3.

The data contained in Table 19 support the research

hypotheses that significant statistical difference exists
between the 4-6 reading programs published by Ginn and
Company and 1) Houghton-Mifflin Company and 2) Scott,
Foresman Company; significant statistical difference exists
between Harcourt, Brace and Jovanovich and 1) HoughtonMifflin Company and 2) Scott, Foresman Company; significant
statistical difference exists between Holt, Rinehart, and
Winston and 1) Houghton-Mifflin Company and 2) Scott,
Foresman Company according to the proportion of matches and
mismatches across the Reading Concept Checklist, (RCC);
therefore, the null hypotheses that there will be no
difference between the concepts presented in the 4-6 read­
ing instructional programs are rejected.

140
The data analyzed were concerned with the proportion
of matches and mismatches between the 4-6 reading instruc­
tional programs surveyed and each of the 4-6 reading
programs and the Michigan Educational Assessment Program
Experimental Reading Test Grade 7.

The proportion scores

have involved the total proportion scores based on the 103
concepts contained in the Reading Concepts Checklist/ (RCC).
From the data presented in Table 19, additional analysis
of data which were statistically non-significant was
deemed unnecessary.

Additional analysis of the statistically

significant data was conducted.

The additional analysis

was conducted to determine the areas in which the 4-6
reading programs differed from each other and the Grade 7
Experimental Reading Test.

To determine the areas of

difference, the data contained in the Reading Concepts
Checklist, (RCC), were anlayzed according to the major
categories.
The data presented in Table 20 add additional support
that the null phyotheses:
Operational H2c:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Houghton-Mifflin Company according to
the proportion of matches and mismatches across the Reading
Concepts Checklist, (RCC).

141
Operational H2d:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Ginn and Company and the 4-6 reading instructional
program published by Scott, Foresman Company according to
the proportion of matches and mismatches across the Reading
Concept Checklist, (RCC).
Operational H2k:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Ginn and Company
and the concepts tested by the Michigan Educational Assess­
ment Program Experimental Reading Test Grade 7 according
to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2f:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational H2g:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Harcourt, Brace and Jovanovich and the 4-6 reading
instructional program published by Scott, Foresman Company
according to the proportion of matches and mismatches
across the Reading Concept Checklist, (RCC).
Operational H21:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Harcourt, Brace
and Jovanovich and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7 according to the proportion of matches and mis­
matches across the Reading Concepts Checklist, (RCC).

142
Operational H2h:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading instruc­
tional program published by Houghton-Mifflin Company
according to the proportion of matches and mismatches across
the Reading Concepts Checklist, (RCC).
Operational H2i:
There will be no difference between the concepts
presented in the 4-6 reading instructional program published
by Holt, Rinehart, and Winston and the 4-6 reading instruc­
tional program published by Scott, Foresman Company accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2m:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Holt, Rinehart,
and Winston and the concepts tested by the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7 according to the Proportion of matches and mis­
matches across the Reading Concepfcs Checklist, (RCC) .
Operational H2n:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Houghton-Mifflin
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7 accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).
Operational H2o:
There will be no difference in the degree of
concurrence between the concepts presented in the 4-6
reading instructional program published by Scott, Foresman
Company and the concepts tested by the Michigan Educational
Assessment Program Experimental Reading Test Grade 7 accord­
ing to the proportion of matches and mismatches across the
Reading Concepts Checklist, (RCC).

143
are rejected and the corresponding research hypotheses
that significant statistical difference exists between the
4-6 reading instructional programs and each of the 4-6
reading programs and the Michigan Educational Assessment
Program Experimental Reading Test Grade 7, are accepted.
Table 20.

1

Interval estimate of the multiple comparison
of proportion scores for the 4-6 reading pro­
grams and the Experimental Reading Test Grade
7, by individual categories in the Reading
Concepts Checklist/ (RCC).®

2

3

4

Category:
1

-.1250a

-.1875

.750

2

.0625a .8750

3

.9375

4

5

t

7

c

.i .

Phonic Analysis
-.1250a

.8125

0.00a

.9375

±.1441

.0625a 1.00
-.8750

5

.0625a
.9375

Category:

1

o.ooa

2
3
4

Structural

Analysis

0.00a

.3636

.2727

.8182

0.00a

.3636

.2727

.8182

.3636

.2727

.8182

-.0909a

.4546

5

±.1247

.5455

aNon-Significant Statistical Difference.

Continued

O

See Appendices G and H for additional statistical data.

144
Table 20.

1

Continued

2

3

Category:
1

.0769a

0.00a

4

5

T?

C.I.

Literal Comprehension
.4616

.3077

.6154

2

-.07693 .3847

.2308

.5385

3

.4616

.3077

.6154

-.1539

.1538

4
5

±.1176

.3077
Category:

1

0.00a

Inferential Comprehension

-.0588a .1765

.3530

.5294

2

-.0588a .1765

.3530

.5294

3

.2353

.4118

.5882

.1765

.3529

4
5

+.0929

.1764

The null hypotheses are rejected at the 0.05 level.
levels are indicated.
aNon-Significant Statistical Difference
Key: 1 = Ginn and Company
2 = Harcourt, Brace and Jovanovich
3 = Holt, Rinehart, and Winston
4 = Houghton-Mifflin Company
5 = Scott, Foresman Company
T

7 = Michigan Educational Assessment Program
Experimental Reading Test Grade 7.

Higher

145

INTER-RATER RELIABILITY
CLASSIFICATION OF TESTED CONCEPTS
The validity model upon which this study is based called
for a review and an evaluation of the test by a panel of
experts.

The purpose of the review and evaluation by the

experts was to determine the relationship of the Michigan
Educational Assessment Program Experimental Reading Test
Grades Four and Seven to the Reading Concepts Checklist/
(RCC).

What concepts contained in the RCC were being

measured by the Michigan Educational Assessment Program
Experimental Reading Test?

The establishment of this

relationship provided the basis for the comparison of the
Michigan Experimental Reading Test to the five reading
programs.
The review and evaluation was conducted independently
by a panel of three reading experts and the researcher.
An inter-rater reliability study was performed to establish
the strength of the relationship between the judges'
classifications of the test items.

Summary of Inter-Rater Reliability Tests
1.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC),
show a higher degree of agreement among the judges for the
Grade 7 Test than the Grade 4 Test (Table 21).

146
2.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC),
show a strong positive relationship among the judges'
classification of the items of the Michigan Experimental
Reading Test Grade 4 (Table 21).
3.

The total proportion scores of the matches and

mismatches across the Reading Concepts Checklist, (RCC) ,
show a strong positive relationship among the judges'
classification of the items of the Michigan Experimental
Reading Test Grade 7 (Table 21).
4.

The findings of the judges' rating indicate the

fourth grade Michigan Educational Assessment Program
Experimental Reading Test failed to measure any portion
of the Reading Concepts Checklist, (RCC), subcategories
of "Auditory Discrimination," "Visual Discrimination,"
and "Phonic Analysis," and the seventh grade test completely
omitted measuring the subcategory of "Phonic Analysis."
5.

An analysis of the findings of the inter-rater

reliability study indicates a strong positive agreement
among the judges.

The non-significant statistical

difference between the ratings of the judges eliminated
the need for further analysis.
6.

The overall findings related to the inter-rater

reliability study indicate the judgments related to the
concepts tested by the Michigan Educational Assessment

147
Program Experimental Reading Test Grades Pour and Seven
can be validity compared with the concepts presented by
the five reading instructional programs according to the
Reading Concepts Checklist, (RCC).

Statistical Tests and Treatments
The Cochran Q Test, compared to a Chi-square dis­
tribution, was used to test the significance of agreement
among the judges between the proportion of matches and
mismatches across the Reading Concepts Checklist, (RCC).
The limits within which the significance of agreement will
be accepted and beyond which it will be unacceptable are
based on the .05 level of significance.

The Q statistic

will be numerically large with the level of agreement is
low.

The level of inter-rater reliability will be

accepted when the Chi-square value is greater than the
.05 level of significance (p > .05).

The region of re­

jection is defined by the confidence limits, (.025, .975).

Results and Evaluation of Statistical Treatment
In order to determine the relationship of the Michigan
Educational Assessment Program Experimental Reading Test
Grades Four and Seven to the Reading Concepts Checklist,
(RCC), the total proportion scores of the judges were
compared by means of the Cochran Q Test (Table 21).

Based

148
on the lack of a significant statistical difference in
total proportion scores, it is accepted that there is strong
positive agreement among the independent ratings of the
judges and that their judgments may be compared to the
five reading instructional programs according to the
Reading Concepts Checklist, (RCC).

Table 21.

Inter-rater reliability total proportion
scores for the Experimental Test Grades
4 and 7.

1

2

3

4

Grade 4
Matches
Mismatches

28
75

26
77

25
78

24
79

2.8378

3

p > .05

Grade 7
Matches
Mismatches

27
76

29
74

29
74

27
76

1.3333

3

p > .05

D.F.

Q

P

The findings of the test indicate the ratings of the
judges show a greater proportion of the concepts contained
in the Reading Concepts Checklist,

(RCC ) , are not measured

by the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven

than the proportion of

concepts which are measured by the Michigan Experimental
Reading Test Grades Four and Seven.

CHAPTER V

SUMMARY, CONCLUSIONS, IMPLICATIONS
AND RECOMMENDATIONS
This chapter contains a brief summary of the study's
purpose, procedures,
conclusions.

limitations, major findings and

Implications of the study and recommendations

specifically associated with the data presented are also
included.

Summary
Purpose and Major Hypotheses
This study is an attempt to establish the degree of
concurrence between the concepts measured by the Michigan
Educational Assessment Program Experimental Reading Test
for grades four and seven and the concepts presented in the
most widely used reading instructional programs used in
Michigan.

This study is designed to analyze and compare the

concepts tested in the Michigan Educational Assessment
Program Experimental Reading Test according to the proportion
of matches and mismatches across the Reading Concepts
Checklist, (RCC) .

Also included in the purpose of this study
149

150
is the degree of concurrence between each of the reading
instructional programs.

Achieving the purpose of this

study also requires a review and evaluation of the Michigan
Educational Assessment Program Experimental Reading Test
by a panel of reading experts.
Two major hypotheses were formulated concerning the
degree of concurrence between the reading instructional
programs and between each of the reading instructional
programs and the Michigan Educational Assessment Program
Experimental Reading Test.
1.

The major hypotheses are:

There will be no difference between the five read­

ing instructional programs in grades K-3 in the concepts
they present or between the degree of concurrence between
the concepts presented in each of the five reading
instructional programs in grades K-3 and the concepts tested
by the Michigan Educational Assessment Program Experimental
Reading Test Grade 4 as shown in the Reading Concepts
Checklist/ (RCC).
2.

There will be no difference between the five

reading instructional programs in grades 4-6 in the concepts
they present or between the degree of concurrence between
the concepts presented in each of the five reading instruc­
tional programs in grades 4-6 and the concepts tested by
the Michigan Educational Assessment Program Experimental
Reading Test Grade 7 as shown in the Reading Concepts Check­
list, (RCC).

151
Selection of Instructional Materials
A statistical analysis comparing the concepts presented
by the reading instructional programs to the concepts tested
by the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven, requires data from
all levels of the reading instructional programs.

The

reading instructional programs used in this study provided
1) data from grades K-3 to be compared with the Michigan
Educational Assessment Program Experimental Reading Test
Grade 4; 2) data from grades 4-6 to be compared with the
Michigan Educational Assessment Program Experimental Read­
ing Test Grade 7; 3) reading concepts to which a majority of
Michigan's K-6 students are exposed; and 4) assurance that
the K-6 students using these programs represent a reasonable
cross-section of Michigan's rural, suburban, urban, and
large-city school children.

The reading instructional

programs selected for this study were chosen on the basis
of a national survey of K-8 reading teachers and supervisors
by an independent research organization.

Instrumentation and Data Collection
The Reading Concepts Checklist, (RCC), was developed as
a means of describing, within a common framework, the con­
cepts presented in the reading instructional materials and
the concepts tested in the Michigan Educational Assessment

152
Program Experimental Reading Test.

Its six major divisions,

subdivided into nine major categories, contain 103 concepts.
The Reading Concepts Checklist, (RCC), was developed on the
basis of conceptual consensus of agreement to insure a high
degree of meaning and similarity of meaning across reading
specialists and test constructors.

The Reading Concepts

Checklist, (RCC), formed the basis of two matrices: 1) the
classification of concepts presented in the reading
instructional materials in kindergarten through grade six,
and 2) the classification of the concepts tested in the
Michigan Educational Assessment Program experimental
Reading Test Grades Four and Seven.
The data from the reading instructional materials were
collected through surveying the sixty-five teachers'
manuals of the five reading instructional programs.

Each

concept presented in the manual by a specific program was
recorded in the matrix for the classification of instruc­
tional materials in the cell connecting the appropriate
grade level and Reading Concepts Checklist, (RCC), concept.
The data from the Michigan Educational Assessment
Program Experimental Test Grades Four and Seven were
collected through a review and evaluation of the test by
a panel of reading experts.

The panel matched the test

items with their stated objectives, published by the Michigan
Department of Education, and recorded the items in the
Reading Concepts Checklist, (RCC), matrix for classification

153
of tested concepts in the cell connecting the appropriate
grade level of the test and the Reading Concepts Checklist,
(RCC), concept.
Concepts which were identified as being in either the
reading instructional programs or the Michigan Experimental
Reading Test were assigned the value of "1", while the
missing concepts were assigned the value of "0".

Treatment of the Data and Analysis
Achievement of the objectives set forth in this study
required the determination of the significance between the
observed differences between the proportion of matches and
mismatches across the Reading Concepts Checklist, (RCC) .
The nonparametric Cochran Q Test, compared to a Chi-square
distribution, was used to test the significance between
the observed differences between the proportion of matches
and mismatches across the Reading Concepts Checklist, (RCC) .
The second statistical step was the determination of
the magnitude and direction of the significance of the
difference between the proportion scores.

Multiple

comparisons of the means of proportion scores were conducted
through the use of the Dunn-Bonferroni pairwise comparison
technique.
The Cochran Q Test was employed to determine the level
of reliability and degree of inter-rater agreement of the
panel of reading experts.

154
The data were scored and coded for IBM tabulation and
processed on a high-speed computer.

Statistical treatments

of the data in this study were conducted through the use of
the facilities of the Computer Laboratory, Michigan State
University.

Scope and Delimitations of the Study
1.

The study is delimited to the degree of concurrence

between the concepts presented in the reading instructional
programs and between each of the programs and the concepts
measured by the Michigan Educational Assessment Program
Experimental Reading Test Grades Four and Seven as measured
by the Reading Concepts Checklist, (RCC).
2.

The study treats the concepts contained in the

Reading Concepts Checklist, (RCC), as the defined content
domain of the domain of reading concepts.

The concepts

are not intended to be inclusive.
3.

The study treats the concepts presented in the

selected reading instructional programs as those concepts
to which a majority of Michigan K-6 students are exposed
and are not interpreted as having been taught.
4.

The conclusions and implications of this study

regarding instructional programs are not interpreted to
indicate the quality of the programs, merely their
differences.

155
Major Findings
1.

The Reading Concepts Checklist, (RCC), findings

indicate that, according to pairwise comparison scores for
four K-3 reading instructional programs (Ginn and Company;
Harcourt, Brace, and Jovanovich; Holt, Rinehart, and Winston;
Houghton-Mifflin Company), concurrence between each of the
K-3 reading instructional programs and the Michigan Educa­
tional Assessment Program Experimental Reading Test Grade
4, is lacking in a significant degree in all nine subcate­
gories of the Reading Concepts Checklist, (RCC), (see
Appendix E).
2.

The Reading Concepts Checklist, (RCC), findings

indicate concurrence between Scott, Foresman Company
K-3 reading program and the Michigan Educational Assessment
Program Experimental Reading Test Grade 4, is lacking
in a significant degree in eight subcategories of the
Reading Concepts Checklist, (RCC), while concurrence is
present in a significant great degree in subcategory VII:
Inferential Comprehension.
3.

According to pairwise comparison scores , the

Reading Concepts Checklist, (RCC), findings indicate that
concurrence between the K-3 reading instructional programs
is present in a significant degree between Ginn and Company
and Harcourt, Brace and Jovanovich; between Ginn and Company
and Holt, Rinehart, and Winston; between Holt, Rinehart, and

156
Winston and Houghton-Mifflin Company; and between
Houghton-Mifflin Company and Scott, Foresman Company (see
Table 9).
4.

The findings, however, indicate a lack of con­

currence in a significant degree between Ginn and Company
and Scott, Foresman Company; between Harcourt, Brace and
Jovanovich and Houghton-Mifflin Company; between Harcourt,
Brace and Jovanovich and Scott, Foresman Company; and
between Holt, Rinehart, and Winston and Scott, Foresman
Company K-3 reading programs
5.

(see Table 9).

The Reading Concepts Checklist, (RCC), findings

indicate that according to pairwise comparison scores for
six subcategories,

(I; "Auditory Discrimination," II:

"Visual Discrimination," III: "Phonic Analysis," IV:
"Structural Analysis,"
VII:

VI:

"Literal Comprehension,"

"Critical Comprehension"), concurrence between the K-3

reading instructional program is present in a significant
degree in all five reading instructional programs

(see

Appendix D ) .
6.

The Reading Concepts Checklist, (RCC) , findings

indicate that according to pairwise comparison scores for
three subcategories,

(V:

"Vocabulary Development,"

VII: "Inferential Comprehension," and IX: "Study Skills"),
concurrence between the K-3 reading instructional

programs

is lacking between Ginn and Company and Houghton-Mifflin
Company; between Ginn and Company and Scott, Foresman Company;

157
between Harcourt, Brace and Jovanovich and Houghton-Mifflin
Company; between Harcourt, Brace and Jovanovich and Scott,
Foresman Company; between Holt, Rinehart, and Winston
and Houghton-Mifflin Company; between Holt, Rinehart, and
Winston and Scott, Foresman Company; between HoughtonMif flin Company and Scott, Foresman Company.

The findings

further indicate, according to pairwise comparison scores,
concurrence is lacking in a significant degree for the
subcategory IX: "Study Skills" between Ginn and Company
and Harcourt, Brace and Jovanovich K-3 reading programs
(see Table 10).
7.

The Reading Concepts Checklist, (RCC), findings

indicate that two major divisions, I: "Auditory Discrimina­
tion" and II: "Visual Discrimination," were neither taught
in the 4-6 reading instructional programs nor tested in
the Michigan Educational Assessment Program Experimental
Reading Test Grade 7, leaving four major divisions with
seven subcategories in the Reading Concepts Checklist,
(RCC) .
8.

The Reading Concepts Checklist, (RCC), findings

indicate that according to pairwise comparison scores for
four 4-6 reading instructional programs, Ginn and Company;
Harcourt, Brace and Jovanovich; Holt, Rinehart, and Winston;
Scott, Foresman Company, concurrence between the Michigan
Educational Assessment Program Experimental Reading Test
Grade 7 and the reading programs is lacking to a significant

158
degree in all seven subcategories of the Reading Concepts
Checklist, (RCC), (see Appendix H).
9.

The findings of the Reading Concepts Checklist,

(RCC), indicate that# according to pairwise comparions
scores for Houghton-Mifflin Company's 4-6 reading program,
concurrence between the Michigan Educational Assessment
Program Experimental Reading Test Grade 7 and the reading
program is lacking in five subcategories of the Reading
Concepts Checklist, (RCC), while concurrence is present
in a significantly greater degree in the subcategories III:
"Phonic Analysis" and V: "Vocabulary Development"

(see

Appendix H ) .
10.

According to pairwise comparison scores, the

Reading Concepts Checklist, (RCC), findings indicate that
concurrence between each of the 4-6 reading instructional
programs is present in a significantly greater degree
between Ginn and Company and Harcourt, Brace and Jovanovich;
between Ginn and Company and Holt, Rinehart, and Winston;
between Harcourt,Brace and Jovanovich and Holt,
and Winston;

Rinehart,

and between Houghton-Mifflin Company and Scott,

Foresman Company (see Table 19).
11.

The findings also indicate a lack of concurrence

in a significant degree between Ginn and Company and
Houghton-Mifflin Company; between Ginn and Company and
Scott, Foresman Company; between Harcourt, Brace and
Jovanovich and Houghton-Mifflin Company; between Harcourt,

159
Brace and Jovanovich and Scott, Foresman Company; between
Holt, Rinehart, and Winston and Houghton-Mifflin Company;
and between Holt, Rinehart, and Winston and Scott, Foresman
Company (see Table 19).
12.

The findings of the pairwise comparison scores

in three subcategories, V:
VII:

"Vocabulary Development,"

"Critical Comprehension,"

and IX:

"Study Skills,"

indicate concurrence between the 4-6 reading instructional
programs is present in a significant

degree

in

all

five reading programs (see Appendix G ) .
13.

The Reading Concepts Checklist, (RCC), findings

indicate that scores in three subcategories, IV:
Analysis,"

VI:

"Literal Comprehension,"

"Structural

and VII:

"Inferential Comprehension," concurrence between the 4-6
reading instructional programs is lacking in a significant
degree between Ginn and Company and Houghton-Mifflin Company;
between Ginn and Company and Scott, Foresman Company; between
Harcourt, Brace and Jovanovich and Houghton-Mifflin Company;
between Harcourt, Brace and Jovanovich and Scott, Foresman
Company; between Holt, Rinehart, and Winston and HoughtonMif f lin Company; and between Holt, Rinehart, and Winston
and Scott, Foresman Company.

The findings further indicate

that in two subcategory scores, VI:
and VII:

"Literal Comprehension,"

"Inferential Comprehension," there is a lack of

concurrence between Houghton-Mifflin Company and Scott,
Foresman Company 4-6 reading programs (see Table 19 and
Appendix G ) .

160
14.

The findings of the inter-rater reliability study

indicate a strong positive relationship among the judges
regarding the relationship of the concepts being tested
by the Michigan Educational Assessment Program Experimental
Reading Test Grades 4 and 7, as measured by the Reading
Concepts Checklist, (RCC).

Conclusions
The findings of the empirical study of the degree of
concurrence between the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven
and the selected K-6 reading instructional programs as
measured by the Reading Concepts Checklist, (RCC ) , can be
evaluated from several perspectives.

A major concern of

the analysis was to test the total proportional measurement
of the content domain.

A second major concern of this study

was the investigation of the relationships between the K-6
reading instructional programs.

A final component of this

study involved the use of a panel of reading experts to
review and evaluate the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven
and perform an inter-rater reliability test to measure the
strength of the relationship of their judgments.

All three

components of this study are interrelated and will be
evaluated in terms of their significant interrelationships.

161
Relationships Between Michigan Experimental Reading
Test and K-6 Reading Instructional Programs
1.

The predominant aspect of the results is the

lack of concurrence between the Michigan Educational
Assessment Program Experimental Reading Test Grades Four
and Seven and the selected K-6 reading instructional programs
as shown by the total pairwise comparison scores, and the
pairwise comparison scores of the individual categories
contained in the Reading Concepts Checklist, (RCC).

This

lack of congruence between the concepts presented in the
K-6 reading instructional programs and the concepts tested
in the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven show the Michigan
Educational Assessment Program Experimental Reading Test
Grades Four and Seven not to be content valid.

Relationship Between Inter-rater Reliability
Study to the Michigan Educational Assessment
Program Experimental Reading Test Grades 4 and 7
2.

There is agreement among the independent judges

pertaining to which concepts contained in the Reading
Concepts Checklist, (RCC), are being measured by the
Michigan Educational Assessment Program Experimental Reading
Test Grades Four and Seven.

This demonstrates the reliability

of the data which were recorded in the Reading Concepts
Checklist, (RCC), and compared with the reading instructional
programs.

The reliability study shows more concepts

162
contained in the Reading Concepts Checklist# (RCC), were
not measured proportionally by the Michigan Educational
Assessment Program Experimental Reading Test Grades Four
and Seven than were measured.

Therefore, the Michigan

Educational Assessment Program Experimental Reading Test
Grades Four and Seven does not fulfill the requirement
of constituting a representative sample of the behaviors
to be exhibited in the desired performance domain.
3.

The agreement among the independent judges that

the Michigan Educational Assessment Program Experimental
Reading Test Grade Four and Seven leaves large portions of
the Reading Concepts Checklist, (RCC), unmeasured shows that
the Michigan Educational Assessment Program Experimental
Reading Test Grades Four and Seven, is insensitive to
instruction, based upon those reading programs reviewed in
this study.

Relationships Between the K-6
Reading Instructional Programs
4.

The major feature of the results of all of the

statistical tests concerning the data regarding the concepts
presented in the K-6 reading instructional programs is
that the five instructional programs may be classified as
belonging to one of two groups: 1) Ginn and Company?
Harcourt, Brace and Jovanovich; and Holt, Rinehart, and
Winston and 2) Houghton-Mifflin Company and Scott, Foresman

163
Company.

The differences between reading instructional

programs apparently reflect differences in program emphasis
or in philosophical approaches to the teaching of reading.
5.

The major differences between the two groups of

K-3 reading instructional programs are in categories
V,

(Vocabulary Development), VII,

and IX,

(Study Skills).

(Inferential Comprehension),

These differences may indicate a

high degree of variation in student performance on the
Michigan Educational Assessment Program Experimental Reading
Test Grade Four.
6.

The major differences between the two groups of

4-6 reading instructional programs are in categories III,
(Phonic Analysis), IV,

(Structural Analysis), VI,

Comprehension), and VII,

(Literal

(Inferential Comprehension).

A

result of these differences may be a high degree of variation
in student performance on the Michigan Educational Assess­
ment Program Experimental Reading Test Grade Seven.
7.

The results of the analyses provide confirmation

of the expected relationship between the K-6 reading
instructional programs and the Michigan Educational Assess­
ment Program Experimental Reading Test Grades Four and Seven.
The total proportion scores confirm the absence of
congruence between the K-6 reading instructional programs
and the Michigan Experimental Reading Test.

The results of

the inter-rater reliability study establish the relationship
between the Reading Concepts Checklist, (RCC) , and the read­
ing instructional programs.

164
8.

The results indicate that according to scores of

each of six divisions and nine individual categories, con­
currence between the Michigan Educational Assessment Program
Experimental Reading Test Grade Four and Seven, and the
selected K-6 reading instructional programs is absent to
a significant degree.

Implications
The findings of the study are based on data collected
through surveying the five reading instructional programs'
sixty-five teachers' manuals.

The five reading programs

were selected on the basis of a 1977 national survey of
reading instructors and reading specialists.

The survey

revealed that the predominant reading materials used in the
region which includes Michigan are 1) Ginn and Company,
2) Harcourt, Brace and Jovanovich, 3) Holt, Rinehart, and
Winston, 4) Houghton-Mifflin Company, and 5) Scott, Foresman
Company.

Indicating the percentage to be 75.86, the survey

also provided data which satisfied the lower acceptable
limit definition of seventy-five percent of Michigan's K-6
students using the reading materials.
The findings indicate significant differences between
what the Michigan Educational Assessment Program Experimental
Reading Test Grade Four and Seven presumes to test and the
concepts presented in the selected K-6 reading instructional

165

programs.

Some explanations for these findings are given

in the implications which follow.
1.

Some may assume that the Michigan Educational

Assessment Program Experimental Reading Test Grade Four and
Seven is a fourth or seventh grade test and tests curriculum
from those grades.'*'

Since the tests are administered

during the initial weeks of the school year for fourth and
seventh grade students, the tests are a measure of the
preceding grades.
The findings of the study indicate large blocks of the
Reading Concepts Checklist, (RCC), are not measured by
the Michigan Educational Assessment Program Experimental
Reading Test, Category III,

(Phonic Analysis), while the

reading instructional programs emphasize this decoding skill.
A major consideration:

if this area is not measured by

the Michigan Experimental Test, does the failure of a student
to achieve the goal established for successfully completing
the test's tasks for inferential comprehension signal
faulty comprehension skills?

Or, does the fault rest with

the test for not measuring a representative sample of the
concepts presented in the reading instructional programs?
The first major implication is that the Michigan Educational

The Michigan Department of Education has stressed,
however, that the fourth and seventh grade tests are measures
of learning in the preceding grades.

166
Assessment Program Experimental Reading Test Grades Four
and Seven is not sensitive to the curriculum.
2.

It was recognized early in this study that the

accountability movement has brought public pressure to bear
upon boards of education and educators at all levels and
in various capacities of education.

The Michigan Educational

Assessment Program Experimental Reading Test is symbolic
of one of the responses to the movement.

It might be assumed

by educators or boards of education that the results of
the Michigan Educational Assessment Program Experimental
Reading Test are a reflection of the quality of education
within the local district.

The findings of this study

indicate the Michigan Educational Assessment Program
Experimental Reading Test is not an accurate measure of the
effectiveness of the local curriculum.

The implication is

that before boards of education make decisions about
curricular effectiveness on the basis of scores achieved on
the Michigan Educational Assessment Program Experimental
Reading Test, additional data concerning the effectiveness
of the curriculum needs to be assembled.
3.

Building administrators and classroom teachers

should exercise caution in attempting to assess the needs
of the building or the individual classroom on the basis
of the Michigan Educational Assessment Program Experimental
Reading Test's results.

The findings of this study that

the Michigan Experimental Test is not sensitive to the
curriculum indicate the success or failure of a student on

167
the Experimental Test is not an indication of the student's
achievement.

Reprogramming to meet the presumed needs of

the student may well be inappropriate and uncessary, if
not potentially an impediment to student progress.
4.

Some may suggest the lack of congruence between

the Michigan Educational Assessment Program Experimental
Resting Test, Grade Four and Seven, and each of the reading
instructional programs results from the Experimental Reading
Test's measurement of minimal performance objectives.

The

implication is that the reading instructional programs are
so comprehensive in nature that the test can not fit the
reading programs.
The measurement of minimal performance levels neither
eliminates the requirement that the test be a "representative
sample" of the content domain nor its obligation to measure
the essential elements of the content domain.

If decoding

is an essential reading skill, presented by the reading
programs and not measured by the Experimental Reading Test,
it can not be stated with certainty that the Experimental
Reading Test's measurement of minimal performance levels
is a measurement of essential minimal performance levels.

Recommendations
Michigan Educational Assessment Program
Experimental Reading Test Grades Four and Seven
1.

It is recommended that the Michigan Department of

Education undertake a complete revision of the Michigan

168
Educational Assessment Program Experimental Reading Test
Grades Four and Seven.

The Michigan Educational Assessment

Program Experimental Reading Test should be redeveloped
on the Basis of item-by-treatment interaction where the
items of the test are in direct proportion to the concepts
presented in the reading instructional programs.
2.

It is recommended that the Michigan Department of

Education engage the services of a nationally known panel
of reading experts to review and evaluate the revised version
of the Michigan Educational Assessment Program Experimental
Reading Test to establish the relationship between the K-8
reading instructional programs used throughout Michigan and
the revised version of the Michigan Educational Assessment
Program Experimental Reading Test Grades Four and Seven.

Development of a Communications
Process and Favorable Attitudes
3.

The present controversy surrounding the current

Michigan Educational Assessment Program Experimental Reading
Test has created a schism between those in support of the

169
test and those who are its critics.

The local Board of

Education's or the individual educator's opportunity to
influence mandated statewide educational policy is
preceived to be greatly reduced.

If the communications

process is lacking or totally inadequate, the schism will
increase.

Therefore, it is recommended that the Michigan

Department of Education and all educators recognize the
challenge before them and use their ingenuity to develop
new avenues of communicating with each other.

Revision, Continued Development and Use of
the Reading Concepts Checklist, (RCC)
4.

It is recommended that a revision of the categories

having a relatively low correlation with total proportion
scores and/or pairwise comparison scores between reading
instructional programs should be made.

The individual

reading concepts within categories ill,

(Phonic Analysis),

IV, (Structural Analysis), V, (Vocabulary Development),
VI,

(Literal Comprehension), VII,

(Inferential Comprehension),

and IX, (Study Skills) should be revised with higher levels
of specificity and studied further to identify the bases for
lack of concurrence between the reading instructional pro­
grams .
5.

It is recommended that periodic use of the Reading

Concepts Checklist, (RCC), should include an investigation
of the stability of the measures derived from the instrument

170
to determine the extent of fluctuations in the concepts
presented in the K-6 reading instructional programs.
Knowledge of these variations in the concepts presented
in the reading instructional programs could effectively
supplement improvements to the quality of Michigan Educ­
ational Assessment Program testing in Michigan.

APPENDICES

171

APPENDIX A
READING CONCEPTS CHECKLIST:
CLASSIFICATION OF INSTRUCTIONAL CONCEPTS
AND
READING CONCEPTS CHECKLIST:
CLASSIFICATION OF TESTED CONCEPTS

172

173
Reading Concepts Checklist:
Classification of Instructional Concepts
KEY:

1.
2.
3.

Ginn and Company
Harcourt, Brace and Jovanovich
Holt, Rinehart, and Winston

Concept
1.0 Auditory Discrimination
1.001 Word Sounds
1.002 Words in Sentences
1.003 Beginning Consonants
1.004 Ending Consonants
1.005 Consonant Blends
1.006 Rhyming Words
2.0 Visual Discrimination
2.007 Upper Case Letter Names
2.008 Lower Case Letter Names
2.009 Words in Sentences
2.010 Words in Paragraph
3.0 Phonic Analysis
3.011 Beginning Consonants
3.012 Ending Consonants
3.013 Medial Consonants
3.014 Beginning Blends
3.015 Ending Blends
3.016 Beginning Consonant
Digraphs
3.017 Beginning Blends and
Digraphs
3.018 Ending Blends and
Digraphs
3.019 Medial Consonants
and Digraphs

K

1

4.
5.

2

HoughtonMif f lin Company
Scott, Foresman
Company

Grade
3
4

5

6

174
Appendix A Continued.

Concept

K

1

Grade
2
3

4

5

3.01 Vowel Sounds
3.020 Short Vowel Sounds
3.021 Long Vowel Sounds
3.022 Vowel Digraphs
3.023 Vowel Diphthongs
3.024 The Schwa Sound
3.025 Context Clues
3.026 "R" Controlled Vowel
4.0 Structural Analysis
4.027 Root Words
4.028 Word Endings
4.029 Word Families
4.030 Contractions
4.031 Compound Words
4.032 Possessives
4.033 Prefixes
4.034 Suffixes
4.035 Syllabication
4.036 Accent Clues
5.0 Comprehension
5.01 Vocabulary Development
5.037 Synonyms
5.038 Antonyms
5.039 Homonyms
5.040 Context Clues
5.02 Literal Comprehension
5.041 Multiple Meaning
of words
5.042 Word Recognition
5.043 Likenesses and
Differences
Continued

6

175
Appendix A Continued.

Concepts

K

Grade
1
2
3

4

5

5.044
5.045
5.046
5.047
5.048
5.049
5.050
5.051
5.052

Syntax
Word Meaning
Sentence Meaning
Paragraph Meanings
Punctuation
Character Development
Main Idea
Details
Place Events in
Proper Sequence
5.053 Plot and Setting
5.054 Cause and Effect
5.055 Gathering Information
from Pictures
5.056 Classifying
5.02 Inferential Comprehension
5.057 Idiom
5.058 Similie
5.059 Metaphor
5.060 Alliteration
5.061 Onomatopoeia
5.062 Personification
5.063 Author's Style
5.064 Mood or Tone
5.065 Draw Logical
Conclusions
5.066 Predict Outcomes
5.067 Character Development
5.068 Main Idea
Continued

6

176
Appendix A. Continued.

Concept

K

1

Grade
2
3

4

5

5.069 Details
5.070 Place Events in
Proper Sequence
5.071 Plot and Setting
5.072 Cause and Effect
5.073 Analogies
5.04 Critical Comprehension
5.074 Judge Accuracy
5.075 Judge Validity
5.076 Distinguish Fact
from Opinion
5.077 Author's Purpose
5.078 Author's Point of View
5.079 Distinguish Realims
From Fantasy
5.080 Detect Propaganda,
Persuasion, Bias
5.081 Verify Conclusions
6.0
Study Skills
6.082 Use Table of Contents
6.083 Use Index
6.084 Use Glossary
6.085 Use Encyclopedia
6.086 Use Index Volume
6.087 Find a Topic
6.088 Cross Reference
6.089 Read Maps
6.090 Read Charts, Graphs,
Diagrams
6.091 Dictionary Skills
6.092 Alphabetize 1st, 2nd,
3rd Letter etc.
Continued

6

177
Appendix A Continued.

Concept
6.093 Use Pronunciation Key
6.094 Locate Entry Word
6.095 Guide Words
6.096 Parts of Speech
6.097 Skimming and Scanning
6.098 Follow Written Directions
6.01 Organizational Study
Skills
6.099 Topic Selection
6.100 Subtopic Selection
6.101 Outlining
6.102 Summarizing Selection
6.103 Reading Newspapers and
Magazines

K

1

Grade
2
3

4

5

6

178
Reading Concepts Checklist:
Classification of Tested Concepts

Concept

Grade Level Tested
Grade 4
Grade 7

1.0 Auditory Discrimination
1.001 Word Sounds
1.002 Words in Sentences
1.003 Beginning Consonants
1.004 Ending Consonants
1.005 Consonant Blends
1.006 Rhyming Words
2.0 Visual Discrimination
2.007 Upper Case Letter Names
2.008 Lower Case Letter Names
2.009 Words in Sentences
2.010 Words in Paragraph
3.0 Phonic Analysis
3.011 Beginning Consonants
3.012 Ending Consonants
3.013 Medial Consonants
3.014 Beginning Blends
3.015 Ending Blends
3.016 Beginning Consonant
Digraphs
3.017 Beginning Blends and
Digraphs
3.018 Ending Blends and
Digraphs
3.019 Medial Consonants and
Digraphs
3.01 Vowel Sounds
3.020 Short Vowel Sounds
Continued

179
Appendix A Continued.

Concept

Grade Level Tested
Grade 4
Grade 7

3.021 Long Vowel Sounds
3.022 Vowel Digraphs
3.023 Vowel Diphthongs
3.024 The Schwa Sound
3.025 Context Clues
3.026 "R" Controlled Vowel
4.0 Structural Analysis
4.027 Root Words
4.028 Word Endings
4.029 Word Families
4.039 Contractions
4.031 Compound Words
4.032 Possessives
4.033 Prefixes
4.034 Suffixes
4.035 Syllabication
4.036 Accent Clues
5.0 Comprehension
5.01 Vocabulary Development
5.037 Synonyms
5.038 Antonyms
5.039 Homonyms
5.040 Context Clues
5.041 Multiple Meaning
of Words
5.02 Literal Comprehension
5.042 Word Recognition
5.043 Likenesses and
Differences
5.044 Syntax
Continued

180
Appendix A Continued.

Concept

Grade Level Tested
Grade 4
Grade 7

5.045
5.046
5.047
5.048
5.049
5.050
5.051
5.052

Word Meaning
Sentence Meaning
Paragraph Meaning
Punctuation
Character Development
Main Idea
Details
Place Events in
Proper Sequence
5.053 Plot and Setting
5.054 Cause and Effect
5.055 Gathering Information
From Pictures
5.056 Classifying
5.03 Inferential Comprehension
5.027 Idiom
5.058 Similie
5.059 Metaphor
5.060 Alliteration
5.061 Onomatopoeia
5.062 Personification
5.063 Author's Style
5.064 Mood or Tone
5.065 Draw Logical
Conclusions
5.066 Predict Outcomes
5.067 Character Development
5.068 Main Idea
5.069 Details
5.070 Place Events in
Proper Sequence
Continued

181
Appendix A Continued

Concept

Grade Level Tested
Grade 4
Grade 7

5.071 Plot and Setting
5.072 Cause and Effect
5.073 Analogies
5.04 Critical Comprehension
5.074 Judge Accuracy
5.075 Judge Validity
5.076 Distinguish Fact From
Opinion
5.077 Author's Purpose
5.078 Author's Point of View
5.079 Distinguish Realism
From Fantasy
5.080 Detect Proaganda,
Persuasion, Bias
5.081 Verify Conclusions
6.0
Study Skills
6.082 Use Table of Context
6.083 Use Index
6.084 Use Glossary
6.085 Use Encyclopedia
6.086 Use Index Volume
6.087 Find a Topic
6.088 Cross Reference
6.089 Read Maps
6.090 Read Charts, Graphs,
Diagrams
6.091 Dictionary Skills
6.092 Alphabetize 1st, 2nd,
3rd Letter etc.
6.093 Use Pronunciation Key
6.094 Locate Entry Word
Continued

182
Appendix A Continued.

Concept
6.095 Guide Words
6.096 Parts of Speech
6.097 Skimming and Scanning
6.098 Follow Written Directions
6.01 Organizational Study
Skills
6.099 Topic Selection
6.100 Subtopic Selection
6.101 Outlining
6.102 Summarizing Selection
6.103 Reading Newspapers and
Magazines

Grade Level Tested
Grade 4
Grade 7

APPENDIX B
COMMUNICATION SKILLS OBJECTIVES

183

COMMUNICATION SKILLS OBJECTIVES

— Reading
—

Speaking/Listening

—

Writing

Michigan Department of Education
January, 1979

184

With Examples of Experiences and Activities
and Suggested Measurement Approaches

READING

PROPOSED READING OBJECTIVES
Competency

Measureable Behavior (3rd Grade)

Measurable Behavior (6th Grade)

Measurable Behavior (9th Grade)

I.
Vocabulary
Meaning

By the end of the third grade,
the student will be able to:

By the end of the sixth grade,
the student will be able to:

By the end of the ninth grade,
the student will be able to:

A.

A.

A.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by prefixes.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by prefixes.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by comnon prefixes.

Example Experiences and/or Activity

Measurement

1.

Give students words whose meanings can be affected
'by prefise8. Also, give them lists of prefixes to
use with the words, or have them think of their own
prefixes to use. Discuss in what way the words
have changed in meaning and what the various pre­
fixes must, therefore, mean.

1.

Give students words with prefixes and
have them choose from four or more
choices the meaning of the prefix.
For example, given the word "reorganize,"
the student should choose the response
"to organize again."

2.

Compile lists of prefixes. Have students discuss
or verify their meanings in the dictionary. Have
them use the prefixes in their own writing.

2.

Have students write sentences or a
selection using a given list of pre­
fixes correctly.

3.

Have students locate prefixes in their textbooks
and keep a list of these prefixes.

B.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by suffixes.

B.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by suffixes.

B.

Determine the meaning of a
word in a sentence whose
meaning has been affected
by comnon suffixes.

Example Experiences and/or Activity

Measurement

1.

1. - Give the students words with suffixes
and have them choose from four or more
choices the meaning of a suffix. For
example, given the word "careless," the
student should choose the response,
"careless means without care."

Give students words whose meanings can be affected
by suffixes. Also, give them lists of suffixes to
use with the words, or have them think of their own
suffixes to use. Discuss In what ways the words
have changed in meaning and what various suffixes
must,therefore, mean.

2. Compile lists of suffixes. Have students' dls£&is:
or verify their meanings in thev-dictionary. Have
ffir' them use the suffixes in their own writing.

*

2.

Have students writesentences or a
selection *using a* given list of
suffixes correctly.

3. At upper grade levels, students may learn that
suffixes often affect the way a word is used in a
sentence; i.e., the part of speech. For example,
"careless" is an adjective; "carelessly" is an
adverb.
..

C. Determine the meaning of a
- - word, that has multiple mean­
ings', depending on its use in
a sentence.

D.

Determine the meaning of a
word that has multiple mean­
ings, depending on its use in
a sentence.

C.

V,.

Determine the meaning of a
word that has mutliple mean­
ings, depending on its use in
a sentence.

Example Experiences and/or Activity

Measurement

1. KriLe a word that has multiple meanings on the
board. Ask the students to think of as many meanings
for the word as possible and use the word in these
various ways in sentences. For example, the word
"circle" may mean to walk around something, to draw
a round line, a ring, or a private group of people.
Thus, "The cat circled the wounded bird," "Cicle
the right answer." "We sat in a circle." "Do you
belong to the inner circle?"

1.

Give the students a sentence with an
underlined word that can have multiple
meanings. Ask them to choose from a
list of four or more meanings the one
that is appropriate to its use in the
sentence.

2.

Give the students a word that has
multiple meanings. Ask them to write
sentences using the word according to
its various meanings.

3.

Give the students a sentence containing
an underlined word that may have multiple
meanings. Also, give the students a list
of dictionary definitions of that word.
Ask them to check the definition most
appropriate to its use in the sentence.

2. Have students look up a word that has multiple mean­
ings in the dictionary. Discuss the various meanings
and use in sentences. More complex words may have
many slightly different meanings.
3. Use library books, such as Amelia Bedelia, The King
Who Rained, and Jake, which make humorous use of the
multiple meaning of words, to illustrate this
principle. Have students write similar selections,
either as individuals or as grours.
4.

At the more advanced levels, discuss how words differ
in connotations as well as denotations. Also discuss
how words may differ In various subject areas; such
as "culture" in social studies , "culture" in science,
and "cultured" in the arts.

D.

Identify a word that has a
similar meaning to another
word (identifying synonyms).

D.

Identify a word that has a
similar meaning to another
word (Identifying synonyms).

D.

Identify a word that has a
similar meaning to another
word (identifying synonyms).

Measurement

1.

Present the students with a word and ask them to
think of as many synonyms as possible. Students may
use dictionaries, thesauruses, and so on to locate
additional synonyms.

1.

Give students a sentence with an under­
lined word. Also give them a choice of
four or more words from which to select
a synonym for the underlined word.

2.

Have students read poetry in which synonyms are used
for artistic purposes, such as "The Cataract of
Lodore." Discuss how even synonyms have fine dif­
ferences in meaning.

2.

Give students a word that has many
synonyms. Ask them to list at least
three (or some other number) synonyms
for the word.

3.

Have students re-write their own selections, using
synonyms for the words they originally used.

E.

Identify a word that has an
opposite meaning to another
word (identifying antonyms).

E.

Identify a word that has an
opposite meaning to another
word (identifying antonyms).

E.

Identify a word that has an
opposite meaning to another
word (identifying antonyms).

Example Experiences and/or Activity

Measurement

1.

Present the students with a word and ask them to
think of as many antonyms as possible. Students
may use dictionaries, thesauruses, and so on to
locate additional antonyms.

1.

Give students a sentence with
lined word. Also give them a
four or more words from which
an antonym for the underlined

2,

Arrange students In groups and have them compete
to find as many antonyms for a given nuaber of
words as possible.

2.

Give students a word that has many
antonyms. Ask them to list at least
three (or some other number)
antonyms for the word.

-3-

an under­
choice of
to select
word.

187

Example Experiences and/or Activity

F.

Determine the meaning of a
word on the basis of the
context of a sentence.

F.

Determine the meaning of a
word on the basis of the
context of a sentence.

F.

Determine the meaning of a
word on the basis of the
context of a sentence.

Example Experiences and/or Activity

Measurement

1.

When listening to students read, if they have
difficulty decoding a word, encourage them to
consider context clues.

1.

2.

Present students with sentences containing words
they may not know the meaning of. Have them dis­
cuss what they think the word might mean on the
basis of its use in the sentence. Have students
verify their guesses in the dictionary.

Give the students a sentence contain­
ing a word they probably will not know.
Also, give them a list of possible
definitions for the word. Ask them to
select the most appropriate definition,
given its use in the sentence.

2.

Give the students a sentence contain­
ing a nonsense word. Also, give them
a list of possible definitions for the
word. Ask them to select the most
appropriate definition, given its use
in the sentence.

3.

Prior to having the students read a section of one
of their textbooks, such as a social studies, science,
health textbook, list the words that they may not
know on the board. Have them discuss what they
think the words may mean; then have them read the
words in the context of the passage and continue
the discussion. If necessary, verify their guesses
in the dictionary or the glossary of the textbook.
Then have them proceed to read the assigned chap­
ter or selection.

Competency

Measurable Behavior (3rd Grade)

Measurable Behavior (6th Grade)

Measurable Behavior (9th Grade)

11 .
Literal
Comprehen­
sion

By the end of the third grade,
the student vill be able to:

By the end of the sixth grade,
the student will be able to:

By the end of the ninth grade,
the student will be able to:

A.

A.

A.

Read a selection using a
knowledge of structure of
the language including syn­
tactic and semantic clues
(cloze procedure).

Read a selection using a
knowledge of the structure of
the language including syn­
tactic and semantic clues
(cloze procedure).

Example Experiences and/or Activity

Measurement

The cloze procedure may be used to determine the stu­
dent’s approximate reading level and to match her or his
reading level and needs to the materials being used.
It is probably one of the simplest ways to determine
a student’s literal comprehension level. Procedures
are as follows:

1.

1.

2.

Select a paragraph. Perhaps it may be from the
reading material the student is to read for the
class. The length of the paragraph may vary, depend­
ing upon the level of d i f f i c u l t y .
F o r pupils in
the third grade and above, passages should be at
least 25 words long.
Delete every fifth word in the selection and re­
place each omitted word with a blank of standard
length. Do not delete a word in the first or last
sentence.

3.

Ask the student to read the selection and fill in
the missing words.

4.

Score the test by counting the nuriber of words
correctly supplied by the student. Do not penalize
students for incorrect spellings. If a student
supplies a word that makes as much sense to the
meaning as the original word (such as supplying
the word "blue" for the phrase "...the
ball"),
it may be counted as acceptable.

-5-

Read a selection using a
knowledge of the structure of
the language including syn­
tactic and semantic clues
(cloze procedure).

Student Instructions:
In this exercise you are to read several
paragraphs. Every fifth word in each
paragraph has been left.out. As you
read the paragraphs, figure out which word
was taken out of each space and write it
in. Only one word goes in each blank.
If you are not sure of the word, you may
guess.
IT«r*i

1« ■

TODAV'S c a tt l e ranchers
(Third grade level)
John's father is a rancher who' owns many
cattle. Once each year John
his
father take their
to market to sell
. Many years ago ranchers
to
market by herding
take their cattle
them
horses.
do not do this.
John and his
load their cattle in ____ cars owned by the
his
company. Then the engineer
to the cattle cars
train and hooks
hauls them to market. John and his
father ride on the train with the cattle.

5. There is no standard way to "score" a cloze procedure.
The teacher should use her or his own judgment as to
"levels" of difficulty. Below is a suggested standard
for making a judgment:
If a student supplies 70 to 100% of the missing words
correctly, he/she is reading the passage at an indepen­
dent level; that is, he/she can read it quite easily.
If a student supplies 40 to 69% of the missing words
correctly, he/she is reading the passage at an instruc­
tional level; that is he/she can read it with some
effort and perhaps assistance from the teacher.
If a student supplies 39% or less of the missing words
• correctly, he/she is reading the passage at a frustra­
tion level; that is, the material is probably too
difficult to use even for instructional purposes.

B. Identify the stated main
idea within a selection.

B.

Identify the stated main
idea within a selection.

B.

Example Experiences and/or Activity

Measurement

1. Present students with a selection in which the main
idea is clearly stated. Go through each sentence
and have students discuss which one seems to best
describe what the whole selection is about. Discuss
what the term "main idea" means.

1.

2. Give students a sentence that can serve as a "main idea"
for a selection. Have them write a selection using the
sentence as the main idea. Have other students locate
the sentence in the selection . Or have students think
of their own main-ides sentences and then have them
develop selections using these sentences.
3. Before having students read a chapter or a section in
one of their textbooks, such as a science or social
studies textbook, go through the chapter or section as
a whole and attempt to locate sentences that may represent
what the main idea of the whole chapter is likely to be.

-6-

Identify the stated main
idea within a selection.

Have students read a paragraph in which
one sentence or phrase represents the
main idea. Ask them to identify that
sentence or phrase.

Do che same wlch parts of Che chapter or paragraphs
within the chapter. After they have read the selection,
discuss whether the sentences were indeed the main
ideas.
4.

Have students find examples in newspapers and
magazines of sentences or phrases that state the
main idea of the selection.

C.

Identify details that
support the main idea of a
selection.

C.

Identify details that
support the main idea of a
selection.

C.

Example Experiences and/or Activity

Measurement

1.

Give students a selection in which the main idea
is stated. Have them find statements within the
passage that support the main idea.

1.

2.

Give students a sentence that can serve as a main idea,
such as "Australia has a lot of unusual animals."
Have them write a paragraph that justifies this
statement. The justifying statements thus support
the main idea.
Have students locate main ideas in their textbooks
and in magazines and newspapers. Have them point
out the sentences that support the main idea.

3.

D.

Identify information within
a selection on the basis if
recall.

D.

Have students read a selection in which
the main idea is stated. List four or
more choices that support the main idea
and have students select the appropriate
choice. For example, the selection may
be "Family Life on the Prairie." The
main idea is that all members of the
family had work to do. The question might
be: "What did little girls do to help?"
The correct choice, on the basis of the
selection, might be "...they helped pre­
pare the meals."

Identify Information within
a selection on the basis of
recall.

D.

Example Experiences and/or Activity

Measurement

1.

1.

Have students read a selection. After they have
finished, ask them about specific information con­
tained within the passage without referring back to
the selection. Or allow varying lengths of time to
lapse before asking them to recall the information.

Identify details that
support the main idea of a
selection.

Identify information within
a selection on the basis of
recall.

Have students read a selection. Without
having them refer back to the selection,
ask them to identify information pre­
sented in the selection, perhaps through
multiple choice questions.

Practice this regularly. Over a period of time, stu­
dents who may have difficulty recalling information
will acquire more of a facility to do so. The activity
can be made into a game, the winners being those who can
recall the most information. This can be done in groups
as well as with individuals.

2.

Have students read a selction. Present
them with lists of information that may
have been presented in the selection.
Have them check those items that are
factually correct.

2. Have students read a selection. Then ask them to make a
list of all the information they can remember from their
reading. On the basis of their lists, have them re-write
the selection without referring to the original.
Then have them compare their versions to the original.
Discuss in what ways the re-written paragraphs are better
than or not as good as the original.
3. Have students discuss mental techniques they may use to
recall information. Discuss various factors that seem to
affect the ability to recall. Is the time lapse between
the reading and the recall important? Is the content
itself a factor? Do those who understand the whole
passage more fully recall the details better?

E.

Identify the sequence with­
in a selection.

E.

Identify the sequence with­
in a selection.

E.

Example Experiences and/or Activity

Measurement

1.

1.

Have students read a selection in which sequenitality is clearly stated, especially by such as words
as "first," "second," "thirdly," "then," "later,"
"next," "soon," "finally," "before," and so on.
Following their reading, have students discuss or
list the elements in the selection as they were pre­
sented. Include selections in which actual events
are not related in the specific order they occurred.
(For example, when a character in a story is walking
home from school, he may be thinking back to how he
got into trouble in school— and that trouble all started
with something that happened yesterday. He dreads

-8-

Identify the sequence with­
in a selection.

Have students read a selection in which
sequentiality is clearly stated. Ask
questions to determine if they under­
stand what followed what. For example,
"What did Fred do as soon as knew he
was lost?" Correct answer: "... he
climbed a tree." (The selection says he
built a fire later.)

getting home, because he knows the teacher has called
his father. When he gets home, his father meets him at
the door, and the concluding events are related.)
2.

Give students a list of events. Have them write a
narrative about the events relating them in various
orders. Some may tell the story in the order of
the events, some may start in the middle, some at
the end.

3.

Have students read books and stories, such as mystery
stories, in which the sequence of events as they
actually occurred (and not where they were actually
related in the story) is a key factor.

4.

Have students write expository selections that
require a step by step treatment. Encourage them
to use words that guide the reader through the
exposition clearly.

F.

Identify stated cause and
effect relationships
within a selection.

F.

Identify stated cause and
effect relationships
within a selection*

F.

Example Experiences and/or Activity

Measurement

1.

Have students read selections in which cause and
effect relationships are clearly stated. In discussions
or through individual work, have them identify the
stated causes and effects.

1.

2.

Have students list words and phrases that denote
cause and effect relationships. Such words and phrases
may Include "because," "as a result,” "therefore."
Sentence structure may also suggest cause and effect
relationship; as In "The Civil War, brought on by the
slavery Issue, occurred in the 1860's" and "The War
contributed much to the North's industrial development.”
Have students locate examples of cause and effect
relationships that are clearly stated, but not through
the use of words typically used to denote these
relationships.
a

Identify stated cause and
effect relationships
within a selection.

Give students a selection in which cause
and effect relationships are clearly
stated. Ask questions to determine if
they comprehend these relationships. For
example, "Why did Mary start crying?"
Correct response"...because her friend
left without her."

3.

Have students locate examples of cause and effect
relationships in their textbooks.

C.

Identify stated likenesses
and differences within a
selection.

G.

Identify stated likenesses
and differences within a
selection.

Example Experiences and/or Activity

G.

Identify stated likenesses
and differences within a
selection.

Measurement

1.

Present students with selections in which likenesses
and differences are clearly stated. For example,
"Wolves are like dogs in many ways, but they're also
different from dogs." (The selection goes on to
explain these likenesses and differences.) Have
students list or discuss the stated likenesses and
differences.

2.

Have students find examples of stated likenesses and
differences in their textbooks and other reading
material.

3.

Have students discuss ways in which their school build­
ing is like other school buildings and ways it differs.
List as many likenesses and differences on the boatd.
Do this with various words, rangina from words that
denote concrete objects ("How are a basketball and
a baseball alike and different?") to words that
denote abstractions ("How are nations and states alike
and different?") •

4.

Have students group various objects and words together
(formulate concepts) according to their likenesses.
Have them justify their groupings (concepts). ("I put
"doll, "ball," and "blocks" together because they're
all toys.") Have students discuss how things can be
alike in some ways; different in others.

-10-

Give students a passage in which like­
nesses and differences are clearly stated,
Have them identify these likenesses and
differences. Question: "Who did Jane
look like?" Answer: "Jane looked like
Mary."

H.

Identify the meaning of a
sentence based on punctua­
tion— periods , commas, ques­
tion marks, exclamation marks,
and quotation marks.

H.

Identify the meaning of a
sentence based on punctua­
tion— periods, commas, ques­
tion marks, exclamation marks,
and quotation marks.

4Example Experiences and/or Activities

Measurement

1.

1.

Give the students various versions of a selection
— one that is punctuated according to common
usage, one that is poorly, or even ludicrously,
punctuated, and one that is not punctuated at all.
Have students read the passages either aloud or
silently and discuss vhat effect the punctuation
or lack of punctuation had on their ability to
read the selection easily.

Give students a selection with punctuation
omitted. Have them supply correct punc­
tuation according to meaning. For
example, "Was the house painted white ( )"

(.)
(,)
(?)
(;)

2.

Have students read aloud sentences that are punc­
tuated in various ways to show thatpunctuation may
affect the way they would read thesentence. For
example:
"Mary, will you come here!"
"Mary, will you come here?"
"Mary, come here."

3.

Have students think of sentences in which the actual
meaning is affected by punctuation marks. For
example:
"Kill Godzilla."
"Kill, Godzilla."
"Kill Godzilla?"

4.

Have students re-write a story containing dialogue
as a play.

2. In the following selection, who is speak­
ing? "John," said Phil, "where is Mary
going?"
John
_Phil
Mary
We don't know

Competency

Measurable Behavior (3rd Grade)

Measurable Behavior (6th Grade)

Measurable Behavior (9th Crade)

III.
Inferential
Compre­
hension

By the end of the third grade,
the student will be able to:

By the end of the sixth grade,
the student will be able to:

By the end of the ninth grade,
the student will be able to:

A.

A.

A.

Infer the main idea of a
selection.

Infer the main idea of a
selection.

Infer the main idea of a
selection.

Example Experiences and/or Activity

Measurement

1. Have students read a selection in which the main
idea is not explicitly stated, but is to be in­
ferred. Go through the sentences contained in the
selection and have the students discover for them­
selves that no one sentence alone states the main
idea of the whole. Ask them to state as clearly
and succinctly as possible what they think the main
idea is.

1. Have students read a selection in which
the main idea is not actually stated. Have
them choose from a list of possible main
ideas the one that most clearly states
the main idea of the selection.

2. Have students read a selection. Have them list as
many ideas contained in the selection as possible.
Then have them decide which of the ideas are more
important to the whole selection, which less im­
portant. Have them select from the more important
ideas the one they think best states the main idea.
Then discuss the concept of "main idea."
3.

Give students a sentence that can be used as a main
idea. Ask them to write a selection about that
idea without actually stating it in the selection.
Have other students state what they think the main
idea is.

4.

As a matter of course when students read, ask them
what they think the main idea was. Accustom them in
various kinds of reading activities, both reading for
pleasure and in instructional material, to distinguish
between major (main) ideas and subordinate or minor
ideas in that same selection.

-12-

2. Have students read a selection. Present
them with a list of ideas to be inferred
from, but not stated in, the selection,
ranging from the more Important ideas
to the lesser ideas. Have them check
the most important ideas and the least
important to be inferred from the selec­
tion. Have students justify their choices.

5.

Ask students to consider the question, "Is the main
idea more often stated or inferred?" in regard to a
variety of reading materials; i.e., stories, fables,
science materials, social studies materials, news­
papers, plays, novels, short stories, and so on.
In what kinds of reading materials does one tend to
find the main idea stated and in what kind of
materials is it likely to be inferred? Why?

B.

Infer the cause and effect
relationships within a
selection.

B.

Infer the cause and effect
relationships within a
selection.

B.

Measurement

1.

Have students read selections in which cause and
effect relationships are not actually stated. In
discussions and through individual work have students
state the inferred cause and effect relationships. Have
them justify the causes and effects they state. For
example, "Joe is very good at carpentry. His father
is a carpenter." Though not stated explicitly, one
might justifiably infer that Joe learned something
about carpentry from his father.

1.

2.

Have students locate Inferred cause and effect
relationships- in their textbooks. Discuss the con­
cept of multiple causes and multiple effects, especially
in regard to science and the social sciences. In dis­
cussing stories, novels, and plays, make a point of ask­
ing students to discuss what they think caused the
characters to act as they did and what effect these actions
had on other characters. Having students discuss in­
ferred causes of human behavior and inferred effects
are as appropriate to discussing "Peter Rabbit" as to
"Hamlet."

-13-

Give students a selection in which cause
and effect relationships may be inferred,
Ask them to identify the appropriate in­
ferred causes and effects.

192

Example Experiences and/or Activity

Infer the cause and effect
relationships within a
selection.

C.

Predict the probable out­
come of a selection.

C.

Predict the probable out­
come of a selection.

C.

Example Experiences and/or Activity

Measurement

1.

Have students read a selection from which the
ending has been eliminated. Have them speculate
how, on the basis of everything else in the story,
it will in all probability end. Have them justify
why they say the story will end that way. Then
have them read the actual ending.

1.

2.

Have students read stories and trade books (library
books) of their own choosing. Have them speculate
what events may occur or what may happen to the
characters in the years following the end of the
story or novel. Have then justify their ideas.

3.

Discuss the idea of "probable outcomes" in relation
to such literary devices as surprise endings, ironic
twists, unforeseeable outcomes, and so on. Assist
them to understand the difference between "probable
outcomes" and more creative and literary outcomes.
Have students complete a story in the most probable
way and then in a less predictable way. Have them
discuss which outcome is better. Why?

D.

Infer details that support
the main idea of a selection.

D.

Give the students a select ion from which
the ending has been eliminated.
List
some probable outcomes.
On the basis
of what the reader is told in the selec­
tion, which of the listed outcomes is
the most probable?

Infer details that support
the main idea of a selection,

Example Experiences and/or Activity
Have students read a selection and then make in­
ferences about details not explicitly stated in
the selection. Have them list all inferences they
can think of about details not stated. For example,
if the story describes a gradually darkening, brilliant
red skv that makes the sea "look blood red," the reader
can infer the story occurs at sunset. Much of what we

Predict the Drobable out­
come of a selection.

D.

Infer details that support
the main idea of a selection.

Measurement
1.

Give the students a selection followed
by a list of details about the selection
that were not explicitly stated. Have
the students choose the details that
may be justifiably inferred.

read in a selection is inferred by the reader, rather
than is explicitly stated— and appropriately so. But
some inferences are more justifiable than others.
2.

Have students make a list of details they want to in­
clude in a story, such as the day the story occurs
will be an extremely hot one. It will occur in July
in the 1860's, and the setting will be Pennsylvania.
The M i n character will be a thirteen-year-old deaf
girl who is the youngest of four children. And so on.
Now have them write the story without explicitly stating
these details. Then have other children read the selec­
tion and make inferences about details. Have them
verify their inferences on the basis of the original list

E.

Infer the sequence within
a selection.

E.

E.

Infer the sequence
a selection.

Example Experiences and/or Activity

Measurement

1.

Have students read a selection in which sequen­
tiality is inferred, though not stated specifically.
Following their reading, have students list the
elements in the story as they were presented and
as they actually occurred. Include examples that
show that the order of presentation within the
selection may not be the order of the actual event.
For example,in the following selection, the events
are not presented in actual sequence: "I took the
cake out of the oven and was so pleased, I decided
to frost it with extra deluxe frosting. As I was
making the frosting and then putting it on the cake,
I thought back to the difficulty I had getting the
batter just right.” "Getting the batter right" is
presented last, but actually occurred before anything
else mentioned in the selection.

1.

2.

Have students do a "time line" on the basis of a story
or book they have read. A section of a history book
might be particularly appropriate for the activity.
-15-

Infer the sequence within
a selection.

Have students read a selection in which
sequentiality is to be inferred. Ask
questions to determine if they understand
the inferred sequentiality. For example,
if the student were asked what occurred
first in the cake-baking selection (see
opposite), he/she should choose "...tried
to get the batter right" and not
"made the frosting."

3.

Have students write a selection involving sequentiality.
Have other students read the selection and list the
elements sequentially.

J
F.

Infer likenesses and differences within a selection.

F.

Infer likenesses and differences within a selection.

F.

Example Experiences and/or Activity

Measurement

1.

Present students with selections in which like­
nesses and differences are to be inferred rather
than actually stated. Ask them to identify these
likenesses and differences. For example, a selec­
tion may be about animals that rely on speed to
escape their enemies. Two animals so discussed may
be antelopes and rabbits. The student would infer
that antelopes and rabbits are alike in that they
both can run fast.

1.

2.

(See also identifying stated likenesses and
differences.)

G.

Draw conclusions from given
information.

G.

Infer likenesses and differ­
ences within a selection.

Give the students a selection in which
likenesses and differences are to be
inferred. Have students identify these
likenesses and differences.

Draw conclusions from given
information.

G.

■■ —
Example Experiences and/or Activity

Measurement

1.

Present students with selections and passages
on the basis of which conclusions may be drawn.
Have them reach various conclusions as individuals
or as members of groups. List the various con­
clusions drawn on the board and discuss which ones
are the most justifiable. Discuss what constitutes
"a safe conclusion."

1.

2.

Make conclusions on the basis of material presented
in a wide range of written material, such as text­
books, poems, novels, stories, advertisements, news-

Draw conclusions from given
information.

— ■-

|

Give the students a selection upon which
a conclusion may be drawn. From a list
of possible conclusions, have them
select the most justifiable one. For
example, if the selection states that
dikes have been built around a city, we
can conclude "that the city is located
close to the sea," hut not necessarily
"that it is a city that dates back to
the Middle Ages."

I
-16-

paper and magazine articles, research studies,
and so on. What kinds of material are easiest and
safest to draw conclusions on the basis of? Are some
conclusions more justifiable than others? Why?
3.

Make it a practice to ask the students "What do you
think we can conclude from what you've read?" Encourage
students to present various conclusions and to justify
them.

H.

Identify relationships of
words (analogies).

t

H.

Identify relationships of
words (analogies).

H.

Example Experiences and/or Activity

Measurement

1.

Have students practice word analogies of various
degrees of difficulty. For example, "Shoe is to
foot as glove is to
." Have students make
up their own analogies to give other students.
Leave various parts of the analogies blank. ("
is to foot as glove is to hand.") Use for vocabulary
builders as well: "Mauve is to purple as gray is to
." Student may have to look up "mauve."

1.

2.

Organize "spelling-bee" type games and other group
games, using word analogies as the vehicle.

1.

Make inferences about
characters in a story.

Students choose from a list the
appropriate word to complete an analogy.
194

III-l was inadvertently left out.

Identify relationships of
words (analogies).

It should read as follows:
I.

Make inferences about
characters in a story.

-17-

I.

Make inferences about
characters in a story.

Competency

Measurable Behavior (3rd Grade)

Measurable Behavior (6th Grade)

Measurable Behavior (9th Grade)

1V„
Critical
Reading
Ski Us

By the end of the third grade,
the student will be able to:

By the end of the sixth grade,
the student will be able to:

By the end of the ninth grade,
the student will be able to:

A.

A.

A.

Determine the author's
purpose for a selection.

Determine the author's
purposes for a selection.

Determine the author's
major purposes for a selection.

Example Experiences and/or Activity

Measurement

1.

1.

Give students a brief selection and
list four or more possible purposes.
Have them choose the "best purpose"
or the "main purpose." For example,
the selection may be on the gradual
decline of elephants because of hunters.
The intended author's purpose is to
"prevent the extinction of elephants."

2.

Given a selection, the students will
identify major purposes and possible
minor purposes.

Have students read selections that have an obvious
purpose, such as Aesop's Fables. Have students
discuss the purpose of the selection. Encourage
various ideas. Then have students decide the best
statement of the purpose. At higher levels, discuss
the author's purpose in terms of materials in which
the purpose is not as clear cut or where there may be
a number of purposes. Also, at the higher levels,
have students discuss the author's purpose in regard
to a wide variety of materials; i.e., fiction, non­
fiction, expository writing, newspaper and magazine
articles, advertisements, and so on.

2 . Have students select "a purpose'* for writing something,
such as to tell a moral, to convince people they should
give to Community Chest, to entertain, or to inform;
and then write a selection based on that purpose.
Have other students read the selections and guess
the intended purposes.
3.

Have students read brief selections. List possible
purposes on the board. Discuss why one particular
purpose is the best choice.

8.

Distinguish between fantasy
and reality.

B.

Distinguish between fact
and opinion.

B.

Example Experiences and/or Activity

Measurement

1.

1.

Discuss with students how some stories are "real"
(could actually happen), while others are fantasies
-18-

Distinguish between fact
and opinion.

Under such phrases as "Which of the follow­
ing could really happen?" "Which of the

(could not actually happen). Have them read selections
and discuss which ones "could really happen" and which
ones "are make believe." Have them discuss the
reasons for distinguishing between fact and fantasy.
2.

3.

4.

At the higher levels, discuss how stories may present
"truths," even though they are not actaully true
or real. Thus, although fables are not real, they
present truths about life. Also, in much writing,
fact and fiction seem to merge.
Give students reading material containing both fact
and opinion. Ask them to identify the facts and
the opinions and tell why they have identified
these parts as such. At the upper levels, con­
sider material in which fact and opinion are less
distinguishable— for example, in cases where facts
arc arranged and presented so as to convey the
author's opinion.

following is make-believe?** "Which of
the following could a person really do"
list various choices and ask the student
to choose the appropriate choice. For
example, given "Which could not really
happen?" the student would choose "The
airplane laughed and laughed."
2.

Give students various selections and
have them decide if they are fact or
fantasy.

3.

Give the students a questions such as:
"Which of the following are statements
of fact?" and list choices. Students
will select the factual statement. Do
the same for opinion.

C.

195

Have students locate examples of writing that
contain facts and opinions, especially in
newspapers and magazines.

Determine the author's
viewpoint from a selection.

C.

Example Experiences and/or Activity

Measurement

1.

Have the students read a selection and discuss
what they believe the author's point of view
to be; i.e., what opinion does the author have
regarding the topic. For example, if the selec­
tion is on crime, does the author believe it is
hopeless to do anything about it, everyone should
try to do something, it's the governor's job, or
it's the natural result of social ills.

1.

2.

Have students read a wide variety of material
and discuss what they believe to be the author's
viewpoint.
-19-

Determine the author's
viewpoint from a selection.

Give students a brief selection and
U s t four or five points of view on
the topic. Have students select the
point of view expressed by the author
of the selection.

3.

Have students read selections on the same topic,
but written from various viewpoints.

4.

Have students write on topics from various view­
points. For example, have them write about the
American Revolution for an English history book,
a Canadian, a French, and a Russion textbook.

3.

Discuss the topic of bias and point of view. Can any
writing be free of bias or a point of view? Especially
discuss the question in relation to the various subject
areas: history, the social science, science, health
education, literature, the arts, and so on.

*

D.

Identify examples of
propaganda techniques.

Example Experiences and/or Activity

Measurement

1.

Discuss various propaganda techniques and have
students read examples of these techniques. Have
students find examples of their own in a variety
of written material, including advertisements.

1.

Give students a brief selection using
a particular propaganda technique. Ask
the student to identify the technique
used.

2.

List various types of propaganda techniques and
have students write selections using these
techniques. Have other students read the
selections and discuss the techniques used.

2.

Ask students to identify selections that
are heavily propagandized and selections
that are relatively free of propaganda.

3.

Have students construct a montage art of
sections of advertisements that use various
propaganda techniques.

4.

Discuss the various uses of propaganda, both
in contemporary society and from a historical
perspective.

-20-

Competency

Measurable Behavior (3rd Grade)

Measurable Behavior (6th Grade)

Measurable Behavior (9th Grade)

V.
Related
Study
Skills

By the end of the third grade,
the student will be able to:

By the end of the sixth grade,
the student will be able to:

By the end of the ninth grade,
the student will be able to:

A.

A.

A.

Identify the major use of
dictionaries, tables of
contents, and glossaries.

Identify the major uses of
dictionaries, encyclopedias,
atlases, newspapers, maga­
zines, telephone books, tables
of contents, glossaries,
indexes, maps, graphs, charts,
and tables.

Example Experiences and/or Activity

Measurement

1.

Give students instruction in the use of the various
reference materials listed in the objectives. Discuss the various situations in which materials
would be used and how they are used.

1.

B.

Locate information within
reference materials using
dictionaries, tables of
contents and glossaries.

B.

Give the student a type of information
to be located. Have the student identify
the appropriate reference material to
locate the information.

Locate information within
reference materials using
die tionarles, encyclopedias,
atlases, newspapers, maga­
zines, telephone books,
tables of contents, glos­
saries, indexes, maps,
graphs, charts, and tables.

B.

Example Experiences and/or Activity

Measurement

1.

1.

Have the students use the various reference
materials listed In the objective in their
everyday work, especially in a variety of
subject areas.

Identify the major uses of
dictionaries, encyclopedias,
atlases, newspapers, maga­
zines, telephone books,
thesauruses, almanacs, card
catalogues, periodical guides,
tables of contents, glos­
saries, indexes, maps, graphs,
charts, tables, appendixes,
footnotes and bibliographies.

Locate information within
reference materials using
dictionaries, encyclopedias,
atlases, newspapers, maga­
zines, telephone books,
thesauruses, almanacs, card
catalogues, periodical guides,
tables of contents, glos­
saries, Indexes, baps, charts,
graphs, tables, appendixes,
footnotes, and bibliographies.

Given a kind of information to locate,
the student will locate the
information in the appropriate refer­
ence material.

C.

Follow written directions.

C.

Follow written directions.

C.

Example Experiences and/or Activity

Measurement

1.

Give students various sets of directions for a wide
variety of tasks, such as how to construct some­
thing, how to get from one place to another, and
how to fill out a form, and ask them to follow the
directions exactly.

1.

D.

Summarize a selection.

D.

Give the student a form with written
directions. Ask the student to complete
the form accurately.

Summarize a selection.

Summarize a selection.

Example Experiences and/or Activity

Measurement

1.

Have students read various kinds of selections and
present them with summaries of the selections.
Discuss which summaries are the best and why.

1.

2.

Have students write summaries of a variety of
selections.

E.

Organize information in an
outline form.

E.

Give students a brief selection and
four or five summaries. Have the stu­
dent select the best summary.

Organize information in an
outline form.

E.

Example Experiences and/or Activity

Measurement

1.

Have students at the lower levels construct
rudimentary outlines of written material. At
upper levels have them construct more complete
outlines. Discuss various types of outlines,
the logic behind outline forms, and the various
uses of outlines.

1.

2.

Given a completed outline, have the student write
a selection or give a speech using the outline.

3.

Have students outline material before writing a
selection.
-22-

Follow written directions.

Organize information in an
outline form.

On the basis of a set of material, the
student will identify the best outline
for a given purpose.

F.

Alphabetize words correctly
through the second letter,

F.

Use alphabetizing skills
to locate information in
common references.

Example Experiences and/or Activity

Measurement

1.

Have students use alphabetizing skills in
locating Information in reference materials.

1.

Given a word, the student will choose
from a list of words the one that would
come next.

2.

Give students lists of words and ask them to
alphabetize them. Activity may be done individually
or in groups, and may be carried out as a game.

2.

Given a list of words, the student will
alphabetize them.

3.

Given two guide words (as are found on
a dictionary page), the student will
identify words that would fall between
those words. For example, "mind" falls
between "mill" and "minor," but "mock,"
"minority," and "mug" will not.

3.

Have students locate a group of words in a diction­
ary as rapidly as possible. Conduct as a race—
the winners being those students who find all the
words first.

4.

Stress the use of guide words in using diction­
aries, telephone books, and so on.

Competency

Measurable Behavior (3rd Grade)

Measurable Behavior (6th Crade)

Measurable Behavior (9th Grade)

VI.
Positive
Responses

By the end of the third grade,
the student will demonstrate
her/his enjoyment of reading by:

By the end of the sixth grade,
the student will demonstrate
her/his enjoyment of reading by:

By the end of the ninth grade,
the student will demonstrate
her/his enjoyment of reading by:

to
Reading

A.

A.

A.

Reading materials of her/
his choice during free time,
both in school and at home.

Reading materials of her/
his choice during free time,
both in school and at home.

Example Experiences and/or Activity

Measurement

1.

1.

2.

Allow time in school for students to read for
their own pleasure. Students should be free
to read the kind of materials they them­
selves select.
Have the students and/or their parents keep a log
of what they (the students) are reading at home.
Any kind of reading material should be considered
. allowable.____________________________________

B.

Going frequently to places
where reading materials are
available, such as libraries,
reading rooms, book sales,
and book exchanges.

B.

Given the opportunity to do so, stu­
dents will freely select and read
books, magazines, or whatever appeals
to them. The observer will set his or
her own objective. It may be: "Given
the opportunity to do so, 90% of the
students will read of their own choice
for at least
minutes."

Going frequently to places
where reading materials are
available, such as libraries,
reading rooms, book sales,
and book exchanges.

B.

Example Experiences and/or Activity

Measurement

1.

1.

2.

Provide time for students to go to the school
library or other places where they can select
reading materials.
Especially allow individual students to go to
•the library as the need arises, or as they wish
to do so.

Reading materials of her/
his choice during free time,
both in school and at home.

Going frequently to places
where reading materials are
available, such as libraries,
reading rooms, book sales,
and book exchanges.

The teacher's objective may be: "Given
the opportunity to go to the library,
90% of the students will choose to go
and select a book or some other reading
material."

C.

Requesting reading materials in addition to those
assigned by the teacher.

C.

Requesting reading materials in addition to those
assigned by the teacher.

C.

Example Experiences and/or Activity

Measurement

1.

Teachers can encourage students to ask for
additional reading materials by making attrac­
tive, high Interest materials readily available.

1.

D.

Responding to the opportunity to talk about and/or
discuss what he/she has read.

D.

The teacher's objective might be: "Over
the course of _weeks, 90% of the
students will at least once ask for
or seek out additional reading materials
that are not 'required'."

Responding to the opportunity to talk about and/or
discuss what he/she has read.

D.

Measurement

1.

Give students ample opportunity to talk about what
they are reading with other students or to adults.
Conversations and discussions may be conducted
class-wide or in small groups. Informal and openended discussions are particularly appropriate.

1.

E.

Taking part in creative
activities related to read­
ing such as puppet shows,
dramatizations, creative
dramatics, art/music activ­
ities, creative writing
activities, investigative
activities, and so on.

Responding to the oppor­
tunity to talk about and/or
discuss what he/she has read.

198

Example Experiences and/or Activity

E.

Requesting reading mater­
ials in addition to those
assigned by the teacher.

The classroom objective might read:
"Given the opportunity to do so, 90%
of the students will, during the course
of a week, choose to talk with someone
else about what they have read."

Taking part in creative
activities related to read­
ing such as puppet shows,
dramatizations, creative
dramatics, art/music activ­
ities, creative writing
activities, investigative
activities, and so on.

E.

Taking part in creative
activities related to read­
ing such as puppet shows,
dramatizations, creative
dramatics, art/music activ­
ities, creative writing
activities, investigative
activities, and so on.

I

Example Experiences and/or Activity .

Measurement

1.

Give the students opportunities to relate their
reading activities to a variety of creative
activities,

1.

2.

See Speaking and Listening objectives (especially
Creative Dramatics).

-26-

A classroom objective might read: "Given
the opportunity to do so, 90% of the
students will, sometime during the course
of a three-week period of time, choose to
take part in a creative activity related
to reading."

APPENDIX C

Appendix >C>
Proportion scores of the reading instructional programs and the Michigan Educational
Assessment Program Experimental Reading Test Grade 4 as measured by the Reading
Concepts Checklist, (RCC).

P4

P
'5

T4

.8641

.6990

.6796

.2427

1.00

.3333

.8333

.75

1.00

.75

.50

Phonic Analysis
(16 Concepts)

.9375

1.00

1.00

Structural
Analysis
(11 Concepts)

.8181

1.00

1.00

.9090

.7272

1.00

1.00

.8333

.3333

P1

P2

Total Score
(103 Concepts)

.8641

.8932

Auditory
Discrimination
(6 Concepts)

.6666

Visual
Discrimination
(4 Concepts)

Comprehension:
Vocabulary
Development
(6 Concepts

1.00

1.00

1.00

.75

1.00

0.00

0.00

0.00

0.00

.6666

200

P3

Category

Appendix C Continued

Category

P1

Literal
Comprehens ion
(13 Concepts)

1.00

P2

.9231

P3

1.00

P4

P5

T4

.8462

.8462

.3077

Inferential
Comprehens ion
(17 Concepts)

.8824

.9412

.9412

.6471

.5294

.4707

Critical
Comprehension
(8 Concepts)

.875

.75

.875

.375

.50

.1250

Study Skills
(22 Concepts)

.7727

.5909

.6818

.4090

.50

.2272

Key:

= Ginn and Company
1?2 = Harcourt, Brace and Jovanovich

= Holt, Rinehart and Winston
P 4 = Houghton-Mifflin Company
P 5 = Scott Foresman Company
= Michigan Educational Assessment Program Experimental Reading Test Grade 4

APPENDIX D

Appendix D
Differences between proportion scores between the reading instructional programs and
between the Michigan Educational Assessment Program Experimental Reading Test Grade
4 as measured by the Reading Concepts Checklist, (RCC).

Value

G-C

HBJ

Category:
1
0

89
14

HRW

HMC

SFC

Q

T4

D.F.

P

H
o

Total Score Difference in Proportions Between Instructional
Programs and the Experimental Reading Test (103 Concepts)
90
13

89
14

72
31

70
33

25
78

162.4435

5

p <.001

Reject

1
0

89
14

90
13

Category I:

1
0

89
14

72
31

70
33

31.9685

4

p <.001

Reject

Auditory Discrimination - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test
(6 Concepts)

4
2

6

0

2
4

5
1

6

0

0

6

18.4043

5

p<.001

Reject

Auditory Discrimination - Differences in Proportions Between
Instructional Programs (6 Concepts)
1
0

4
2

6

0

2
4

5
1

6

9.33

0

The null hypotheses are rejected at the 0.05 level.

4

p>.05

Not
Rejected

Higher levels are indicated. Con't.

203

Total Score Differences in Proportions Between Instructional
Programs (103 Concepts)

Appendix D. Continued

Value

G-C

HBJ

Category II:
1

3

0

1

4
0

HRW

HMC

SFC

T.
4

D.F.

0

P

H
o

Visual Discrimination - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (4 Concepts)
3
1

2
2

3
1

0

12.3913

5

p <.05

Reject

4

Visual Discrimination - Differences in Proportions Between
Instructional Programs (4 Concepts)
3
1

4
0

3
1

2
2

3

5.00

4

p > .05

1

Not
Rejected

Category III; Phonic Analysis - Differences in Proportions Between Instruc­
tional Programs and the Experimental Reading Test (16 Concepts)
1
0

15

16

16

16

16

1

0

0

0

0

0
16

75.4819

5

p < .001

Reject

Phonic Analysis - Differences in Proportions Between Instructional
Programs (16 Concepts)
1
0

15
1

16
0

16
0

16
0

16
0

4.00

The null hypotheses are rejected at the 0.05 level.

4

p > .05

Not
Rejected

Higher levels are indicated. Con’t.

204

1
0

Appendix D. Continued

Value

G-C

HBJ

HRW

HMC

SFC

T4

D.F.

0

P

H
o

Category IV: Structural Analysis - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (11 Concepts)
1

8

0

3

10
1

10
1
2

9

8

3

2
9

27.4490

5

p < .001

Reject

Structural Analysis - Differences in Proportions Between
Instructional Programs (11 Concepts)
8

3

10
1

Category V:

1

6

0

0

6
0

10
1
2

9

5.7143

8

4

p>.05

3

Not
Rejected

Comprehension: Vocabulary Development - Differences in
Proportaions Between Instructional Programs and the Experimental
Reading Test (6 Concepts)
6
0

5
1

2
4

4

16.1111

5

p< .01

Reject

2

Comprehension: Vocabulary Development - Differences in Proporations
Between Instructioanl Programs (6 Concepts)
1
0

6

6

6

0

0

0

5
1

2

15.6667

4

p< .01

Reject

4

The null hypotheses are rejected at the 0.05 level.

Higher levels are indicated. Con't.

205

1
0

Appendix D. Continued.

Value

G-C

HBJ

HMC

HRW

SFC

T4

Q

D.F.

P

H

O

Category VI: Literal Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Tests
(13 Concepts)
1
0

13
0

12
1

13
0

11
2

11
2

4
9

30.7143

5

p < .001

Reject

Literal Comprehension - Differences in Proportions Between
Instructional Programs (13 Concepts)
13
0

12
1

13

11
2

0

11
2

5.7143

4

p > .05

Not
Rejected

Category VIII: Inferential Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (17 Concepts)
1
0

15
2

16
1

16
1

11

9

8

6

8

9

19.4554

5

p < .01

Reject

Inferential Comprehension - Differences in Proportions Between
Instructional Programs (17 Concepts)
1

15

16

0

2

1

16
1

11
6

9

13.2903

4

P<-01

Reject

8

The null hypotheses are rejected at the 0.05 level.

Higher levels are indicated. Con't.

206

1
0

Appendix D . Continued

Value

HBJ

G-C

HRW

HMC

SFC

T,4

0

D.F.

P

H

O

Category VIII: Critical Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (8 Concepts)
1
0

7
6
7
3
4
1
15.7143
5
P<-01
1 2
1 5
4
7
Critical Comprehension - Differences in Proportions Between
Instructional Programs (8 Concepts)

Reject

1
0

7
1

Not
Rejected

6

2

7
1

3
5

4
4

8.80

4

p>.05

1
0

17
5

13
15
9
11
5
22.5806
5
p<.001
Reject
9
7
13
11
17
Study Skills - Differences in Proportions Between Instructional
Programs (22 Concepts)

1
17
13
15
9
11
10.8108
4
p<.05
Reject
0
5
9
7
13
11
The null hypotheses are rejected at the 0.05 level. Higher Levels are indicated.
KEY: G-C - Ginn and Company
HMC - Houghton-Mifflin Company
HBJ - Harcourt Brace and Jovanovich
SFC - Scott Foresman Company
HRW - Holt, Rinehart, and Winston
T.
- Michigan Educational Assessment
Program Experimental Reading Test
Grade Four

207

Category IX: Study Skills - Differences in Proportions Between Instructional
Programs and the Experimenatl Reading Test (22 Concepts)

APPENDIX E

208

209
Appendix E
Summary of the values of the pairwise comparions of the
means of the proportion scores between the K-3 reading
instructional programs and the Michigan Educational
Assessment Program Experimental Reading Test Grade 4 by
individual category scores within the Reading Concepts
Checklist/ (RCC).

Experimental
Reading Test

Program
Category:

Auditory Discrimination

Ginn and Company

.6 6 6 6

Harcourt, Brace
and Jovanovich

1.00

Holt, Rinehart,
and Winston

.3333

Houghton-Mifflin
Company

.8333

Scott, Foresman Company
Category:
Ginn and Company
Harcourt, Brace
and Jovanovich

±.2258

1.00

Visual Dis trimination
.750

±.2868

1.00

Holt, Rinehart,
and Winston

.750

Houghton-Mi fflin
Company

.500

Scott, Foresman
Company

.750

Continued

210

Appendix E. Continued

Program
Category:
Ginn and Company

Experimental
Reading Test
Phonic Analysis
.9375

Harcourt, Brace
and Jovanovich

1.00

Holt, Rinehart,
and Winston

1.00

Houghton-Mifflin
Company

1.00

Scott, Foresman
Company

1.00

Cateqory:

*

±.2772

Structural Analysis

Ginn and Company

.5454

Harcourt, Brace
and Jovanovich

.7273

Holt, Rinehart,
and Winston

.7273

Houghton-Mifflin
Company

.6363

Scott, Foresman
Company

.4545

±.1017

Continued

211

Appendix E Continued

Program

Category:

Experimental
Reading Test

Comprehension - Vocabulary Development

Ginn and Company

.3334

Harcourt, Brace
and Jovanovich

.3334

Holt, Rinehart,
and Winston

.3334

Houghton-Mif f1in
Company

.1667

Scott, Foresman
Company
Category:

±.1470

-.3333
Literal Comprehension

Ginn and Company

.6923

Harcourt, Brace
and Jovanovich

.6154

Holt, Rinehart,
and Winston

.6923

Houghton-Mi ff1in
Company

.5385

Scott, Foresman
Company

.5385

±.0882

Continued

212

Appendix E. Continued

Program

Category:

Experimental
Reading Test

Inferential Comprehension

Ginn and Company

.4118

Harcourt, Brace
and Jovanovich

.4706

Holt, Rinehart,
and Winston

.4706

Houghton-Mifflin
Company

.1765

Scott, Foresman
Company

.0588a

Category:

±.1017

Critical Comprehension

Ginn and Company

.7500

Harcourt, Brace
and Jovanovich

.6250

Holt, Rinehart,
and Winston

.7500

Houghton-Mifflin
Company

.2500

Scott, Foresman
Company

.3750

aNon-Significant Difference

+.2037

Continued

213

Appendix E. Continued

Program
Category:

Experimental
Reading Test

ip

Study Skills

Ginn and Company

.5455

Harcourt, Brace
and Jovanovich

.3637

Holt, Rinehart,
and Winston

.4546

Houghton-Mifflin
Company

.1818

Scott, Foresman
Company

.2728

±.1211

APPENDIX F

214

Appendix F
Proportion scores of the reading instructional programs and the Michigan Educational
Assessment Program Experimental Reading Test Grade 7 as measured by the Reading
Concepts Checklist, (RCC).

Category

P3

P4

P5

T7

.8252

.8447

.5147

.6505

.2718

.00

.00

.00

.00

.00

.00

Visual
Di scrimination
(4 Concepts)

.00

.00

.00

.00

.00

.00

Phonic Analysis
(16 Concepts)

.8152

.9375

1.00

.0625

.9375

.00

1.00

.6364

.7273

.1818

.6777

.500

.6777

P1

P2

Total Score
(103 Concepts)

.8252

Auditory
Discrimination
(6 Concepts)

Structural
Analysis
(11 Concepts)
Comprehension
Vocabulary
Development
(13 Concepts)

1.00

.8333

1.00

1.00

.8333

Continued

Appendix F Continued

Category

P,X

P2

P3

P4

P5

T7

Literal
Comprehension
(13 Concepts)

.9231

.8462

.9231

.4615

.6154

.3077

Inferential
Comprehension
(17 Concepts)

.9412

.9412

.7647

.5882

.4118

Critical
Comprehension
(8 Concepts)

.750

.8750

.6250

.6250

.3750

.8636

.7727

.7273

.3636

Study Skills
(22 Concepts)
Key:

1.00

1.00

1.00

.8182

= Ginn and Company
?2

= Harcourt, Brace and Jovanovich

P^ = Holt, Rinehart, and Winston
P^ = Houghton-Mifflin Company
P,. = Scott, Foresman Company
= Michigan Educational Assessment Program Experimental Reading Test Grade 7

APPENDIX G

217

Appendix G
Differences between proportion scores between the reading instructional programs and
between the Michigan Educational Assessment Program Experimental Reading Test Grade 7
as measured by the Reading Concepts Checklist, (RCC).

Value

HBJ

G-C

HRW

HMC

SPC

T7

D.F.

Q

P

H
O

Category: Total Score Difference in Proportions Between Instructional
Programs and the Experimental Reading Test (103 Concepts)
1
0

85
18

85
18

87
16

53
50

67
36

28
75

153.2440

5

p < .001

Reject

Total Score Difference in Proportions Between Instructional Programs
1
0

85
18

85
18

87
16

53
50

67
36

64.5797

4

p < .001

Reject

Category I: Auditory Discrimination - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test Grade 7
(6 Concepts)
Not Considered:

Neither Taught nor Tested.

Category II: Visual Discrimination - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test Grade 7
(4 Concepts)
Not Considered:

Neither Taught nor Tested.

The null hypotheses are rejected at the 0.05 level.

Higher levels are indicated. Con't

Appendix G . Continued

Value

G-C

HBJ

HRW

HMC

SFC

T?

D.F.

Q

P

H

o

Category III: Phonic Analysis - Differences in Proportions Between Instructional Programs and the Experimental Reading Test Grade 7 (16 Concepts)
1
0

13
3

15
1

16
0

1
15

15
1

0
16

63.6923

5

p < .001

Reject

Phonic Analysis - Differences in Proportions Between Instructional
Programs (16 Concepts)
1
0

13
3

15
1

16
0

1
15

15
1

44.5714

4

p < .001

Reject

Category IV: Structural Analysis - Differences in Proportions Between Instructional Programs and the Experimental Reading Test Grade 7 (11 Concepts)
1

11

11

0

0

0

11
0

7
4

8

3

2
9

29.5875

5

p < .001

Reject

Structural Analysis - Differences in Proportions Between Instructional
Programs (11 Concepts)
1

11

11

0

0

0

11
0

7
4

8

12.6667

4

p < .05

Reject

3

The null hypotheses are rejected at the 0.05 level.

Higher levels are indicated. Con't

Appendix G. Continued

Value

G-C

HBJ

HRW

HMC

SFC

T7

D.F.

Q

P

H
o

Category V: Comprehension; Vocabulary Development - Differences in
Proportions Between Instructional Programs and the Experimental
Reading Test Grade 7 (6 Concepts)
1
0

5
1

6

0

5
1

4
2

3
3

4
2

7.1739

5

p > .05

Not
Rejected

Comprehension; Vocabulary Development - Differences in Proportions
Between Instructional Programs (6 Concepts)
5
1 0

6

5
1 2

4

3
3

6.5000

4

p > .05

Not
Rejected

Category VI; Literal Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (13 Concepts)
1
0

12
1

11
2

12
1

6

7

10
3

4
9

23.3562

5

p < .05

Reject

Literal Comprehension - Differences in Proportions Between
Instructional Programs (13 Concepts)
1
0

12
1

11
2

12
1

6

7

10
3

12.4000

The null hypotheses are rejected at the 0.05 level.

4

p < .05

Reject

Higher levels are indicated. Con't

220

1
0

Appendix G. Continued

Value

G-C

HBJ

HRW

HMC

SFC

T7

D.F.

Q

P

H
o

Category VIIi Inferential Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (17 Concepts)
1
0

16
1

16
1

17
0

13
4

10
7

7
10

25.4301

5

p < .001

Reject

Inferential Comprehension - Differences in Proportions Between
Instructional Programs (17 Concepts)
1
0

16
1

16
1

17
0

13
4

10
7

13.8333

4

o < .01

Reject

Category VIII: Critical Comprehension - Differences in Proportions Between
Instructional Programs and the Experimental Reading Test (8 Concepts)
1
0

6

8

2

0

7
1

5
3

5
3

3
5

11.5000

5

p < .05

Reject

Critical Comprehension - Differences in Proportions Between
Instructional Programs (8 Concepts)
1
0

6

2

8

0

7
1 3

5

5
3

6.8000

The null hypotheses are rejected at the 0.05 level.

4

p > .05

Not
Rejected

Higher levels are indicated. Con1

Appendix G. Continued

Value

G-C

HBJ

HRW

HMC

SFC

T?

0

D.F.

P

H

o

Category IX: Study Skills - Differences in Proportions Between Instructional
Programs and the Experimental Reading Test (22 Concepts)
22
0

1
0

18
4

19
3

17
5

16

8

6

14

28.3051

5

p < .01

Reject

Study Skills - Differences in Proportions Between Instructional
Proqrams (22 Concepts)
22
0

1
0

18
4

19
3

17
5

16

7.3103

6

Key:

G-C
HBJ
HRW
HMC
SFC
T7

=
=
=
=
=
=

p > .05

Not
Rejected

Higher Levels are indicated.

Ginn and Company
Harcourt, Brace and Jovanovich
Holt, Rinehart, and Winston
Houghton-Mifflin Company
Scott, Foresman Company
Michigan Educational Assessment Program Experimental Reading Test Grade 7

222

The null hypotheses are rejected at the 0.05 level.

4

APPENDIX H

223

224
Appendix H
Summary of the values of the pairwise comparisons of the
means of the proportion scores between the 4-6 reading
instructional programs and the Michigan Educational
Assessment Program Experimental Reading Test Grade 7 by
individual category scores within the Reading Concepts
Checklist, (RCC).

Program
Category:
Not Considered:

Experimental
Reading Test
Auditory Discrimination
Neither Taught nor Tested.

Category: Visual Discrimination
Not Considered:
Category:

Neither Taught nor Tested.
Phonic Analysis

Ginn and Company

.8125

Harcourt, Brace
and Jovanovich

.9375

Holt, Rinehart,
and Winston

±.1441

1.00

Houghton-Mi fflin
Company

.0625a

Scott, Foresman
Company

.9375

aNon-Significant Statistical Difference.

Continued

225

Appendix H. Continued

Program

Category:

Experimental
Reading Test

Structural Analysis

Ginn and Company

.8182

Harcourt, Brace
and Jovanovich

.8182

Holt, Rinehart,
and Winston

.8182

Houghton-Mi f flin
Company

.4546

Scott, Foresman
Company

.5455

Category:

.1666

Harcourt, Brace
and Jovanovich

.3333

Holt, Rinehart,
and Winston

.1666

Scott, Foresman
Company

±.1247

Comprehension - Vocabulary Development

Ginn and Company

Houghton-Mifflin
Company

ip

±.1347

o.ooa
.1667

aNon-Significant Statistical Difference.

Continued

226

Appendix H. Continued

Program

Category:

Experimental
Reading Test

Literal Comprehension

Ginn and Company

.6154

Harcourt, Brace
and Jovanovich

.5385

Holt, Rinehart,
and Winston

.6154

Houghton-Mi ff1in
Company

.1538

Scott, Foresman
Company

.3077

Category:

±.1176

Inferential Comprehension

Ginn and Company

.5294

Harcourt, Brace
and Jovanovich

.5294

Holt, Rinehart,
and Winston

.5882

Houghton-Mifflin
Company

.3529

Scott, Foresman
Company

.1764

+.0929

Continuted

227

Appendix H. Continued

Program

Category s

Experimental
Reading Test

Critical Comprehension

Ginn and Company

.3750

Harcourt, Brace
and Jovanovich

.6250

Holt, Rinehart,
and Winston

.5000

Houghton-Mifflin
Company

.2500

Scott, Foresman
Company

.2500

Category:

*

±.1411

Study Skills

Ginn and Company

.6364

Harcourt, Brace
and Jovanovich

.4546

Holt, Rinehart,
and Winston

.5000

Houghton-Mi ff1in
Company

.4019

Scott, Foresman
Company

.3637

±.0779

BIBLIOGRAPHY

228

BIBLIOGRAPHY
Aaron, Ira E. and Carter, Sylvia, Step Right Up, Glenview,
Illinois: Scott, Foresman and Company, 1978.
Aaron, Ira E.; Davis, Charles and Schelly, Joan, Flying
Hoofs, Glenview, Illinois: Scott, Foresman and
Company, 1978.
Aaron, Ira E.; Jackson, Dauris; Riggs, Carole; Smith,
Richard G. and Tierney, Robert, Racing Stripes,
Glenview, Illinois: Scott, Foresman and Company,
1978.
Aaron, Ira E. and Koke, Rena, Ride A Rainbow, Glenview,
Illinois: Scott, Foresman and Company, 1978.
Airasian, Peter W., "The Role of Evaluation in Mastery
Learning," in Mastery Learning Theory and Practice,
James H. Block, ed. New York: Holt, Rinehart and
Winston, Inc., 1971.
American Psychological Association, Standards for
Educational and Psychological Tests, Washington,
D. C.: American Psychological Association, 1974.
American Psychological Association, Inc., Technical
Recommendations for Psychological Test and Diagnostic
Techniques. Washington, D. C.: APA 1954 in Ebel,
Robert L. Essentials of Educational Measurement.
2nd. ed., Englewood Cliffs, New Jersey: PrenticeHall, Inc., 1972.
Barbe, Walter B., Personalized Reading Instruction,
9th Printing, Englewood Cliffs, New Jersey:
Prentice-Hall, Inc., 1967.
Berk, Ronald A., A Consumers1 Guide to CriterionReferenced Test Item Statistics, Paper Presented at
the Annual Meeting of the National Council on
Measurement in Education (Toronto, Ontario, 'Canada),
1978.

229

230
Botko, Louise Gorgos; Kerbs, JoAnn; Manning, John and
Klassen, Verla Krocker, Dragon Wings, Glenview,
Illinois: Scott, Foresman and Company, 1978.
>Cairns, Joanna; Galloway, Elizabeth and Tierney, Robert,
Daisy Days, Glenview, Illinois:
Scott, Foresman
and Company, 1978.
Cairns, Joanna; Galloway, Elizabeth and Tierney, Robert,
Hootenanny, Glenview, Illinois:
Scott, Foresman
and Company, 197 8 .
Clymer, Theodore and Barrett, Thomas C . , A Pocketfull
of Sunshine, Lexington, Mass.: Ginn and Company,
1979.
Clymer, Theodore and Bissett, Donald J., A Lizard to
Start With, Lexington, Mass.:
Ginn and Company, 1979.
Clymer, Theodore; Bissett, Donald J. and VJulfing, Gretchen,
Inside Out, Lexington, Mass.:
Ginn and Company,
1979.
Clymer, Theodore and Christenson, Bernice M . , Ready for
Rainbows, Lexington, Mass.: Ginn and Company, 1979.
Clymer, Theodore and Fenn, Priscilla Holton, How It Is
Nowadays, Lexington, Mass.: Ginn and Company, 1979.
Clymer, Theodore and Martin, Patricia Miles, The Dog
Next Door, Lexington, Mass.: Ginn and Company, 1979.
Clymer, Theodore; Martin, Patricia Miles and Gates, Doris,
May I Come In, Lexington, Mass.: Ginn and Company,
1979.
Clymer, Theodore; McCracken, Blair and McCullough,
Constance M., Measure Me, Sky, Lexington, Mass.:
Ginn and Company, 1979.
Clymer, Theodore; Parr, Billie; Gates, Doris and Robinson,
Eleanor G., A Duck is a Duck, Lexington, Mass.:
Ginn and>Company, 1979.
Clymer, Theodore; Parr, Billie; Gates, Doris and Robinson,
Eleanor G., Helicopters and Gingerbread, Lexington,
Mass.: Ginn and Company, 1979.
Clymer, Theodore; Stein, Ruth Meyerson; Gates, Doris and
McCullough, Constance M . , Tell Me How the Sun Rose,
Lexington, Mass.: Ginn and Company, 1979.

231
Clymer, Theodore; Wong, Olive and Benedict, Virginia
Jones, One to Grow On, Lexington, -Mass.: Ginn and
Company, 1979.
Cohen, S. Alan and Hyman, Joan S., Instructional Objectives
in Reading, New York: Random House, Inc., 1977.
Crambert, Albert C., Estimation of Validity for CriterionReferenced Tests, Paper Presented at the Annual
Meeting of the American Educational Research
Association (61st, New York, New York), 1977.
Cronbach, Lee J . , Educational Psychology, 2nd ed.. New
York: Harcourt, Brace and World, Inc., 1963.
Duffy, Gerald G. and Sherman, George B., Systematic
Reading Instruction, 2nd. ed., New York: Harper
and Row, 1977.
Durr, William K.; LePere, Jean M. and Alsin, Mary Lou,
Footprints, Boston, Mass.: Houghton-Mifflin
Company, 1979.
Durr, William K.; LePere, Jean M. and Alsin, Mary Lou,
Rockets, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K.; LePere, Jean M. and Alsin, Mary Lou,
Surprises, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K . ; LePere, Jean M . ; Alsin, Mary Lou;
Bunyan, Ruth Patterson and Shaw, Susan, Cloverleaf,
Boston, Mass.: Houghton-Mifflin Company, 1979.
Durr, William K.; LePere, Jean M.; Alsin, Mary Lou;
Bunyan, Ruth Patterson and Shaw, Susan, Honeycomb,
Boston, Mass.: Houghton-Mifflin Company, 1979.
Durr, William K.; LePere, Jean M. and Brown, Ruth Hayek,
Passports, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K.; LePere, Jean M. and Brown, Ruth Hayek,
Windchimes, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K.; LePere, Jean M . ; Niehaus, Bess and
York, Barbara, Sunburst, Boston, Mass.: HoughtonMiff lin Company, 1979.
Durr, William K.; LePere, Jean M . ; Niehaus, Bess and
York, Barbara, Tapestry, Boston, Mass.: HoughtonMiff lin, Company, 1979.

232
Durr, William K . ; Windley, Vivian O. and Earnhardt, Kay S.,
Impressions, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K.; Windley, Vivian O. and Yates, Mildred C . ,
Keystone, Boston, Mass.: Houghton-Mifflin Company,
1979.
Durr, William K . ; Windley, Vivian 0. and McCourt, Anne A . ,
Medley, Boston, Mass.: Houghton-Mifflin Company,
1979 .
Early, Margaret,
Look, Listen, and Learn,
Harcourt Brace and Jovanovich, 1979.

New York:

Early, Margaret; Canfield, Robert; Karlin, Robert and
Schottman, Thomas A . , Building Bridges, New York:
Harcourt Brace and Jovanovich, 19 79.
Early, Margaret; Canfield, Robert; Karlin, Robert and
Schottman, Thomas A., Moving Forward, New York:
Harcourt Brace and Jovanovxch, 1979.
Early, Margaret; Canfield, Robert; Karlin, Robert and
Schottman, Thomas A., Reaching Out, New York:
Harcourt Brace and Jovanovich, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy, Happy Morning Magic Afternoon and Reading
Skills 2/3, New York: Harcourt Brace and Jovanovich,
1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy, People and Places and Reading Skills 7 ,
New York: Harcourt Brace and Jovanovxch, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy, Ring Around the World and Reading Skills 9,
New York: Harcourt Brace and Jovanovxch, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy,
Sun and Shadow and Reading Skills 4, New
York: Harcourt Brace and Jovanovich, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy,
Sun Up and Reading Skills, New York: Harcourt
Brace and Jovanovich, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy, Together We Go and Reading Skills 5 , New York:
Harcourt Brace and Jovanovich, 1979.

233
Early, Margaret; Cooper, Elizabeth K, and Santeusanio,
Nancy, WideningtCircles and Reading Skills 8 , New
York: Harcourt Brace and Jovanovich, 1979.
Early, Margaret; Cooper, Elizabeth K. and Santeusanio,
Nancy, World of Surprises and Reading Skills 6 ,
New York: Harcourt Brace and Jovanovich, 1979.
Ebel, Robert L., Essentials of Educational Measurement,
2nd ed., Englewood Cliffs, New Jersey: PrenticeHall, Inc., 1972.
Ebel, Robert L., Essentials of Educational Measurement,
3rd ed., Englewood Cliffs, New Jersey: PrenticeHall, Inc., 1979.
Ebel, Robert L. , "The Case for Minimum Competency Testing, "
Phi Delta Kappan (April, 1978) .
Edmonston, Leon P. and Randall, Robert S., A Model for
Estimating the Reliability and Validity of CriterionReferenced Measures, Paper Presented at the Annual
Meeting of the American Educational Research
Association 56th, Chicago, Illinois) 1972.
Ekwall, Eldon E., Diagnosis and Remediation of the
Disabled Reader, 2nd. Printing, Boston, Mass.:
Allyn and Bacon, Inc., 1976.
Emrick, John A . , The Experimental Validation of an
Evaluation Model for Mastery Testing, Final Report,
Office of Education, Washington, D. C., November,
1971.
Estes, Gary; Colvin, Lloyd W. and Goodwin, Coleen, A
Criterion-Referenced Basic Skills Assessment Program
in a Large City School System, Paper Presented at
the Annual Meeting of the American Educational Research
Association (60th, San Francisco, California), 1976.
Evertts, Eldonna L. and Weiss, Bernard J., Never Give U p ,
New York: Holt, Rinehart and Winston, 1977.
Evertts, Eldonna L. and Weiss, Bernard J., People Need
People, New York: Holt, Rinehart and Winston, 1977.
Evertts, Eldonna L. and Weiss, Bernard J., Special
Happenings, New York: Holt, Rinehart and Winston,
1977.
Evertts, Eldonna L. and Weiss, Bernard J., The Way of the
World, New York: Holt, Rinehart and Winston, 1977.

234

Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B . , A Place For M e , New York: Holt, Rinehart
and Winston, 1977.
Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B . , A Time for Friends, New York: Holt,
Rinehart and Winston, 1977.
Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B., Books and Games,
New York: Holt,
Rinehart and Winston, 1977.
Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B., Can You Imagine,
New York: Holt,
Rinehart and Winston, 1977.
Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B., Hear, Say, See, Write, New York: Holt,
Rinehart and Winston, 1977.
Evertts, Eldonna L.; Weiss, Bernard J. and Cruikshank,
Susan B . , Pets and People,
New York: Holt, Rinehart
and Winston, 1977.
Everetts, Eldonna L . ; Weiss, Bernard J. and Cruikshank,
Susan B., Rhymes and Tales, New York: Holt,
Rinehart and Winston, 1977.
Farr, Roger, Reading: What Can Be Measured? Newark,
Delaware:
International Reading Association, 1969.
Freeman, Donald? Kuhs, Therese? Knappen, Lucy, and Porter,
Andrew, A Closer Look at Standardized Tests,
Institute for Research on Teaching, East Lansing,
Michigan, November, 1978.
Gavin, Anne T . , Guide to the Development of Written Tests
for Selection and Promotion: The Content Validity
Model. Technical Memorandum 77-6, Civil Service
Commission, Washington, D.C.: Personnel Measurement
Research and Development Center, 1977.
Glasser, R . , and Nitko, A. J . , Measurement in Learning and
Instruction. In R. L. Thorndike ed. Educational
Measurement, Washington: American Council on Education,
1971.
In Ronald K. Hambleton and William P. Gorth,
Criterion-Referenced Testing:
Issues and Applications,
Paper Presented at the Annual Meeting of the North­
eastern Educational Research Association (Liberty, New
York), 1970.

235
Haladyna, Tom and Roid, Gale, A Theoretical and
Empirical Comparison of Three Approaches to
Achievement Testing^ (New York: ERIC Document
Reproduction Service, Education 148903, May, 1978).
Hambleton, Ronald K. arid Norick, M. R. , "Toward an
Integration of Theory and Method for CriterionReferenced Tests," Journal of Educational
Measurement. In Sherry Ann Rubinstein and Paula
Nassif-Royer, The Outcomes of Statewide Assessment;
Implications for Curriculum Evaluation, Paper Presented
at the Annual Meeting of the American Educational
Research Association (61st, New York, New York), 1977.
Hays, William L., Statistics for the Social Sciences,
2nd ed., New York: Holt, Rinehart and Winston,
Inc., 1973.
Jackson, Dauris, Jumping Jamboree, Glenview, Illinois:
Scott, Foresman and Company, 1978.
Jackson, Dauris,
No Cages Please, Glenview, Illinois:
Scott, Foresman and Company, 1978.
Jackson, Dauris, Puppy Paws, Glenview, Illinois:
Scott, Foresman and Company, 1978.
Jenkins, Joseph R. and Pany, Darlene, "Curriculum
Biases in Reading Achievement Tests," Journal
of Reading Behavior, Vol. X, No. 4, (Winter, 1978).
Jennings, Robert E. and Prince, Dorothy E., Calico Caper,
Glenview, Illinois: Scott, Foresman and Company, 1978.
Kearney, Philip; Donovan, David L.; and Fisher, Thomas H.,
"In Defense of Michigan's Accountability Program,"
Phi Delta Kappa 56 (September, 1974).
Kelley, Truman, "The Selection of Upper and Lower Groups
for the Validation of Test Items," Journal of
Educational Psychology, Vol. 30, (1939), in Robert L.
Ebel, Essentials of Educational Measurement, 2nd ed.,
Englewood Cliffs, New Jersey: Prentice-Hall, Inc.,
1972.
Lennon, Roger T., "Assumptions Underlying the Use of
Content Validity," In Readings in Measurement and
Evaluation in Education and Psychology, Edited by
William A. Mehrens, New York: Holt, Rinehart,
and Winston, 1976.

236
Lewis, Juanita; Harrison, M. Lucile; Durr, William K.
and McKee, Paul, Getting Ready to Read, Boston,
Mass.: Houghton-Mifflin Company, 1979.
Linehart, Marsha M., Content Validity in Behavioral
Assessment, Paper Presented at the Annual Meeting
of the American Psychological Association (84th,
Washington, D.C.), 1976.
Magnusson, David, Test Theory, Trans. By Hunter, Mabon,
Reading, Mass.: Addison-Wesley Publishing Company,
1967.
Marascuilo, Leonard A. and McSweeney, Maryellen, Nonparametric and Distribution-Free Methods for the
Social Sciences, Monterey, California: Brooks/
Cole Publishing Company, 1977.
Market Data Retrieval, Inc., HM Co. Market Research
Report No. 17, Reading K - 8 Survey, New York:
Market Data Retrieval, Inc., 1977.
McCormick, Dean Richard, "The Controversial Development
of the Michigan Educational Assessment Program
1969-1977"
(unpublished Ph.D. dissertation, Michigan
State University, 1978).
Mehrens, William A., Technical Report: The Fifth Report
of the 1973-74 Michigan Educational Assessment
Program. Michigan State Department of Education,
Lansing, Michigan 1975.
Mehrens, William A. and Ebel, Robert L., Some Comments
on Criterion-Referenced and Norm-Referenced Achievement
Tests, NCME Measurement in Education, Vol. 10",
No. 1, Washington, D.C.: National Council on
Measurement in Education, Winter, 1979.
Mehrens, William A. and Lehmann, Irvin J., Measurement
and Evaluation in Education and Psychology, 2nd ed.,
New York: Holt, Rinehart and Winston, 1978.
Michigan Department of Education,
1967-77, Lansing, Michigan:
Education, undated.

Michigan Accountability
Michigan Department of

Michigan Educational Assessment Program, First Report of
the 1977-78 Michigan Educational Assessment Program,
Interpretive Manual, Lansing, Michigan, 1978.

237
Michigan Educational Assessment Program, Technical Report,
Lansing, Michigan; Michigan Department of Education,
1977.
Nunnally, Jum C., Educational Measurement and
Evaluation, 2nd ed., New York: McGraw Hill Book
Company, 1972.
Pipho, Chris, State Activity Minimal Competency Testing,
Denver, Colorado: Education Commission of the States,
October 5, 1978.
Popham, W. James and Husek, T. R. , "Implications of
Criterion-Referenced Measurement," Journal of
Educational Measurement, 1969. In Ronald K.
Hambleton and William P. Gorth, Criterion-Referenced
Testing: Issues and Implications, Paper Presented at
the Annual Meeting of the Northeastern Educational
Research Association (Liberty, New York), 1970.
Reid, Ethna R., Teaching Literal and Inferential
Comprehension, Salt Lake City, Utah: Cove Publishers,
1978.
Riggs, Carole, First Feathers, Glenview, Illinois:
Scott, Foresman and Company, 1978.
Riggs, Carole, Hello, Sunshine, Glenview, Illinois:
Scott, Foresman and Company, 1978.
Ross, C. C. and Stanley, Julian C., Measurement in
Today's Schools, Englewood Cliffs, New Jersey:
Prentice-Hall, Inc., 1954 in Ebel, Robert L.,
Essentials of Educational Measurement, 2nd ed.,
Englewood Cliffs, New Jersey: Prentice-Hall, Inc.,
1972.
Roudabush, Glen E . , Item Selection for CriterionReferenced Tests. Paper Presented at the Annual
Meeting of the American Educational Research
Association, (57th, New Orleans, La.) 1973.
Rubinstein, Sherry Ann and Nassif-Royer, Paula, The
Outcomes of Statewide Assessment: Implications for
Curriculum Evaluation, Paper Presented at the Annual
Meeting of the American Educational Research
Association (61st, New York, New York), 1977.

238

Smith, Douglas A . , The Effects of Various Item Selection
Methods on the Classification Accuracy and Classification
Consistency of Criterion-Referenced Instruments, Paper
Presented at the Annual Meeting of the American
Educational Research Association (62nd, Toronto,
Ontario, Canada), 1978.
Smith, Richard G. and Tierney, Fins and Tales, Glenview,
Illinois: Scott, Foresman and Company, T978.
Spool, Mark D., Performing a Content Validity Study,
Paper Presented at the Annual Meeting of the
Southeastern Psychological Association (21st,
Atlanta, Ga.), 1975.
Swezey, Robert W. and Pearlstein, Richard B., Guidebook
for Developing Criterion-Referenced Tests, Army
Research Institute for the Behavioral and Social
Sciences, Arlington, Va., 1975.
Tallmadge, G. Hasten and Horst, Donald P., The Use of
Different Achievement Tests in the ESEA Title I
Evaluation System, Paper Presented at the Annual
Meeting of the American Educational Research
Association (62nd, Toronto, Ontario, Canada), 1978.
Tanenbaum, Arlene B . , and Miller, Christine A. The Use
of Congruence Between the Items in a Norm-Referenced
Test and the Content in Compensatory Education
Curricula in the Evaluation of Achievement Gains,
Paper Presented at the Annual Meeting of the American
Educational Research Association (61st, New York,
New York), 1977.
Weiss, Bernard J . , Freedom's Ground,
Rinehart and Winston, 1977.

New York:

Weiss, Bernard J., Riders on the Earth,
Rinehart and Winston, 1977.

Holt,

New York:

Holt,

Weiss, Bernard J. and Stener, Loreli Olson, Time To
Wonder, New York: Holt, Rinehart and Winston, 1977.
Wert, James E.; Neidt, Charles 0. and Ahman, J. Stanley,
Statistical Methods in Educational and Psychological
Research, New York: Appleton-Century-Crofts, Inc.,
1954.
Wight, Albert R., "Beyond Behavioral Objectives," Readings
in Measurement and Evaluation in Education and
Psychology, William A. Mehrens, ed., New York:
Holt, Rxnehart and Winston, 1976.