. jﬁ'
.

4 .

‘ [*3 ‘ 4 .._",
l' s.- \ n‘hik
(«- uh‘t U >
. .l. L
g.

33

\z
‘3:

v M
1 “$1“ 3% ~
3 3 34351-4 .

1
I

.st 7 ‘

‘1'. ' ‘23 “mm,

T‘W 1.3,. {.333

t‘wﬁﬁ‘érq ' 3
E'k-mg‘“‘}3” '3", W
3‘. ‘Z'Y'Xﬁ'v "1.1;“ ‘1 “(’1
.‘w 33M ‘3‘“ ‘3 ‘
‘5 3 ,, "1435433, (111113333
~ 33:33: 3:333 3333-3

k ‘3'. “Qihk‘ewmﬂ‘ :' '-
. ‘ .31: ’ 33. 31334213

_ ‘ Mk -~:;,

,l'» " "‘ H‘Vx

1334'} K'ué‘g. {EEJLQ‘G}

“W 1%,: .

9.

‘15 '1} w. '.,
:35. 3.33:: '
I ‘nt :5

£53;

A A?! r E“

Y

7'5”

I"
{in} ‘(

3.,
35,

1., '
L53". {Ly
‘i .

,' .g5'

~
‘éﬁ.
at .

‘32 .
{MW »

5»
a3:

3
., ,WJp
33133.31 !‘ '3
3 gr

X
,L‘
11‘“

" Ii ,3 3
3333333?“ ‘W
5!

3:

l’

‘57:.“

1 3; .‘. x. at."

11';

133*:

3}

at
‘1"

r"

J. _

j:-

v
v

“VAL
in
51122:?! t
33.13?
qu‘t

3: 1. ‘.
‘73:}

, g3“
33?. , 3‘3
. 333,, -

1‘.
1 .
Ry?!"

 

 

xx.

.3 ,3
I‘!

,(n f: . (‘3
3335:1333: 3; ~
N $.39» ‘X . -
1 $313 ‘1 L
‘l
.,

L
why if”! 3 - ‘9
$333514“: PER“;
33313333333333:
‘ . ‘1‘“ 'i ‘5" li‘kl'gﬁ
34%,“ ”“28: "3. K3
' ' 3“," 3%: LL Stag;
w “'3 ‘1 L ,. H 5&3.
“"‘3‘333‘ ‘ :3 ‘5‘
. “3‘35 ‘P.
- n \ 1L ‘

..n q: i EL
wmﬁwr u '
’ | “3.13. 1% 5,

\

‘ 3‘31.‘ 3.
‘L

‘!
3,", t1
xt‘cl‘z‘éll‘f
...g.‘ u'
x run 5" i
38%‘3r:%3‘52:332 .w 1333.53 . .~ 1
3:.q'333‘3i33tl.“ ‘ m3 .,,~ ’3,‘ 3
‘ﬁ 13:4. mug? , AH‘q‘ﬁ-h ﬁc‘ 1‘
.sal'xé‘Q‘g-g .3 ”‘1" 3 '3
MIME. ' '5’ “ “"3 7 5‘
V . . t. "3
, ~ . 931-1, ;.
N ”i ‘0 ' . .
1):? ”why *)
3. _ , ‘

3.

3:53..
3’:sz A

‘c‘
5:

”‘33”? 3,: -
.2- .. "'".*.Wi523 .3 ‘
4.3 3,3 , _,

_ ilw

'3- 13
5

Til

up. 3 ., .

Y3:

.fn‘
:

, ,9“. :5
avail
‘ @333"

- 5,, ,
gm
~waz~$5

@135...
g

@2552? .
. f2 A

$53-22*
23:23
4,39%
, a:

5!

7’:

:32; “7132,32.
twﬁ l 1
13-32:?

,5:

.

,3:
,3

3.13.3 ‘ ‘4’
“ iizfﬁﬁﬁéﬁﬁ
w§.:ﬁ;,;gqii . “,3. ,3
W‘ﬁi”. “1." “9,3 M 3v
” a! .d‘i‘z‘z" " 1
“3? n .H 7‘"

m 3,
u?

x
' 31:313. .,
“r25 3‘ E‘l‘“ 13"
Eaﬁtﬁk
“" 3' 7: ,- w
ﬁg? 33:35”:
‘ '._"'3&’1‘l," ‘9‘“ flit: ‘ 34‘,
W31, ' K.

1:, ,W1
,z._,:uz,,§3~ V333,? '3
x ‘1. j‘. ’3‘ 7“ ‘71 :
33333 ‘~3’3}:e3555$§3‘43$"" ‘3
33333.3 333323333 f1 ‘
3333;»?‘33331‘3 1'13“ 3‘33
up zmﬁ_wu.}m £3.31; Viv
@camrqge‘bﬂts‘ff 333,3,
'~z,§%§1§b\,ﬁ.l.ht¢c€3§‘§§a .. 33315
, 313%qu a I 13;“
3. 1“ ‘1“

IV {A '
‘ u“ L l‘“

23.11%“

3

4m

'. “ V {k

“I ~ 3 3 33.

£2“: w ' 2 3"? 33:.

L f ‘ 3:“ 3:: K~:§l.iﬁ)(;‘*u

“c; jaunt-35:3 1‘4

‘0‘- “? aw "“ZH‘Z‘ﬁ‘Jiﬁy‘ut.’

V '. k .. ‘ ~ 3.5.

s T11};E‘ '51}. MC'M. ‘34, 41313311133 ~11ch23-
35* " ' g‘x‘iaxféa c3323;
3 {33332333 3 3:3
v.23. '1 . : “Edith.

~ 3 q, , W 3‘“ “52%;; 3.23,

‘ v3. 3. " t, 3.; ~ ‘ .w “.0“

‘3‘ ”3 33’33 3
‘3 “33:3m3mm3
. . 3‘ «g. ‘4 ‘3.“ ‘ L' 1
. 3‘1: '3 ith , 3. ‘15. “ﬁx?"

‘1.

W1.
Y

v.

93- J ..

'Eﬁrﬂ?“
3,3333%

% pawl}; ‘5‘..."
L ’{WJvk‘ ~ ﬂwd‘ﬁ

litﬂvﬁ‘bgiﬂ

v Q; ‘ 1’3:- 1. .

‘ba _

‘3 3%? $32335
333% ', “P3. "3‘:

\

up

 

3‘25

\5‘ S‘é‘f’l \. »
{$3.33, ‘33
gum. :3
3:- :w 3*-

 

 

 

LIBRARY
Michigan State
3 University

 

This is to certify that the

dissertation entitled

A STUDY OF AVERAGE THIRD GRADE READERS'
ORAL READING PERFORMANCE IN MATERIAL
OF VARYING FRY DETERMINED READABILITIES

presented by
Janet Sue Dixon
has been accepted towards fulﬁllment

of the requirements for

L degree in Educat ion

L

(as

 

/ / Major professor

Date May 13, 1987

 

MCIIi-n. A!" .‘ . ' F '1‘

0-12771

 

 

MSU

LIBRARIES
gums

 

 

 

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will

 

be charged if book is
returned after the date
stamped below.

 

 

 

 

A STUDY OF AVERAGE THIRD GRADE READERS'

ORAL READING PERFORMANCE IN MATERIAL

 

OF VARYING FRY DETERMINED READABILITIES

BY

Janet Sue Dixon

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Department of Teacher Education

1987

 

 

Copyright by

JANET SUE DIXON

1987

ABSTRACT
A STUDY OF AVERAGE THIRD GRADE READERS'
ORAL READING PERFORMANCE IN MATERIAL
0F VARYING FRY DETERMINED READABILITIES
By

Janet Sue Dixon

The Problem

In order to match readers of given ability with
materials of suitable difficulty, practitioners will
frequently ask the reader to read aloud from the material in
question. The time involved with such a procedure, however,
makes it impractical when large numbers of readers or
materials are under consideration. The question raised by
this study is whether or not standardized test scores and
Fry Readability Graph data can be used to effectively

accomplish the same purpose.

Method

The subjects in this study were 50 third grade students
with grade equivalency scores on the Reading Test of the
California Achievement Tests within three months above or
below their grade placement at the time of testing. Each
subject read aloud form a set of five selections, one each
having Fry determined readabilities of first, second, third,
fourth and fifth grade. Thus the subjects reading
achievement was held relatively constant while the

readability of the selections varied. Traditional oral

reading assessment procedures were used to evaluate the
readings. It was expected that the readers would make more
miscues and read more slowly as the readability of the
selections increased and that the first, third and fifth
grade paragraphs would be at the readers' independent,

instructional and frustrational reading levels respectively.

Findings

Generally speaking the readability scores did not
appear to discriminate well. When only quantity of miscues
was considered performance on all paragraphs tended to be
virtually the same and at the readers' instructional reading
levels. In terms of rate, unacceptable miscues and fluency
the second grade paragraph appeared easiest and the fifth
grade selection the most difficult. Additional data
analysis, however, found miscues were highly predictable and
factors triggering them could be identified. These factors
were not related to those traditionally associated with
readability formulae, but were virtually identical to
factors reported in miscue research conducted more than ten

years earlier.

 

 

 

DEDICATION

To the memory of my mother
Katherine Boley Dixon

because she would be proud of this.

 

 

ACKNOWLEDGEMENTS

My sincere thanks to my committee chairman, Dr. George
Sherman, for his patience, understanding and support
throughout this long study. My thanks also to the members
of my committee, Dr. John Baldwin, Dr. Perry Lanier and Dr.
Lonnie McIntyre for their encouragement and patience as
well.

I am also deeply indebted to the following people in
the Bay City Public School System, without whose cooperation

this study could not have been done:

Dr. Charles Link, Assistant Superintendent for
Curriculum and Instruction, for his support and

internal coordination of the study,

Dr. Douglas MacPherson, Director of Research and

Evaluation, for his assistance during data

analysis,

Mr. Wesley Garner, Director of Chapter I Services,

for cooperation during the testing phase,

vi

The Principals in the participating buildings:
Mr. Leon Katzinger, Mr. Clem Kaye, Mr. Warren
Liken, Mr. Pete Mayo and Mr. Ron Stachowiak, for
their help in communicating with parents and

arranging for data collection,

The third grade teachers in the participating
buildings: Mrs. Beverly Ballor, Mrs. Gloria
Brooks Garcia, Mrs. Janice Harbour, Mrs. Edith
Hinkley, Mrs. Nancy Lusher, Mrs. Nancy Maier, Mrs.
Janet Moll, Mrs. Rita Narlock, Mrs. Sandra
Remensnyder, Mrs. Sandra Stachowiak, Mrs. Susan
Tanner, Mrs. Irene Tobias, Mrs. Shirley Wegener
and Mrs. Joan Wilson, for so graciously adjusting
their schedules in order to allow subjects in

their rooms to participate in the data collection.

But most of all I am indebted to the children who
participated in the study, to their parent who so trustingly
gave their permission and in particular, to the fifty
children who read the research passages with such eagerness
and enthusiasm. They will always be remembered with great

delight.

 

TABLE OF CONTENTS

LIST OF TABLES...........................................

CHAPTER

I.

II.

THE PROBLEM....................................
Overview.. ............. ..................
Background of the Problem. ........ ........
Introduction to the Problem....... ...... ..

REVIEW OF THE LITERATURE.....
Overview...................................

Part

Measuring Reading Ability ....... .....
Measuring Readability................
Concerns in Test-Formula Matching....

Statement of the Problem..................
Purpose of the Study.... ......... .........
Questions Directing the Study.............
Need for the Research.. ..... ..............

Definition of Terms........................

I:

Determining Readability...............

Historical Background.....................
Deve10pment of Readability Formulae.......

Historic Trends..... .......... .......
Methodology..........................

Limitations of Readability Formulae.......

Limitations in Factors Studied.......
Limitations in Criterion ...... .......

Validity of Readability Formulae..........

Original Presentation of the
Readability Method ......... ..........
Original Criterion Prediction........
Correlation with Other Readability
Formulae.. .................. ...
Experimental Validation Studies...
Validation Against Outside Criteria.
Oral Reading Criteria in Readability
Research ...... ... ...... ........

.22
.22
.25
.27
.28
.29
.30
.33
.36

.37
.37

.38
.40
.42

.43

Development of the Fry Graph ....... ........44
Recent Trends in Readability Prediction....

Ease and Speed of Use ............... .
Criterion Developments... ....... ...
Readability in the Early Elementary
Grades ............... . ......... ......

viii

48
.49
.53

.56

III.

IV.

Part

Oral Reading in Readability Measurement
and Prediction ....... ............ .......... 60

II: Determining Reading Ability...........62

Introduction....................... ........ 62
Standardized Tests ............... ..........62

The California Achievement Tests......67
Oral Reading Assessment ............. .......69

Development of Traditional Practices..70
Traditional Versus Psycholinguistic

Diagnosis ......... ....... ............. 78
Summary of the Literature Review ..... ......85
Formula Limitations ................... 86
The Fry Graph ...................... ..87
The Assessment of Reading Ability.....87
DESIGN OF THE STUDY ........ ... ..... . ............ 90
Overview ................................... 90
Questions Guiding the Study .......... ......91
Hypotheses ................................. 92
Population ............ . .................. ..93
Sample Selection ....................... ....93
Measurement of Student Reading Ability ..... 94
Instrument Selection and Construction. ..... 94
Passage Selection ................ .....94
Determination of Readability.... ...... 96
Data Collection ............................ 96
Data Recording ............................. 97
Data Analysis .............................. 98
PRESENTATION AND ANALYSIS OF RESULTS .......... .100
Introduction ...................... ... ..... 100
Presentation of Results ................... 101
Additional Data Analysis ...... . ...... ..110
Presentation of Additional Data Analysis. .113
Descriptive Miscue Analysis ............... 122
Summary of Results .................. ......129
Summary of Results from
Four Measures of Difficulty .......... 129
Summary of Results from
Functional Reading Levels ............ 132
Summary of Results from
Miscue Frequency Data ........... .....134
SUMMARY AND CONCLUSIONS ..................... ...136
Introduction ........................... ...136
Summary ................................... 136
Conclusions ............................... 138
Discussion ................................ 141
Implications .............................. 144
Recommendations ........................... 146

ix

 

 

APPENDICES

APPENDIX A

Letter from Principals to Parents ..... .........l49
Parental Permission Slip... ...... ......... ..... 151
APPENDIX B
The Research Passages..........................152
APPENDIX C
The Fry Readability Graph ....... ........ ....... 158
APPENDIX D
Formulae and Computational Procedures..........159
APPENDIX E
Summary of Computations ...... . ................. 161
REFERENCES. ...... ....... .......... ..... .................. 168

TABLE

IV-1

IV-2

IV-3

IV-4

IV-5

IV-9

IV-10

IV-ll

IV-12

LIST OF TABLES

MEANS OF WORD RECOGNITION ACCURACY SCORES
BASED ON TOTAL NUMBER OF MISCUES............. ..... 101

MEANS OF READING RATE SCORES... ...... .. ..... . ..... 102

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
ON PARAGRAPH #1..................... . ............. 105

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
ON PARAGRAPH #3. .............................. ....107

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL

0N PARAGRAPH #5.... . ......................... 109
MEANS OF WORD RECOGNITION ACCURACY SCORES

WHEN ONLY UNACCEPTABLE MISCUES WERE COUNTED ....... 114
MEANS OF GENERAL IMPRESSION OF FLUENCY SCORES.....115

FREQUENCIES 0F MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #1 .................. . ................ 117

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #2 ................................... 118

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #3 ................................... 119

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #4 ..................... . ........ .....120

FREQUENCIES OF MISCUES

OCCURRING ON EACH WORD
IN PARAGRAPH #5 ................................... 121

xi

IV-13

IV-14

SUMMARY OF DIFFERENCES
FOUND BETWEEN PARAGRAPHS
ON FOUR MEASURES OF DIFFICULTY....................131

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONING READING LEVEL
ON EACH PARAGRAPH.................................133

CHAPTER I

THE PROBLEM

Overview

In this chapter the problem will be introduced,
background information will be presented and the importance
of the problem will be established. The questions directing
the research will be given and terminology pertinent to the

study will be defined.

Background of the Problem

About 1840 the McGuffey Readers introduced the concept
of graded difficulty to American Schools. Forerunners of
the modern basal series, their author, William Holmes
McGuffey had based the texts on two important premises which
still govern the way most reading is taught today: (a) The
difficulty of reading material (readability) can be
controlled and (b) controlling readability facilitates
learning to read.

While McGuffey's method for controlling readability
would be debatable, the relationship between task difficulty
and learning which he recognized, has since been well
supported in the research and by successful instructional
practices. It would eventually affect not only the teaching
of reading but the development of instructional theory and
the structure of curriculums. Taxonomies (Bloom, 1956)

hierarchies (Gagne, 1968; Gagne, 1969), task analysis

 

 

(Anderson and Faust, 1973, Chapter 3; DeCecco, 1968, Chapter
2), programmed instruction (Glaser, 1965; DeCecco, 1968,
Chapter 12; Lumsdaine, 1960, 1964), and mastery learning
(Carroll, 1963; Block, 1971; Block and Anderson, 1975;
Bloom, 1976; Smith, 1977) would be among the terms and
methods made familiar by educators and educational
psychologists as they described the process of breaking
complex learnings into simpler underlying tasks, usually
arranged in some hierarchical form. Ideally the learner
begins at a point in this sequence where he can succeed and
master tasks of gradually increasing difficulty until the
complex learning has been accomplished.

Maximizing success is central to this process of
controlling task difficulty, for the facilitating effect of
success on learning has long been recognized by virtually
every learning theorist. 0n the other hand, while failure
experiences may contribute positively to the learning
process under some conditions, (Cage and Berliner, 1984, p.
396-197; Weiner, 1972) such experiences can also be
devastating, and the undesirable consequences to the learner
who repeatedly fails have been frequently documented. Such
learners typically have shown increased anxiety, less
persistence, lowered aspirations, increased tendency to
repeat inappropriate responses or to use fantasy or
superstitious behaviors rather than realistic problem
solving strategies (Sears, 1940; Baker, 1941; Baker, Demo

and Lewin, 1941; Lantz, 1945).

 

 

3
While failure situations can become self-defeating and
are generally to be avoided, tasks that are too easy are
undesirable also, for they will not produce the desired
growth. As David Ausubel (1968) has observed:

If the material is too difficult, the learner
accomplishes disproportionately little for the
degree of effort he expends; if it is too easy,
his accomplishments are disappointingly meager in

terms of what he could have achieved were greater
effort demanded of him. (p. 325)

In addition Ausubel notes:

Inappropriately easy material...fails to stimulate

and challenge the learner adequately, fostering

boredom and disinterest. (p. 326)

Ideally then the teacher seeks to find that place in
the learning sequence where the tasks are of appropriate
difficulty for the learner. Sometimes called the student's
"instructional level", it is that point where the material
offers some challenge but where the student is capable of
handling that challenge without undo anxiety or frustration.

For some kinds of learnings the hierarchy involved can
be arranged in a relatively linear progression, each
subskill more or less prerequisite to the next. Finding the
instructional level is largely a matter of testing for
mastery of the underlying skills. Learning to read,
however, tends to be a developmental process, characterized
by stages of increasing complexity, involving many skills,
abilities and understandings which the reader must combine

more or less simultaneously and appropriately in order to

read a given selection. Finding the "instructional level"
then, is not simply a matter of testing for specific skills,
but depends on an evaluation of the reader's entire general
level of functioning in relationship to the difficulty of
the material being read.

In reading, this is commonly done by assessing the
learner's oral reading performance directly in the material
under consideration. This performance is typically
evaluated using some variation of procedures and criteria
popularized by Emmett Betts (1946) about 40 years ago.
Betts distinguished at least three different reading levels:
The instructional level, the independent level (material
that is easily read) and the frustration level (material
which is too difficult). From observation of the student's
oral reading "errors", the teacher makes a determination of
the level of difficulty of this material for this student.
This information might then be combined with other
knowledge, such as the student's interest in the subject,
the length of the selection and consequently the persistence
needed to finish it, or the format of the book, in deciding
if the student will be able to successfully read the
selection.

While finding a reader's instructional level may not be
a simple, precise or expedient matter, it is important in
the teaching of reading in order to eliminate the task
avoidance responses commonly associated with frustration and

failure. While task avoidance is certainly a hindrance in

5
any kind of learning it is particularly detrimental to
reading progress since reading, like many other complex
performances such as playing the piano, driving a car, or
learning a sport, seems highly affected by practice.

Not only does practice affect reading by reinforcing
and automating previously learned skills, but it is also
necessary to integrate those skills into meaningful and
fluent reading behaviors. In addition, we know that many
people learn to read with little if any apparent formal
instruction. Evidently what one needs to know to become a
better reader can often be learned intuitively while
reading, with little assistance from the teacher, if the
teacher can only find material motivating enough so the
learner will read it and easy enough so the learner can read
it. Failure experiences, on the other hand, can lead to
avoidance of further reading cutting off perhaps the most
important means by which the failing learner could improve.
Finding materials of suitable difficulty for the learner
then becomes an integral part of developing reading

proficiency.

Introduction to the Problem

If a practitioner wanted to know quickly whether a
student could read a particular book, the most logical
procedure would be to have the student read a few sample
passages aloud. Based on this observation, the practitioner

could then make a judgment concerning the student's ability

to handle the material. This procedure, in fact, is
frequently used when the question concerns one reader and
one book, but when the problem involves many children and
many books, the time required for listening to each child
read makes such a procedure impractical.

Obviously if there were some effective means of
measuring student reading ability and some corresponding
method for measuring passage difficulty (readability), the
process of matching readers with materials would be greatly
facilitated. Measures of both reading ability and
readability to exist, and they are frequently used together
to make decisions of this nature in research studies,
textbook selection and development of new materials.
However an examination of these measures poses important
questions and suggests serious limitations concerning their

use in this way.

Measuring Reading Ability Determination of a student's

 

reading achievement is most frequently made based on results
of either an Informal Reading Inventory (IRI) or a
Standardized Reading Achievement test. With both methods
the results are popularly reported using some form of grade
level norm.

The Informal Reading Inventory is more apt to be used
by teachers in special reading programs since it is a time
consuming, individually administered test which does not

lend itself to the structure of most classroom settings. It

 

 

can be teacher constructed from materials being used by the
student or the teacher may choose to use one of the
commercially published tests such as the Durrell Analysis of
Reading Difficulty (Durrell, 1937, 1955), the Diagnostic
Reading Scales (Spache, 1963, 1972) or the Classroom Reading
Inventory (Silvaroli, 1965).

When giving an IRI, deviations from the text made by
the reader while reading aloud are recorded. Then, usually
using some variation of criteria first popularized by Emmett
Betts (1946), the instructor decides if this material is at
the student's independent, instructional or frustrational
reading level. As previously noted, practitioners often use
this procedure by itself to determine directly if a
particular book is of suitable difficulty for a given
student. In the IRI, however, the selections are presumably

graded in difficulty corresponding to the grade levels of

basal texts. In this respect it then becomes a prediction
device. The assumption is that once the student's reading
levels are established in terms of the grade level

difficulties of the IRI passages, this information can
automatically be transfered to other material, and when the
student is reading another selection intended for that grade
they can be expected to perform in a similar fashion.

Using an IRI to establish a student's instructional
reading level assumes that all reading materials intended
for a given grade are of the same level of difficulty. The

publishers of basal texts, however, use individual standards

8

and standardization procedures in developing their books and
they do not conform to any universal standard applicable for
all basal publishers. Moreover, the standardization methods
they use are not typically well described in an easily
accessible manner, as they are in the documentation prepared
by standardized test publishers. Thus it is difficult to
compare norming procedures from series to series or even to
know what procedures were used. Teachers, however, will
frequently refer to one series as being "more difficult"
than another, and formula determined readabilities of basal
selections can differ greatly from series to series for
materials intended for use by the same grade, and may even
differ from selection to selection in the same book (Bradley
and Ames, 1976, 1977; Eberwein, 1979). This would suggest
that the difficulty of materials for a given grade may
differ considerably. It would seem then that it cannot be
assumed that a reader's performance based on one basal
series will automatically indicate performance in another,
nor can the passages from one IRI, and their grade level
indicators, necessarily be used as a meaningful standard for
judging the difficulty a reader may encounter in other
materials.

Standardized tests are most frequently used in
classrooms to assess reading achievement since they are
fast, convenient and highly reliable. They are excellent
for comparing the performances of readers, but they pose

serious problems when used to determine reading levels.

They are primarily tests of comprehension and offer no
opportunity to observe reading behaviors directly, or to
compare the reader's performance to a set of criterion
tasks. The scores are based on comparisons of students with
a standardization group and are not necessarily related to
the difficulty level of reading selections. A grade
equivalency score of 2.0 on a standardized test, therefore,
does not mean the test taker was able to comprehend material
with a beginning second grade readability, but that s/he was
able to answer as many questions correctly over the entire
test as did the average beginning second grader in the
standardization group.

There is empirical evidence that the grade equivalency
scores from standardized tests cannot be used to place
children at their instructional level, for when they are
compared to IRI results they usually yield significantly
higher grade placement scores. Using them for this purpose
will probably result in students being placed at their
frustration level (Sipay, 1964; Glaser, 1964). Also, most
norming procedures use one administration of the test during
the school year and the between grade norms for each month
of the year are interpolated from these results. This
practice assumes that reading growth proceeds at an even
rate, an assumption that is not supported by research
studies (Bernard, 1966; Lennon, 1951).

Finally grade equivalency scores imply that students in

different grades with the same scores have the same reading

lO
achievement. However, students scoring above their grade
placement and students scoring below their grade placement
may perform quite differently on an IRI even though their
scores from a standardized test indicated the same grade
level in reading achievement (Glaser, 1964; Farr and Carey,

1986, p. 153-154).

Measuring Readability The development of objective

 

methods for measuring the difficulty of reading passages has
also presented serious problems. While it is relatively
easy to observe that some materials are more difficult to
read than others, it is not so easy to identify or measure
the factors which account for that difference.

Early in this century, using improved statistical
procedures in factor analysis, researchers began to
systematically investigate aspects of writing which appear
to influence the ease with which material is read and
understood. Interest generated by these early studies,
along with increasing demands in society to understand and
control reading difficulty, eventually led to the
development of numerous formulae for calculating what is now
termed "readability".

Readability formulae attempt to give an objective
measure of factors within text which may affect the reading
accomplishment needed to handle the passage or the ease
with which the material can be read and understood. By

their nature, these devices must be based on only a limited

 

11

set of factors which can influence the difficulty level of a
passage, for while many factors have been studied,
invariably only a few emerge as significant enough (or

measurable enough) to be included in the final formula.

These factors usually include some measure of word
difficulty and some measure of sentence complexity.
Formulae cannot measure conceptual complexity, reader

interest, reader motivation, topic organization, figurative
language or such physical factors as format, illustrations,
or size of print, all of which may also contribute to the
difficulty one encounters when reading a given passage.
Moreover, results of studies concerning the validity of
readability prediction methods have been conflicting and
those studies concerning the ability of the devices to go
beyond prediction of relative difficulty to prediction of
difficulty for students in given grade levels, have been
generally negative.

Because of these limitations, readability formulae have
met severe criticism from many leaders in the field of
reading. At best, these authorities, and even the authors
of the formulae themselves, caution that these devices
should be used with great care and only as rough estimates
of relative difficulty. In spite of such warnings, however,
the grade level indexes yielded by these formulae are still
frequently combined with the grade equivalency scores
yielded form standardized test data to make decisions

concerning the appropriateness of difficulty of certain

_ anal“

12

materials for given readers.

Concerns in Test-Formula Matching Even if we were assured

 

of the validity of the tests and formulae involved to
measure reading ability and passage difficulty respectively,
the test-formula matching procedure assumes that the two
measures are congruent. The evidence would suggest that
probably they are not.

Standardized tests and readability formulae were not
developed using the same criterion measures nor were they
designed to be used together specifically for matching
readers with materials of appropriate difficulty. The test-
makers' prime concern has not been readability but rather
comparison of performances. Therefore, when readability
formulae are used to assess standardized test passages they
do not reveal an orderly progression of gradually increasing
difficulty as one might expect, and it is possible for a
student to receive a grade equivalency score of 2.0 on a
standardized test, without any passage on the test having a
readability of 2.0.

Moreover, standardized tests are typically measures of
comprehension. Readability formulae, on the other band, do
not measure comprehension directly, but rather deal with
factors in the text which may affect comprehension. It is
also evident that some authors never meant their formula to
be an indicator of the level of accomplishment needed as

associated with developmental reading achievement, but

13
rather as measures of "clear writing" style which increases
the ease of comprehension for adult readers (McElroy, 1953;
Flesch, 1948, 1949, 1954, 1958).

In the literature review for her study concerning "Easy
to Read" books for children, Margaret Paolo (1977) found the
use of oral reading in readability research has received
little attention. Validity studies which have attempted to
compare formula predictions with reader's performance have
typically used silent reading comprehension, rather than
word recognition, as the measure of that performance, even
though oral reading would seem a more logical choice since
it, like the formulae involved, does not assess
comprehension directly but rather deals with word and
sentence factors in the text which may affect comprehension.

Because comprehension has been used so exclusively in
such validity studies it has left practitioners with little
information concerning the usefulness of the various
readability formulae. If a reader's achievement test scores
and the formula's data suggest a given reader should be able
to read a given selection, but we find his comprehension in
the material to be low, the results do not tell us if the
reader was unable to handle the text at all, or if he could
read the text but found the situations or concepts presented
to be too complex of unfamiliar for his understanding. If
his comprehension of the material is good, it still does not
assure us that this material is at or below the student's

instructional reading level, for it is possible for a reader

 

14
to maintain an acceptable level of comprehension even
though experiencing frustration due to word recognition
difficulties. This might especially be true if the topic

involved is familiar or the selection is short.

Statement of the Problem

Observation of oral reading performance directly in the
material under consideration is frequently used to assess a
single reader's ability to read a given selection. The time
involved in such a procedure, however, makes it impractical
when large numbers of students are involved. This has led
to the practice of combining standardized test scores as a
measure of student reading ability, and readability formula
data as a measure of passage difficulty to determine if
certain readers will be able to read certain materials.

It would appear that direct observation of the reader's
performance in the material provides a more acceptable means
for matching readers with materials. The question raised by
this study is whether or not standardized test scores and
readability formula data can be used together to effectively
accomplish the same purpose. If they can, then we would
expect a great deal of consistency between and among oral
reading, standardized test scores, and readability measures.
However, as the preceding text has noted, this is often not
the case. Do oral reading assessment procedures, then,
which are primarily measures of word recognition, and

standardized tests, which are primarily measures of

15
comprehension, and readability formulae, which attempt to
measure characteristics in the text which may affect both
comprehension and word recognition all sample enough of the
same reading factors to allow a reader's performance on a
standardized test to predict that reader's oral reading
performance in material measured by a readability formula?
In greater detail, to what extent are a reader's grade
equivalency scores as measured by a standardized test
predictive of his functional reading levels as established
by his oral word recognition abilities when reading material
of a formula determined readability? And is this test-
formula relationship strong enough to make it a useful tool
for practitioners and justify its use as a basis for making
judgments and decisions in research studies, text selection,

and the development of new instructional materials?

Purpose of the Study

The purpose of this study is to investigate the
relationship between grade equivalency scores from the
California Achievement Tests and Fry Readability Graph
(1968) data. Specifically, it examines how effectively
grade equivalency scores from the Reading Subtest of the
California Achievement Tests, when used to identify a group

of "average' readers, and Fry Readability Graph estimates of
material difficulty, will predict the degree of difficulty a

reader will encounter when reading orally from material of

varying Fry determined readabilities. Subsequently, the

16
study will also investigate the relationship between the
Readability Graph scores of these selections and (a) the
number of oral reading errors (miscues) made by the readers,
(b) the readers' reading rate and (c) the readers'

functional reading levels.

Questions Directing the Study

If the grade equivalency scores from the California
Achievement Tests and Fry readability data provide an
effective means for matching readers with materials of
appropriate difficulty, then we would expect the readers to
make more word recognition errors (miscues) and to read more
slowly as the readability of the passages increases. We
would also expect the readers to read the passage with first
grade readability at their independent reading level, the
passage with third grade readability at their instructional
reading level and the passage with fifth grade readability
at their frustrational reading level. Based on these
expectations, the following questions were posed to be
answered by this study.

When average third grade readers, as determined by the
Reading Test of the California Achievement Tests, are
reading selections with varying Fry determined

readabilities:

1. Will the readers' word recognition accuracy, based
on their oral reading errors (word miscues), decrease as the

grade level readability scores of the selections increase?

17

2. Will the readers' reading rate (number of words
read per minute) decrease as the grade level readability
scores of the selections increase?

3. Will the readers read material with a first grade
readability at their independent reading level?

4. Will the readers read material with a third grade
readability at their instructional reading level?

5. Will the readers read material with a fifth grade

readability at their frustrational reading level?

Need for the Research

In spite of continual criticism, the use of readability
estimates appears to be rising. Publishers increasingly
list estimates of difficulty of their materials with the
names of the formula (or frequently formulae) used to make
those determinations. Increased demand for "High Interest,
Low Vocabulary" and "Easy to Read" books places continual
pressure on authors to control readability in their writing.
It is probably only the time involved in using the formulae
that has kept their use from becoming more prevalent. As
microcomputers become commonplace, however, the development
of more complex but faster and easier to use computerized
formulae promises to remove this restriction and further
increase their use.

The widespread acceptance of the readability concept
and the demand for readability information and control

underscores the serious need teachers and others have for

18

some indication of the suitability of a given material for a
given reader, even if that information may be questionable
and unproven. It is important, therefore, that studies be
conducted that either help practitioners define readability
scores operationally, discredit their use, or provide
estimates of how much confidence can be placed in them.
Such studies might also indicate how more predictive reader

ability - readability indexes could be developed.

Definition of Terms

Readability: Refers in general to any factor that affects
the ease with which a selection can be read and understood.
More specifically it has come to be associated with the
factors measured by readability formulae. In this study it
will refer to the scores from the Fry Readability Graph as
computed by the text analysis computer program School
Utilities Volume 2, available from the Minnesota

Educational Computer Consortium.

Fry Readability Graph: A nomograph developed by Edward Fry,

 

Rutgers University. It estimates readability using
sentences per 100 words and syllables per 100 words. For
books and longer selections, the final estimate is based on
an average of three samples. Because the selections in this
study are short, the Fry estimate will be based on the

actual text involved.

19

Functional Reading Level: A term used to refer collectively

 

to a reader's independent, instructional and frustrational

reading levels.

 

Independent Reading Level: Refers to material which a
reader can read easily. In this study it will refer to
material a reader can read with 99% or better word

recognition accuracy.

Instructional Reading Level: Refers to material a reader is

 

capable of reading with some help. It is the level of
difficulty which, ideally, should be used for instruction.
In this study it will refer to material a reader can read

with 95% to 99% word recognition accuracy.

Frustrational Reading Level: Refers to material that is too

 

difficult for a reader to read under any conditions. In
this study it will refer to material a reader reads with 90%

or less word recognition accuracy.

Miscue: A deviation from text which a reader makes when
reading orally. The term miscue is generally preferred to
the terms "mistake" or "error" because it more accurately
suggests what is occurring during the reading process,
suggesting that such deviations from text are not random

errors but, in fact, are cued by the thought and language of

the reader in his encounter with the written material

20

(Goodman and Burke, 1972).

Oral Reading Errors: Refers to a miscue made by a reader
when reading orally. The following types of miscues will be
counted as oral reading errors in this study: (a) Omissions,
(b) insertions, (c) substitutions, (d) partial or gross
mispronunciations (not caused by dialect or speech

difficulties) and (e) words aided.

Betts' Criteria: Criteria, developed and pOpularized by
Emmett Betts (1946), and used widely in oral reading
assessment procedures to determine a reader's functional
reading levels. In this study the Betts' word recognition
criteria of 99% word recognition accuracy will be used to
designate a selection as being at a reader's independent
reading level, 95% to 99% word recognition accuracy will be
used to designate a selection as being at a reader's
instructional reading level and less than 90% word
recognition accuracy will be used to designate a selection

as being at a reader's frustrational reading level.

Reading 3353: Refers to the speed with which material is
read. Researchers have used reading rate as an index of
speed of response which they in turn consider as an
indicator of automaticity (Samuels, 1979). In this study
reading rate will be given in terms of words read per

minute and will be determined by dividing the number of

    
 

~ the selection, multiplied by 60.

CHAPTER II

REVIEW OF THE LITERATURE

Overview

In this chapter a review and synthesis of selected
literature relevant to the study will be presented. The
review will be divided into two parts. Part I, Determining
Readability, will concentrate on (a) the development of the
readability concept, its measurement, and prediction and (b)
the use of oral reading in readability prediction and
validation. Part II, Determining Reading Ability, will
concentrate on the development and use of (c) standardized
tests as a measure of reading ability, and (d) the Informal

Reading Inventory and oral reading assessment procedures.

Part I

Determining Readability

Historical Background

The awareness that reading material can differ in
difficulty and the search for ways to control that
difficulty are probably as old as writing itself. Klare and
Buck (1954, p. 42) have noted, for instance, that much of
early literary criticism was concerned with comparing

"ornate" and "plain' styles among writers, and Klare (1963,
p. 29) cites a quotation from I Corinthians 14:9 as a

favorite among advocates of clear language: "Except ye utter

22

23

by the tongue words easy to be understood, how shall it be
known what is spoken? For ye shall speak into the air."

While awareness of style and admonishments to writers
may be evident early in the history of writing, the idea
that readability can be consciously and systematically
controlled seems to be a much more recent historical
development. In the early years of American education, for
instance, there was evidently no attempt to prepare books
specifically to meet the needs of beginning readers.
Colonial children learned to read by struggling as best they
could with whatever books were available. Usually those
books were of a religious nature intended for adults rather
than children. Chief among them, for instance, was the New
England Primer, which was so named, not because it was the
child's first book, or because it contained easy to read
material appropriate for beginning readers as the term
"primer" implies today, but because it contained religious
teachings which were considered "primary" for the child's
spiritual existence (Smith, 1986, p. 18—25; Ford, 1952).

It should be noted that in colonial times education was
primarily for the few, the wealthy and those with facility

for learning, and the primary purpose for reading was

religious. Once public school education became established
by law, however, and as concern grew for creating an
educated electorate, the situation began to change. As

Klare and Buck (1954) have noted

24

Saving Everyman's child from illiteracy was a

different job from teaching the sons of merchants

to read the Scriptures. It required different

tools. (p. 40)

Klare and Buck (1954, p. 41) observe that, when
compared with texts previously offered to children, the
basic differences which appeared in the books of McGuffy and
his contemporaries were their secular content and the fact
that they were "graded" in vocabulary and reading
difficulty. It appears that these authors were developing a
concept of readability similar to that generally used today.
They believed readability could be consciously controlled,
and several decades before any scientific investigations of
readability were begun, they were already identifying and
manipulating factors which they felt affected it.

McGuffey and his contemporaries seemed to view
vocabulary as the primary determinant of reading difficulty,
for as Spache and Spache (1977) have observed

This author (McGuffey) controlled the difficulty

of his books, he believed, by the length of words

in the stories. The opening book used only two-

or three-letter words and longer words were

gradually introduced in later books. (p. 42)

Klare (1963, p. 30) notes that this relationship
between vocabulary and reading difficulty seems to have been
generally agreed upon during this period with much early
work focusing on it, and Chall (1958, p. 17) contends that
vocabulary has probably always been associated with reading

difficulty.

25

Interest in the relationship between vocabulary and
reading difficulty eventually led to the publication in 1921
of IRE Teacher's W251 £22k by E. L. Thorndike. This work,
which listed words with tabulations of their frequencies in
print, was intended to provide estimates of the commonness
of words and therefore their relative importance. The list
would influence the teaching of vocabulary in schools for
generations and would also be a significant event in
readability development since it would be used as the basis
for many later readability formulae.

Klare (1963, p. 32) cites two additional events for
their significant contribution to the deve10pment of modern
readability theory. One was the formation in 1935 of the
Sub-committee on Readable Books of the Commission on the
Library and Adult Education. This committee consolidated
the efforts of scattered individuals and gave recognition to
the problem of readability in general. The second event was
the publication, during that same year, of W. A. McCall and
Lelah Mae Crabbs' Standard Tgst Lessons in Reading. This
set of graded reading passages would later become the most
often used criteria for the construction of readability

formulae (Klare, 1984, p. 685).

Development of Readability Formulae
Early in this century interest in readability mounted
dramatically. Literacy had become commonplace and the

purposes for reading had expanded beyond religion into

26

information and pleasure. Readability was no longer simply
a matter of importance to educators. Publishers of
newspapers, magazines and best-sellers, and authors of
government bulletins, industrial communications and military
manuals were forced to write for a much larger group of
readers more diverse in their reading abilities. At the
same time as the need to understand and control readability
was expanding, improved statistical procedures gave
researchers better tools with which to work, and some of
these methods, especially those in factor analysis and
multiple correlation techniques, were particularly suited to
readability study.

By 1920 researchers were conducting systematic and
scientific investigations of readability, and Chall (1958,
p. 17) credits Bertha A. Lively and S. L. Pressey in 1923,
with developing the first procedure which approached the
modern concept of a formula. Their work as well as that in
other early studies generated much enthusiasm, motivated
other researchers, and eventually led to the development of
a host of formulae and other techniques which claim to
predict the reading difficulty of a passage. This abundance
of measures in turn produced an even greater proliferation
of literature regarding the validity of such devices and
controversies surrounding their use. To review all of the
studies on readability would be a formidable task.
Fortunately, two notable authors, Jean Chall and George

Klare have provided comprehensive reviews of the most

 

27
significant early research concerning formula development

and validity. Chall's books, Readability, An Appraisal g:

 

Research and Application (1963) and Klare's book, The

Measurement 2: Readability (1963) are cited in nearly every

 

article, book or dissertation concerning readability. They
have become virtual classics in the field.

Exactly how many readability formulae have been
developed is somewhat controversial. As Klare (1963, p. 33)
has explained, the term formula has been used loosely to
include both true formula based on regression equations as
well as other devices for measuring readability. For this
reason authors have defined the term differently and have
therefore reported varying numbers of formulae as having
been developed. No matter what definition is used, however,
the number seems more than substantial. Chall (1958), for
instance, tallied 29 up to 1954, Klare (1963) estimated 39,
while one of Klare's students, Carolyn Dunlap (1954), listed

56 (Klare, 1963).

Hi§£2£i£_ Trends The general trend in formula development
has been first one toward greater and greater complexity and
then a sharp reversal toward increasing efficiency and
simplicity. Chall (1958, p. 27) credits Irving Lorge (1939)
with beginning this trend of simplification in 1938, while
Klare (1963, p. 37-80) notes the same pattern but
distinguished four historical periods. "Early Formulas",

1921-1934, used vocabulary primary as the predicting

 

 

28

factor and there was great dependency on Thorndike's
Teacher's Word Book (1921). The criteria used was
relatively crude. The next period, "Detailed Formulas",

1935-1938, saw an ever increasing tendency to use more and

different predicting factors with less emphasis on

Thorndike's work. There was also an increased concern for
adequate criterion. The following period, 1938-1953, is
termed "Efficient Formulas" by Klare, since the emphasis

shifted during that time to increased efficiency and
simplicity of use. The period from 1953—1959, a period
following the publication of Chall's work, Klare labels
"Specialized Formulas", since the tendency was to develop
formulas based on a particular aspect of readability or a
special audience level rather than wide applicability.
Forbes and Cottle's formula (1953), for instance, was
designed for use with psychological tests, while Bloomer
(1959) was interested in measuring "the level of abstraction
as a function of modifer load" and Spache (1953), Stone
(1957) and Wheeler and Smith (1954) all authored formulae
intended specifically for materials at the early elementary

grade levels.

Methodology While individual formulae have varied, both
Chall (1958) and Klare (1963) agree that the basic
methodology by which most have been developed has been
virtually the same, and generally proceeds according to the

following steps: (a) A list of possible elements which could

 

 

29

be responsible for differences in readability is compiled.
This list is usually based on some survey of reader and/or
expert opinion or some analysis of content. (b) A set of
criterion passages, representing a range of difficulty, is
selected or developed. Methods used to establish the
relative difficulty of the passages have varied and include
the results of comprehension tests, ratings by readers or
experts, publishers grade level recommendations and even
other readability formula scores. (c) Once the relative
difficulties are established, counts are made of the
frequencies with which the identified elements occur in the
criterion passages. (d) The frequency counts are correlated
with the difficulty indices of the criterion materials. (e)
The correlational information is combined in a regression
equation which ultimately becomes the final formula.

While differences have occured in the criterion used
and the factors studied, most formula have used the
correlational method and virtually all have followed the

same developmental procedure.

Limitations of Readability Formulae

Chall (1958, p. 34-56) distinguished the following five
components of readability formulae which are useful for
evaluation and comparison: (a) The criterion on which the
formula is based, (b) the range of difficulty of the
criterion materials, (c) the method used for determining

that difficulty, (d) the internal factors studies, and (e)

 

30
the method used to compare the occurrences of the factors
studied with the difficulty indexes of the criterion
materials.
While a few early formulae used an inspection method to
compare the occurrences of the factors studied with the

difficulty indexes of the criterion materials, generally all

others have used the correlational method. Aside from this,
however, formulae have differed greatly in the criterion
used and in the factors studied. Both areas have posed

serious limitations for readability prediction.

Limitations 13 Factors Studied By their very nature
readability formulae must be based on an extremely limited
set of factors which can affect reading difficulty. Most
restrictive is the fact that they can only utilize those
aspects of writing which can be quantitatively measured, and
generally only stylistic factors have lent themselves to
that kind of analysis. While some formulae have attempted
to include content factors such as abstractness of words or
analysis of ideas, Klare points out that they only touch on
content in a very indirect way. Chall (1958, p. 12) and
Klare (1963, p. 24) both caution however, that content, an
aspect of writing that is difficulty to measure
quantitatively, is frequently thought to be as important as,
or even more important than style, in determining the ease
with which a selection can be read and understood. In fact,

a classic study by Gray and Leary, reported in their book

 

31

What Makes a Eggk Readable in 1935, found content to be
judged most important by their sample of librarians,
publishers, teachers, and adult education directors. Style
was ranked a close second, format a distant third and
general factors of organization judged least important in
determining passage difficulty. Klare (1963, p. 24-25)
further warns that only one aspect of style, that of
difficulty, can be measured by the formula. Formulae cannot
measure the effectiveness or quality of that style, and
moreover, they cannot even measure style difficulty with
perfection.

In addition to being limited to measurement only of
style, the style elements included in the formulae are also
very restricted. While hundreds of factors of style have
been studied, ultimately only two, some measure of word
difficulty and some measure of sentence complexity, have
emerged as being significant enough, or measurable enough,
to become common elements in most formulae. Chall (1958, p.
54) explains that although other factors have been found to
be significantly related to the criterion, they are also
highly related to other factors in the formula and
consequently add little by themselves to the final
prediction. Their contribution is so meager that it is not

worthwhile to include them. "The law of diminishing

returns,‘ she notes, 'sets in early in readability

prediction.'

32

Counts of words which appear (or do not appear) on
various word lists of presumably "easy" or "hard" words has
been a favorite means of assessing the vocabulary element.
The general premise has been that the frequency with which a
word appears in print, or its "commonness" is related to its
difficulty. Thorndike's list (1921) has often been used for
this purpose and was especially popular with early formula
authors. Early authors also assessed vocabulary difficulty
through some count of the number of different words in a
selection. This method has sometimes been termed "word
range" or "vocabulary diversity". Determining either
frequency or diversity, however, required cumbersome, time
consuming word counts, especially difficult to make in a
pre-computer age. In attempting to find faster and easier
to use methods, Lewernz (1922) and later Dale and Tyler
(1934) used words beginning with certain letters. W, h, and
b words were considered easy while words beginning with e
and i were considered hard. Eventually the number of
affixes, number of syllables in words and word length in
terms of the number of letters were all found to be highly
related to the commonness of words also. Since these
factors provided simpler, faster and more reliable means for
assessing vocabulary difficulty, they would become
increasingly common elements in later formulae.

While the very first readability measures reported by
Chall (1958) concentrated primarily on vocabulary factors,

measurements of sentence complexity were soon being

 

33

incorporated. Sentence structure and the number of clauses
were generally considered related to sentence difficulty.
These in turn, however, were found to be highly related to
certain types of words. Counts of the number of
prepositions and prepositional phrases, and content words
(nouns and verbs) have all appeared in various formulae.
Ultimately, however, there is an obvious relationship
between these factors and sentence length in general. Since
length is a factor which can be counted simply and reliably,
it became a common element in later formulae, usually in

terms of average number of words per sentence.

Limitations in Criterion While at first it appears that
formulae have varied widely in the internal factors studied,
in reality the factors involved are all highly interrelated.
All formulae have included some measure of vocabulary, most
have included some measure of sentence complexity, and few
have included much more than that. Therefore, in respect to
the factors studied, the differences between formulae tend
to be in the methods used to measure the factors and not in
the factors themselves. Real differences have occurred,
however, in the criterion used to construct various formula,
the range of difficulty it represents and the method used to
establish that difficulty. These differences in criterion
are particularly important since they greatly limit the
generalizability of any one formula, for as Chall (1958) has

noted

34

Judged by strict scientific standard, each of the

formulas is applicable only to material similar to

the criterion on which it is based. Too often

this is forgotten.......This has led to criticism

of the formulas when actually the fault lay in

their application to a type of material for which

they were not designed. (p. 35)

Some criterion materials have been highly specialized.
Ojemann (1934), for instance, used only parent education
materials and Dale and Tyler (1934) used health brochures.
Some authors have used general adult selections while others
have concentrated on children's literature or textbooks used
at particular grades. The McCall—Grabbs passages (1925)
have been popular with several authors including two of the
most well known formulae, the Dale-Chall (1948) and Flesch
(1948).

The range of difficulty used for the criterion has also
varied. In some formulae the range has included grades
primer to adult, while others have been confined only to
adult materials or a limited number of grade levels such as
primer through third.

The methods of establishing the difficulty of the
passages have also been diverse. Some authors have used
various measures of comprehension of the test while others
have favored more informal means such as ratings based on
"expert" judgment. The grade level recommendations of
publishers and later even other readability formulae have
also been used.

Initially it would appear that tested comprehension on

the material is the best possible method for establishing

 

 

35
the relative difficulty of criterion passages, since
ultimately this is what the user of a formula wants to
predict. Using comprehension scores for this purpose,
however, has presented particular problems. As Chall (1958)
notes

The major weakness.....lies in the fact that the

difficulty of the passage can be changed by the

ease or difficulty of the question asked......Easy

questions based on hard passages will result in

underestimates of passage difficulty. (p. 40)

A study by Irving Lorge (1949) emphasized this point.
Lorge applied the Gray-Learly formula (1935) to both the
McCall-Crabbs passages (1925) and the questions on the
passages. He found the correlation coefficient to be .6156,
suggesting "there are factors in the passage which are
unrelated to factors in the structure of the questions."

Determinations of passage difficulty become even more
controversial when they go beyond providing indexes of
relative difficulty to providing a grade level score. The
later implies not only a comparison of the passages with
each other, but a comparison of the passages to the
performance of readers of a given ability. Typically
determinations of that ability have been derived in an
indirect manner, often using the standardized test scores of
persons who have performed in a prescribed way on the
criterion passages. Ojemann (1934), for instance, in
developing his criterion, used the reading grade equivalent

on a standardized test of the readers who were able to

36

answer correctly 50% of the comprehension questions on the
criterion passages. Washburne and Vogel (1925) used
children's books and the median of the score on the Stanford
Paragraph Meaning Test of children who "read and liked" the
book. Later authors, such as Spache (1953), however, simply
accepted the grade level placement of materials as
recommended by the publisher.

Validity data on grade level indexes when compared with
external criteria and other formulae has been contradictory,
leading Chall (1958) to conclude

...it is questionable whether the grade placement

arrived at by the application of any one of these

formulas can be used to make a definitive
statement about the suitability of a particular
piece of reading matter for a specific level of

reading ability, even if only in terms of
expressional difficulty. (p. 96)

Klare, (1963) has reached a similar conclusion and

writes
The various formulas do not necessarily give
comparable grade-level results even though they
frequently show high intercorrelations. This
indicates that attempting to place materials

within a grade level by means of formula score is
certainly questionable. (p. 120)

Validity of Readability Formulas

Nearly all readability validity studies can be
classified using the following five categories: (a)
Original presentation of the readability method, (b)

original criterion prediction, (c) correlation with other

37

readability formulas, (d) experimental validation studies
and (e) validation against outside criteria.

Generally Chall (1958) and Klare (1963) both reviewed
studies in all of these categories but their classification

systems and category titles differed somewhat.

Original Presentation of the Readability Method Studies in

 

 

this category involve evaluation of a formula's validity
based on logical grounds or on the evidence provided by its
author. As Chall (1958, p. 70) points out some
investigators merely assume the validity of their
techniques, however two—thirds of the authors she studied
provided some kind of empirical evidence such as correlation
with test scores, basal reading series, or other formulae.
On logical grounds the validity of a formula can be assessed
based on such considerations as the way it was developed,
the materials that were used, or the factors involved. For
instance, a formula based on factors which have previously
shown a strong relationship to reading difficulty would be
considered more valid than one based on factors for which a

relationship to reading ease has not been determined.

Original Criterion Prediction Studies in this category

 

consider how successfully a formula will predict the scores
of the original criterion passages from which it was
developed. Klare (1963, p. 111) has characterized this as

being "almost analogous to pulling oneself up by the

38
bootstraps", and cautions, that while it is an important
consideration, it is not sufficient by itself. Particular
factors used in a formula are usually selected because they
are the most highly related to the criterion.

Klare (1963, p. 113) found that in original criterion
prediction, the criterion coefficient for most recent
formulas in 1963 was about .70. He explains that this in
turn means roughly one-half of the variance in readability
can be accounted for by the formula, 3 level of validity
somewhat higher than the relationship usually found between
psychological test scores and college course grades. Thus,
he concludes, "these readability formulas can be considered

of relatively high validity in a general sense'.

 

Correlation with Other Readability Formulae Studies in
this category examine the amount of agreement which exists
between and among formulae. The assumption is that if
readability formulae are all measuring the same thing then
there should be a great deal of agreement in their results.
Although a large number of comparative studies have been
done, Klare (1963, p. 119) found the data difficult to
interpret for the following reasons: Different
investigators have used different materials and different
formulae; some formula have yielded grade level scores while
others have required corrections; different studies have
used different criteria, with some studies based on the

level at which 50% of the questions on a given passage could

 

39
be answered, while others have used 75%, etc.; and some
studies have used a rank order correlation while others have
used product—moment correlations.

In spite of the numerous disagreements in the data, the
following are among the conclusions Klare (1963, p. 120)
felt could be justifiably drawn from the comparative studies
he examined: (a) The Dale—Chall (1948) and Flesch Reading
Ease formulae (1948) have provided the most consistently
comparable results in terms of both correlational and
grade~placement data, (b) more of the high intercorrelations
have involved Dale-Chall scores than those of any other
formula and, (c) the various formulae do not necessarily
give comparable grade-level results even though they
frequently show high intercorrelations.

Earlier Chall (1958, p. 96) had also concluded that "at
all ranges of difficulty the Flesch and Dale-Chall formulas
tend to assign similar grade-levels". She also noted
inconsistencies in grade level designations and expressed a
need for additional comparative studies in specific subject
area fields in order to interpret the meaning of the grade
placements of one formula in relation to those of another.

These conclusions by Chall and Klare probably had
considerable impact on later formula development, since they
led to a generalized belief that the Dale-Chall is the
"best" formula. Many later formula authors would justify
the validity of their device by how well it correlated with

the Dale-Chall.

 

40

Experimental Validation Studies Studies in this category

 

involve rewriting material in easier and harder versions.
These versions are then read by groups of readers presumed
to be equivalent in reading ability. Most often
comprehension (or learning, or retention) has been used as
the criterion in such studies, although readership and
reading speed (or efficiency) have also appeared.

Even though experimental studies offer the best
opportunities for controlling variables, results from those
using comprehension and readership criteria have been
contradictory (Chall, 1958, p. 111; Klare, 1963, p. 133).
Among other factors, Chall (p. 111, 112) speculates that the
differences in effect may be related to the magnitude of the
difference in readability between the two versions. Using a
version that is greatly simplified is more likely to show a
difference in comprehension than one that is only slightly
easier. Moreover the relationship of difficulty of the
passages to the reader's ability may be very important. If
both versions are above or below the reader's ability, there
may be little difference in comprehension, but if the
original is beyond the reader's ability and simplifying

brings the difficulty of the passage down to the reader's

level, the effects might be considerable. Differences may
also depend on the importance of the factor being
manipulated. Vocabulary and sentence length, for instance,

have shown more effect than human interest factors (Allen,

1952). The number of factors involved may also be important

41

since later studies using multifactor formulae tended to
show more positive results than earlier studies based on
vocabulary changes only.

Reading speed as a criteria appears relatively late in
readability research studies, with Rudolf Flesch (1949)
being credited as the first to suggest a relationship
between the two (Klare, 1963, p. 135). Of seven studies
reported by Klare (1963, p. 137), all were judged positive
in results when speed was measured in terms of words per
minute. Klare (1963, p. 137) concluded that "the general
results indicate clearly that readability and reading speed
are related. This measure appears to be both a sensitive

and consistent criterion.‘ More recent studies (Miller and
Coleman, 1971; Coke, 1974), however, have indicated that
reading rate for both oral and silent reading, remains
constant over a wide range of readability, when rate is
measured in units smaller than words per minute, namely
syllables per minute. The word rate in these studies
decreased with passage difficulty, but the syllable rate
remained constant. Coke (1974) explains

Since subjects read at a constant syllable rate,

words containing more syllables took longer to

read than words with fewer syllables. Therefore,

the harder passages, which had a larger proportion

of longer words, were read more slowly when rate

was measured in word units. (p. 407)

Coke cautions that "the almost universal practice of

measuring rate in words can lead to spurious conclusions

about the relationship between reading rate and

 

42

readability."

Validation Against Outside Criteria Studies in this

 

category involve comparing formula results with results
obtained from other sources. Judgments and reading
performance have been the most common types of outside
criteria used.

In judgment studies, readers, librarians, teachers,
publishers or other experts are asked to rank or assign
grade level designations to the research passages. These
are then compared with the formula results. In studies
reviewed by Klare (1963, p. 155), 12 of those using
judgments showed positive results, 2 were negative and 3
were considered indeterminate. Klare concluded that more
readable material as measured by formulae can be judged more
readable by experts and readers.

Comprehension, also referred to as learning or

retention, was one of the first and most frequently used

criteria for comparing formula results with reading
performance. However, most of the studies using
comprehension criteria were of the experimental type

described earlier. Klare (1963, p. 133) reviewed five
studies, however, which gave some indication of the
formula's ability to predict performance for a particular
grade level. These studies by Stadtlander (1938), Miller
(1946), Latimer (1948), Dunlap (1954) and Peterson (1956)

generally involved estimating the subject's reading ability

43
either by test scores or their grade placement,
administering comprehension questions to them after they had
read the research passages and then comparing the results
with the formula scores of the passages. All of these
studies were considered positive in results although many

questions have been posed concerning their validity.

Oral Reading Criteria in Readability Research It should be

 

 

noted that when reading performance has been used as the
criteria for constructing or validating readability
formulae, the type of performance used has been almost
exclusively silent reading comprehension, the only
exceptions being those few studies using reading speed.
There is no evidence of oral reading as a criteria being
used in either formula development or in validation studies
(Klare, 1963; Klare, 1984, p.688-699). This is probably not
accidental. At the time early readability theory was
developing, oral reading, either as a testing or teaching
procedure, was very much out of favor in education
(Allington, 1986, p. 830-831, 835; Smith, 1986, p. 158-195).
This helps to explain the early researcher's preoccupation
with readability only as it relates to silent reading. Oral
Reading as an evaluation tool, did not appear until the late
1940's (Durrell, 1937; Betts, 1946) and no doubt did not
become well established in practice until much later.
Likewise, formulae specifically intended for primary level

materials (Spache, 1953; Stone, 1957; Wheeler and Smith,

44
1954) did not appear until much later in the development of
readability prediction. Even then, the earlier traditions
seem to have prevailed, for the use of oral reading, either
in the development of formulae or in studies of their
validity, has been almost totally ignored even to the

present day.

Development of the Fry Graph

By the time Klare's book was published in 1963,
research in readability had fairly well run it course.
There seemed to be very little additional information that
a new formula could add to the existing body of knowledge of
readability, nor did there seem to be much need for
developing another readability measuring device considering
the abundance of procedures already available. Yet in 1968,
Edward Fry, a professor at Rutger's University, published
still one more method for predicting the ease with which a
selection could be read and understood. Surprisingly, it
would become one of the most popular methods ever developed.

Kistulentz (1967) found Fry's procedure to show high
correlations with other readability methods. Its added
appeal, however, lies in its simplicity and ease of use and

it was on this basis that Fry (1968) justified its

publication. It is not a formula based on a regression
equation as such, but rather it utilizes a nomograph. It,
therefore, does not require the user to make any
mathematical calculations, a definite advantage in the days

before inexpensive calculators were commonplace. Instead

45
the user simply makes counts of two variables, one of word
length and one of sentence length, and then plots these on
the Fry Graph. Fry published the graph with a specific
statement that it was not c0pyrighted, thus assuring its
continual easy and widespread availability.

Simplicity is not only evident to the user of Fry's

procedure but also appears to be a keynote in its
development. The author chose to capitalize on previous
research which suggested that only two factors, word

difficulty and sentence complexity would consistently emerge
as the most significant elements in prediction of
readability. He uses a count of the total number of
syllables and a count of the total number of sentences
(estimated to the nearest tenth) in one hundred word samples
as measures of these factors.

Fry (1969) cites research by Stolurow and Newman (1959)
and Brinton and Danielson (1958) as support for his use of
sentence length as the measure of syntactic complexity and a
syllable count as the measure of word difficulty. The
former study found a high correlation (.90) between reading
difficulty and polysyllables, difficult words and the
percentage of different difficult words, a high correlation
(.90) between reading ease and easy words and monosyllables,
and a relatively high correlation (.86) between average
sentence length and difficulty. The researchers concluded
that "any yardstick which gave primary weight to the so-

called word factor and a lesser but almost equal weight to

46
the sentence factor would account for a good deal of the
variance in readability." Similarly, after studying twenty
language elements, Brinton and Danielson concluded that
their investigation "confirms the importance of word length
and sentence length" in readability measurement.

Some researchers (Bormuth, 1966, 1969, 1975; Stolurow
and Newman, 1979, p. 250) have suggested that a curvilinear
relationship may exist among readability factors with
sentence length having a greater affect on readability at
lower reading levels and word difficulty being more
significant in upper grades. The values on the Fry graph
support this contention (Entin and Klare, 1979, p. 288).
Because of the way the graph was constructed, it may
automatically take such a curvilinear relationship into
account.

When the syllable count and sentence count form a
sample are plotted on the Fry graph, they fall into various
areas which indicate the difficulty of the passage in terms
of a grade level score. Fry (1968) explains how he arrived
at these scores as follows:

Grade level designations were determined by
simply plotting lots of books which publishers

said were 3rd grade readers, 5th grade readers,
etc. I then looked for clusters and "smoothed the

curve. After some use of correlational studies

the grade level areas were adjusted. (p. 515)

There are indications in the literature that the Fry
method assigns higher grade levels to primary level

materials than other methods do, leading some to regard his

47

formula as being too easy. Harris and Jacobson (1980), for
instance, measured numerous samples from the Economy (1980)
and Houghton Mifflin (1981) reading series for levels pre-
primer through third grade using the Fry graph (1968), the
Spache formula (1974) and a computer version of the Harris-
Jacobson formula (1974). A comparison of the results found
that the Fry method assigned much higher grade levels to
materials beyond the second grade than did either of the
other two formulae or the publisher's designations. This
led Harris and Jacobson to conclude that the Fry graph
seriously overestimates the difficulty of second and third
grade reading materials (Harris and Jacobson, 1980).

Fry (1980) has not considered this overestimation a
serious problem since it would mean assignment of easier to
read books, a better alternative than assigning books that
would be too difficult for a reader. Fry also contends that
the differences between his results and those from other
procedures are not as great as the Harris and Jacobson study
indicates. He cites studies by Britton and Lumpkin (1977),
who found good correlations and good grade level agreement
among publisher's designations and Fry, Spache and Harris-
Jacobson results. He also notes a second study by Fox
(1979), who used the Fry measure on "almost every basal
reader in America being sold in 1978". Fry (1980) reports

that the Fox study found his formula 'correlates quite well
at grades 1 and 2 but that it is a little high in third

grade, but not as far off as Harris and Jacobson found in

48

the average of the two series that they analyzed".

Recent Trends in Readability Prediction

In spite of their inherent weaknesses, and
contradictory evidence as to their general usefulness,
methods for estimating readability continue to receive a
great deal of attention from publishers and educators today.
Most recently published pedagogical texts, especially those
intended for secondary and content area fields, at least
mention readability, if only to warn readers of the
limitations involved, and many such books devote entire
chapters to the subject, along with descriptions and
instructions for using several readability techniques
(Singer and Donlan, 1980; Vacca, 1981, 1986; Criscoe and
Gee, 1984). Articles on readability also continue to appear
in professional journals and new formulae continue to be
developed regularly.

Two trends are noticeable in most recently published
readability literature. First the need for increasing ease
and speed in making readability estimates still continues to
be a primary consideration. Secondly, there appears to be a
trend toward finding more valid methods for developing
criterion passages. These include methods which do not ask
comprehension questions and which thereby circumvent the
problems of relative difficulty inherent in the questions
themselves, and also the use of specialized criteria for

formulae to be used in specialized areas. In addition,

 

49
Klare (1984), in a recent review of current readability
research, identifies two other trends: (a) Use of new
approaches and (b) work in languages other than English. Of
these, only the new approaches seem to be pertinent to this
study. Most of these new approaches, however, are closely
related to the trends previously mentioned since they are
being developed, it appears, because of dissatisfaction with
the ease of use and the adequacy of criteria in existing
formulae. They will be included, therefore, as a part of
the following discussion which will concentrate on efforts
to improve ease and speed of use and to develop better

criterion procedures.

Bass apd Speed pf Egg Most readability prediction
procedures currently popular, are similar to the Fry method,
but claim to improve ease and speed of use even further,
primarily through easier counting of factors or the use of
graphs, charts and manual or machine aids. The Flesch
procedure (Flesch, 1949), for instance, uses a scale instead
of a graph. The Raygor method (Raygor, 1977) employs a
graph similar to the Fry, but utilizes counts of long words
(six letters or more) rather than syllables. The SMOG
method (McLaughlin, 1969) involves counting only words of
three or more syllables, estimating the square root of this
number and adding a constant of 3. No graph is necessary.
In addition to these modifications of current methods,

the quest for increasing ease and speed in calculating

 

50
readability has taken at least two new directions: (a) The
appearance of several new subjective, rather than objective
methods, and (b) the publication of numerous computer based
formulae.

The trend toward developing faster, easier to use
subjective techniques, rather than objective measures, is
noticeable in the guidelines offered by Vacca (1986, p. 40-
41), in the Irwin and Davis checklist (1980), and in the
attention that has been given recently to the SEER technique
(Singer Eyeball Estimate of Readability, Singer, 1975) and
the Rauding Scale (Carver, 1974, 1975, 1976). These latter
two techniques are based on the ratings of trained judges.
They involve taking a passage of unknown readability level
and comparing it with a set of scaled passages, the reading
levels of which have already been determined. While at
first such subjective techniques may appear to have little
merit, their authors claim they are faster and just as
accurate as other procedures. The positive findings of
early studies using judgmental criteria reported by Klare
(1963), would suggest that human raters can be as accurate
as mechanical devices, however, more recently, Klare (1984,
p. 702) has questioned the proported speed advantages of
judgmental techniques, especially when the time to train and
qualify the raters is considered.

While some researchers have been pursuing subjective
techniques, others have concentrated on the development of

computer based formulae, and as inexpensive, desktop

 

51

computers became available, they quickly capitalized on this
tool as a means for making readability prediction more
efficient. A host of readability estimation programs, both
commercial and non-commercial, have recently appeared
(Danielson and Bryan, 1963; Carlson, 1980; Gerbens, 1978;
Goodman and Schwab, 1980; Irving and Arnold, 1979; Keller,
1982; Schuyler, 1982). Some of these programs use new
formulae, while others are simply automated versions of
present, well known readability procedures. Even the new
methods, however, still seem to be based on measures of word
and sentence difficulty similar to that used in existing
formulae. For this reason Geoffrion and Geoffrion (1983),
in reviewing such programs, have concluded

Existing computerized readability software make
inefficient use of a computer's capabilities.

They note that current formulae

are limited to measures such as sentence length
and syllable counts because these are easy for

human evaluators to judge quickly. More complex
aspects of a passage are ignored because they are
too tedious for rapid manual calculation. Yet the

computer's speed and accuracy make feasible much

more complex calculations. (p. 104-105)

As yet there seems to be little research aimed at
producing computer programs designed to take full advantage
of the computer's capabilities in producing more accurate
readability estimates, although the authors do note work in
this direction underway at Bell Laboratories (Geoffrion and

Geoffrion, 1983, p. 105). Klare (1984), however, questions

 

52
whether or not more complex formulae can really add any
greater precision. In a study by Bormuth (1969),
"unrestricted" formulae with up to 20 variables gained
slightly in predictive power over simpler formulae in the
validation process but dropped considerably in cross-
validation. Klare (1984) concludes that

This yielded an unexpected answer to those who

felt that the availability of computers would lead

to more complex and, therefore, necessarily more

powerful predictors. (p. 687)

Geoffrion and Geoffrion (1983, p. 105—106) also caution
that practical problems greatly limit the usefulness of
presently available readability measurement software. They
note that current formula programs lack a convenient means
for entering text samples and the less sophisticated ones
also lack an easy way to correct typing errors. The MECCA
program used in this study is typical. Once text is
entered, this program can give readability estimates based
on several popular procedures as well as syllable counts,
word and sentence length and other such information. Text,
however, must be entered line by line. Corrections can be
made by backspacing only within the line currently being
typed for entry. Once the line is entered, corrections can
only be made by calling up another program for text editing.
Correcting even the simplest one letter error, entails
designating which line is to be changed, indicating the type
of change desired (add a line, delete a line or edit a

line), and then actually replacing the line by retyping and

 

53
reentering it. This is a tedious and time consuming
process.

Computers capable of recognizing written text in a
variety of print fonts, styles and layouts have already been
developed for use by the visually impaired in oral reading
machines. While this capability is still too expensive for
general use, Geoffrion and Geoffrion (1983, p. 106) see it
as having great potential for future readability programs.
The development of computers that can recognize human speech
and voice commands also holds promise for facilitating text
entry. Until such machines are available, however,
computerized readability measurement is not as easy as it

may first appear.

Criterion Develppment Accompanying recent efforts to

 

increase ease and speed of use in readability prediction are
developments aimed at improving the criteria on which the
procedures are based. Klare (1984, p. 691) notes the
following trends in this regard: (a) Improvements in
existing criteria primarily through renorming of the McCall-
Crabbs passages (Harris and Jacobson, 1976; Jacobson,
Kirkland and Selden, 1978), (b) specialized criteria for use
in special areas (Caylor, Sticht, Fox and Ford, 1973;
Kincaid, Fishburne, Rogers and Chissom, 1975), and (c) use
of the Cloze procedure in criteria development (Coleman,
1965; Miller and Coleman, 1967; Bormuth, 1969, 1975). Of

these the latter is particularly significant and will

 

54
therefore be examined in more depth.

In order to understand the role of the Cloze procedure
as it relates to readability, it is necessary to understand
the distinction between readability prediction as opposed to
readability measurement. As Klare (1984, p. 701) and Vacca
(1986, p. 53) point out, formulae are predictive techniques.
They hypothesize about text difficulty based on an analysis
using selected variables that have been statistically found
to correlate with comprehension difficulty. The reader is
not a variable. In contrast the Cloze technique, like oral
reading assessment, is a readability measurement procedure.
It measures readability by using actual reader performance
in the material without making any predictions concerning
that reader's performance in any other material.

The Cloze technique (Taylor, 1953) involves the
systematic deletion of words from a passage, usually every
5th or 7th word. The reader is then asked to fill in the
blanks with the word they think appeared in the original
text. Using criteria established by Bormuth (1966),
identifying between 40% to 60% of the words correctly would
indicate the passage is at the reader's instructional
reading level. The Maze procedure (Guthrie, 1974) is
similar to the Cloze test, except that it uses a multiple
choice format instead of blanks, and, since it is an easier
process, it uses more stringent criteria of 60% to 85%.

The Cloze and Maze procedures appear to have more

validity than readability prediction procedures because

55

they, like oral reading assessment, use actual reader
performance. However they also have some of the same
drawbacks. They do not assign a readability index number or
grade level to the passage as such, but simply indicate if
the text in question is of suitable difficulty for the
particular student or group of students involved. Every
time a new text or a new group of students in encountered,
the procedure must be repeated. Unlike oral reading
assessment, however, which is a one to one process, the
Cloze and Maze tests can measure readability for many
readers at the same time.

Recently the cloze format has gained popularity as the
means for establishing the relative difficulty of the
criterion passages on which new readability formulae are
based (Bormuth, 1969, 1975; Coleman, 1965; Miller and
Coleman, 1967). In this manner it goes beyond readability
measurement and becomes a prediction device.

In determining the difficulty of criterion passages,

the Cloze procedure has a distinct advantage over
traditional methods of assessing comprehension through
questioning. It by-passes the problem of variations in

difficulty inherent, not in the passages, but in the
questions themselves. Because of this, it holds particular
promise for formula authors, and Klare (1984, p. 687) has
found the use of the Cloze measure to be one of the

important new directions in criteria development.

 

56

Readability $3 the Early Elementary Grades Although

 

several formulae have been designed for use at the lower
grade levels (Spache, 1953; Stone, 1957; Wheeler and Smith,
1954), basal programs generally prevail in these grades and
the demands of content area reading are far less than those
at intermediate and secondary levels. Therefore, it is not
surprising to find relatively less emphasis on readability
theory and measurement in training materials intended for
teachers at these levels. There are two notable exceptions
however. One is in the preparation of materials for use in
constructing Informal Reading Inventories (IRI) and the
second is in the development of materials for use in
individualized reading instruction.

An informal Reading Inventory estimates the level of a
student's reading ability by listening to the student read
in materials of increasing difficulty and noting the errors
that are made. Although many commercially prepared
inventories are available, the procedure is considered most
valid when the paragraphs to be read are prepared by the
teacher using materials the student will be reading in the
classroom (Betts, 1946, p. 454; Ekwall, 1976, p. 271; Bader,
1980, p. 206). Authors usually suggest that the teacher use
basal text passages for this purpose, but since basal
materials can vary considerably in difficulty, many also
suggest using a readability formula to check the difficulty
level as well. Some, like Harris and Sipay (1980, p. 58),

even give detailed, step by step instructions for doing so.

 

57

In addition to Informal Reading Inventories,
Individualized Reading Instruction Programs have created a
particular need for determining readability of materials for
elementary students. Such programs typically do not use a
basal text series through which all students in a class
proceed together. Instead, such programs are usually
centered around an extensive collection of children's
literature and trade books. Since self selection of reading
materials is usually emphasized, a large collection is
necessary in order to provide for the varying interests and
reading abilities of a classroom. A readability formula
could be used to determine the difficulty of the selections.
However, the great number of books involved usually makes
using even the simplest procedure impractical. Therefore
the problem of helping students find books they both want to

read and are capable of reading has been a major problem in

implementation of this type of program. As a result,
advocates of individualized reading instruction have
suggested some interesting solutions. One of the most

popular is the "Rule of Thumb" (also called the Five Finger
Exercise) developed by Jeannette Veatch (1966, 1978), whose
leadership has been foremost in popularizing the concept of
individualized reading instruction.

The "Rule of Thumb" is meant to be taught to children.
They are then to use it by themselves to decide if a
particular book is easy enough for them to read. When the

student has found a book he wants to read, Veatch instructs

 

58
the teacher to say the following to the student: (Veatch,
1978)

Riffle the pages and stop on one page in the

middle of the book. Start to read it to yourself.

If you come to a word that you don't know, put

your thumb on the table. If you come to another

word you don't know, put down your first finger.

Another unknown word, another finger, and so on.

If you use up all your fingers, then the book is

too hard for you. Put it down and find another.

If you find a book that has no unknown words,

it is probably too easy for you. Save it for a

free time, and choose another book to bring to me

for your conference. (p. 55)

Veatch does not indicate how many words the student
should read before deciding if the book is of suitable
difficulty. Cunningham (1977, p. 191), however, modifies
the procedure by having the student choose a paragraph of
about 100 words. Five unknown words in 100, would then
suggest 95% word recognition accuracy. This corresponds
roughly to the criteria Betts (1946) established for
determining a reader's instructional reading level.

The reliability of the Rule of Thumb method appears to
be questionable. As Cunningham (1977, p. 191-192) warns,
some children may not be able to handle this procedure,
either because they cannot admit, even to themselves, that
they do not know a word, or because they may be unaware that
they have made an error. Validity data to support the
practice is not offered and Veatch herself (1976, p. 55)

calls the method a 'rough measure, to be sure, but the only

one in which the choice of material is the pupil's.'

 

59

ARRF (Average Reader Readability Formula), proposed by
Patricia Cunningham (1976), appears to be another rather
crude method for making a quick assessment of the difficulty
of a large number of books to be used for a particular
class. It involves identifying a student whose reading
ability is considered ab0ut "average" for the classroom in
question. This student is then asked to spend a couple of
hours with the teacher reading short passages from each of
the books to be used in the program, and deciding if each
selection is too easy, too hard or just about right for the
average readers in this class, of whom this reader is
supposedly typical. The books are then codified as being
"easy", "average" or "hard". Below average readers in the

room may choose books from the 'easy" group, while above
average readers may choose from any group.

The validity of a procedure such as ARRF is also

obviously questionable. The method assumes that an
"average" reader can be identified and that this reader's
performance can be generalized to other readers. With the

large amount of reading involved, variations in the reader's
performance while classifying the books must also be
considered. It is possible that books read later in the
classification process may appear easier due to the practice
effect or more difficult due to reader fatigue. Cunningham

does not offer validity data.

 

60

Oral Reading in Readability Measurement and Prediction

Although oral reading is currently a commonly used
method for measuring readability, it seems to have been
virtually ignored in either the development of readability
formulae or studies of their validity. As previously noted,
oral reading, either for instructional or assessment
purposes, was very much out of favor at the time that early
readability research and formula development were occurring
(Allington, 1985, p. 829—835); Smith, 1986, p. 158-195). It
is not surprising to find, therefore, that the early
readability researchers' focus was on silent reading
comprehension, and oral reading as a criterion measure or
in validation studies was apparently disregarded. That
disregard seems to have persisted to the present time, for
this literature search revealed no evidence of oral reading
having ever been used in the development of any formula, and
only two validity studies using oral reading could be found.

Fry (1969) used oral reading errors along with cloze
procedure errors and Fry and Spache readability procedures
to make rank order comparisons of the readability of seven
selections. Fry found that all four methods ranked the
difficulty level of the passages quite well, but the cloze

procedure seemed to be the most accurate and made the finest

distinctions. The oral reading scores were not as accurate
or as fine grained as the cloze scores. However Fry views
oral reading as an interesting method for judging

readability and one not often used. It has the advantage of

 

61
being objective, independent and a different validation
procedure. Fry (1969) notes that

Readability formulas are often validated on such
non-objective criteria as subjective judgment or

publishers' recommendations. Or they are
validated by comparing them with other formulas.
These methods are not wrong, but we must

continually keep in mind the real basis for
readability is whether a child can read the
material. Therefore, validity measures that use
children should receive high priority. For this
reason cloze and oral reading errors should be

used increasingly in research to validate
readability formulas, although the time factor
limits their use as practical methods of

determining readability. (p. 536)

In another study, Paolo (1977) compared Fry readability
scores and reading errors in "Easy-to-Read" trade books for
children. She found that eight of the ten books she studied
were at frustration level for her first and second grade
subjects, and that a positive and significant correlation
(.78) existed between the oral reading errors and the
readability scores. Paolo's study was well designed.
However, it involved only five subjects, which seriously

limits the impact of its findings.

 

62
Part II

Determining Reading Ability

Introduction

While the concept of readability was being developed in
the late eighteenth and early nineteenth centuries, methods
for measuring the readers' ability were also under
investigation. Ultimately these efforts lead first to the
development and widespread use of standardized tests, and
later to the appearance of the Informal Reading Inventory
and oral reading assessment procedures. Each of these

methods will be examined in turn.

Standardized Tests

Standardized tests are probably the most frequently
used means of measuring student reading performance. They
are designed to be administered and scored in a uniform
manner so that any variation in test scores can be
attributed to differences in the students taking the test
and not to the conditions of testing. Generally such tests
are put out by major publishing companies with items written
by professional test specialists and revised through many
try outs and item analysis. This process has resulted in
tests of exceptionally high reliability. The general range
is from .80 to .95 and standardized tests with reliabilities

over .90 are not uncommon (Borg and Call, 1979, p. 218).

 

63

Scores on standardized tests are generally based on
relative performance. An individual score has meaning only
in relationship to the scores obtained by others who have
taken the same test. Norms, or scores which indicate
"average" or "normal" performance, are developed by
administering the test to a standardization sample. The
sample itself is chosen from persons who are representative
of students for whom the test is intended. Usually this
sample is large, with 1000 or more subjects. Thus
developing the norms for a standardized test is an expensive
procedure.

Publishers of standardized tests typically supply
detailed information concerning the norming procedures used
and descriptions of the social, educational, economic,
ethnic and racial characteristics of the standardization
sample. This information is important to persons using the
test since the test has its greatest validity for students
with backgrounds most similar to those of the persons in the
sample. Various tables and instructions for converting and
interpreting raw score data are also included.

Standardized tests can be individual measures and can
assess achievement or abilities by sampling many behaviors
in diverse ways. In practice, however, the standardized
tests of reading achievement being used today are almost
exclusively objective tests of silent reading comprehension
designed for group administration. Johnston (1984) notes

that this group focus and silent reading emphasis are not

 

64

accidental, but are rather related to the historical climate
prevalent during the development of such testing procedures.
Standardized reading tests are a direct outgrowth of the
turn of the century psychological testing movement in
general. Johnston identifies two driving forces of this
movement. The first was the intention of making psychology
worthy of the term science which seemed to indicate
quantification and "objectivity". The second was the press
for educational accountability that accompanied a dramatic
rise in school enrollments brought on by immigration and
population growth, child labor and compulsory education laws
and increased literacy expectations in society. These
forces, along with the almost universal emphasis on silent
reading during the period produced a climate in which
Johnston (1984) concludes only a certain kind of test could
survive.

Thus, while diverse approaches were developed
initially, the fittest in terms of efficiency soon

surfaced. Reading tests came to consist of the
silent reading of a passage, followed by the
solving of brief, generally text-related,

problems; usually questions. (p. 149)

The efficiency of administering and scoring
standardized tests, along with their high reliability have
made them popular measuring devices in educational
research and program evaluation. However, while these
qualities of efficiency and reliability are generally
accepted, the question of test validity, or the ability of

such tests to measure what they claim to measure, along with

 

65
questions concerning their proper use, remain controversial.

Farr (1969, p. 85) lists two valid uses of standardized
tests. First they are reliable for comparing students in
terms of general reading achievement. Secondly, the tests
are useful as screening devices in determining if further
assessment through individual reading tests and informal
testing procedures is needed.

The major weakness in using standardized tests seem to
center around the tests inability to identify specific areas
of reading strength or weakness, and in the use, or misuse,
of grade equivalency scores.

While most standardized tests are made up of subtests
such as "phonetic analysis", "vocabulary", and
"comprehension", Farr finds that such tests are unable to
measure distinct skills or abilities (Farr, 1969, p. 82).
Because of this lack of discriminant validity, the tests are
of little value in reading diagnosis or for planning
specific instructional programs.

The inclusion of grade equivalency scores, which are
provided by most test publishers in addition to other
norming information such as percentile ranks and standard
scores, raises further questions concerning the valid use of
standardized testing. Grade equivalency scores are popular
with both teachers and the general public because they
provide an easy point of reference. However, the term
"equivalent" is probably very misleading and such scores

should be interpreted with great care.

 

66

The term "equivalent" implies that, regardless of their
grade placement, students receiving the same grade
equivalency scores, have comparable reading abilities.
Glaser (1964), however, found that while a group of seventh
graders and a group of third graders had the same scores on
the Gate's Survey (between 5.0 and 5.9), their performances
on an informal reading inventory differed considerably. He
concluded that this was because the standardized test
compared individual performances to that of other students,
while the informal inventory compared individual
performances to a set of criterion tasks.

Another problem with grade level norms lies in the
between grade scores. Usually such scores are reported with
a decimal. The number before the decimal indicates the
grade level, while the number after the decimal indicates
the month in that grade, with 0 to 9 standing for the months
of September through June. No credit for progress is given
for the months of July and August. Usually, in the course
of being normed, a test is administered only once during a
year. Between grade norms are interpolated from these
"empirical norms". Using scores derived in this manner
assumes that learning within a year proceeds at a uniform
pace, however, studies by Bernard (1966), Lennon (1951) and
Traxler (1950) suggest that this is not the case.

For these reasons, grade level scores are considered to
have their greatest validity when time of year of testing

corresponds as closely as possible to the time of norming.

 

67
In addition, scores considerably above or below the
student's grade placement should be interpreted only as
above or below the norm for that grade. The student should
not be considered to have the same reading ability as the
average student in the grade indicated by the grade level
score.

Finally, several studies (Betts, 1940; Killgallon,
1942; Sipay, 1964; Glaser, 1964; McCracken, 1964 and
Leibert, 1965) have compared standardized test results with
results from Informal Reading Inventories. Generally these
studies found that standardized tests gave higher grade
level scores than the IRI, indicating that they cannot be
used to place students in materials at their functional
reading levels. Moreover, Farr (1966, p. 108) notes that
most publishers of standardized tests do not suggest that
the grade score norms be used as indicators of the levels at

which reading instruction should be provided.

The California Achievement Tests The California

 

Achievement Tests, 1977 edition, published by McGraw-Hill,
have been a well known, widely used and highly regarded
series of test batteries designed to measure achievement in
basic skills. Ten different levels of the tests are
available for children in grades kindergarten through
twelve. The upper seven levels also have alternate forms
for use in pre and post testing situations or when

multiple administrations of a level are necessary. The test

68
is nationally normed and adheres to the standards of the
American Psychological Association to assure that the
standardization group is a representative national sample.
Derived scores are provided in the form of percentile ranks,
normal curve equivalents, stanine scores, grade equivalents
and scale scores.

Unlike many standardized tests which are administered
to the standardization sample only once during the norming
process, the California Achievement Tests have had two
administrations, and therefore empirical norms are available
for both spring and fall testing. This improves the
validity of the test results and is probably a major reason
why the tests enjoy high regard.

The Reading Test of the California Achievement Tests
consists of four subtests at Levels 13 and below: Phonic
Analysis, Structural Analysis, Reading Vocabulary and
Reading Comprehension. Beginning with Level 14, which is
the test usually used with fourth grade students, there are
only two subtests: Reading Vocabulary and Reading
Comprehension. Thus the tests are heavily dependent on
silent reading comprehension at the primary grades and
become almost totally a test of silent reading comprehension
at grade four and beyond.

While Grade Equivalency scores are provided, the
publisher also includes cautions for interpreting these

scores (California Achievement Tests-Norms Tables, 1977, p.

4). Among them they warn that (a) grade equivalents do not

 

69
mean that a student has mastered all of the objectives
taught in the school district up to the grades corresponding
to the grade equivalent score, (b) grade equivalents should
not be used in placing students in school grades
corresponding to the test score and, (c) because grade
equivalent scores can be easily misinterpreted, it is
strongly recommended by the publisher that they not be used
in reporting a student's score to parents or other persons

with little or no training in testing.

Oral Reading Assessment

In addition to standardized tests of reading
achievement, procedures for assessing oral reading
performance are frequently used to determine reading
proficiency.

The first formal assessment of reading through
observation of oral reading performance probably appeared in
1915 with the publication of Gray's Standardized Oral
Reading Paragraphs (Allington, 1984, p. 835). These test
passages, arranged in order of difficulty, were to be read
aloud while the examiner recorded such errors as
mispronunciations, omissions, additions, and repetitions.
The test received very little attention at the time of its
publication, however, probably because it coincided with
widespread criticism of oral reading and vigorous expansion
of silent reading practices in instruction brought on by

expanding literacy, changing needs in society and research

 

70
reports (Piner, 1913; Thorndike, 1917; Judd and Buswell,
1922) which stressed the superiority of silent reading over
oral reading in developing fluency and comprehension. These

studies have been summarized by Huey (1908; 1968).

Development pf Traditional Practices Moderation between

 

oral and silent reading positions eventually led to renewed
interest in oral reading. This interest was no doubt
prompted by growing dissatisfaction with standardized
testing, which offered no opportunity to observe reading
behaviors directly. During the 1930's several authors
developed descriptions of oral reading errors (Duffy and
Durrell, 1935; Daw, 1938) and oral error classification
systems (Payne, 1930; Monroe, 1932). It is Emmett Betts,
however, who is generally credited with defining and
popularizing the practices of oral reading assessment. The
principles underlying the Informal Reading Inventory, and
the concepts of independent (basal), instructional,

frustrational and 'capacity" reading levels, with the
criteria for establishing them, were presented by Betts in

one chapter of his book, Foundations pf Reading Instruction,

 

published in 1946. This work had a profound affect on the
development of modern reading diagnostic theory and
practice, and while the Betts' criteria is often challenged,
it remains widely used and commonly accepted by

practitioners today.

 

71

In determining placement of students in materials,
Betts identified four levels of functioning for a reader in
relationship to the readability of materials at various
grade levels. The first level he called the basal level.
This is generally referred to as the independent level
today, since the basal level "approximates the level at
which "free," supplementary, independent, or extensive
reading can be done successfully" (Betts, 1946, p. 446).

The second level, the instructional level, is the place
"where learning begins". It represents that level where the
learner is "challenged but not frustrated" by the material
(Betts, 1946, p. 447).

The third level, the frustration level, is "the lowest
level of readability at which the pupil is unable to
comprehend printed symbols to a reasonable degree....the
individual is inadequate to deal with the reading matter"
(Betts, 1946, p. 451).

Betts also identified a fourth level, the capacity
level which is sometimes called the listening comprehension
level today (Durrell, 1955; Kress & Johnson, 1965; Ekwall,
1976, 1979). Betts (1946, p. 452) describes this level as

n

the highest level of readability of material which the
learner can comprehend when the material is read to him."

Betts (1946, p. 446) included the following in his
criteria for establishing a reader's basal level: Accurate

pronunciation of more than 99% of the words; freedom from

tension and finger pointing; acceptable reading posture;

72
oral reading characterized by proper phrasing; accurate
interpretation of punctuation; and use of conversational
tone.

The criteria for oral reading at the instructional
level (Betts, 1946, p. 449) included the following:
Accurate pronunciation of 95% of the running words; ability
to anticipate meaning; freedom from tension, finger pointing
and head movement; and acceptable reading posture.

At the frustration level (Betts, 1946, p. 451), the
criteria for oral reading included: Inability to pronounce
ten percent or more of the running words; frequent or
continuous finger pointing; distracting tension, such as
frowning, blinking, excessive and erratic body movements;
unwillingness to attempt the reading; attempts to distract
the examiner's attention from the problem; word-by-word
reading; failure to interpret punctuation; high-pitched
voice; meaningless word substitution; repetition of words;
insertion of words; partial and complete word reversals;
omission of words; and practically no eye-voice span.

The criteria as presented by Betts contains several
contradictions which have caused considerable controversy
and variation in the way the criteria has been interpreted
in practice. First, Betts established word recognition
scores of 95% for the instructional level and 90% for the
frustration level leaving a gap of 5 percentage points not
designated as being at any level. Secondly, the way Betts

originally presented the criteria left it unclear if silent

73

reading should precede the oral reading in an IRI, or if the
oral reading should be at sight without the benefit of
preparation. Finally, using the Betts' criteria is further
complicated because, although Betts gave definite
percentages for judging word recognition at each level, he
did not clearly define what deviations from text should be
considered in determining these percentages. He simply
refers to "accurate pronunciation" of a given percent "of
the running words" (Betts, 1946, p. 446, 449, 451).
Consequently, what determines a mispronunciation has been
left largely up to interpretation.

Most authors have dealt with the gap in percentages
between levels by using other information gathered during
the reading, to decide if a score falling in the range of
95% to 90% should be designated at the reader's
instructional level or frustrational level.

The question of silent reading proceeding oral reading
is a more serious one, since the number of errors made in
oral reading falls drammatically when the reader is first
allowed to prepare silently (Brecht, 1977).

The confusion concerning silent reading preparation
seems to stem from the fact that Betts introduced the
principles of the informal reading inventory, a form of
testing, with the principles underlying a directed reading
lesson, 3 form of instruction, simultaneously, in the same

section of his text. In this regard, Betts (1946) writes

 

74

There is general agreement on one basic principle
regarding directed reading instruction ..... namely,
silent reading should precede oral reading. (p.
449) (Emphasis added)

 

 

A few pages later, in giving the principles underlying an

Informal Reading Inventory, he states

In general, the procedure £23 £23 administration
pf pp informal reading inventogy for the
systematic observation of performance in
controlled reading situations is based 33 the
principles governing a directed reading activity.
(p. 456) (Emphasis added)

 

 

 

As one of these principles, he notes that "silent reading
should precede oral reading", but a few lines later he

writes

 

Ag exception £2 the principles basic £2 a directed
reading activity is that of using oral reading at

sight (i.e., without previous silent-reading
preparation) as one means of appraising reading

performance. (p. 456) (Emphasis added)

 

 

 

On the next page (p. 457), in giving a description of the
"procedure for appraising reading achievement by means of an
informal reading inventory", he lists "Oral Reading at
Sight" as the first step and explains that this is done to
"appraise reading behavior in a situation where the pupil is
without benefit of preparation".

It appears that Betts clearly intended oral reading at
sight to be the first step in administering an IRI, a form
of testing, and that prepared oral reading was to be used in

a directed reading lesson, a form of instruction. But the

criteria for the reading levels is listed with descriptions

 

75
of the directed reading lesson. Thus it is unclear if Betts
meant this criteria to be used with unprepared oral reading
in a testing situation or some kind of continuous evaluation
of reading progress made during the directed reading lesson.

Generally the criteria established by Betts has been
used with unprepared oral reading. Several authors,
however, debate this practice, especially since it appears
that Betts based the criteria on the results of a study
conducted for a doctoral dissertation by a student under his
direction, Killgallon (1942). In the Killgallon study the
subjects preread the research passages silently.

Since Betts did not specify what deviations from text
should be considered as errors when using his criteria,
considerable variation has resulted in interpretation and
practice. Generally, counting the following deviations has
been widely agreed upon: omissions; substitutions;
insertions; gross or partial mispronunciations; and words
aided. This agreement, however, may be due more to the high
interscorer reliability found on these items, rather than
their demonstrated relationship to frustration. Most
authors also consider hesitations and lack of regard for
punctuation important as well, but do not count them in
computing the percentages, probably because they are
difficult to score objectively (Ekwall, 1976, p. 266).

Whether or not to count repetitions as errors has been
one of the more controversial issues (Ekwall, 1976, p. 267).

Some writers feel repetitions should not be counted since

76

recent psycholinguistic research suggests the repetition or
regression is frequently the student's means of reprocessing
a selective bit of data necessary to the emerging story line
(Guzak, 1970, p. 667). Other authors recommend counting
only the first repetition but not subsequent repetitions of
the same word or group of words. Some suggest counting only
repetitions of more than one word, while others, like
Ekwall, insist that all repetitions should be counted as
errors.

Ekwall bases his insistence on research studies (Ekwall
and English, 1971; Ekwall, Solis and Solis, 1973; Ekwall,
1974) that not only give support for his position, but also
provide physiological evidence that, as material becomes
more difficult, readers really do experience the anxiety
associated with frustration. Using polygraph and galvanic
skin measurement devices, the researchers found the students
actually became physiologically frustrated before they
reached the percentage of errors normally recognized as
being at the student's frustration level. As Ekwall (1967)
explains

..students become so concerned about their
reading performance that their hearts beat faster,

they begin to perspire, etc. just as one does when

he is frightened or extremely nervous. With this

sort of empirical research available it seems that

there should be no doubt that using the normally

recognized criteria, all repetitions should be

counted as errors. (p. 267)

While instructions for preparing teacher made oral

reading tests generally suggest use of the Betts' criteria,

1_‘

 

77

the authors of commercial oral reading tests have usually
developed their own standards (Powell and Dunkeld, 1971;
Allington, 1984, p. 838). Criteria has differed from author
to author but has generally allowed more errors at the lower
grade levels. Other research studies have also suggested
that the criteria for establishing reading levels should
differ with the ability of the reader. Ekwall, Solis and
Solis (1973), for instance, found that it seems to take
fewer oral errors to frustrate good readers than poor ones.
Studies by Cooper (1953) and Powell (1969) suggest that
children in lower grades seem able to tolerate a greater
percentage of oral errors while maintaining a given level of
comprehension.

In reviewing studies concerning the reading levels
criteria, it should be noted that various researchers are
actually defining the frustration level differently. Powell
(1969), for instance, is viewing it as the point where
comprehension breaks down, while others like Betts (1946)
and Ekwall (1967), are considering it as the place where
difficulty in reading begins to produce an anxiety reaction
in the reader. Still other researchers (Cooper, 1952;
Dunkeld, 1970) have been concerned with validating the
instructional level in terms of the relationship of error
rate to achievement. Studies by Gambrell, Wilson, and Gantt
(1981), Berliner (1981) and Jorgenson (1977) have suggested
that achievement improves when students are placed in

materials which produce error rates of 5% or less, and that

 

78
readers placed in materials which produced error rates
greater than this tended to spend more time off task.

In the final analysis, while the traditional oral
reading assessment practices described here appear to be
very pervasive in both educational practice and the
pedagogical literature, they remain a highly diverse and
subjective matter, using varying standards and criteria,
with amazingly little empirical evidence to support their
widespread acceptance. On the other hand, traditional
practices seem to prevail because, as yet, although efforts
may be increasing, no one has presented conclusive evidence

for anything better (Pikulski and Shanahan, 1982).

Traditional Versus Psycholinguistic Diagnosis During the

 

late 1960's and early 1970's, researchers at Wayne State
University, under the leadership of Kenneth Goodman,
conducted a series of investigations in which they studied
the oral reading "miscues" of children and adults. This
research has provided new insights into the reading process
and has led to the development of new theories and models of
reading as well as a new approach to reading diagnosis. The

Reading Miscue Inventory (Goodman and Burke, 1970) was

 

developed as a diagnostic procedure based on principles
generated by miscue research.

In the miscue analysis studies, a miscue was defined as
the deviation between the oral response of the reader and

the expected response of the text. Allen (1976) notes it

 

79
was a basic assumption of the studies that every response a
reader makes is cued in some way by the reading situation
and these responses will vary qualitatively.

Goodman (1967) has characterized the reading process as
a "psycholinguistic guessing game" in which the reader is
constantly sampling cues from the material, predicting what
will come next and verifying those predictions by sampling
more cues. The Goodman model is based on three cue systems
which the readers in the miscue studies seemed to be using:
(a) Grapho-phonic (sound—symbol relationship) cues, (b)
syntactic (grammar) cues, and (c) semantic (meaning) cues.

A basic assertion made by Goodman is that readers rely
as little as possible on grapho-phonic cues. Instead they
tend to use higher order language and meaning cues, and
their miscues are most often affected by semantic and, even
more importantly, syntactic constraints. Authors have
recently begun to characterize this type of model as "top-
down" processing (DeBeaugrande, 1981). In contrast,
traditional diagnosis has viewed reading as a "bottom-up"
process, proceeding from letters to sounds to word
recognition to meaning.

Both top-down and bottom—up processing models have had
problems explaining, from a theoretical position, apparent
contradictions which have appeared in particular research
studies, especially differences in strategies used by good
and poor readers and differences between recognition of

words in isolation as opposed to recognition in context. In

80

response, Stanovich (1980) has proposed an "interactive-
compensatory model" which suggest readers use both types of
processing. Samuels and Kamil (1984, p. 213) explain that a
poor reader, who may be inaccurate or slow at word
recognition but who has knowledge of the text topic, may use
top-down processes to compensate for the weakness in
decoding. On the other hand, if a reader is skilled at word
recognition but does not know much about the text topic, he
may find it easier to simply recognize the words on the page
and rely on bottom-up processes.

While many controversies still surround the interactive
view of the reading process, Spiro and Myers (1984) have
concluded that

By most accounts, the dominant view of reading

today is that of an interactive activity

(Rumelhart, 1977). Processing goes on from the

bottom-up and from the top-down (either

simultaneously or alternatingly). (p. 483)

Essential both traditional and psycholinguistic
diagnosis consider the same reading behaviors as errors or
miscues, but they have differed sharply in how those
behaviors are viewed. Traditional diagnosis has treated all
errors as undesirable behaviors to be eliminated. The
purpose of error analysis is to determine the best
instructional procedure to accomplish this.
Psycholinguistic diagnosis considers miscues as a natural

aspect of the reading process, and the term "miscue' is used

instead of the word "error' to denote this distinction. Not

 

81
all miscues are considered undesirable. Qualitative rather
than quantitative analysis of miscues is carried out to
determine the seriousness of the miscue and to gain insight
into the strategies being used by the reader.

While the goal in traditional oral reading assessment
has been both diagnosis and placement in materials,
placement has not been a goal of miscue analysis. Rather
the reader is purposely given difficult material in order to
elicit a sufficient number of miscues for making the
analysis. Many subsequent studies, however, have examined
the relationship between miscues and material difficulty as
well as reader's proficiency (Christie, 1981; Christie and
Alonso, 1980; Kibby, 1979; Leslie and Osol, 1978; Schlieper,
1977), and many of the findings of miscue research have
implications which challenge assumptions underlying current
readability theory.

Laura Smith (1976, p. 146), as a part of the reading
miscue research project, was involved with testing materials
being considered for inclusion in a new basal reading
series. Based on the oral reading miscues and retellings by
many children, she reported that the researchers found many
factors that seemed to be ignored by readability scales.
The factors could be categorized as either language related
or concept related factors. Among language related factors,
even though traditional readability theory asserts that
short sentences are easier to read than long ones, the

researchers found that very long sentences could be read

82

easily under the following conditions:

1. When the grammatical function of words and their
meanings were familiar in a long sentence. The word brown,
for instance, might be easily identified when used as a
color word, but presented difficulty when used as someone's
name. The word saddle was not a problem when it appeared as

a noun, but was more difficult when used as a verb.

2. When the phrases in a long sentence were familiar.

Phrases such as "she walks in such a way and "Charlie

turned his attention" were difficult for many readers.

3. When the tense choices in a long sentence were familiar
to the reader and predictable in the story. Subtle changes
in tense made by an author, usually to emphasize a point,
were difficult for the readers. Frequently they would

change the tense to the one they expected.

4. When the word order in a long sentence was predictable.
Questions and negative statements were consistently not
anticipated and readers frequently changed the construction
into positive statements. Sentences beginning with the
words what, where and when usually suggested a question and
if the sentence was not a question the readers would often
change the structure to make it a question. Dialogue and
dialogue carriers presented problems. Dialogue carriers
appearing at the beginning of the sentences were the easiest

to read and those in the middle were the most difficult.

 

83

Dialogues containing a name were even harder. "We must
hurry, John," said Mother. "We will be late." was often
read as "We must hurry." John said, "Mother, we will be
late." Unusual dialogue carriers, such as shouted, cried or
screamed and additions to carriers such as gloomily,
anxiously and briskly presented problems for the readers.
In addition, the word order in directions and descriptions
of processes caused more miscues than stories with a plot.

Three concept related factors were also found to be
important: (a) The amount of specialized vocabulary, (b) the
amount of vocabulary that was unfamiliar to the reader and
(c) the complexity of the concept and how thoroughly it was
developed.

In spite of their differences, both traditional and
psycholinguistic diagnosis share some common weaknesses.
Both generally assume oral reading can indicate silent
reading processes, an assumption not universally agreed
upon. Both diagnostic procedures also suffer from lack of
empirical evidence of their validity and both rely heavily
on judgments made by the examiner.

The assumption that oral and silent reading processes
represent a unitary phenomenon is a position held by K. S.

Goodman and implied in the Reading Miscue Inventory. Some

 

studies (Fairbanks, 1937; Gillmore, 1947) have found a high
correlation between silent and oral reading which would
justify the use of oral reading performance to assess

reading achievement in general. Other researchers (Wells,

 

84
1950; Mosenthal, 1976-77, 1978), however, found evidence
supporting a contrary position.

As an indicator of silent reading response, oral
reading may have its greatest validity when used at the
primary level or in the beginning stages of reading
development. It appears that, at these levels oral and
silent reading tend to be very similar processes, but they
soon begin to diverge, until finally, in the mature reader,
they may become two totally different aspects of language.
This position has been supported by Gray and Reese (1957),
who found that a student's reading rate for both types of
reading was virtually the same at the first grade level, but
by second grade, silent reading was becoming faster and it
continued to do so every year thereafter.

The problem of subjectivity continues to be a major
concern in both traditional and psycholinguistic diagnosis
since several studies have indicated that oral reading
assessment can be a highly diverse matter with little, if
any agreement among diagnosticians. Weinshank (1980), for
instance, found agreement between any two practitioners
(reading specialists, learning disabilities specialists and
classroom teachers) concerning the reading diagnostic
statements they made regarding the same case, was virtually
nil (0.00). Moreover, she also found that when a clinician
was presented a virtually identical replica of a case they
had diagnosed at an earlier time, the mean agreement with

their own previous statements was less than 0.23. Studies

 

85
by Sherman, Weinshank and Brown (1979), by Gill, Polin,
Vinsonhaler and VanRoekel (1980) and by Polin (1981) have
demonstrated, however, that practitioners can agree on what
they find if first they agree on what they are looking for.
These studies found diagnostic agreement could be improved
drammatically through training, especially when decision

aids were employed.

Summary of the Literature Review

Both the development of readability formulae and

methods for assessing reading achievement through
standardized testing occurred simultaneously but
independently during the early part of this century. Both

appear to have been influenced heavily by the almost
universal emphasis on silent reading in instruction during
the time, an emphasis which was prompted by the changing
needs of society and supported by research studies
indicating the superiority of silent reading in developing
comprehension and fluency (Pitner, 1913; Thorndike, 1917;
Judd and Buswell, 1922). It was perhaps because of this,
that the use of oral reading in formula development or
validation studies, seems to have been largely ignored, and
oral reading as an assessment procedure didn't become

popular until the middle of the century, prompted no doubt

by dissatisfaction with standardized testing.

 

 

86

Formula Limitations While a vast number of readability

 

formulae have been developed, virtually all have used the
same methodology, and have encountered similar problems.
These problems have centered on the factors studied and the
criteria used in formula development. The factors studied
have been seriously limited since only quantitative, rather
than qualitative elements, can be used in the prediction,
and generally only factors of style difficulty have lent
themselves to that kind of analysis. Moreover, only two
elements of style difficulty, some measure of vocabulary
load and some measure of sentence complexity, have
consistently emerged as significant enough, or measurable
enough, to be included in the final formula.

The criterion materials used in formula development
have varied widely in content, the range of difficulty of
the criterion passages and the methods used to establish
that difficulty. This greatly limits the generalizability
of any one formula, for in a strict scientific sense the
formula is only applicable to materials similar to those on
which the formula was based. Moreover, the use of
comprehension questions for establishing passage difficulty
of a selection can be effected by asking more or less
difficult questions. Finally, while formulae may be useful
for establishing the relative difficulty of passages, their
ability to relate this difficulty to the reading
accomplishment needed by students in various grade levels is

questionable.

 

87
The EEK Graph Historically, early formulae, after a short
period of increasing complexity, showed a sharp reversal
toward greater simplicity and ease of use. The Fry
procedure (1968) is directly related to this continuing
trend. Only two elements, word length measured in syllables
and sentence length measured in words, are used, since
previous research has repeatedly found these two factors
account for a great deal of the variability in reading
difficulty. It appears that the criterion materials used in
developing the method were taken directly from basal readers
or other materials intended for children. Apparently Fry
has simply accepted the publishers grade level
recommendations of the passages in establishing their
relative difficulty. He has then circumvented the task of
making tedious calculations by developing a nomograph,

rather than a regression equation.

Assessment pf Reading Ability Efforts to assess reading
ability have basically taken two directions: (a) The

development and use of standardized tests and (b) the
development of oral reading assessment procedures.
Standardized tests have proven to be highly reliable
measures useful for comparing students' reading performances
and for screening students to determine if more extensive
testing is needed. Such tests have been unsatisfactory,
however, for providing direct observation of reading

behaviors, for diagnosis of specific reading difficulties or

 

88
for placing students in reading materials.

Oral reading performance is frequently used as a
readability measure and in informal reading assessment.
Traditionally some variation of procedures and criteria
described by Betts (1946) have been used for this purpose.
While the Betts' criteria is frequently challenged, and
contains contradictions resulting in much variation in
practice, it remains widely accepted and has had great
influence on traditional diagnostic theory. More recent
psycholinguistic studies, however, are providing new
insights into the reading process and a new approach to
reading diagnosis. This approach views reading "miscues" as
a natural reading phenomenon to be analyzed qualitatively
rather than quantitatively. The results of miscue research
studies have also held some important implications for
readability prediction since the miscues made by the readers
in these studies frequently contradicted some of the basic
assumptions of current readability theory. Especially
challenged are those assumptions concerning the difficulty
of reading long sentences.

Oral reading has received very little attention in the
development of readability prediction methods or in studies
validating their use. This trend has continued to the
present, with no indications of oral reading being used in
developing criteria on which new formulae might be based,
and while Fry (1969) and Paolo (1977) have used oral reading

briefly in validation of the Fry procedure, and Fry

 

89
encourages the practice, any further use of oral reading for
this purpose appears to be rare and obscure or unpublished

if it exists at all.

 

CHAPTER III

DESIGN OF THE STUDY

Overview

This study was designed to use oral reading assessment
procedures to evaluate the oral reading performance of a
group of fifty (50) third grade students. The purpose of
the study was to assess how effectively the readers'
standardized test scores and Fry Readability Graph data
would match readers with materials of appropriate
difficulty.

The subjects' grade equivalency scores from the Reading
Test of the California Achievement Tests were within three
months above or below their grade placement at the time of
testing, thus suggesting a rather homogeneous group of
students of average reading achievement.

Each subject read the same set of five selections, one
each with a readability of first, second, third, fourth and
fifth grade, as determined by the Fry Readability Graph.
The readability scores of the selections thus suggested
gradually increasing difficulty from considerably below to
considerably above the students' tested reading achievement.

If the students' standardized test scores and the
readability graph data provide an effective means for
matching readers with materials of appropriate difficulty,
then we would expect to observe the following when the

students were reading the research passages aloud: (a) The

90

 

91
subjects will make more word recognition errors (miscues)
and will read more slowly as the readability of the passages
increases, and (b) the subjects will read the passage with
first grade readability with ease (at their independent
reading level), the passage with third grade readability
with some difficulty (at their instructional reading level)
and the passage with fifth grade readability with great
difficulty (at their frustrational reading level). Based on
these expectations, the following questions and hypotheses

were developed to direct the research.

Questions Guiding the Study

The following questions were generated to be answered
by this study when subjects from a group of average third
grade readers, as determined by the Reading Test of the
California Achievement Tests, are reading aloud from
selections with varying Fry determined readabilities.

1. Will the readers' word recognition accuracy, based
on their oral reading errors (word miscues) decrease as the
readability scores of the selections increase?

2. Will the readers' reading rate, in terms of the
number of words read per minute, decrease as the readability

scores of the selections increase?

3. Will the readers read the selection with first
grade readability at their independent reading level?

4. Will the readers read the selection with third
grade readability at their instructional reading level?

5. Will the readers read the selection with fifth
grade readability at their frustrational reading level?

92
Hypotheses
Based on the questions guiding the research, the
following hypotheses were constructed.

When a group of average third grade readers are reading

aloud from materials with varying Fry determined
readabilities
1. The mean of the word recognition accuracy scores

for any paragraph will be greater than the mean of the word
recognition accuracy scores for any paragraph with a higher
readability.

1a. The mean of the word recognition accuracy scores
for any paragraph will not be greater than the mean of the
word recognition accuracy scores for any paragraph with a
higher readability.

2. The mean of the reading rate scores for any
paragraph will be greater than the mean of the reading rate
scores for any paragraph with a higher readability.

2a. The mean of the reading rate scores for any
paragraph will not be greater than the mean of the reading
rate scores for any paragraph with a higher readability.

3. On the passage with a first grade Fry determined
readability, the greatest percentage of the readers will be
reading at their independent reading level.

3a. On the passage with a first grade Fry determined
readability, the greatest percentage of the readers will not
be reading at their independent reading level.

4. On the passage with third grade readability, the
greatest percentage of the readers will be reading at their
instructional reading level.

4a. On the passage with third grade readability, the
greatest percentage of the readers will not be reading at
their instructional reading level.

5. 0n the passage with fifth grade readability, the
greatest percentage of the readers will be reading at their
frustrational reading level.

5a. On the passage with fifth grade readability, the
greatest percentage of the readers will not be reading at
their frustrational reading level.

 

93
Population

The subjects for this investigation were selected from
third grade students attending five Chapter I identified
schools in the Bay City Public Schools System, Bay City,
Michigan.

The identification of a school for Chapter I services
in this district is based on the percentage of students
eligible for free or reduced lunches. Since this figure is
determined by family income, it is considered for these
purposes to be an index of social economic status for the
school's population. A school is determined eligible for
Chapter I services in this district if it has a greater
percentage of students eligible for free or reduced lunches
than does the district as a whole. Thus the subjects in
this study were attending schools in the lower half of the

district's social-economic scale.

Sample Selection

Third grade students in the Bay City Public Schools had
taken the level 12 California Achievement Tests (CAT) as
second graders in May of the preceding school year as part
of the system's district-wide testing program. After
securing permission and support from the district's central
administration, the researcher asked principals in the
participating schools to first ascertain that third grade
teachers in their buildings were willing to cooperate in the

data collection, and then to identify third grade students

 

94
who had taken the CAT the previous spring and who had grade
equivalency scores ranging between 2.3 and 3.3 on the
Reading Test from that battery. Parents of these students
were then sent letters briefly acquainting them with the
study and asking for their permission in order to have their
child participate. Those children who had parental
permission and were themselves willing to be involved, were
then administered the CAT Level 13 Reading Test. The group
of fifty subjects were then chosen from those students with
grade equivalency scores ranging from 3 months above to 3
months below their grade placement at the time of testing

with the CAT Level 13 test.

Instrument Selection and Construction

Measurement pf Student Reading Ability The Reading Test

 

of the California Achievement Tests (1977 edition) was used
in this study as a measure of student reading achievement.
This instrument was chosen because it is a widely used and
highly regarded, nationally normed standardized test. It is
also the test used by the subjects' school system for its
district—wide testing program. Therefore it is probable
that instructional decisions affecting the subjects are

commonly made based on results from this test.

Passage Selection The passages which were read by the

 

subjects were taken from an SRA Reading Laboratory Ic,

95

published by Science Research Associates, Inc., 1961.
Materials were selected from this source because it provided
access to many short selections, similar in style,
specifically written for children, and easily available.
Furthermore, the material involved was not part of the
subjects' regular reading program, and the lab chosen was an
older edition not currently being used by teachers in the
district. These considerations reduced the likelihood that
the subjects may have had previous exposure to the material
either as part of their regular reading program or as
supplemental reading material. Finally, although the
publisher provided readabilities for the selections, these
readabilities did not correspond with those determined by
the Fry Readability Graph.

The selections used in the study were of comparable
length, ranging form 95 to 106 words, and were reproduced on
plain white typing paper using the same type size and format
and eliminating illustrations. Each selection was titled,
however the title was not included in determining the
readability of the passages and errors made by the subjects
when reading the titles were not included in the error
counts for the selections. Five selections were used, one
selection each with a Fry determined readability score of
first, second, third, fourth and fifth grade. An additional
selection, one with a first grade Fry determined
readability, was also prepared and read by all subjects as a

practice passage.

 

96

Determination g: Readability Scores from the Fry
Readability Graph were used as the measure of passage
difficulty of the selections. The Fry procedure was chosen
because of its speed, ease of use and great popularity.

Readabilities of many selections were first computed by
using the microcomputer text analysis program School
Utilities Volume 2, published by the Minnesota Educational
Computer Consortium. Three to five selections were then
chosen at each grade level of readability. These selections
were then plotted manually on the Fry Graph to verify the
grade level designations obtained by the computer program.
The passages used in the study were then chosen from those
paragraphs which both the computer and the Fry graph

designated as being at a given grade level.

Data Collection

Subjects were given an orientation session by the
researcher, familiarizing them with the location in which
they would be reading, the recording equipment that would be
used and the task the researcher would be asking them to do.
With the cooperation of their classroom teachers, each
subject was taken individually to a relatively secluded area
in the school to do the readings. All subjects first read
the practice paragraph. They were then administered the
research passages in random order and asked to read each of
these aloud. The readings were audio recorded for later

analysis.

 

97
Data Recording
Three types of data were recorded from the subjects'

oral readings:

1. A word recognition accuracy score for each subject
for each paragraph, based on the number of miscues made per
100 words. For instance, if a reader made two miscues per

100 words, the word recognition accuracy score would be 98%.

2. A reading rate score for each reader on each

passage based on the number of words read per minute.

3. A reading level designation of "independent",
"instructional" or "frustrational" for each passage for each
reader based on the reader's word recognition accuracy score

and using the Betts' criteria.

From the recordings of this data, the following

determinations were made:

1. The means of the word recognition accuracy scores

for each paragraph.

2. The means of the reading rate scores for each
paragraph.
3. The percentage of readers reading at their

independent reading level on each passage.

4. The percentage of readers reading at their

instructional reading level on each passage.

 

98
5. The percentage of readers reading at their

frustrational reading level on each passage.

Data Analysis
This study was primarily descriptive in nature, however
the following data analysis techniques were used to aid the

researcher in the descriptive process:

1. A repeated measures design with each individual
exposed to five treatments was employed. Subjects read the
research passages in random order to control for the
sustained effects which can occur when taking repeated
measures. Analysis of variance was then used to determine
(a) if there were differences between the means of the word
recognition scores for each paragraph and (b) if there were
differences between the means of the reading rate scores.
The computational formula used was presented by Winer (1971,
p. 261-308) for use in single factor experiments with
repeated measures. The formula is given in Appendix D.
When analysis of variance indicated that differences did
exist, the Scheffe test for post-hoc comparisons was used to
determine where differences occurred. The computational
formula was taken from Hinkle, Wiersma and Jurs (1979, p.
276—280), and is also presented in Appendix D. In the post-
hoc comparisons, each paragraph was contrasted individually
with all paragraphs of a higher readability. In other

words:

 

99
Pl vs. P2 P2 vs. P3 P3 vs. P4 P4 vs. P5
P1 vs. P3 P2 vs. P4 P3 vs. P5
P1 vs. P4 P2 vs. P5

P1 vs. P5

where P1 = the means of the word recognition
scores (or the reading rate scores) for the
paragraph with first grade readability, P2 = the
means of the word recognition scores (or reading
rate scores) for the paragraph with second grade

readability, etc.

A series of bar graphs was also constructed, to
show the relative number of subjects reading at
each of the three reading levels, independent,

instructional and frustrational, for each passage.

 

CHAPTER IV

PRESENTATION AND ANALYSIS OF RESULTS

Introduction

In this chapter the results of the study and a
descriptive analysis of the findings will be presented. The
purpose of the analysis will be to determine (a) whether or
not the frequency of miscue and the reading rates for each
paragraph would suggest that the readers experienced
increasing difficulty in the material as the grade level
readability indexes of the passages increased, and (b)
whether or not the readability grade level scores of a
selection were predictive of the students' functional
reading levels on that selection.

Reviewing the hypotheses set forth in this study, we
would expect to find that the means of the scores of word
recognition accuracy and reading rate would both decrease as
the readability of the paragraphs increased. We would also
expect to find that the greater percentage of the readers
would be reading at their independent level on the paragraph
with first grade readability, at their instructional reading
level on the paragraph with third grade readability and at
their frustrational reading level on the paragraph with
fifth grade readability.

The results of the study will first be examined as they
related to each of the questions originally developed to

guide the research, and the hypotheses formulated in

100

 

 

 

101

association with each question.

Presentation of Results

Question 1. Will the readers' word recognition
accuracy, based on their oral reading errors (word miscues)
decrease as the readability scores of the selections

increase?

H The mean of the word recognition accuracy scores
for any paragraph will be greater than the mean of the word
recognition accuracy scores for any paragraph with a higher
readability.

Ho The mean of the word recognition accuracy scores
for any paragraph will not be greater than the mean of the

word recognition accuracy scores for any paragraph with a
higher readability.

Table IV-l presents the means of the word recognition
accuracy scores for each paragraph when all miscues were
counted. It should be noted that while this data is
presented for each paragraph in sequential order, the
subjects actually read the paragraphs in random order.
Randomization was necessary in order to control for

sustained effects.

Table IV—l
MEANS OF WORD RECOGNITION ACCURACY SCORES
BASED ON TOTAL NUMBER OF MISCUES
Paragraph 1 2 3 4 5

Means 93.92 94.4 94.36 93.8 92.92

As the table shows these means were almost the same for
each passage. An analysis of variance for repeated measures

(Winer, 1971, p. 266), as expected, indicated no significant

102
differences. (See Appendix D for formula and computational
procedures used.)

The decision was to accept the null hypothesis for all
contrasts and to conclude that, when all miscues were
considered, the readers' word recognition accuracy scores
did not decrease as the readability scores of the selections

increased.

Question 2. Will the readers' reading rate, in terms
of the number of words read per minute, decrease as the
readability scores of the selections increase?

H The mean of the reading rate scores for any
paragraph will be greater than the mean of the reading rate
scores for any paragraph with a higher readability.

Ho The mean of the reading rate scores for any
paragraph will not be greater than the mean of the reading
rate scores for any paragraph with a higher readability.

Table IV—2
MEANS OF READING RATE SCORES
(WORDS READ PER MINUTE)
Paragraph 1 2 3 4 5

Means 93.976 110.93 99.998 98.172 94.358

Analysis of variance indicated significant differences
among these scores. The Scheffe post-hoe comparison test
(Hinkle, Wiersma and Jurs, 1979, p. 364-368) was used to
make all possible pairwise comparisons between the mean of
each paragraph and the mean of every paragraph with a higher
readability. (See Appendix D for formula used and

computational procedures.) Significant differences were

103
found between paragraphs 1 and 2, 1 and 3, 2 and 3, 2 and 4
and paragraphs 2 and 5. However, the differences between
paragraphs 1 and 2 and paragraphs 1 and 3 were in a
direction opposite of what would be expected. In other
words paragraph 1 was read more slowly than paragraphs 2 or
3.

The decision was to accept the null hypothesis for 7 of
the 10 contrasts and to reject the null hypothesis for the
following contrasts: Paragraph 2 vs. paragraph 3, paragraph
2 vs. paragraph 4 and paragraph 2 vs. paragraph 5.

It was concluded from this data that the number of
words read per minute did decrease between paragraph 2 and
paragraph 3, paragraph 2 and paragraph 4 and between
paragraph 2 and paragraph 5, but that the number did not
decrease between any other paragraphs. In fact the number

actually increased between paragraphs 1 and 2 and paragraphs

1 and 3.

Question 3. Will the readers read the selection with
first grade readability at their independent reading level?

H On the passage with first grade readability, the
greatest percentage of the readers will be reading at their
independent reading level.

Ho On the passage with first grade readability, the

greatest percentage of the readers will not be reading at
their independent reading level.

Table IV-3 presents a graph showing the relative
percentages of readers at each of the functional reading

levels, independent, instructional and frustrational, on the

 

104

paragraph with first grade readability. Because of the gap
in the Betts' criteria, readers with word recognition
accuracy scores from 91% to 94% did not fall into any of
these categories. A fourth category labeled "Instructional-
Frustrational" was created to accommodate data from these
readings.

The shaded bars on the graph represent the actual
results obtained in the study. The unshaded bars represent
the results that could be reasonably expected if the
alternate hypothesis were true. The unshaded bars were
included to provide a means of comparison between actual and

expected findings.

 

50%

40%

30%

20%

10%

105

Table IV-3

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
0N PARAGRAPH #1
(First Grade Readability)

 

 

 

W
7%

 

 

/ ,

 

 

 

 

 

W”

 

 

 

 

 

 

 

6%

Independent

50%

Instructional

28%

Instructional
Frustrational

16%

Frustrational

 

106

As the graph indicates 6% of the readers read the first
grade passage at their independent reading level, that is
with 99% or 100% word recognition accuracy. Fifty percent
were reading at their instructional level (95% to 98% word
recognition accuracy) and 16% were reading at their
frustrational level (90% word recognition accuracy or less).
Twenty-eight percent of the readers had scores between 94%
and 91% and were placed in the "Instructional-
Frustrational" category.

The greatest percentage of readers were reading at
their instructional level on paragraph 1, rather than at
their independent reading level. Therefore, the decision

was to accept the null hypothesis for question 3.

Question 4. Will the readers read the selection with
third grade readability at their instructional reading
level?

H On the passage with third grade readability, the
greatest percentage of the readers will be reading at their
instructional reading level.

Ho On the passage with third grade readability, the

greatest percentage of the readers will not be reading at
their instructional reading level.

Table IV-4 presents a graph showing the relative
percentages of readers reading at each functional reading

level on the third grade paragraph.

 

20%

10%

107
Table IV—4

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
ON PARAGRAPH #3
(Third Grade Readability)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

% z

 

4% 52% 36% 8%

Independent Instructional Instructional Frustrational
Frustrational

 

108

Four percent of the readers were reading at their
independent reading level. Fifty-two percent were reading
at their instructional reading level and 8% were reading at
their frustrational reading level. The other 36% fell into
the "Instructional-Frustrational" category.

The greatest percentage of the readers were reading at
their instructional level on this paragraph. The decision,
therefore, was to reject the null hypothesis in favor of the

alternative hypothesis for question 4.

Question 5. Will the readers read the selection with
fifth grade readability at their frustrational reading
level?

H On the passage with fifth grade readability, the
greatest percentage of the readers will be reading at their
frustrational reading level.

Ho On the passage with fifth grade readability, the

greatest percentage of the readers will not be reading at
their frustrational reading level.

Table IV-5 presents a graph showing the relative
percentages of readers reading at each functional reading

level on the fifth grade paragraph.

 

109
Table IV-S

PERCENTAGES OF SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
ON PARAGRAPH #5
(Fifth Grade Readability) '

 

 

 

 

 

 

 

%

/,

 

 

 

 

   

 

 

 

/

 

6% 36% 38% 20%

Independent Instructional Instructional Frustrational
Frustrational

 

110

Six percent of the readers were reading at their
independent reading level. Thirty-six percent were reading
at their instructional reading level and 20% were reading at
their frustrational reading level. The other 38% fell into
the "Instructional-Frustrational" category.

The greatest percentage of readers were reading at
their instructional and instructional-frustrational level,
and not at their frustrational reading level on paragraph 5.
The decision, therefore, was to accept the null hypothesis

for question 5.

Additional Data Analysis

As indicators of whether or not the passages in this
study showed evidence of increasing difficulty, the initial
results appeared contradictory. The results form the word
recognition accuracy scores and the determination of reading
levels seemed to indicate that there was no change in
difficulty from paragraph to paragraph. The reading rate
scores, however, indicated that, while the readability
scores did not appear to discriminate well, there were
increases in difficulty between paragraph 2 and paragraph 5,
with 2 being the easiest and 5 being the most difficult.
During the data collection process, however, the researcher
made the following additional observations which had not
been previously anticipated and which seemed to have

implications for the study:

 

111

1. Although the readers seemed to encounter
difficulty on all of the passages, miscues of a more serious
nature, such as words aided and gross mispronunciations,
seemed to be occurring more frequently on the paragraphs
with higher readability scores. This was particularly true
on the fifth grade passage where there were 29 instances of
"words aided". There were only 3 occurrences on paragraphs

1 and 4 each and no such occurrences on paragraph 2 or 3.

2. Fluency seemed to be more a reader characteristic,

rather than a function of passage difficulty.

3. Miscues did not seem to occur randomly. Instead
readers tended to miscue on the same words and at the same
places in a passage. Moreover, they frequently made the

same or a similar response.

Because these observations had implications for
readability research and oral reading assessment practices,
and because reporting the original results without further
exploration of the data might lead to erroneous conclusions,

three additional questions were raised for study.

1. Would the readers' word recognition accuracy
scores decrease as the readability of the paragraphs
increased when only unacceptable miscues were considered in
determining word recognition accuracy?

2. Did the readers read with less fluency as the
readability scores of the passages increased?

3. Did the readers' miscues occur randomly or were
there identifiable patterns?

112
In order to better answer these questions, additional

data analysis was undertaken as follows:

1. Miscues were classified as either acceptable or
unacceptable. An acceptable miscue was defined as any
miscue that had no effect or negligible effect on meaning
and any miscue that was corrected. Based on this
classification, a percent of accuracy score was determined

for each reader for each paragraph when only unacceptable

miscues were considered. The means of these scores for each
paragraph were then tested to see if there were
statistically significant differences between means. The

procedure was the same as that used with the original word

recognition accuracy scores when all miscues were counted.

2. The reading of each paragraph was rated on a scale
of 1 to 5, with 1 representing the greatest fluency. The
ratings were based on the researcher's general impression of
the fluency with which the passage was read. In order to
make this determination, the researcher listened to each
recorded reading without watching the script and judged the
reading on the basis of how it would compare to a television
or radio newscast. To receive a high rating the reading had
to make sense, be read with appropriate phrasing and
intonation and be free of hesitations and repetitions.
Readings receiving low ratings had many instances of
mispronunciations, omissions, substitutions or improper

phrasing which rendered some portion of the reading

 

 

113
senseless or the reading was characterized by monotone, word
by word reading, hesitations, stammering, long pauses or
requests for aid. The fluency rating scores for each
paragraph were then totaled and the means were tested for
differences, again using the same procedure as that used for

the word recognition scores.

3. A frequency count of the miscues that occurred on
each word in each paragraph was made. Graphic
representations were then constructed depicting the

frequency of miscue for each word in each paragraph.

Additional hypotheses were developed for the first two
questions, based on the results that would be expected if
the readers were truly experiencing more difficulty as the
readability of the passages increased. Hypotheses were not
developed for the third question because the question did
not lend itself to hypothesis testing. Rather an inspection
of the miscue frequency graphs was used to decide if
discernable patterns of miscue clustering occurred and if
high incidence of miscue seemed to be occurring on specific
words.

Each of the additional questions, their accompanying
hypotheses and the results of the data analysis associated

with them will be presented in turn.

Presentation of Additional Data Analysis Results

Question 1a: Will the readers' word recognition
accuracy scores decrease as the readability of the

114

selections increases if only unacceptable miscues are
considered in determining the word recognition accuracy
scores?

H The mean of the word recognition accuracy scores
for any paragraph, when based on the unacceptable miscues
only, will be greater than the mean of the word recognition
accuracy scores for any paragraph with a higher readability.

Ho The mean of the word recognition accuracy scores
for any paragraph, when based on the unacceptable miscues
only, will not be greater than the mean of the word
recognition accuracy scores for any paragraph with a higher
readability.

Table IV-6 presents the means of the word recognition

accuracy scores for each paragraph when only unacceptable

miscues were considered.

Table IV—6
MEANS OF WORD RECOGNITION ACCURACY SCORES
WHEN ONLY UNACCEPTABLE MISCUES WERE COUNTED
Paragraph 1 2 3 4 5

Means 99.28 99.58 99.02 99.14 97.36

Analysis of variance indicated significant differences
in these scores. The Scheffe post—hoe comparisons indicated
significant differences between paragraphs 1 and 5,
paragraphs 2 and 5, paragraphs 3 and 5, and paragraphs 4 and
5. It was decided, therefore, to accept the null hypothesis
for six of the ten contrasts and to reject the null
hypothesis in favor of the alternate hypothesis for the
following contrasts: Paragraph 1 vs. paragraph 5; paragraph
2 vs. paragraph 5; paragraph 3 vs. paragraph 5; and

paragraph 4 vs. paragraph 5. It was concluded that, when

 

115
only unacceptable miscues were considered, word recognition
accuracy scores did decrease between paragraph 5 and all

other paragraphs, but not between any other paragraphs.

Question 2a: Will the readers' fluency decrease as the
readability indexes of the passages increase?

H The mean of the General Impression of Fluency
scores for any paragraph will be less (indicating greater
fluency) than the mean of the General Impression of Fluency
score for any paragraph with a higher readability.

Ho The mean of the General Impression of Fluency
scores for any paragraph will not be less than the mean of

the General Impression of Fluency score for any paragraph
with a higher readability.

Table IV-7 presents the means for the "General

Impression of Fluency" ratings.

Table IV-7

MEANS OF GENERAL IMPRESSION OF FLUENCY SCORES

Paragraph 1 2 3 4 5

Means 2.32 2.09 2.28 2.34 2.6

Analysis of Variance indicated there were significant
differences in the means of these scores. When the Scheffe
Test was used to make post-hoe, pairwise comparisons,
significant differences were found between paragraphs 1 and
5, 2 and 5, and 3 and 5. The decision was to accept the
null hypothesis for 7 of the 10 contrasts, and to reject the
null hypothesis in favor of the alternate hypothesis for

paragraphs 1 vs. 5, 2 vs. 5, and 3 vs. 5. It was concluded

116
that a significant difference in the fluency scores did
occur between paragraphs 1 and 5, 2 and 5, and 3 and 5 but

not between any other paragraphs.

Question 3a: Did the readers' miscues occur randomly
or were there predictable patterns?

Table IV-8 presents a graphic representation of the
frequencies of acceptable and unacceptable miscues for the
first grade paragraph. Likewise Tables IV—9, IV-lO, IV-ll
and IV-12 present the same information for paragraphs 2, 3,
4, and 5 respectively. The circled numbers indicate the
sentence within the paragraph. The other numbers indicate
the position of each word within the paragraph, followed by
the word. The solid squares indicate the number of readers
who made unacceptable miscues on the word, while a square
with an x in it represents the number making acceptable
miscues. The first word in the first sentence in Paragraph
#1, for instance, was "Something". Four readers made an
unacceptable miscue on this word and three others made an
acceptable miscue. Squares between words represent

insertion miscues.

117

Table IV—8

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #1

Unacceptable Miscue I

Acceptable Miscue x

 

GI Souething II!“ 37 around 73 comes XX
X
2 is X J! you X 74 out.
XX
3 all 39 when @ 75 Many In!
4 Iround 40 1: IXXXX 76 things X
X
5 you XX 41 blows XX 77 use I
I X
6 I: IIXXXX 42 hard. II 7! air. EEEXXX
XXX
7 all xx .43 You @ 79 um. sex:
8 times 44 can XX 80 windmills IIXXX
I
@9 You 45 Ecol 31 and
10 cannot_ XXX! 46 it 82 tooth-11s X
X
U. see ‘7 then. X I} use XXX:
12 it. @4! Sometimes X 84 sir. X
@13 But As you @ as Ssllbosts xx
14 sometimes IIIuX 50 can 86 Ila XXXXXX
15 you 51 feel 37 pushed XX
XXXXX
16 can X 52 st: X 88 by
XX
17 feel 53 from XX 89 sir.
f"
13 it. 54 s XXXXX @ 90 Kites XXX
@19 Without IXXX 35 balloon 91 are ﬂXX
20 u xxxxx 56 now IX 92 up: 1
21 nothing XXXX 57 up I 93 up XXXXXXXX
X
22 can 53 s X 94 by XXXXX
X XXXXXXX
23 live IX 59 bllloun X 95 air.
X
@2A Do xxxxx @ so Then xxxxx @ 96 u: x
X X
25 you XX 61 let I 97 helps X
X xxxxxxxxxxxxx x
26 know X 62 so 93 to XXXX
' XXXX
27 what XXXXX 63 of XXX 99 keep X
X XXX
28 it XX 64 it: XXXXXXXX 100 airplanes X
29 is X 65 mouth 'IX 101 in X
@30 I: IXXXXXXXXXXX 12 66 Can XXXXXXXXXXX 102 the
31 is 'XXKXXXXXXX 67 you XXXXXXXXXX 103 lky
XXXX X
12 air 68 feel 104 too. XXX
X
@321 You 69 the
34 know x 70 air
35 it XXX 71 as XXKX
.16 ‘15 XX 7: it XX
XX X

 

 

 

Insertion miscues appear between words.

Circled numbers indicate sentence number.
Uncircled numbers followed by a word identify each

word in the paragraph.

 

118

Table IV-9

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #2

 

Unacceptable Miscue I Acceptable Miscue x
(91 Long 37 rocks. XXX ‘ 73 the
1 ago @ 33 5* 33 74 rock: xxxxxxx
3 many X 3’ 5"”‘9 IXX 75 had X
4 large XX 40 ‘1‘" 76 been X
5 rocks X ‘1 5" X 77 in j
e ' 1., xxx: ‘1 “e “In 75 the s
7 all XXXXXXXXXXX 43 aides XXXXXXXXXXXXXXX 79 way. I
8 over XXX 2‘ °‘ 80 But X
9 the XX ‘5 ‘9‘ u 81 soon x
10 lround ‘6 (1414- g 82 they XX
(all There IXX ® ‘7 I. g 83 helped
12 was ruxxxxxx ‘8 ndc u :5.
‘3 ‘ Ix ‘9 ‘ XXX 85 farmer. lxxx
M farmer l 5° R“- : ® 36 - And xxx:
15 who 51 ‘7‘ 87 the X
15 "and 52 the xxxxxxxxxxxxxxxxx as bun”, aaxx
17 =° ’3 '°"“~ ‘ 59 rock xuxxxxxux
1‘ 3"?" ® 5‘ Th.“ XX 90 fence
I
19 um. : 55 ‘11 u“ H made s
20 on 56 the X! 92 the xxxxxxxx
21 the xxxxxxxxx 57 other I! ,3 11.14
22 1.34, 55 (armors V 9‘ more
©13 3'“ x 59 could XXXX 95 beautiful. X
24 nothing 60 see
25 would 51 when XXXXX
26 3”" x 61 his XXXXXX
27 where xxxxxxxxxxxxxxxxxx 63 11.14 x
23 the IXXX 6‘ was
29 reeks X @’ 65 Flower: l
30 were. [XXX 66 grew
@31 So 57 along Ix
32 h- ;x 68 the
33 scarggd 69 rock XXXXXXXX
34 picking XX 70 fence. XXXX
35 up i E 71 A: IXXXX
XX
36 the XX 7: first

 

 

 

Insertion miscues appear between words.
Circled numbers indicate sentence number.

Uncircled numbers followed by a word identify each
word in the paragraph.

 

119

Table

IV-lO

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #3

Unacceptable Miscue l

Acceptable Miscue x

 

3:1 Let's
2 pretend
J you're
4 running
3 a
6 too
(:D7 In
5 your
9 too
10 you
11 have
12 four
13 tiger!
14 but
15 only
16 one
17 polar
15 hear
@19 You're
20 lucky
21 to
22 have
23 the
24 tigers.
@25 Very
26 {av
27 tigers
Z! are
29 born
30 in
31 zoos
32 but
33 two
)4 were
35 born
36 in

 

X
XX
XX

X
IIIIII
XX

KN

X

IXXXXXX

XXXXXXXXXX
XXXXXXX

XX

17 your

38 too

39 a

40 year

41 ago.
@ A: an

4! you're

44 unlucky

45 to

46 have

47 only

4! one

49 polar

50 bear.
6) 51 x:

52 isn't

5] much

54 fun

55 for

56 people

57 who

5! come

59 to

60 your

61 zoo

62 to

63 watch

64 one

65 lonely

66 polar

67 bear.
(2) 68 I!

69 you

To know

71 how

<4

XXXXXXXXXXXXXXXXXXX 73
74
X
XX 75
XX 76
77
78
‘XXX 79
IIIIXXXXXXX 60
XX ll
X 82
XXXX 83
X 34
(I 85
Q...
XXX 87
IXX 88
89
XX
XX 90
XX 91
X 92
X
n_ 93
XX (2) 94
XX 95
XXXXX 96
97
X
XX 98
99
XXXXXXI
XXXXXXXXXXXXX 100
IIIXXXXXXX 101
III 102
103
X
X
XX

you'll
look
around

for

that

pants

tiger.
Maybe
you
can

trade

you
have
trading
in

your

blood

“MN

IXXXXXXXXX
XX

XXX

X

XXX

X
IIIIIIIIIXXXXXX
X

I

XXX

[XX

IIIIIXX

 

Insertion miscues appear between words.

Circled numbers indicate sentence number.

Uncircled numbers followed by a word identify each
word in the paragraph.

 

 

120

Table IV-ll

FREQUENCIES OF MISCUES
OCCURRING ON EACH WORD
IN PARAGRAPH #4

 

 

Unacceptable Miscue I Acceptable Miscue x
6‘11 Maybe II 37 fl” 7! a
XXXXXX
2 you XXXXX 38 over 74 clip.
XX
3 already XXX 39 and IIIIXXXXX <73 75 Have IUXXXXXXXXXXX
I
4 know XX 40 stick X 76 you IXXXXXXXXXXX
5 how XX 41 to IXXX 77 ever XXX
X
6 magnets IXX 42 the XXXX 7B tried X
7 work. XX 43 magnet. IX 79 that! XXX
Q)! 11 x G) u nu. xx @ so haybe
9 you xxx 45 happen! X 81 not
X
10 were XXXXX 46 because 32 but XX
11 to XXXXX 47 paper 33 you
12 hold IXXXXXX 48 clips XX 34 know X
IXX
13 a XXX 49 are XX 85 that XXXXXX
. I
14 magnet X 50 made 86 there XX
X
15 near IX 31 of XXX 57 is XX
16 a X 52 iron. I II no "I
17 paper ® 53 And XX 39 iron
13 clip 54 anything XXXX 90 in XXXXXXXXXX
XXX
19 on Ill 55 made 91 paper.
X
20 your IXX _ 56 of X ® 92 And XXXXXXX
21 deck 57 iron 93 because
XX
22 you'd XXXXXXXXXX 53 stick: 94 of XX
23 know 59 to XXXX 95 this X
XXXXXXXX
24 what XXXXXXXX 60 magnets XXXXXXXXXX 96 you
25 u xxx @ 61 What am 97 :«1
26 expect Illlllllll 62 would X 93 sure X
X
@27 When JXX 63 happen I 99 that XX
XX XX
23 the XXXX 64 if X 100 paper
29 magnet 65 you 101 will XXX
X
30 got XXXXXX 66 used X 102 not
X
31 close XX 67 a 103 stlek
XXXXXXXXXXXXX
32 the XXXXX 68 piece X 104 to
X
33 paper 69 of 105 the XXXXXXXXXXX
.
34 clip X 70 paper 106 magnet X
XXX
35 would XX 71 instead III
36 suddenly Ill. 72 o! X

 

Insertion

Circled numbers

Uncircled numbers followed by a

word

miscues appea

indicat

in the paragraph.

r between words.

e sentence number.

word identify each

 

 

121
Table IV-lZ
FREQUENCIES OF MISCUES

OCCURRING ON EACH WORD
IN PARAGRAPH #5

 

Unacceptable Miscue I Acceptable Miscue x
@1 The 37 facing neeeaelxxx 73 H X
2 an: 35 m. 7‘ in xxxxxxxxxxxxx
3 liIC ' 39 street 75 ' X
5 you 50 or x 76 new IXXX
5 are l H nayhe IX 77 building XX
6 going £2 none X 75 there IX
7 into xxxxxxxxxxxxxx 43 u xx 79 uy IXXXX
a a xxx 56 .11 x no he
x xxxxxxxxxxx
9 bank ® ‘5 The xx ll huge . nnxxxxxxxx
1° '9’? 56 front. 82 windows ix
xxx
ll a XX 67 of XX ® 83 Through
12 linute . 53 the xxx 3‘ then
13 before X I a, hank .7 85 you
1" 7°“ 50 will X 86 can X
15 push 51 aeen Xx ‘7 '7‘
xx
16 open I 32 . x 38 the XXX
17 the xxxxxx 53 solid nnaaaaaeaxxxxx G9 hank'a xxxxx
18 heavy 5‘ [zone XXX 90 workera [XX
19 door. 55' nu. x 51 and an
@20 Look I ,9 55 “m,“ “x 92 customer! :lllllllllll
21 a: 57 n. x 93 u XXX
22 the X 53 door x 9‘ van X
23 building. 59 you 95 an IXXX
@2‘ 1: xx so nay x 96 a xxxxxxxxxxx
:5 1: xx 51 ... 97 uniforned unnunllnnauaaaxx‘
26 in 52 . 93 guard IIXXX
xx
27 an [XX 63 5..“ 99 who X
25 old M guard IIIIIIXX 100 h
29 but X _ 65 whole lanxxxxxxxxxxx 101 no:
30 there in 66 “not. IlllllllllllllllllllIIXXXX 102 und- IIIIIXXX
31 will 67 includes IIIIIIIIIIIIIIXXXXX
32 be I as a xx
33 only XXXX 69 gun.
34 . § ‘3 7o 1:
35 f" 71 your xxxmxxxxxxxxx
36 windows 72 hank

 

 

 

Insertion miscues appear between words.
Circled numbers indicate sentence number.

Uncircled numbers followed by a word identify each
word in the paragraph.

 

122

As the graphs show, some words were never involved in a
miscue while other words found 20%, 30%, 40% and even 50% of
the readers miscuing. As the graphs also show, miscues
tended to cluster at certain places in certain sentences.
Based on these observations it was concluded that miscues
did not occur randomly, but rather tended to cluster on
certain words and in certain parts of sentences in

identifiable patterns.

Descriptive Miscue Analysis

It is not the purpose of this study to provide an
indepth analysis of the type of miscues made by the readers,
however, some of the text conditions that were associated
with their miscues and seemed to be triggering them, were
strikingly similar to those reported by Laura Smith (1976)
as a part of the miscue research project. Because they have
implications for readability study, they could not be
dismissed without comment.

Generally, the miscues observed in this study could be
classified in three categories. In the first category were
miscues of little or no consequence. For instance, readers
would consistently substitute a contraction for the two
words for which it stood. They omitted articles or added
them or substituted one for another, and the "s" at the end
of a word was often disregarded, with negligible or no
effect on meaning. In paragraph two, for example, the

sentence "He made a fence of the rocks.‘ was often read "He

 

123
made a fence of rocks." or "He made the fence of rock."
The sentence "He carried them to the sides of the field."
was read "He carried them to the side of the field" or "to
sides of the field."

Category one miscues were always acceptable miscues and
while the reader's production was not the same as the text,
the text could have just as well been written as the reader
read it. In fact, in some instances, the reader's miscues
actually seemed to produce a better flowing, easier to read
version. For example, in the first grade paragraph, in the
sentences "Do you know what it is? It is air!", the

repetition of the words 'it is" gave the reading an awkward

and unnatural cadence. Readers consistently substituted a
contraction for the second "it is", which produced a
smoother flowing text.

In the second category of miscues were those that
occurred because, even though it seemed obvious that the
reader had sufficient word recognition skills to identify
all of the words in the passage, certain conditions in the
text, or in the reader's ability to handle those conditions,
seemed to repeatedly interfere with the reader's processing
strategies. Generally, because they did have sufficient
word recognition abilities, the readers were able to recover
from these situations with little or no serious damage.
These miscues, however, did affect the reader's speed and
general fluency and, in some cases, when they were not

corrected, they had implications for the reader's

 

124

understanding of the text.

Most of the conditions involved in these category 2
miscues were similar to those observed previously by Laura
Smith (1976) in her work with the reading miscue research
studies. Unfamiliar grammatical function or meaning of a
word, unfamiliar phrases, and unfamiliar word order,
especially the use of rhetorical questions, accounted for
most of these miscues.

In this study there were several instances where a
familiar word was used with an unfamiliar meaning, and while
the readers did not miscue on the word itself, they made
insertions or deletions to make the meaning conform to the
one they knew. As Smith has noted, for instance, young
readers seem to be more familiar with a word when it is used
as a noun rather than a verb. This was evident when
children were reading, the latter part of sentence 3 in
paragraph 4, "When the magnet got close, the paper clip
would suddenly flip over and stick to the magnet". This

sentence was frequently read .the paper clip would

suddenly flip over the stick to the magnet" or "the stick of

the magnet" or, in one instance "the stick of metal of
magnet." Evidently these readers anticipated that "stick"
would be a noun, and therefore inserted the word "the" in
front of the word "stick" to make it a noun. This of course

meant that the rest of the words in the sentence did not
make sense and the reader was forced to reevaluate the

situation and decide how best to proceed.

 

125
The first sentence in paragraph 3 presented readers
with the word "running". All of the readers read the word
correctly, but, because the word was used with a meaning
evidently unfamiliar to some, a preposition was consistently
inserted to make the word conform to the meaning more common
to the readers. Therefore, "Let's pretend you're running a

zoo. was read as "Let's pretend you're running in a zoo" or
"running to a zoo" or "running at a zoo", or even, in one
case, "running on a zoo". In this situation there was
nothing to alert the reader that a miscue had occurred, and
the miscue was seldom corrected, however it did have
implications for the reader's understanding of the passage.

The unfamiliar phrase "you have trading in your blood"

in the last sentence of paragraph 3, also caused problems

for many readers. Several students read this phrase as "you
have traded in your blood." Most readers seemed to
understand the concept of "trading something in". Their
families had no doubt traded in cars or appliances. But

they did not understand what it meant to have something "in
your blood", so they substituted the concept they did
understand. It was apparent, however, that they still could
not understand why anyone would want to "trade in their
blood". This prompted some to reprocess the phrase and
sometimes correct the miscue. However, they still indicated
a lack of understanding through their hesitancy and

questioning tone.

126

Unfamiliar word order consistently caused readers

difficulty. In paragraph 2, in the sentence But nothing

would grow where the rocks were.", one third of the readers
read "where" as "there". Not only did this make perfect
sense at the time the miscue was made, but the word "where"
came at the end of a line of type, making it appear even
more likely to be the end of the sentence. The physical
position of the word on the page, its graphic similarity to
the actual word and its perfect sense undoubtedly accounted
for the high frequency of miscue. Once made, however, it
left the reader trying to figure out what to do with the

words the rocks were". Most read these words with an

intonation that would suggest they thought they were part of

another sentence "There the rocks were.’ Some tired to

make the rocks were' part of the next sentence, but,
interestingly, very few went back to correct the miscue.

On the first grade paragraph, the sentences "Blow up a

balloon. Then, let go of its mouth." presented readers with
an unfamiliar word order. The readers repeatedly read the
second sentence "Then let it go." or "Then let go of it."

This of course left remaining words which did not make sense
and the readers were forced to cope with the situation in
various ways.

In paragraph 4, in the sentence "When the magnet got

close, the paper clip would suddenly flip over and stick to

n

the magnet... , readers repeatedly ignored the comma and

inserted "to" at the end of the opening clause, so it read

 

127
"When the magnet got close to the paperclip...". This again
left the reader with words that made no sense. Many readers

simply went on, their intonations suggesting that they may

have made a covert correction, while others fumbled and

tried to recover. One reader inserted an "it" to make the
sentence read"...it would suddenly flip over". Another
inserted "what" to make a question "...what would suddenly

flip over... and one inserted "you", to make it read you
would suddenly flip over and stick to a magnet."

Authors of children's texts frequently insert
questions, presumably to increase the reader's involvement
and thereby heighten their interest. This seemed to be the
case with some of the passages used in this study. These
questions, however, usually produced a high incidence of
miscue. "Do y0u know what it is?" and "Can you feel the air
as it comes out?" in paragraph 1, and "Have you ever tried
that?" in paragraph 4 were often converted to statements.
In the first sentence, the "do" was typically omitted,

although this miscue was often corrected soon after it was

made. In the second sentence, the words Can you were
usually reversed to make a statement. This miscue was
usually not corrected, probably because it made perfect
sense as it was. In the third sentence the words "Have you"
were usually reversed also to make a statement, but in this

case the results did not make sense. Some readers simply

went on, while others struggled to recover.

 

128

The most interesting question, however, was one that
appeared in paragraph 4 and read "What would happen if you
used a piece of paper instead of a clip?". This sentence
was interesting because, unlike the other questions, it did
not produce many miscues. Evidently, the word "what" at the
beginning of the sentence provided the readers with a
familiar signal of a question and they were better able to
predict the text.

In the third category of miscues, readers began to
encounter situations where they no longer had the capacity
to recover. This usually involved words that they did not
know and did not have sufficient word analysis or contextual
analysis skills to figure out. The reader either had to ask
for help, stop until help was given or make the best attempt
possible even though the results generally made little or no
sense. These situations became very frequent in paragraph
5. They usually occurred on multiple syllable and/or low
frequency words and involved many of the subjects. Category
three miscues were always unacceptable.

The two words that produced the most miscues, involving
50% of the readers, were the words "uniform" and "uniformed"
in paragraph 5. The reader's strategy was almost
universally to treat "un" as a prefix, and they either could
not abandon this strategy, or they knew of none other.

Therefore the 'uniformed guard" typically became an

'unformed" or 'uninformed guard".

 

129

Miscues in this category usually involved the
insertion, deletion or transposition of a letter or letters
to produce another word with a similar visual form even
though that word generally made little or no sense.
"Facing" often became "facting", "solid" became "soiled",
"includes" became "inclouds", "customers" became ”costumers"
and "armed" became "alarmed". This is not to imply that the
subjects in this study had been taught by a sight word
method, but simply that they reached a point where the word
analysis skills they possessed were no longer adequate to

deal with words at this level of complexity.

Summary of Results

The results of this study ultimately took three forms:
(a) Statistical analysis of four measures which would seem
to be logically associated with passage difficulty -
quantity of miscues, rate, quality of miscues, and fluency,
(b) graphic analysis of the percentages of students reading
at each of the functional reading levels and (c) graphic
analysis of miscue frequencies with inspection and
descriptive analysis of specific portions of text involved

in high frequency of miscue.

Summary of Results from Four Measures of Difficulty Table

 

 

IV-l3 summarizes the significant differences found between
paragraphs for the means of (a) word accuracy when all

miscues were counted, (b) rate, (c) word accuracy when only

 

 

unacceptable miscues were counted, and (d) general

'1
impressions of fluency. ' "A1;
-|

 

131

Table IV-13

SUMMARY OF DIFFERENCES
FOUND BETWEEN PARAGRAPHS
0N FOUR MEASURES OF DIFFICULTY

 

e ta e
uenc

 

*difference in direction opposite of that expected

 

 

 

132

As the table indicates, paragraph 5, when contrasted
with other paragraphs, showed the greatest number of
differences in general. The greatest number of differences

in particular occurred between paragraphs 2 and 5. Rate,
unacceptable miscues and fluency scores all suggested a
definite increase in difficulty between these two paragraphs
with paragraph 5 being the most difficult to read and
paragraph 2 being the easiest. Other than this, however,
there seemed to be very little discrimination of difficulty
among paragraphs 1, 3, and 4. There were no significant
differences found between any paragraphs when word
recognition accuracy was based on quantity of miscues only

and the quality of miscues was not considered.

Summary of Results from Functional Reading Levels Table

 

IV-14 presents a graph showing the percentages of students
reading at each of the functional reading levels
(independent, instructional, instructional-frustrational and

frustrational) for each paragraph.

 

 

133

Table IV-14

PERCENTAGES 0F SUBJECTS
READING AT EACH FUNCTIONAL READING LEVEL
N EACH PARAGRAPH

o

 

 

:Etiiﬁfifjfffff}?H11mII

1 2 3 4 S 1 Z 3 4 5 1 2 3 4 5 l 2 3 4 5
Z 6 22 4 8 6 50 40 52 36 36 28 20 36 34 38 16 18 8 22 20
Independent Instructional Instructional Frustrational

Frustrational

134

The greatest percentage of students read at their
instructional level on the third grade paragraph as
expected, however this was also true of all other

paragraphs. There seemed to be very little differentiation
between paragraphs in terms of the levels at which students
were reading. This finding was not surprising, however,
when we consider that these levels were established using
the traditional Betts' criteria, which in turn was based
only on the quantity, and not the quality of miscues.
Since there were no differences between paragraphs in terms
of quantity of miscue, it would not seem unusual,
therefore, to also find no differences between paragraphs

in terms of reading levels.

Summary of ReSults from Miscue Frequency Qata The data
previously presented in this chapter suggested that miscues
did not occur randomly, but rather tended to cluster on
certain words and in certain places in a sentence. Further
analysis suggested that these miscues could be categorized
as either (a) miscues of no consequence, (b) miscues which
seemed to be triggered by factors in the text which
interfered with the reader's processing strategies, but for
which the reader had the capacity to correct, and (c)
miscues for which the reader lacked adequate decoding
strategies and from which the reader could not recover.
Factors in the text which were associated with high

frequency of miscue could be identified. They included

 

135
unfamiliar grammatical function, unfamiliar word meanings,
unfamiliar phrases and unfamiliar word order. These were
not factors traditionally associated with readability

procedures, but they were similar to those reported in

previous studies of miscue analysis.

CHAPTER V

SUMMARY AND CONCLUSIONS

Introduction

In this chapter a summary of the purpose, the design of
the study and the findings will be discussed. Conclusions
based on the analysis and focusing on the degree to which
the study credits or discredits the test-formula matching
practice will be presented. Implications for a)
practitioners and (b) further research will be discussed and

recommendations for further research will be given.

Summary

The purpose of this study was to investigate how
effectively the Reading Test grade equivalency scores from
the California Achievement Tests as a measure of student
reading achievement, and Fry Readability Graph (1968)
scores, as a measure of passage difficulty, would predict
the degree of difficulty a given group of students would
encounter when reading orally from material of varying Fry
determined readabilities. To accomplish this, the subjects
selected for the study were all third grade students with
Reading test scores, from the California Achievement Tests,
falling within a six month range from three months above to
three months below their grade level at the time of testing,
while the passages selected for them to read had Fry

determined readabilities ranging from first to fifth grades.

136

 

137

Thus, the study was designed to hold the reading achievement
of the subjects, as indicated by their Reading test scores,
relatively stable, while the readability scores of the
passages were allowed to vary. If the test scores and the
readability data provide an effective means of matching
readers with materials of suitable difficulty, we would
expect to find very little variation in reading performance
from student to student and considerable variation in
performance from paragraph to paragraph. Generally speaking
this was not the case.

When only quantity of miscues was considered,
performance on all paragraphs tended to be very similar.
There were not statistically significant differences in the
means of the word recognition accuracy scores for each
paragraph, and similar percentages of students tended to
fall in each of the reading levels categories on all
paragraphs. In each case, a small percentage of students,
roughly 5%, were reading at their independent level. A
slightly larger percentage, between 10% and 20%, were
reading at their frustration level. At the instructional
level there was a slight differentiation between the first
three paragraphs and the last two, with approximately 60% of
the readers able to read paragraphs 1, 2 and 3 at their
instructional level and about 45% able to read paragraphs 4
and 5 at this level. Roughly, between 80% to 90% of all
readers were able to read all paragraphs with 90% word

recognition accuracy or better.

 

138

In terms of rate, unacceptable miscues and general
impressions of fluency, there were indications that the
second grade paragraph was the easiest to read, while the
fifth grade paragraph was the most difficult. Differences
were found between paragraphs 2 and 5 on all three of these
measures. In addition, there were significant differences
between paragraph 5 and all other paragraphs when comparing
the means of word recognition accuracy scores when only
unacceptable miscues were considered. There were
significant differences between the means of the reading
rate scores for paragraph 2 and paragraphs 1, 3, 4, and 5,
with paragraph 2 being read faster in each case. Also, more
readers were at their independent reading level when reading
paragraph 2.

The difficulty of the other three paragraphs, 1, 3, and
4, appeared to fall somewhere between paragraphs 2 and 5,
but there was little to suggest any distinction of

difficulty between paragraphs 1, 3, and 4.

Conclusions
In analyzing the findings in this study, the following

conclusions were reached.

1. The readability graph seemed to identify material
within the reader's general range of ability, but did not
seem able to discriminate difficulty as precisely as one or

two grade levels.

 

139

2. In terms of miscues, readers seemed to encounter
similar amounts of difficulty (quantity of miscues) on all
paragraphs. However, there was a decided shift in the type
of difficulty (quality of miscues) they were experiencing,

especially when they reached the fifth grade paragraph.

3. 0n paragraphs with lower readabilities, the type
of difficulty readers seemed to be experiencing appeared to
be due to factors in the text which interfered with their
prediction strategies. This resulted in many miscues, but
they were generally miscues of an acceptable nature, that is
miscues that had negligible effect on meaning or miscues for
which the reader had the capacity to recover and thereby

correct.

4. On paragraphs of higher readability, when readers
began to encounter difficult words with which they were
unfamiliar and for which they lacked adequate decoding
strategies, they did not simply add these miscues to the
types of miscues they had previously been making. Instead,
the quantity of the miscues tended to stay the same but the
quality of the miscues changed, resulting in a lower
proportion of acceptable miscues and a greater proportion of
requests for aid, gross mispronunciations and other miscues

of an unacceptable nature.

5. Because reading levels were established using the
traditional Betts' criteria which is based on the quantity

of miscues, and not quality, and because the quantity of

 

140

miscues tended to remain the same but the quality of miscues
changed, the students reading levels did not provide a

complete picture of the difficulty they experienced.

6. Sentence length did not appear to be associated
with the difficulty the subjects encountered when reading
the passages in this study. Word difficulty, however, did
seem to have an effect. Whether this was a function of
factors measured by the readability graph, or a result of
the vocabulary control used in developing the materials

could not be determined.

7. The places within a passage where miscues occurred
were highly predictable, with many readers miscuing at the
same place in a sentence or on the same word, and frequently

making the same or a similar response.

8. The factors in a sentence that seemed to be
triggering a high number of miscues could be identified.
These factors were not those traditionally associated with
readability formulae, but they were virtually identical to
factors reported by Laura Smith (1976) in previous miscue
analysis studies. In fact when factors reported by Smith
were not identified as a cause of difficulty for readers in
this study, it was simply because the text chosen did not
provide an opportunity to observe them. For instance, Smith
found that direct quotations caused many readers difficulty.
There were no direct quotations in the material used here,

so therefore the readers' response to them could not be

 

141
observed. However, when there was an opportunity to observe
a factor causing difficulty reported by Smith, the
similarity of response by readers in this study was
uncanny. This finding becomes even more significant when
the ten year time gap between the Smith study and this study

is considered.

Discussion

At least five major considerations seem to emerge from
the conclusions of this study. First, most authorities in
the field of reading, including the authors of readability
formulae themselves, have repeatedly stressed that such
devices should only be used as rough estimates of relative
difficulty. The results of this investigation amplify the
importance of such admonishments. Furthermore, many
readability prediction methods only attempt to assign
difficulty in a very general way, such as "below 4th grade"
or in terms of "elementary", "high school" or "college"
levels. The results of this study would suggest that it may
not be possible to predict difficulty with much more
precision than these methods have attempted. In addition,
it must be noted that the type of difficulty readers
experienced on the fifth grade passage in this study was
highly associated with vocabulary load. Since the materials
used in this study were specifically designed for
instructional use, it cannot be determined if the difficulty

was due to factors measured by the readability graph or a

 

142
function of the vocabulary control used in developing the
materials. Only a replication of this study using
selections from children's literature and trade books which
do not use strict vocabulary controls, could ascertain if
the readability graph even predicted general areas of
difficulty.

Secondly, the shift in quality of miscues, but not
quantity of miscues, strongly suggests that readers changed
their processing strategies when they encountered unknown
words. When the words in the text were very familiar, the
readers seemed to use top-down processing strategies,
relying on language and meaning cues to direct their
reading. Factors in the text or in the reader's ability to
deal with those factors, however, seemed to repeatedly
interfere with their prediction strategies, often causing
many miscues, although the reader was usually able to
reprocess the material and recover. When the readers began
to encounter unknown words, however, they were forced to use
word analysis methods and thus shift to bottom-up
strategies. In doing so, they had to attend more closely to
the grapho-phonic cues in the writing. This would explain
why there were fewer miscues on familiar words when requests
for aid, gross mispronunciations and other unacceptable
miscues, caused by unfamiliar words, increased. Such an
explanation would be consistent with and supportive of an

interactive-compensatory model of the reading process.

 

143

Third, this study raises serious questions concerning
the traditional use of the Betts' criteria in establishing a
student's functional reading levels. Such criteria assumes
that, as material becomes more difficult, readers will
simply make more miscues, and does not provide for a
change in quality of miscue, rather than quantity. In this
connection the effect of silent prereading on the quality
and quantity of oral reading miscues needs to be examined
further. It is possible that prereading allows the reader
to work out miscues of an acceptable nature, so that only
more serious miscues appear in the subsequent oral reading.
Under these conditions quantity of miscues might then be
more closely associated with the difficulty readers actually
experienced. This might also be accomplished in unprepared
oral reading, by classifying miscues and giving them
weighted scores based on their seriousness.

Fourth, the uncanny and totally unanticipated
similarity between miscue analysis findings in this study
and those reported ten years earlier by Laura Smith (1976),
suggests that there may be some universal miscue patterns
characteristic at various stages of reading development.
Knowledge of these patterns could be of great use to those
writing for readers of various ages and grade levels.

Finally, because miscues did not occur randomly, and
because it was possible to identify factors in the text
which seemed to be causing many miscues, it would also seem

that, if oral reading was used in developing the criterion

 

144

for the prediction method, it would be possible to develop a
procedure that would better predict oral reading
performance. The process of matching readers with material
might be made even more reliable if oral reading were also

used to measure the reading achievement of the student.

Implications

The results of this study should clearly demonstrate to
reading practitioners as well as authors and publishers,
that the usefulness of current readability prediction
methods is probably very limited. The study largely
discredits the notion that such devices can be used to place
students in materials within one, two or even three grade
levels of their reading achievement, at least for readers of
this age. At the very best, it appears that such procedures
may only be able to identify a general area of difficulty
such as "primary", "elementary", "high school" or "college"
level. Furthermore, since the discernable difficulty in
this study appeared to be closely associated with vocabulary
load, and since the materials involved were specifically
written for classroom use, it cannot be determined if the
difficulty readers experienced was due to something measured
by the readability graph or a function of the vocabulary
control used in developing the materials. Therefore, in the
final analysis, the use of readability prediction devices,
either to place students in material or to check passage

difficulty when writing for a specific audience, appears to

 

145
remain a basically unsupported practice.

The results of this study do support previous research
which has found identifiable and highly predictable factors
in writing which seem to cause difficulty for young readers,
although they are not factors generally measured by
readability formulae. Knowledge of the effect of these
factors, such as word order, unfamiliar word meanings and
usage, and unfamiliar phrases, might aid practitioners and
authors in the selection and writing of material for
children. In this regard, there are many questions raised
by the reader's performance in this study which have
implications for further research. First of all, at what
point do readers develop the abilities necessary to read
material of the type used in this study, with speed and
virtual perfection? Do readers at this stage of reading
development, even have an independent reading level or are
they still unable to read any material with the fluency the
independent level implies? Are their miscues due to prior
instructional practices or the inclusion of material in
their basal reading series which does not expose them to the
situations which caused them difficulty in this material, or
are their miscues a natural part of any child's reading
development? Research guided by these questions could
provide valuable assistance to those selecting and writing
materials for young readers, as well as providing further
insights into the developmental stages involved in learning

to read.

 

146
In addition, while the reading levels concept appears
to be a useful one, the results of this study suggest that
as reading material becomes more difficult, readers do not
simply make more miscues, but instead make miscues of a more
serious nature. Therefore, better criteria for determining
functional reading levels, criteria which considers quality

of miscues as well as quantity, needs to be developed.

Recommendations

As this investigation progressed many questions arose
which suggest recommendations for further research. Such
additional studies might answer questions which resulted
from limitations in this study and might also extend the

scope of this investigation further.

1. The study could be repeated using passages from
children's literature or trade books which have not been
developed specifically for instructional use. This might
help to determine if the difficulty encountered on the fifth
grade paragraph was due to factors measured by the
readability graph, or a happenstance of the vocabulary

control used in developing the materials.

2. The study could be repeated using new versions of
the passages from this study, rewritten to eliminate the
factors which appeared to be causing many miscues. This
would help to determine if controlling these factors would,

in fact, make the material easier for children of this age

 

 

147

to read.

3. Repeating the study with older and younger
children could provide valuable information concerning the

development of children's reading proficiency.

4. Repeating the study with children who have been
given specific instruction and practice with material
containing features which seemed to cause a high incident of
miscues in this study, might help determine if these miscues
are the result of previous instructional practices and
experiences, or if they are a natural part of reading

development.

5. Repeating the study, but giving the readers the
opportunity to preread the passages silently before oral
reading might provide valuable information concerning the
effect of silent prereading on subsequent oral reading

miscues and their relationship to passage difficulty.

6. Data from the present study could be reanalyzed
using a classification and weighting system for miscues.
This might help to determine if reading levels based on such
a procedure would be more closely associated with the other
indicators of passage difficulty (rate, unacceptable miscues

and general fluency) observed in this investigation.

7. Finally, in a more general sense, it would appear
that, since oral reading performance is frequently used as a

measure of readability, more valid methods for predicting

 

148

readability, especially for young readers, could be
developed if oral reading were used to rank the difficulty
of the criterion passages. The use of oral reading in the
development of new readability prediction methods,

therefore, is worthy of research attention.

 

 

APPENDIX A

Letter from Principals to Parents

Parental Permission Slip

 

149

 

 

 

W Bay City Public Schools

9l0 N. Walnut Street 0 Buy City, Michigan 48706

 

 

 

Dear Parent:

Currently one of our Chapter I Reading Teachers, Janet Dixon, is
working on a study concerning readability formulas as part of her
doctoral program at Michigan State University. These formulas claim to
predict the difficulty of reading materials, however their usefulness
is debatable. It is the purpose of Janet's study to listen to children
read material which the formulas say vary in difficulty, and then see
if the children will actually make more errors on the more difficult
selections.

Your child has been selected as a possible subject in this study.
In order to participate your permission will be necessary. Hopefully,
the following information will reassure you and make you feel more
comfortable about giving that permission.

If your child participates, he will be asked to do two things. In
the first session he will take the Reading Test of the California
Achievement Tests. If he is involved in the second session, he will be
asked to read aloud list of words and five paragraphs which will be
tape recorded for later analysis. All participants will take the test
but not all will read the paragraphs. It will take about 45 minutes to
complete the test and about 15 minutes to read the paragraphs.

As a subject, your child will be given a code name. Only the
researcher (Janet) will have a list of the code names and this list
will be destroyed once data collection is completed. You may know your
own child's code name, but you must ask for it at the time of data
collection. Once the list is destroyed there will be no way for
anyone, even the researcher, to identify your child in the study.

Your child will not be used in the study unless he is a willing
participant. The task involved is not a difficult one and should not
cause any undue distress. Your child will be given continuous support
and encouragement by the researcher throughout the project and may
discontinue at any time if he, or the researcher, feels the situation is
too stressful. Such situations will be handled carefully to make sure
the child feels positive about the experience even if he decides to
decline or discontinue, and there will be no penalty for such a
decision.

 

 

 

150

 

 

In order to help Janet complete the list of subjects, please
return the attached permission slip to your child's teacher as soon as
possible. Return the slip even if you decide not to have your child
participate. This will make a follow-up letter unnecessary.

If you have any further questions, Janet or I will be more than
happy to discuss them with you. You may contact us at the following
numbers:

Janet Dixon
Home: -
Elementary Center:

, Principal
Elementary School:

If we are not in, please leave your name with the secretary and we will
return your call.

Thank you for taking the time to read this letter, for giving this
matter your consideration and for returning the permission slip. Of
course, the most important thing in this study will be the children who
participate. We are hoping your child will be among them.

Sincerely,

, Principal
Elementary School

 

 

 

151

 

 

To Whom It May Concern:

My child

 

has my permission to participate in the study being conducted by Janet
Dixon concerning formulas used to predict the difficulty of reading

materials.

Parent's

Signature

 

Do you wish to know your child's code name? Yes No

If you do not wish to have your child participate, please check here:

(A signature is not necessary in this case)

Please return this entire sheet to your child's teacher.

 

 

 

 

 

APPENDIX B

The Research Passages

 

152
HOW IS THE AIR TODAY?

Something is all around you at all times. You
cannot see it. But sometimes you can feel it.
Without it nothing can live. Do you know what it
is? It is air. You know it is around you when it
blows hard. You can feel it then. Sometimes you
can feel air from a balloon. Blow up a balloon.
Then let go of its mouth. Can you feel the air as
it comes out?

Many things use air. Tires, windmills and
footballs use air. Sailboats are pushed by air.
Kites are kept up by air. Air helps to keep

airplanes in the sky too.

Orange 15, SRA Reading Lab Ic
Science Research Associates, Inc., 1981

 

153

A ROCK FENCE

Long ago many large rocks lay all over the
ground. There was a farmer who wanted to grow
things on the land. But nothing would grow where
the rocks were. So he started picking up the
rocks. He carried them to the sides of the field.
He made a fence of the rocks. Then all the other
farmers could see where his field was. Flowers
grew along the rock fence. At first the rocks had
been in the way. But soon they helped the farmer.
And the farmer's rock fence made the field more

beautiful.

Aqua 11, SRA Reading Lab Ic
Science Research Associates, Inc., 1961

 

154

WANT TO TRADE A TIGER?

Let's pretend you're running a zoo. In your
zoo you have four tigers but only one polar bear.
You're lucky to have the tigers. Very few tigers
are born in zoos but two were born in your zoo a
year ago. But you're unlucky to have only one
polar bear. It isn't much fun for people who come
to your zoo to watch one lonely polar bear. If you
know how to run your zoo, you'll look around for a
zoo that wants a tiger. Maybe you can trade for a
polar bear. Like every zoo man you have trading in

your blood.

Brown 5, SRA Reading Lab Ic
Science Research Associates, Inc., 1961

155

FUN WITH MAGNETS

Maybe you already know how magnets work. If
you were to hold a magnet near a paper clip on your
desk, you'd know what to expect. When the magnet
got close, the paper clip would suddenly flip over
and stick to the magnet. This happens because
paper clips are made of iron. And anything made of
iron sticks to magnets. What would happen if you
used a piece of paper instead of a clip? Have you
ever tried that? Maybe not, but you know that
there is no iron in paper. And because of this you

feel sure that paper will not stick to the magnet.

Brown 14, SRA Reading Lab Ic
Science Research Associates, Inc., 1961

 

156

BANKS ARE INTERESTING PLACES

The next time you are going into a bank stop a
minute before you push open the heavy door. Look
at the building. If it is an old bank there will

be only a few windows facing the street or maybe

none at all. The front of the bank will seem a
solid stone wall. Through the door you may see a
bank guard whose uniform includes a gun. If your

bank is in a new building, there may be huge
windows. Through them you can see the bank's
workers and customers as well as a uniformed guard

who is not armed.

Green 12, SRA Reading Lab Ic
Science Research Associates, Inc., 1961

 

 

APPENDIX C

The Fry Readability Graph

 

157

 

 

GRAPH FOR ESTIMATING READABILITY —EXTENDED

Average number of syllables per 100 words
108 112 116 120 124 128 132 136 140 144 148 152 156 160 164 168 172 176 180 182
2

20.0
16.7
14.3
12.5
11.1
10.0

Average number 0! sentences per 100 wovds
III
N

 

2.
108 112 116 120 124 128 132 136 140 1“ 148 152 156 160 154 168 172 176 180 132

 

DIRECTIONS: "' .. u a. "cl-s. Plol average
n - am ‘00

 

,
umber ‘

mine lhe grade level ol lne maternal

, L I

Wun .- glean vunaunuy
,.

am .nualm
s r r ,

Count proper nouns. numevals and un-nalxzanons as words. Counl a syllable let each
symbol. For example, “1945" IS 1 word and 4 syllables and "IRA“ IS lwovo and 3 syllables.

EXAMPLE: SYLLABLES SENTENCES
1st Hundred Words 124 6 6
2nd Hundred Welds m1 5 5
3rd Hui-loved Words 158 6 8
AVERAGE Ni 6.3

READABILITY 7m GRADE (see not planes on graph)

For further information and validity data. see Edward Fry. “Fry's Readability Graph:
Clariﬁcations, Validity. and Extension lo Level 17." Journal uchading (December 1977).

 

 

 

 

APPENDIX D

Formulae and Computational Procedures
ANOVA

Scheffe Post-Hoe Comparisons

 

 

158
Formula and Computational Procedures

ANOVA

The following computational procedure for Analysis of

Variance for Single Factor Experiments with Repeated
Measurers of the Same Elements was used in this study to
determine if mean differences did exist. The procedure was

taken from Winer (1971, p. 268).

K=number of treatments X=an individual score
n=number of subjects in a P=the sum of scores of all
treatment group treatments for one subject
T=the sum of all scores G=the sum of all scores for
for one treatment all treatments
subscript j=all treatment subscript i=all subjects
groups (1 to 5) (l to 50)
2 2 2 2
I=G /Kn Iiazzx III=(ET )/n IV=GEP )/K
Source of SS df
Variation (Sum of Squares) (Degrees of Freedom)
Between People SSB = IV—I n-l
Within People SSW = II—IV n(K-1)
Treatments SST = III-I K—l
Residual SSR = II-III-IV+I (n-1)(K-l)
Total SSTO = II—I Kn-l
MST=SST/df=SST/K-1 MSR=SSR/df=SSR/(n-1)(K—l)
MST
F = -----
MSR

The critical value for the F ratio is taken from the tables
for the F Distribution for K-l and (n-1)(K-1) degrees of
freedom. A significance level of .05 was used in this
study. If the computed F value exceeded the critical F
value, the null hypothesis was rejected and it was assumed
that there were differences in the means.

 

159
Formula and Computational Procedures
Scheffe Test for Post-Hoe Comparisons

When analysis of variance indicated mean differences
did exist, the Scheffe Test for Post-hoe comparisons was

used to determine where differences occurred. The formula
and computational procedures used were taken from Hinkle,
Wiersma and Jurs (1979, p. 276—280). When used with the

ANOVA for repeated measures, MSR takes the place of MSW
(Winer, 1971, p. 270).

The formula used for each set of contrasts was

 

2
(M1 — M2)
F:
MSR (l/nl + 1/n2)
where M1 = the means of the first contrast, n1 = the number
of scores in that mean, M2 = the mean of the second contrast
and n2 = the number of scores in the second contrast.

The critical value for F used in the Scheffe is the
critical value used in the ANOVA multiplied by K-l where K
is the number of groups. Therefore the critical value for F
used in the Scheffe tests in this study was (2.45)(4) = 9.8.

 

APPENDIX E
Summary of Computations
Word Recognition Accuracy Scores
Based on Total Number of Miscues

Reading Rate Scores

Scheffe Post-Hoe Comparisons
Reading Rate Scores

Word Recognition Accuracy Scores
When Only Unacceptable Miscues Were Counted

Scheffe Post-Hoe Comparisons
Word Recognition Accuracy Scores
When Only Unacceptable Miscues Were Counted

General Impression of Fluency Scores

Scheffe Post-Hoe Comparisons
General Impression of Fluency Scores

 

160

Summary of Computation
for
Word Recognition Accuracy Scores
Based on Total Number of Miscues

Totals T1 T2 T3 T4
4696 4720 4718 4690 4
Means 93.92 94.4 94.36 93.8 9
2 2
G = 23470 22X = 2207554 2P = 11026292 K=5
2 2

I = G /Kn = (23470) /250 = 2203363.44

2
II =221X = 2207554

2
III = (2T )/n = 110171755/50 = 2203435.1

2
IV = (2? )/K = 11026292/5 = 2205258.4
Source of SS df
Variation (Sum of Squares) (Degrees of Free
883 (people) = IV-I = 1894.96 n-l =
SSW (people) = II-IV = 2295.6 n(K-l) =
SST (treatment) = III-I = 71.66 K-l =
SSR (residual) = II-III—IV+I = 2223.94 (n-1)(K—1)=
SSTO (total) = II-I = 4190.56 (Kn)-1 =

MST
MSR

= SST/K-l = 71.66/4 = 17.915

= SSR/(n-1)(K-1) = 2223.94/196 - 11.3466326
MST 17 915

——— = ——————————————— = 1.57888252

MSR 11.3455326

Critical .05 F (4,196) = 2.45

F < Critical F; Therefore accept null.

Assume there are no differences.

T5
646
2.92

dom)

49
200
4
196
249

 

161

Summary of Computation
for
Reading Rate Scores

Totals T1 T2 T3 T4 T5
4648.8 5546.5 4999.9 4908.6 4717.9
Means 93.976 110.93 99.998 98.172 94.358

2 2
G = 24871.7 22X = 2588508.22 2P = 12808179.5 K=5 n=50

2 2
G /Kn = (24871.7) /250 = 474405.82

H
II

2
II =>:>:X = 2588508.22
2

III = GET )/n = 124194314/50 = 2483886.28
2

IV = GEP )/K = 12808179.5/5 = 2561635.91

Source of SS df
Variation (Sum of Squares) (Degrees of Freedom)
SSB (people) = IV-I = 87230.09 n-l = 49
SSW (people) = II-IV = 26872.31 n(K—1) = 200
SST (treatment) = III-I = 9480.46 K-l = 4
SSR (residual) = II-III—IV+I = 17391.85 (n-1)(K-1)= 196
SSTO (total) = II—I = 114102.4 (Kn)-1 = 249

MST = SST/K-l = 9480.46/4 = 2370.115
MSR = SSR/(n—1)(K-1) = 17391.85/196 = 88.7339285

MST 2370.115
F = ——- = ——————————————— = 26.710358
MSR 88.7339285

Critical .05 F (4,196) = 2.45

F > Critical F; Therefore reject null.
Assume there are differences.

 

162
Summary of Computation
for

Scheffe Post-Hoc Comparisons
Reading Rate Scores

n1=n2=n3=n4=n5=50 MSR = 88.734 K=5

M1=First mean to be contrasted
M2=Second mean to be contrasted

Critical F (from ANOVA) = 2.45
Critical F for Scheffe Test = 2.45(K-1) = 9.8

2
(M1 - M2)

MSR (1/50 + 1/50)

Contrasts Means Computed F Significance
Paragraph 1 vs 2 *93.976-110.93 80.991 *yes
1 vs 3 *93.976—99.998 10.249 *yes
1 vs 4 *93.976-98.172 4.961 no
1 vs 5 *93.976-94.358 .041 no
2 vs 3 101.93-99.998 33.674 yes
2 vs. 4 101.93-98.172 45.862 yes
2 vs 5 101.93-94.358 77.383 yes
3 vs 4 99.998-98.172 .939 no
3 vs 5 99.998-94.358 8.963 no
4 vs 5 98.172—94.358 4.0986 no

*indicates contrasts in which the first element is
smaller than the second. If significance is found it
suggests that differences existed between the means but in a
direction Opposite of that which would be expected.

 

163

Summary of Computation
for
Word Recognition Accuracy Scores
When Only Unacceptable Miscues Were Counted

Totals T1 T2 T3 T4 T5
4964 4979 4951 4957 4868
Means 99.28 99.58 99.02 99.14 97.36
2 2
G = 24719 221 = 244795 2? = 12221529 K=5 n=50

2 2
G /Kn = (24719) /250 = 2444115.74
2

H
II

II =sz =2444795
2

III = (2T )/n = 122213408/50 = 2444268.16
2

IV = (2P )/K = 12221529/5 = 2444305.8

Source of SS df
Variation (Sum of Squares) (Degrees of Freedom)
SSB (people) = IV-I = 190.06 n-l = 49
SSW (people) = II-IV = 489.2 n(K-l) = 200
SST (treatment) = III—I = 152.42 K-l = 4
SSR (residual) = II-III-IV+I = 336.78 (n-1)(K-1)= 196
SSTO (total) = II-I = 679.26 (Kn)-1 = 249

MST = SST/K—l = 152.42/4 = 38.105

MSR = SSR/(n—1)(K-1) = 336.78/196 = 1.7182653
MST 38 105

F = ——— = —————————————— = 22.1764357
MSR 1.7182653

Critical .05 F (4,196) = 2.45

F > Critical F; Therefore reject null.
ASSume there are differences.

 

Summary of Computation
for

164

Scheffe Post-Hoe Comparisons

Word Recognition Accuracy Scores

When Only Unacceptable Miscues Were Counted

n1=n2=n3=n4=n5=50

MSR = 1.718

M1=First mean to be contrasted
M2=Second mean to be contrasted

Critical F (from ANOVA) =

Critical F for Scheffe Test = 2.45(K-1)

F:
Contrasts
Paragraph 1 vs 2 *99
1 vs 3 99
1 vs 4 99
1 vs. 5 99
2 vs. 3 99.
2 vs. 4 99
2 vs 5 99
3 vs 4 *99
3 vs 5 99
4 vs 5 98

(M1

Means

.28-99

*indicates contrasts in
smaller than the second.
suggests that differences existed between the means but in a
direction opposite of that which would be expected.

.28-99.
.28-99.
.28-97.
58-99.
.58-99.
.58-97.
.02-99.
.02-97.
.14-97.

- M2)

MSR (1/50 + 1/50)

which the first
If significance

2.45

Computed F
.58 1.304
02 .9797
14 .284
36 53.426
02 4.545
14 2.8057
36 71.426
14 .2087
36 39.936
36 45.918

K=5

Significance

no
no
no
yes
no
no
yes
no
yes
yes

element is
is found it

 

165

Summary of Computation

for

General Impression of Fluency Scores

Totals T1 T2 T3 T4 T5
116 104.5 114 117 130
Means 2.32 2.09 2.28 2.34 2.6
G = 581.5 22X = 1485.249 2? = 7209.24987 K=5 n=50
2
I = G /Kn = 1352.56699
2
II =ZZX = 1485.2499
2
III = (2T )/n = 1359.22499
2
IV = (2P )/K = 1441.84997
Source of SS df
Variation (Sum of Squares) (Degrees of Freedom)
SSB (people) = IV-I = 89.282977 n-l = 49
SSW (people) = II-IV = 43.399926 n(K-l) = 200
SST (treatment) = III-I = 6.657997 K-l = 4
SSR (residual) = II-III-IV+I = 36.741929 (n-1)(K-1)= 196
SSTO (total) = II—I =132.682903 (Kn)-1 = 249
MST = SST/K-l = 6.657997/4 = 1.66449925
MSR = SSR/(n-1)(K-1) = 36.741929/196 = .187458821
MST 1.66449925
F = -—- = —————————————— = 8.87927939
MSR 36.741929
Critical .05 F (4,196) = 2.45

F > Critical F;

Therefore reject null.
Assume there are differences.

 

166

Summary of Computation
for
Scheffe Post-Hoe Comparisons
General Impression of Fluency Scores

n1=n2=n3=n4=n5=50 MSR = .18746 K=5

M1=First mean to be contrasted
M2=Second mean to be contrasted

Critical F (from ANOVA) = 2.45
Critical F for Scheffe Test = 2.45(K-1) = 9.8

2
(M1 - M2)
F: ________________
MSR (1/50 + 1/50)

Contrasts Means Computed F Significance
Paragraph 1 vs 2 *2.32—2.09 7.0721925 no

1 vs 3 *2.32-2.28 .2139 no

1 vs 4 2.32-2.34 .0535 no

1 vs 5 2.32—2.6 10.481283 yes

2 vs 3 2.09-2.28 4.826 no

2 vs. 4 2.09-2.34 8.3556 no

2 vs 5 2.09-2.6 34.7727 yes

3 vs 4 2.28—2.34 .48128 no

3 vs 5 2.28-2.6 13.689 yes

4 vs 5 2.34-2.6 9.037 no

*indicates contrasts in which the first element is
larger than the second. If significance is found it
suggests that differences existed between the means but in a
direction opposite of that which would be expected.

 

 

 

 

REFERENCES

167

REFERENCES

Allen, P.D. (1976). The Miscue Research Studies. In P.D.
Allen & D.J. Watson (Eds.), Findings 2: research in
miscue analysis: Classroom implications. Urbana, IL:
National Council of Teachers of English, ERIC
Clearinghouse on Reading and Communication Skills.

 

 

 

Allington, R.L. (1984). Oral Reading. In D.P. Pearson
(Ed.), Handbook gf reading research. New York:
Longman.

Anderson, R.C., & Faust, G.W. (1973). Educational
psychology: The science 3: instruction and learning.

 

 

New York: Dodd, Mead & Co.

Ausubel, D.R. (1968). Educational psychology: A cognitive
view. New York: Holt, Rinehart & Winston.

 

Bader, L.A. (1980). Reading diagnosis and remediation lg
classroom and clinic. New York: Macmillan.

 

Baker, R.G. (1942). Success and failure in the classroom.
Progressive Education, 19, 221-224.

 

Baker, R.G., Dembo, T., & Lewin, K. (1941). Frustration and

regression: A3 experiment with young children
(University of Iowa Studies in Child Welfare, No. 1.).

Ames: University of Iowa.

 

 

Berliner, D.C. (1981). Academic learning time and reading

achievement. In J.T. Guthrie (Ed.), Comprehension and
teaching: Research reviews. Newark, DE:

 

International Reading Association.

Bernard, H.W. (1965). Psychology 2: learning and teaching.
New York: McGraw Hill.

Bernard, J. (1966). Achievement test norms and time of year
of testing. Psychology 13 the Schools, 3, 273—275.

Betts, E.A. (1940). Reading problems at the intermediate
grade level. Elementary School Journali 15, 737-746.

 

Betts, E.A. (1946). Foundations 2f reading instruction.
New York: American Book Co.

 

 

168

Block, J.R., & Anderson, L.W. (1975). Mastery learning 13
classroom instruction. New York: Macmillan.

 

Block, J.H. (Ed.). (1971). Mastery learning: Theory and
practice. New York: Holt, Rinehart & Winston.

 

Bloom, 3.8. (1976). Human characteristics and school
learning. New York: McGraw-Hill.

 

Bloom, B.S. (Ed.). (1956). Taxonomy 3: educational
objectives, handbook I; Cognitive domain. New York:

Longman.

 

Bloomer, R.H. (1959). Level of abstraction as a function of
modifier load. Journal 3: Educational Research, 52,
269-272.

 

Borg, W.R., & Gall, M.D. (1979). Educational research: éﬂ
introduction, (3rd ed.) New York: Longman.

 

Bormuth, J.R. (1969). Development gf readability analyses
(Final Report, Project No. 7-0052, Contract No. 1 OEC-
3-7-070052-0326). Washington, DC: USOE Bureau of
Research, HEW.

 

Bormuth, R.C. (1966). Readability: A new approach.
Reading Research Quarterly, 1 (3), 79-132.

 

Bormuth, J.R. (1975). The cloze procedure: Literacy in the

 

 

 

classroom. In W.D. Page (Ed.), Help for the reading
teacher: New directions 13 research. Urbana, IL:
National Conference on Research in English, ERIC

Clearinghouse on Reading and Communication Skills,
National Institute of Education.

Bradley, J.M., & Ames, W.S. (1976). The influence of

 

intrabook readability variation on oral reading
performance. Journal 2f Educational Research, 10, 101-
105.

Bradley, J.M., & Ames, W.S. (1977). Readability parameters
of basal readers. Journal 2: Reading Behavior, 9, 195-

 

 

183.

Brecht, R.D. (1977). Testing format and instructional level
with the informal reading inventory. Reading Teacher,
31, 57-59.

Britton, G., & Lumpkin, M. (1977). A consumer's guide on

readability: Ginn and Company, Ginn Reading 7207
Corvallis, OR: G. Britton & Associates.

 

 

169

Britton, J.E., & Danielson, W.A. (1958). A factor analysis
of language elements affecting readability. Journalism

Quarterly, 35, 420-426.

California achievement tests: Norm tables. (1977).
Monterey, CA: McGraw Hill.

 

 

Carlson, R. (1980, April). Reading level difficulty.
Creative Computing, 60—61.

 

Carroll, J.B. (1963). A model of school learning. Teachers
College Record, 64, 723-733.

Carver, R.P. (1975-1976). Measuring prose difficulty using

 

the rauding scale. Reading Research Quarterly, 11,
660-685.
Carver, R.P. (1974). Improving reading comprehension:

 

Measuring readability (Final Report, Contract No.
N00014—72-C0240, Office of Naval Research). Silver
Spring, MD: American Institute for Research.

 

Caylor, J.S., Sticht, T.G., Fox, L.C., & Ford, J.P. (1973).
Methodologies for determining reading requirements pf
military occupational specialties (Tech. Rep. No. 73-
75, HumRRO Western Division). Presidio of Monterey,
CA: Human Resources Research Organization.

 

 

 

Chall, J.S. (1958). Readability: AB appraisal pf research
and application. Columbus: The Bureau of Educational
Research, Ohio State University.

 

 

Christie, J.F., & Alonso, P.A. (1980). Effects of passage
difficulty on primary-grade children's oral reading
error patterns. Educational Research Quarterly, 5, 41—
49.

 

Christie, J.F. (1981). The effects of grade level and
reading ability on children's miscue patterns. Journal
pf Educational Research, 14, 419-423.

 

Coke, E.U. (1974). The effects of readability on oral and
silent reading rates. Journal pf Educational

Psychology, 66, 406-409.

Coleman, E.B. (1965). 93 understanding prose: Some
determiners pf its complexity (NSF Final Report GB—
2604). Washington, DC: National Science Foundation.

 

 

 

Cooper, J.L. (1952). The effect pf adjustment pf basal
reading materials pp achievement. Unpublished doctoral
dissertation, Boston University, Boston, MA.

 

 

 

 

 

170

Criscoe, B.L., & Gee. T.C. (1984). Content reading: A
diagnostic pgescriptive approach. Englewood Cliffs,

 

 

NJ: Prentice-Hall.

Cunningham, P. (1976). ARRF: A book that fits!. The
Reading Teacher, 29, 206-207.

Cunningham, P., Arthur, S., & Cunningham, J. (1977).
Classroom reading instruction K-S: Alternative
approaches. Lexington, MA: D.C. Heath & Co.

 

 

 

Dale, E., & Chall, J.S. (1948). A formula for predicting
readability. Educational Research Bulletin, 21, 11-20,
37-54.

 

Dale, E., & Tyler, R.W. (1934). A study of the factors
influencing the difficulty of reading materials for
adults of limited reading ability. Library Quarterly,
A, 384-412.

 

Danielson, W.A., & Bryan, S.D. (1963). Computer automation
of two readability formulas. Journalism Quarterly, 32,
201-206.

 

Daw, S.E. (1938). The persistence of errors in oral reading
in grades four and five. Journal pf Educational
Research, 22, 81-90.

DeBeaugrande, R. (1981). Design criteria for process models
of reading. Reading Research Quarterly, lg, 261-315.

 

DeCecco, J.P. (1968). The psychology pf learning and

instruction: Educational psychology. nglewood Cliffs:
NJ: Prentice-Hall.

 

 

 

Duffy, G.B., & Durrell, D.D. (1935). Third grade
difficulties in oral reading. Education, 2g, 37-40.

Dunkeld, C.G. (1970). TEA validity pi RES informal reading
inventory for the designation pf instructional reading
levels: A study pf Ehg relationship between children's
gains 13 reading achievement 332 the difficulty pf
instructional materials. Unpublished doctoral

dissertation, University of Illinois, Champaign.

 

 

 

 

 

 

 

 

Dunlap, C.G. (1954). Readability measurement: A review and
comparison. Unpublished doctoral dissertation,
University of Maryland, College Park.

 

 

Durrell, D.D. (1937, revised 1955). Durrell analysis pf

reading difficulty. New York: Harcourt, Brace &
World.

 

 

 

171

Eberwein, L.D. (1979). The variability of readability of
basal reader textbooks and how much teachers know about
it. Reading World, 18, 259-272.

Ekwall, E.E. (1974). Should repetitions be counted as
errors?. The Reading Teacher, 21, 365-367.

Ekwall, E.E., & English, J. (1971). Use pf the polygraph pp
determine elementary school students' frustration
level. (Final Report, Project No. 0G078). Washington,
DC: United States Department of Health, Education &
Welfare.

 

 

 

Ekwall, E.E. (1976). Diagnosis and remediation pf the
disabled reader. Boston, MA: Allyn & Bacon.

Ekwall, E.E. (1979). Ekwall reading inventory. Boston:
Allyn & Bacon.

 

Ekwall, E.E., Solis, J., & Solis, E. (1973). Investigating
informal reading inventory scoring criteria.
Elementary English, 52, 271-274.

 

Entin, E.B., & Klare, G.R. (1978). Factor analyses of three
correlation matrices of readability variables. Journal
pf Reading Behavior, l9, 279-290.

 

Fairbanks, G. (1937). The relation between eye—movements
and voice in the oral reading of good and poor silent
readers. Psychological Monographs, A8, 78-107.

 

 

 

 

 

Farr, R. (1969). Reading: What can pg measured? Newark,
DE: International Reading Association.

Farr, R. (Ed.). (1970). Measurement and evaluation pf
reading. New York: Harcourt, Brace & World.

Farr, R., & Carey, R.F. (1986). Reading: What can pg
measured? (2nd Edition). Newark, DE: International

Reading Association.

Flesch, R.F. (1948). A new readability yardstick. Journal
3: Applied Psychology, 32, 221-233.

 

Flesch, R.F. (1958). A new way to better English. New

York: Harper & Brothers.

Flesch, R.F. (1954). How £3 make sense. New York: Harper
& Brothers.

Flesch, R.F. (1949). The art pf readable writing. New
York: Harper & Row.

 

 

172

Forbes, T.W., & Cottle, W.C. (1953). A new method for
determining readability of standardized tests. Journal
62 Applied Psychology, 22, 185-190.

 

Ford, P.L. (Ed.). (1962). The New England primer: A
history 62 its origin and development. Teachers
College, Columbia University.

 

 

 

 

Fox, A.C. (1979). Foxies comparative chart 6A study 62
readability results pp stories 13 22 basal series).
Coeur d'Alene, ID: Fox Reading Research Company.

 

 

Fry, E.B. (1980). Comments on the preceding Harris and
Jacobson comparison of the Fry, Spache, and Harris-
Jacobson readability formulas. 365 Reading Teachepy
22, 924-926.

 

Fry, E.B. (1969). The readability graph validated at
primary levels. The Reading Teacher, 22, 534-538.

Fry, E. (1968). A readability formula that saves time.
Journal 62 Reading, 22, 513-516; 575-578.

 

Gage, N.L., & Berliner, D.C. (1984). Educational psychology
(3rd Edition). Boston, MA: Houghton Mifflin.

 

 

 

 

 

 

 

Gagne, R.M. (1968). Learning hierarchies. Educational
Psychologist, 6, 1—9.

Gagne, R.M. (1969). The acquisition of knowledge.
Psychological ReviewJ 62, 355-365.

Gagne, R.M. (1965). The analysis of instructional
objectives for the design of instruction. In R. Glaser
(Ed.), Teaching machines and programmed learning II:
Data and directions. Washington, DC: Department of
Audio-Visual Instruction, National Education

Association.

Gambrell, L.B., Wilson, R.M., & Gantt, W.N. (1981).
Classroom observations of task attending behaviors of
good and poor readers. Journal 2£ Educational Research,
26, 400-404.

 

Geoffrion, L.D., & Geoffrion, O.P. (1978). Computers and
reading instruction. Reading, MA: Addison Wesley.

 

Gerbens, A. (1978). Read any good books lately? Kilobaud,
AA, 104-106.

Gill, D., Polin, R.M., Vinsonhaler, J.F., & VanRoekel, J.

 

(1980). The impact 66 training 23 diagnostic
consistency. E. Lansing, MI: The Institute for

Research on Teaching, Michigan State University.

 

173

Gillmore, J.V. (1974). The relation between certain oral

 

 

 

 

 

reading habits and oral and silent reading
comprehension. Unpublished doctoral dissertation,

Harvard University, Cambridge, MA.

Glaser, N. A. (1964). A comparison of specific reading
skills of advanced and retarded readers 62 fifth grade

reading achievement. Unpublished doctoral dissertation,
University of Oregon, Eugene.

 

 

 

Glaser, R. (Ed.). (1965). Teaching machines and programmed
learning 22: Data £21 directions. Washington, DC:
Department of Audiovisual Instruction, National
Educational Association.

 

 

 

Glasser, W. (1969). Schools without failure. New York:
Harper & Row.

 

Goodman, D., & Schwab, S. (1980, April). Computerized
testing for readability. Creative Computing, 46-51.

 

Goodman, K. S., & Fleming, J. T. (Eds. ). (1969).
Psycholinguistic and the teaching of reading. Newark
DE: International Reading Association.

 

 

Goodman, K.S. (1967). Reading: A psycholinguistic guessing
game. Journal 62 the Reading SpecialistJ A, 126-135.

 

Goodman, Y.M., & Burke, C.L. (1972). Reading miscue
inventory manual: Procedure for diagnosis and
evaluation. New York: Macmillan.

 

Gray, L., & Reese, D. (1957). Teaching children £2 read.

 

 

 

 

New York: Ronald Press.

Gray, W.S., & Leary, E.E. (1935). What makes E book
readable? Chicago: University of Chicago Press.

Gray. W.S. (1915). Standardized oral reading paragraphs.
Bloomington, IL: Public School Publishing Co.

Guthrie, J. (1974). The maze technique to assess, monitor
reading comprehension. The Reading Teacher, _6, 161-
168.

Guzak, F.J. (1970). Delemmas in informal reading

assessments. Elementary English, 61, 666-670.

 

Harris, A.J., & Jacobson, M.D. (1980). Comparison of the
Fry, Spache, and Harris-Jacobson readability formulas
for primary grades. Reading Teacher, 22, 920-924.

174

Harris, A.J., & Jacobson, M.D. (1976). Predicting twelfth
graders' comprehension scores. Journal 62 Reading, 22,
43-47.

 

Harris, A.J., & Jacobson, M.D. (1974, October). Revised
Harris-Jacobson readabilipy formula. Paper presented
at the annual meeting of the College Reading
Association, Bethesda, MD.

 

Harris, A.J., & Sipay, E.R. (1980). How £2 increase
reading ability (7th Ed.). New York: Longman.

 

Hinkle, D.E., Wiersma, W., & Jurs, S.G. (1979). Applied
statistics for the behavioral sciences. Boston,
Houghton Mifflin.

 

 

Huey, E.B. (1908, reprinted 1968). The psychology and
pedagogy 62 reading. New York: Macmillan.

 

Irving, S.L., & Arnold, W.B. (1979 September). Measuring
readability of text. Personal Computing, 34-36.

 

Irwin, J.W., & Davis, C.A. (1980). Assessing readability:
The checklist approach. Journal 62 Reading, 22, 129-
130.

Jacobson, M.D., Kirkland, C.E., & Selden, R.W. (1978). An
examination of the McCall-Crabbe standard test lessons
in reading. Journal 66 Reading, 22, 224-230.

 

Johnson, M.S., & Kress, R. (1965). Informal reading
inventories. Newark, DE: International Reading
Association.

 

Johnston, P.H. (1984). Assessment in reading. In D.P.
Pearson (Ed.), Handbook 62 reading research. New York:
Longman.

 

Jorgenson, G.W. (1977). Relationship of classroom behavior
to the accuracy of the match between material
difficulty and student ability. Journal 62 Educational

Psychology, 62, 24-32.

 

Judd, C.H., & Buswell, G.T. (1922). Silent reading: A
study 62 the various types (Supplementary Educational
Monographs, No. 23). Chicago: University of Chicago

 

Press.
Keller, P.T.G. (1982). Maryland micro: A prototype
readability formula for small computers. Reading

Teacher, 22, 778—782.

 

 

175

Kibby, M.W. (1979). Passage readability affects the oral
reading strategies of disabled readers. The
Reading Teacher, 22, 390-396.

Killgallon, P.A. (1942). A study p2 relationships among
certain pupil adjustments 2p language situations.
Unpublished doctoral dissertation, Pennsylvania State
University, State College.

 

 

 

Kincaid, J.P., Fishburne, R., Rogers, R.L., & Chissom, B.S.
(1975). Derivation p2 new readability formulas
(Automated readability index, Epg count, and Flesch
reading ease formula) for Navy enlisted personnel
(Branch Report 8—75). Millington, TN: Chief of Naval

Training.

 

 

 

 

 

 

 

Klare, G.R. (1984). Readability. In P.D. Pearson (Ed.),
Handbook p2 reading research. New York: Longman.

 

 

Klare, G.R., & Buck, B. (1954). Know your reader: The
scientific approach pp readability. New York:
Hermitage House.

 

 

Klare, G.R. (1963). The measurement p2 readability. Ames:
Iowa State University Press.

 

Lantz, B. (1945). Some dynamic aspects of success and
failure. Psychological Monographs, No. 271.

 

Latimer, E.H. (1948, April). A comparative study of recent
techniques for judging readability (Abstracts of
Doctoral Dissertations). University p2 Pittsburgh
Bulletin 26, 246-256.

Leibert, R.E. (1965). Ap investigation p2 the differences
2p reading performance pp two tests p2 reading.
Unpublished doctoral dissertation, Syracuse University,

Syracuse, NY.

 

 

Lennon, R.T. (1951). The stability of achievement test
results from grade to grade. Educational and
Psychological Measurement, 22, 121-127.

 

Leslie, L., & Osol, P. (1978). Changes in oral reading
strategies as a function of quantities of miscues.
Journal p2 Reading Behavior, 26, 442-445.

 

Lewerenz, A.S. (1930). Vocabulary grade placement of
typical newspaper content. Educational Research
Bulletin, Los Angeles City Schools, 10, 4-6.

 

Lively, B.A., & Pressey, S.L. (1923). A method for
measuring the vocabulary burden of textbooks.
Educational Administration and Supervision, 2, 389-398.

 

176

 

Lorge, I. (1939). Predicting reading difficulty of
selections for children. Elementary English Review, 26,
229-233.

Lorge, I. (1949). Readability formulae-An evaluation.
Elementary Epglish, 26) 86-95.

Lumsdaine, A.A. (1964). Educational technology, programmed
learning, and instructional science. In E.R. Hilgard
(Ed.), Theories p2 learning pp2 instruction, Part 2 p2
the 63rd yrbk. p2 the National Society for 66p Study p2

Education. Chicago: University of Chicago Press.

 

 

 

 

Lumsdaine, A.A., & Glaser, R. (Eds.). (1950). Teaching
machines and pppggammed learning: A source book.
Washington, DC: Department of Audiovisual Instruction,
National Education Association.

 

 

 

McCall, W.A., & Crabbs, L.M. (1925). Standard test lessons
2p reading: Teacher's manual for all books. New York:
Bureau of Publications, Teachers College, Columbia
University.

 

 

 

 

 

McCracken, P. (1962, February). Standardized tests and
informal reading inventories. Education, 366-369.

McElroy, J. (1953, June). Fog count readability formula.
In Guide for Air Force writing, Air Force manual 11-3.

Maxwell, AL: Department of the Air Force, Maxwell Air
Force Base, Air University.

 

 

 

McLaughlin, G.H. (1969). SMOG grading: A new readability
formula. Journal p2 Reading, 22, 639-646.

MECC #749, School Utilities : Vol. 2. (1980). Minnesota
Educational Computer Consortium.

Miller, G.R., & Coleman, E.B. (1967). A set of thirty-six
passages calibrated for complexity. Journal p2 Verbal

Learning and Verbal Behavior, 6, 851-854.

Miller, G.R., & Coleman, E.B. (1972). The measurement of
reading speed and the obligation to generalize to a
population of reading materials. Journal p2 Reading
Behavior, 6, 48-56.

Miller, R.B. (1962). Task description and analysis. In
R.M. Gagne (Ed.), Psychological principles 2p system
development. New York: Holt, Rinehart & Winston.

 

Mills, R.E., & Richardson, J.R. (1963). What do publishers
mean by grade level? The Reading Teacher, 26, 359-362.

 

 

177

Monroe, M. (1932). Children who cannot read. Chicago:
University of Chicago Press.

Mosenthal, P. (1976-1977). Psycholinguistics properties of
aural and visual comprehension as determined by
children's abilities to comprehend syllogisms. Reading
Research Quarterly, 22, 55—92.

 

Mosenthal, P. (1978). The new and given in children's
comprehension of presuppositive negatives in two modes
of processing. Journal p2 Reading Behavior, 26, 267—
278.

 

 

Ojemann, R.H. (1934). The reading ability of parents and
factors associated with reading difficulty of parent
education materials. University p2 Iowa Studies 2p
Child Welfare, 2, 11-32.

Paolo, M.F. (1977). A comparison p2 readability graph
scores and oral reading errors pp trade books for
beginning readers. Unpublished master's thesis,
Rutgers: The State University of New Jersey, New
Brunswick.

 

 

 

 

 

Payne, C. (1930). The classification of errors in oral
reading. Elementary School Journal, 22, 142-146.

 

Peterson, M.J. (1956). Comparison of Flesch readability
scores with a test of reading comprehension. Journal
p2 Applied Psychology, 62, 35-36.

 

Pikulski, J.J., & Shanahan, T. (1982). Informal reading

 

inventories: A critical analysis. In J.J. Pikulski &
T. Shanahan (Eds.), Approaches pp the informal
evaluation of reading. Newark, DE: International

Reading Assoziation.

Pitner, R. (1913). Oral and silent reading of fourth grade
pupils. Journal p2 Educational Psychology, A, 330-337.

 

Polin, R.M. (1981). A study p2 preceptor training p2
classroom teachers 2p reading diagnosis. (Reading
Series No. 110). E. Lansing, MI: The Institute for

Research on Teaching, Michigan State University.

 

 

 

Powell, W.R. (1969). Reappraising the criteria for
interpreting informal inventories. In D. DeBoer (Ed.),
Reading diagnosis App evaluation: Proceedings p2 66p
thirteenth annual convention. Newark, DE:
International Reading Association.

 

 

 

Powell, W.R., & Dunkeld, C. (1971). Validity of the IRI
reading levels. Elementary English, 62, 637-642.

 

 

178

Raygor, A.L. (1977). The Raygor readability estimate: A
quick and easy way to determine difficulty. In P.D.
Pearson (Ed.), Reading: Theory, research ppp practice
(26th Yearbook of the National Reading Conference).
Clemson, SC: National Reading Conference.

 

Rumelhart, D. (1977). Toward an interactive model of
reading. In S. Dornic (Ed), Attention and performance
22. Hillsdale, NJ: Erlbaum.

Samuels, J.S. (1979). The method of repeated readings. The
Reading Teacher, 22, 403-408.

Samuels, S.J., & Kamil, M.L. (1984). Models of the reading
process. In P.D. Pearson (Ed.), Handbook p2 Reading
Research. New York: Longman.

 

Schlieper, A. (1977). Oral reading errors in relation to
grade and level of skill. The Reading Teacher, 22,
283-287.

 

Schuyler, M.R. (1982). A readability program for use on
microcomputers. Journal p2 Reading, 22, 560—591.

 

Sears, P.S. (1940). Level of aspiration in academically
successful and unsuccessful children. Journal p2
Abnormal and Social Psychology, 22, 498-536.

 

Sherman, G.B., Weinshank, A., & Brown, S. (1979). Training
reading specialists 2p diagnosis (Research Series No.
31). E. Lansing, MI: The Institute for Research on
Teaching, Michigan State University.

 

Silvaroli, N.J. (1965). Classroom reading inventory.
Dubuque, IA: Wm. C. Brown Co.

 

Singer, H., & Dolan, D. (1980). Reading and learning from
text. Boston: Little, Brown & Co.

 

Singer, H. (1975). The SEER technique: A non-computational
procedure for estimating readability level. Journal p2
Reading Behavior, 2, 255—267.

 

Sipay, E.R. (1964). Comparison of standardized reading
scores and functional reading levels. Reading Teacher,
22, 265-268.

Smith, J.K. (1977). Perspectives pp mastery learning and
mastery testing. Princeton, NJ: Eric Clearinghouse on
Tests, Measurement and Evaluation.

 

 

179

Smith, L. (1976). Miscue research and readability. In P.D.
Allen & D.J. Watson (Eds.), Findings p2 research 2p
miscue analysis: Classroom implications. Urbana IL:
National Council of Teachers of English, ERIC
Clearinghouse on Reading and Communication Skills.

 

 

Spache, G.D. (1953). A new readability formula for primary
grade reading materials. Elementary School Journal,
22, 410-413.

 

Spache, G.D., & Spache, E.B. (1977). Reading 2p the

elementary school (4th ed.). Boston, MA: Allyn &
Bacon.

 

 

Spache, G.D. (1972). Diagnostic reading scales. Monterey,
CA: California Test Bureau.

 

Spache, G.D. (1974). Good reading for poor readers (Rev.
9th ed.). Champaign, IL: Garrard Publishing Co.

 

Spiro, R.J., & Myers, A. (1984). Individual differences and
underlying cognitive processes in reading. In P.D.
Pearson (Ed.), Handbook p2 Reading Research. New York:
Longman.

 

Stadlander, E.L. (1936). A scale for evaluating the
difficulty of reading materials for the intermediate
grades (Abstract of Doctoral Dissertation). University
p2 Pittsburgh Bulletin, 22, 347-352.

 

Stanovich, K.E. (1980). Toward an interactive-compensatory
model of individual differences in the development of
reading fluency. Reading Research Quarterly, 26, 32-
71.

 

Stolurow, L.M., & Newman, J.R. (1959). A factional analysis
of objective features of printed language presumably
related to reading difficulty. Journal p2 Educational
Research, 22, 243—251.

 

Stone, C. (1957). Measuring difficulty of primary reading
material: A constructive criticism of Spache's
measure. Elementary School Journal, 22, 36-41.

 

 

Taylor, W.L. (1953). Cloze procedure: A new tool for
measuring readability. Journalism Quarterly, 26, 415—
433.

Thorndike, E.L. (1921). Educational psychology, 22 The

 

 

original nature of man, 2: The psychology of learning,

22 Work and fatigue, individual differencesT_ New York:

Bureau of Publications, Teachers College, Columbia
University Press.

 

 

 

 

 

180

Thorndike, E.L. (1917). Reading reasoning: A study of
mistakes in paragraph reading. Journal p2 Educational

Psychology, 6, 323-332.

Thorndike, E.L. (1921). The teacher's word book. New York:
Teacher's College, Columbia University.

 

Traxler, A.E. (1950). Reading growth of secondary school
pupils during a five year period. Educational Records
Bulletin, 22, 98-107.

 

Vacca, R.T. (1981). Content area reading. Boston: Little,
Brown & Co.

Veatch, J. (1978). Reading 2p the elementary school (2nd
ed.). New York: John Wiley & Sons.

 

 

Washburne, C., & Vogel, M. (1926). What books fit what
children? School and Society, 22, 22-24.

Washburne, C., & Vogel, M. (1928). An objective method of
determining grade placement of children's reading
materials. Elementary School Journal, 22, 373-381.

 

Weiner, B. (1972). Theories of motivation: From mechanism

2p cognition. Chicago: Markham.

 

 

 

 

Wienshank, A.B. (1980). Investigation p2 the diagnostic
reliability p2 reading specialists, learning
disabilities specialists, ppp classroom teachers:
Results ppp implications. E. Lansing, MI: The
Institute for Research on Teaching, Michigan State
University.

Wells, C.A. (1950). The value of an oral reading test for
diagnosis of the reading difficulties of college
freshmen of low academic performance. Psychological

Monographs, 66, 1-35.

Winer, B.J. (1971). Statistical principles 2p experimental
design. New York: McGraw Hill.

 

 

 

 

   

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

llllllljlllllMlllllllﬂlllllllﬂlllllll

0

     

.11._____