Ln . flaw. mm .2 .,....3.. E... . fl; 1 . . .. :1 uh. o\ L 5-. .v: .5. =5 . 23.5 ”5.3).. . .25., :3: - 9.23: z. 1. x «R. i v. 3.21:: . . . ”3.359;...“ . List.) ‘ :T. a... my: a: ‘ $1 w.55. _ .5; a fl. . .1. .13.... £5... 1. §,_2‘..Eaxi.i3. xxx» :..!!..5 .13.. 4“... : is $3. a. u a... 4 .a E . .nLca..:.é...i..i 5:: 2.2:. :iiifizz ~lu~$§ . Strfil 2‘! p . 5:23.... is... £1 a. 5...§§Si.§.§§k EYYI::!-§n.$<1 ”3-5.1! H.221}. 2.5.x . a. :51: £51.. .. V . x t... it :3 1.51:5: 3:1: lit .57.. > #713295. 5:.“ 2t.5=§).7l.l: xlxysrlyfi)‘ $1.51.. .Laluyazlff....‘ve)r .1111: a?! 5.5»; EA 1 5.5:}! a . . To an”. Izzt: ,.:.xxrxfi..l lfitli: i5. \. 3,132. 1:. 1533;..91. «2.6.? 5...... . I: a?) 5 . .2. gent-52);. 13.55.191.51 1115.534 3...?! (It; {1.13115 iii-K15}... Z:.F....n.{x1 .. ;,.l.t. 5‘ : V 5110.51.15. if. 159 :5. {r LU..:!.£;;.:..J‘. 5.13.:th It?! .. a... 11.21.111.519: t .5: 5L?)i(l...il:§. £1 s..!.....l..,. 3 7. LIE-.511?!) 111.}.1 , . 5:31;} a... r?! at ....a?..lt.ia s. r: 5.3:): WIS (If 302079 9544 This is to certify that the dissertation entitled The Effect of Item Text Characteristics on Children's Growth in Reading presented by Hye—Sook Park has been accepted towards fulfillment of the requirements for Ph. D. Educational Psychology degree in Major professor - / Dec 18 1998 Date MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 LBRARY Michigan State University The Effect of Item Text Characteristics on Children’s Growth in Reading By Hye-Sook Park A DISSERTATION Submitted to 7 Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology and Special Education 1 999 ABSTRACT The Effect of Item Text Characteristics on Children’s Growth in Reading By Hye-Sook Park This study investigates children’s growth in reading reflected on the Peabody Individual Achievement Test (PIAT) reading comprehension item responses from the National Longitudinal Survey of Youth data over several years. Based on the idea that reading comprehension is determined by characteristics of both readers and texts, this study investigates the relative impact of both. Using a three-level hierarchical generalized linear model, in which items (level-1) are nested within time points (level-2) and time points are nested within individuals (level-3), this study assesses relationships among text characteristics, cognitive abilities, environmental factors, and reading ability (as indexed by the Peabody text). Reading ability did not grow at a constant rate; in fact it exhibited variable patterns that were influenced by verbal memory and text characteristics in difl‘erent ways at different points in children’s reading development. In general, short sentences, frequently used vocabulary, and high density facilitated reading comprehension, but the temporal influences of the patterns of three text characteristics differed. The effect of age on children’s reading comprehension was manifested differentially depending upon sentence characteristics. In the case of sentence length, the effect of age was manifested only with short sentences. The positive contribution that frequently used vocabulary made to reading comprehension increased over years, but the ii growth rates were also different. The efi‘ect of age on reading comprehension was greater with sentences written using high frequency vocabulary than with low frequency vocabulary. The effect of propositional density increases constantly. The effect of age on reading comprehension was manifested greatly with high density sentences, that is, coherent sentences, rather than with low density sentences. In addition, verbal memory was statistically significant in predicting both the average efi‘ect of sentence length over time and the rate of growth of sentence length slope. There was an interaction effect between verbal memory and length of sentences over time. In the case of short sentences, the effect of verbal memory was practically as well as statistically significant. However, in the case of long sentences, the effect of verbal memory was almost absent. As verbal memory increased, vocabulary frequency had a greater effect on reading ability. However, verbal memory did not influence the effect of propositional density. The differential contribution of each psycholinguistic variable over time implies that achievement, as measured by a reading comprehension test, is a complex entity that is greatly dependent on the nature of the text contained in the test. iii Copyright by Hye—Sook Park 1999 To the Almighty who made this possible ACKNOWLEDGEMENTS Although my dissertation research was conducted based on information processing theory due to the affordances/constraints of the outcome measure, my work acknowledges that knowledge is socially constructed. I owe directly and indirectly so many things to professors and fiiends who have made this work possible. Looking backwards, due to their support, I feel that the period of conducting this research was the happiest one in my graduate program. I was blessed to work with my committee members, who are the model educators and researchers that I have admired. As my dissertation director, Professor P. David Pearson provided indescribable mentoring and support. He provided an intellectual niche for me and provided me with different types of financial support. In spite of his overwhelmingly busy schedule, I could get feedback fiom him whenever I needed it. His accessibility to me both via e-mail and face-to-face interaction within 24 hours facilitated my progress. His father-like support has sustained my energy, spirit, and enthusiasm for learning. I also would like to express my deepest thanks to Professor Stephen W. Raudenbush, the great motivator in inspiring my interests and creating potentials as a researcher in quantitative research. Because of him, I came to like statistics. He has provided valuable feedback on each phase of my dissertation, even as he was moving to another university. His warm and understanding smiles have also given me the energy to vi go over work again and again during various stages of my graduate program. He always told me to do “ground—breaking work,” and at the same time he did not forget to tell me to enjoy the Michigan sunshine. Sincere thanks also goes to Professors Ralph T. Putnam, my advisor at the last stage of my graduate program, for setting high standards on my study, listening to me, editing my papers, and supporting me in many situations. I have known Professor Thomas J. Luster since the first year during my master’s program. He informed me about the National Longitudinal Survey of Youth data for this research. His on-going kindness and warm support have made my life at Michigan State University easy. Special thanks goes to Michael C. Rodriguez, for his kindness and timely help. I learned something about measurement from him, too. I am also grateful to Professor Barbara K. Abbott because of her kindness when I felt ambiguity in analyzing propositions. I also would like to say thank you to Dr. Randall Fotiu for some technological support at the beginning of my study, Yasuo Miyazaki and other HLM classmates for posing intellectual challenges to me and providing feedback on my study; Professors William Mehrens, Betsy Jane Becker, and Kenneth Frank for their kindness in giving me the opportunities to learn more about measurement and statistics; and Professor Stephen L. Yelon for his advice and kindness during my wandering periods. My thanks extends to Professors Richard Prawat, John Schwille, and James Gavelek, my former advisor, for providing me with direct and indirect financial support and encouragement. vii Thanks goes to friends, Rena and Robert Atherton, my host family; Haj-young and Dr. Sung-Jun Kim, Hee-Sook Park, Ok-Sook Park, Mr. Yoon-seng Kim, Sheila Moore, Ailing Kong, Drs. In-Kyung Lee, Kedmon Hungwe, Myung-Ae Bang, and Leticia and Rodolfo Altamirano for unforgettable friendships, help, and prayers for me especially during my physical ailment. Thanks goes to my office-mates and Gary Afiholter for their support. Thanks to Professors Shin-You Lee and Sang-0k Park at Hong-Ik University, Dr. N. M. Mohan Pankaj at Australia National University, Mr. Jae- Hyuk Yang, Shinwon Middle School principal, for their encouragement, and my student, Ju-sub Yoon for reminding me that I am realizing my dream. Finally, my thanks goes to my family members. To my parents, Keum-Nam Park and Young-Su Kim, for their high expectation on my education and their sacrificial support for their children’s education; to my sister, Kyung-Sook Park for her warm support and cheering; to my younger brother and his wife, Jae-Hyun Park and Keum-mi Lee, especially to their children, Ji-Su and Sang-Su for their smiles and cheering, and to my older brother, Kyung-Sun Park who has indirectly led me to become a persistent learner. I also would like to extend my thanks to many other fiiends and faculty members who have helped me grow as a researcher at Michigan State University. viii Effect ofIndividual Characteristics on Achievement ...............67 Changing Home Environment .................................................... 68 Patterns of Growth in Ability .................................................... 69 Interaction Between Verbal Memory and Text Characteristics over Time ........................................................................... 70 Sentence length ........................................................................... 7O Vocabulary frequency ................................................................. 71 Propositional density ................................................................... 73 Summary ................................................................................................... 75 CHAPTER 5 CONCLUSIONS AND DISCUSSION ......................................................................... 77 Summary ................................................................................................... 77 Reading Ability Growth Pattern .............................................................. 78 Psycholinguisticiinguistic Variable Growth Pattern ............................... 79 Relationship Between Verbal Memory and Psycholinguistic Variables ..... 82 Discussion .................................................................................... 83 Implications for Test Development and Methodology ........................ 84 Limitations ......................................................................... 86 Direction for Future Research ................................................... 86 APPENDIX .......................................................................................... 89 BIBLOGRAPHY .................................................................................... 9O LIST OF TABLES 1. Correlations Among Cognitive Stimulation Scores and Total Home Scores ............ 45 2. Non-Linear Model with the Logit Link Function: Unit-Specific Model ...................... 54 3. Reading Ability by Time ............................................................................................ 55 4. Descriptive Statistics for Level-1 Variables ............................................................... 56 5. The Effect of Sentence Length by Time in Log-odds ................................................. 58 6. The Effect of Vocabulary Frequency by Time in Log-odds ........................................ 59 7. The Efl‘ect of Propositional Density by Time in Log-odds ......................................... 62 8. Descriptive Statistics for Level-3 Variables ............................................................... 64 9. Full Model ................................................................................................................ 65 10. The Effect of Verbal Memory in Log-odds .............................................................. 67 ll. Descriptive Statistics for Level-2 Variables ............................................................. 67 12. The Effect of Home Cognitive Stimulation Score .................................................... 68 13. Interaction Efl‘ect Between Verbal Memory and Sentence Length over Time on Reading Comprehension in Log-odds ...................................... 71 14. Interaction Effect Between Verbal Memory and Vocabulary Frequency over Time on Reading Comprehension in Log-odds ..................................... 72 15. Interaction Effect Between Verbal Memory and Propositional Density over Time on Reading Comprehension in Log-odds ...................................... 74 LIST OF FIGURES 1. Information About Forming Ceiling and Basal Items 31 2. Patterns of Change in Ability and Change in the Importance of Item Text Characteristics ................................................................... 54 3. The Growth of Children’s Ability in Reading ............................................................ 57 4. The Effect of Sentence Length ............................................................................. ~ ..... 59 5. The Effect of Vocabulary Frequency ......................................................................... 61 6. The Effect of Propositional Density ........................................................................... 63 7. Patterns of Change in Ability and in the Importance of Item Text Characteristics ...... 65 8. Interaction Effect Between Verbal Memory and Sentence Length over Time on Reading Comprehension in Log-odds .................................................... 71 \O . Interaction Effect Between Verbal Memory and Vocabulary Frequency over Time on Reading Comprehension in Log-odds ..................................... 73 10. Interaction Effect Between Verbal Memory and Propositional Density over Time on Reading Comprehension in Log-odds over Time ............... 74 xii CHAPTER 1 INTRODUCTION For decades, studies on readability have been conducted to understand the efl‘ect of text characteristics on reading comprehension. However, no studies have been conducted to investigate how the effect of text characteristics on reading comprehension changes as children grow older. This study investigates how the linguistic characteristics of text interact with characteristics that children bring to the classroom either by virtue of nature or experience. This study explores factors that explain or account for the growth in beginning readers’ abilities at ages 6, 8, and 10 in terms of potentially explanatory variables: (a) psycholinguistic variables such as sentence length, word frequency, and idea density; (b) changing home environmental factors; and (c) time invariant individual characteristics such as race, gender, verbal memory, and testing time. In addition, this study investigates how individual characteristics interact with psycholinguistic variables in explaining grth in reading. Reading achievement was measured by the Peabody Individual Achievement Test (PIAT) Reading Comprehension items across three time points over four years as a part of the National Longitudinal Study of Youth (NLSY). These data are the primary outcome measures for this investigation. It is commonly believed that reading comprehension is determined by the joint influence of the characteristics of readers and the texts they read, with the assumption that ability is itself the joint effect of biological (genetic) and environmental factors. The present study builds on a long tradition of readability studies in the sense that it incorporates text characteristics in the model. Traditionally, studies on readability have used regression models to explain the difliculties of texts. Some of these studies put linguistic and psycholinguistic factors into the model to explain text difficulties. Early readability studies (Chall et al., 1948; Flesch, 1943) investigated only observable text characteristics (e. g., number of words in a sentence, number of syllables in a word, number of prepositions, and vocabulary frequencies). More recent studies have tried to explain text difliculties by incorporating reader factors, such as reader’s prose-processing capability (Kintsch, 1979). Carver (1977) and Stenner (1997) measured both the difficulties of texts and the ability of readers by attempting to place the two constructs on the same scale. However, in spite of these researchers’ contributions to the area of reading comprehension, questions still remain regarding how the importance of text characteristics differs with respect to readers’ abilities. Text characteristics interact with the characteristics of readers, and readers’ abilities may influence the perception of text characteristics. Thus, it is important to investigate the changing patterns of influence of linguistic and psycholinguistic variations of texts, especially as they are moderated by changes in children’s underlying reading abilities and cognitive growth. Based on information processing theory, the study will investigate the NLSY children’s PIAT reading comprehension item responses using hierarchical generalized linear models (HGLM). The NLSY PIAT reading comprehension item responses provide important information for understanding beginning readers’ development from ages 6 to 10. According to Chall’s (1983) reading development scheme, children undergo six difl’erent reading stages before reaching adulthood: pre-reading, initial decoding, reading for confirmation of knowledge, reading for obtaining conventional knowledge, reading with multiple view points, and the construction/reconstruction of knowledge. However, no studies have investigated the changing impact of text difliculty as children progress from one stage to the next. In addition, no studies have investigated how the text characteristics interact with children’s individual characteristics. Especially rare is the use of item responses by the same subjects to the same test across several time points over several years, as is the case in this study. This longitudinal perspective will help us examine the complex array of factors that influence reading development more extensively and more accurately. The use of a common metric and a single group of subjects across years eliminates the confounding that might occur if either a different assessment instrument were to be employed across time or difi’erent subjects were incorporated at each time point. In addition, this study will avoid some problems commonly found in this sort of research: If only two time points are used, it is difficult to assess the trends (growth) of children’s reading development across years; cross-sectional designs obscure the assessment of the individual children’s development across years due to cohort effects. To explain the immediate text processing phenomena at each time point, this study will be based on information processing theory. In fact, the very structure of the PIAT suggests a grounding in information processing theory. The characteristics of the PIAT items, procedures, and underlying assumptions about reading processes are consistent with information processing theory. The PIAT reading comprehension test comprises 66 items, each a single sentence. As the test progresses, sentences get longer and the words used become less common. The PIAT reading comprehension test is conducted by asking children to read a sentence only once, turn a page, and then to select one of four pictures that describes the sentence. The PIAT uses a range-finding approach to item selection for each individual by giving a certain range of items that is appropriate to readers’ ability levels based on the PIAT reading recognition score (a word identification test). A basal level (a range of easy items) and a ceiling level (a range of very hard items) are found for each child, and a final score for any individual is based upon performance on these items that fall between the basal and ceiling levels. The assumption of reading found in the PIAT test is that comprehension is a process of finding meaning in a text. The meaning of the text exists independently of the reader, since the reader has to choose the one correct meaning out of four options. When children take the test they must engage recall (Carroll, 1972), or short-term memory. (They turn the page after reading the sentence in order to see the four picture choice.) Carroll (197 2) argued that having readers answer questions without the text present overemphasizes the memory component rather than measuring pure comprehension of text reflected in lexical knowledge, grammatical knowledge, and an ability to locate facts in a paragraph. As suggested, this test also requires the child to invoke short-term memory (STM) or working memory. As indicated by Jorm (1983) and Morrison, Giordani, and Nagy (1 97 7), there exists a relationship between reading ability and STM. Poor readers have difficulty storing and processing information in STM. Since the attention (mechanism) and memory size change as children grow older, this study will investigate how children’s reading abilities, which influence the children’s perception of text dificulties, change over four years. The psycholinguistic model (so named because psycholinguistic factors are used in the model) in this study takes into account the limitations of STM capacity. To investigate how STM is related to reading comprehension, test items are analyzed according to three psycholinguistic variables: sentence length, word frequency, and propositional density. According to Baddeley et al. (1975), the phonological loop in immediate memory performance is directly influenced by the spoken length of memory items. In this sense, using length of word for determining sentence difficulty is related to the effrciency of STM or working memory. Especially when considering that beginning readers undergo a decoding stage and that the children in this study are beginning readers in 1988, the first year of the data collection, the phonological loop in working memory is assumed to be involved in children’s early stage of oral reading. Also, familiar words do not take much memory space because of the effect of automaticity. Propositions, psychological representations of meaning, are composed of a predicator and arguments (Kintsch, 1974). For example, the sentence, “John runs fast.” consists of two propositions: [run, John] and [fast, run]. In addition to this, the more propositions in a sentence, the more STM space they require since the number of propositions is comparable to the number of conceptual meaning units or memory chunks. In this sense, the use of these three variables is directly related to the capacity of STM or working memory. However, in order to avoid a probable multicollinearity between sentence length and number of propositions, the density of propositions, which is obtained by dividing the number of propositions by the number of words in a sentence, will be used. In addition, studies of early literacy show the importance of home environment and intra-individual characteristics. However, for theoretical consistency, variables such as home cognitive stimulation score and variables that reflect intra-individual characteristics will be used to investigate how children’s reading ability and the relative contribution of psycholinguistic variables change over time. Children from enriched home environments and children who have high verbal memory typically demonstrate better reading achievement. This study will investigate the pattern of the children’s ability while controlling for intra-individual and home environmental factors. In addition, this study will examine the relative impact of each cluster of variables on children’s reading development, while controlling for other contextual characteristics. The methodology employed in this study, HGLM, provides a vehicle to evaluate my research questions. In the HGLM to be used in this study, item responses (which have linguistic characteristics) are nested within testing occasions. Occasions, in turn, are nested within individuals who differ from one another on several characteristics. The following are the specific research questions: 1. Do children’s reading abilities change at a constant rate? a) Do abilities increase at constant or variable rates over time? b) Do changes in reading abilities differ across individuals? 2. How does the importance of each linguistic/psycholinguistic variable change as children grow older? Is the rate of change for each linguistic variable constant or variable? 3. How do individual children’s characteristics such as verbal memory interact with text characteristics, such as sentence length, vocabulary frequency, and propositional density? a) Does the effect of sentence length on reading comprehension depend on children’s verbal memory? b) Does the effect of vocabulary frequency on reading comprehension depend on children’s verbal memory? 0) Does the effect of the propositional density on reading comprehension depend on children’s verbal memory? 4. How do contextual factors influence children’s growth in reading? To what extent does children’s growth in reading depend on a) verbal memory? b) home environment? 0) race? b) gender? (1) the initial test month? CHAPTER 2 LITERATURE REVIEW This study draws on three relevant bodies of literature related to the use of the Peabody Reading Comprehension Test. The first is the information processing view of cognitive processes, including reading. The second is the long-standing empirical tradition of estimating the readability of text by examining its linguistic characteristics. The third is a developmental stage-wise view of reading, one that suggests that the cognitive demands of reading change as the task increases in complexity. Theoretical Perspectives Miller (1993) argued that information processing is not a single theory, but rather a framework which characterizes a large number of research programs. The flow of information begins with an input, or stimulus. It ends with an output, which could be a bit of information stored in long-term memory (LTM) or an observable behavior such as a speech act or a decision of choosing one answer over another. Since mental operations occur in short-term memory (STM) during the real time between input and output, the consideration of STM (or working memory) is useful for this study. William James (1890) proposed that the essence of attention is focalization, concentration, and consciousness. Attention requires withdrawal from some things in order to deal effectively with others. Because of the limited capacity for attending to stimuli (Broadbent, 1958; Treisman, 1960; Posner, 1982), performance may break down if the attentional demands of the task exceed the performer’s capacity (Anderson, 1982). However, as practice increases, performance becomes more automatic, requiring less attention (Laberge & Samuel, 1974) and less STM or working memory space. Chunking, which can be regarded as organizing stimuli into a meaningful unit, is also related to automatization in the sense that the perceptual system rapidly parses the stimulus, forming a hierarchical structure of instantiated chunks (V anLehn, 1989). Miller (1956), observing that STM has a limited capacity, posited his now famous 7 i 2 rule, specifying that STM can only deal with about seven chunks of information concurrently. According to Miller, although the size of a chunk might differ among individuals, the number of chunks remains the same. However, his conclusion is based on research with adults; children’s memory chunks are smaller and change both quantitatively and qualitatively as they develop. Two general sources of changes in processing are the acquisition of particular cognitive skills and increases in the capacity or rate of processing (Miller, 1993). Baddeley and Hitch (1974) presented a working memory model in which there are three components in working memory: a central executive component, a phonological loop, and a visuo-spatial sketch pad. The central executive component regulates information flow within working memory, retrieves information from other memory systems such as LTM, and processes and stores information. However, the processing resources used by the central executive are limited in capacity. The efliciency with which the central executive fulfills a particular function depends upon whether other constraints are placed on it (Gathercole & Baddeley, 1993). The central executive is supplemented by two components or slave systems--the phonological loop and visuo-spatial sketch pad. The phonological loop maintains verbally coded information, whereas the visuo-spatial sketch pad is involved in the short-term processing and maintenance of material which has a spatial component. These two systems as well as LTM size undergo changes as children grow older. A study by Gathercole et al. (1991) showed that the phonological loop is related to verbal memory and vocabulary knowledge. A study by Scarborough (1998) showed that kindergartners’ verbal memory score is more strongly related to their future reading achievement than digit span, word span, and pseudo-word repetition measures. This study will investigate how much the verbal memory obtained around the age of four influences children’s reading abilities over three points in time. Changes in reading ability may come about through certain kinds of experiences. Some experiences are stored as schemas or scripts in the LTM, which can be brought into the working memory when needed. For example, schema theory explains that text comprehension varies directly with experiential background--that readers can easily understand text when it matches their experience (Anderson & Pearson, 1984). Experiences include encountering conflict between different predictions, becoming more familiar with the task materials, trying out a strategy that works, and acquiring more knowledge about the physical and social world (Miller, 1993). These experiences lead to new rules or strategies, which in turn lead to better memory, representation, and problem- solving. In this sense, experience is one major factor inducing cognitive development. However, the social environmental experience is not the initial or central interest of information processing theory (Gardner, 1987) although numerous studies have shown the importance of home environment on children’s cognitive development. In the NLSY data set, home score, which is the combined score of cognitive stimulation score and emotional lO support score, exists. However, for theoretical consistency, I used home cognitive stimulation score to investigate the effect of home environment on the children’s reading ability over time. In addition, through this study, I will investigate whether individual differences exist after controlling for a changing environmental factor and individual differences. Need for Readability Satay Attainment of literacy in reading is directly related to academic, economic, societal, political, and personal life and values (Harris, 1990). As far back as 1935, Gates described reading as the most important and the most troublesome subject in primary schools. Since mastering reading is essential to learning almost every other school subject, failure in the primary school is directly related to deficiencies in reading. Along the same line, Ogle, Absalam, and Rogers (1991) reported that students who have difliculty in reading are more likely to experience unemployment upon leaving school. Reading is a vital developmental task that should be mastered. Recently, national attention has been drawn to reading, or more precisely reading disabilities; a report issued by the National Research Council (Snow & Burns, 1998) showed the devastating consequences of a reading disability. In most cases, unsatisfactory achievement in reading has a handicapping effect on an individual’s life. Because of the importance of reading, for decades researchers have tried to find various ways to improve students’ reading ability. Numerous individuals and commissions have offered their analyses and recommendations to improve reading. Texts were the central aspect in these reports and emphasis on quantifiable standards brought renewed ll interests in readability studies (Bruce & Rubin, 1988). Research studies (Hahn, 1987) also showed that if texts are too difficult, children exhibit behavioral problems during class by being less attentive. Carver (1994) also implied that easy text books, which are characterized by the existence of less than 1 percent of unknown words, are not appropriate for enhancing children’s vocabulary. Thus, an optirrral level of text difliculty is needed to induce children’s learning. Developmentally appropriate texts are neither so easy that they offer no challenge to children, nor so diflicult that children feel frustrated. The prediction of text readability has been championed as a tool to enhance or maximize students’ learning because it affords the selection of developmentally appropriate texts. However, no studies have been conducted to investigate the importance of text characteristics over time, especially with the same students across several years. History of Measuring Text Dificulty According to Klare (1985), readability concerns itself with qualities of writing which are related to reader comprehension. Readability formulas refer to a predictive device (Klare, 1963) intended to provide quantitative and objective estimates of reading difficulty (Klare, 1985). Readability formulas have been used as an indicator of comprehension difficulty of reading materials (Carver, 1977-78). Readability has been studied in two traditions, prediction and production. In the prediction tradition, readability of a text has been investigated to predict how readable a piece of writing is likely to be for the intended reader or to predict the grade level of the written materials. In the production tradition, readability of a text has been manipulated experimentally to produce readable texts for readers in a target population. Prediction research has been done by applying psychometric theory, where the validity and 12 reliability have been high compared to production research studies. The prediction research studies can be generalizable because a large sample size of the criterion variable is used. However, production research studies, which are done in the psycholinguistic tradition, have comparatively low reliability, which influences their replicability and validity. As production research studies are implemented experimentally, they can be used to test causal inferences regarding the effects of particular texts. Even so, results of text experiments are often questioned on grounds of generalizability to a population of passages because of the small number of sample passages in a given study (Klare, 1984). According to Klare’s (1963, 1974-5, 1984) historical accounts of readability measurement, the development of readability formulas goes back to the early 19205. H. D. Kitson (1921) can be considered as its pioneer. He used the number of syllables in a word and the number of words in a sentence as indices of the relative difficulty of newspapers and magazines. Since then, numerous readability formulas using linear regression have sprung up. Among them, Lively and Pressesy’s formula (1923) used a word frequency index based on Thomdike’s Teacher’s Word Book to estimate vocabulary difficulty. Lodge’s (1939) formula used semantic and syntactic factors, which are still the most widely used variables. Flesch’s (1943) formula was designed for adult materials. According to Flesch, formulas then existing were not fit for adult materials because of their emphasis on vocabulary frequency at the expense of other factors. Flesch’s formula put emphasis on 13 abstract words. Using magazine articles as criterion variables, he found that counting abstract words and affix morphemesl, as a means of measuring abstractness, was closely related to the magazine levels. However, the tediousness of counting affixes as a means of measuring abstractness and the often misleading methods of counting personal references led to the development of two formulas. One of them is the most popular, Flesch’s Reading Easy Formula. This formula used the number of syllables in a word and the number of words in a sentence as indices of syntactic dificulty of a systematically selected 100 word sample of materials. (Klare, 1963/1984). The formula correlated 0.70 with the McCall-Crabbs criterion. The other formula is Flesch’s Human Interest Formula, which used personal words per 100 words and personal sentences per 100 sentences. Personal words means using personal names instead of using proper noun. For example, -“Mike said that. . . .” Personal sentences are those sentences aimed directly at readers. For example, “You should do. . . .” This formula correlated 0.43 with McCall-Crabbs criterion. To supplement some deficiencies found in Flesch’s original formula, Dale and Chall (1948) used familiar words to determine semantic difiiculty using Dale’s list of 3,000 words and sentence length (in words) in their formula. Dale-Chall formula scores correlated 0.70 with McCall-Crabbs criterion scores (which is based on multiple choice, and has been widely used as a measure of comprehension). Dale-Chall’s formula is highly predictive of text difficulties. Gray and Leary’s (1935) work was also salient because of its comprehensiveness and the methods of conducting factor analysis for building a formula. This formula is also ' Afi’rxes are the additions to stems, roots, and words to modify the meaning of words. For example, im- in impossible is used as a prefix and -ness in goodness is used as a suffix. 14 intended for adults. Gray and Leary (193 5) employed survey methods to isolate factors contributing to readability. Existing work and surveys of experts’ opinions and reactions of library patrons yielded 289 factors. These are grouped into four major categories such as content, style of expression and presentation, format, and general features of organization. To understand adult reading ability, they developed the Adult Reading Test and found that 44 factors out of the 82 style factors were significantly related to reading score. Due to high correlations among these 44 factors, five of these factors -- number of personal pronouns, number of words per sentence, number of prepositional phrases, and number of difl’erent hard words-- were singled out to be used in the readability formula. Most formula developers used children’s material in the developmental process, which raised validity issues (Klare, 1975). However, Flesch’s, Gray and Leary’s, and Dale-Chall’s formulas were intended for adult materials. Some formulas also yielded grade-level scales. For example, the Fox Index developed by Gunning (1952), the Degrees of Reading Power (which can be rescaled into Grade equivalent units), and Stenner’s lexile scale all yielded grade level estimates of difficulty. Some of these programs and the research underlying them will be discussed in a later section. There are several authors who measured text dimculties without relying on readability formulas: clinical approaches, tests, and cloze proceduresz. The clinical or individual approach was also frequently used as a means of measuring readability (Klare,l963). For example, Dewey (1931) interviewed children to understand the nature and limitations of comprehension in reading history. However, due to subjective judgment 2 Cloze procedure is the deletion of words in a text at stated intervals, in which readers are asked to fill in words correctly (Zakaluk, & Samules, 1988). 15 that is prone to errors, the clinical approach is often used in conjunction with the readability formula. Tests are also used for measuring text difliculties. However, constructing and administering a test is a diflicult and time-consuming process compared to predicting readability. Taylor (1953) developed the cloze procedure, which requires students to fill in blanks of a text that appear after every few words, usually every five words. Klare (1963) criticized the cloze procedure saying that it is not a formula. However, it is a quick and easy testing technique that may be used for developing criteria in the construction and validation of readability formulas. Unlike traditional readability formulas which do not require testing of human subjects to provide readability scores for passages, the cloze procedure does take into account the reader factor (Klare, 1984). However, Carver (1977-7 8) criticized the cloze test because the cloze dificulty estimate depends on the ability level of the particular group to whom the test was administered as well as the difficulty level of the material. Even when an ability adjustment for cloze was developed, it was still an impractical method in many situations because it was always necessary to have a norm group before a language difficulty estimate was obtained (Carver, 1977-78). The most comprehensive exploration of variables was completed by Bormuth (1966). Using correlation and regression, Bormuth (1966) explored more than 100 structural variables. Among them, more than 60 variables were significant in predicting comprehension difficulty of a criterion variable which was measured by the cloze test. According to Pearson (1969, 1974-75), Bormuth’s contribution in the area of readability was significant in that he was able to estimate readability using multiple regression at the level of word (R=0.51), the independent clause (R=0.67), the sentence (R=0.68), and the 16 passage (R=0.93), whereas traditional formulas cannot be reliably applicable to below passage level. In addition to this, Bormuth’s exploration of the parts of speech ratio significantly predicted text difficulty. For example, he found highly explanatory linguistic ratios, such as pronoun/conjunction (r =0.8l), interjection/pronoun (r =0.62), and verb/conjunction (r =0.73). He also used quadratic terms in his regression model and showed the existence of a nonlinear relationship between outcome variables and a predictor. In his study, Bormuth also applied Yngve’s (1960) word depth analysis as a means of measuring sentence complexity. According to Yngve, the notion of word depth comes from mechanical translation of language by electronic computers. Embedded sentences, such as “the cat that the dog chased was gray,” require more memory because the machine has to store information from the beginning of the sentence (the cat) up to the end of the sentence (was gray). However, Bormuth’s use of many variables was not based on any consistent theoretical perspective. Bormuth’s major concern seemed to be in the explanatory power of variables such as sentences length, parts of speech ratio, and depth of words. Pearson’s (1969) summary on the variables found in 31 readability formulas, which was mentioned in Klare (1963), showed that word frequency (18), sentence length measure (17), number of syllables (9), sentence complexity (9), and conceptual measure (10) were widely used. As was seen in many earlier readability formulas, text difiiculties have been measured by semantic and syntactic factors. Among semantic factors, vocabulary difliculty was one of the most significant predictors of text difficulties (Dale, 1965; Davis, 1968; Chall, 1983). As a measure of syntactic difficulties, sentence length or word length has been frequently used. However, short sentences do not necessarily make a text easy 17 to comprehend (Chall, 1958; Klare, 1963; Kintsch, 1979; Pearson, 1969). Besides this, using factors other than semantic and syntactic was not successful in predicting text difficulty. A recent study by Stenner (1997) using the PIAT reading comprehension test showed that the combination of sentence length (the log of mean sentence length) and word fi'equency (the mean of the log word frequencies) explained 85 percent of the variance in the PIAT item rank-order dimculty. However, Stenner’s study did not incorporate the effect of word order or syntax which has been shown to operate somewhat independently of sentence length (e. g., Pearson, 1974-5). Notice also that there are some sentences in which sentence length cannot be a genuine explanatory factor: Ifwe were to scramble the order of words in a sentence, it could be dificult or even incomprehensible even though sentence length had not changed at all. Readability formulas have not had strong theoretical perspectives (Kintsch, 1979), and formulas have been based on apparent, or surface level, text characteristics. For example, Bormuth’s (1964/66) exploration of more than 60 variables which contributed to the variance of the criterion variable, using the cloze test, was not based on a consistent reading theory, although some of these variables seemed quite reasonable and plausible. Before Kintsch’s readability formula (1979), which incorporated some aspects of the psychological processes of the reader, most readability formulas confined themselves to measuring observable text characteristics. Most traditional readability formulas have not directly taken the reader’s ability into account. According to Baker, Atwood, and Duffy (1988), the traditional readability model regards the process of reading as a passive activity, in which the reader decodes the text to obtain meaning. Therefore, reading can be defined in terms of the skills necessary to decode words and sentences. Because 18 reading is viewed as decoding words and sentences, the difficulty of the text is indexed in terms of word (lexical features) and sentence characteristics. Ifliteracy is determined by the reader’s ability as well as the difficulty of the text (Bormuth, 1966), then the earlier formulas are problematic because they do not take into account the reader factor (Kintsch & Vipond, 1979). According to Bruce and Rubin (1988), readability formulas have limitations because formulas do not measure all the factors that influence the comprehensibility of a text. Since existing formulas have measured only one aspect of writing, the difficulty of style, they have not touched content, organization, word order, format, or imagery of writing; nor have they embraced reader factors such as purpose, maturity, or intelligence (Klare, 1963). A good readability score does not mean that the piece of writing was written well. Formulas have not taken into account other elements such as content, or other aspects of style, such as mood. In addition to this, the traditional readability grade level index found in traditional readability formulas produced different results (Bruce & Rubin, 1988). A grade level score for an individual based on a typical reading test means that he/she reads as well as some normative group. Along with this, in the traditional readability study, reading is viewed as a general process independent of domain knowledge. The typical formulas are applied regardless of the nature of tasks, subject, and expertise of reader (Baker, Atwood, Duffy, 1988) However, Kintsch’s readability approach is different. Kintsch (1979) regarded readability “not as immutable property of text, but as the result of a reader-text interaction.” Unlike traditional readability formulas, Kintsch’s model is based on information processing theory. His model came out of empirical observations such as 19 recall or text processing time. In his model there are two given conditions: the reader, who usually has a goal schema to understand the text or at least to find out what is new in it and the text, which is represented as propositions. Examining text as a semantic representation, Kintsch codes the text into a set of propositions or conceptual structures that represent the meaning of the text. Kintsch wanted to identify the process that occurs between input propositions (lowest level) and readers’ goal schema (highest level). The lowest level of propositions is needed to predict a part and the level of the input propositions that people recall. The input propositions construct a coherent network, identifying places where inferences are required to obtain coherence. To predict the summaries that people make of a text, the hierarchical macrostructure is also needed. In this model, information flows both bottom-up and top-down. According to Kintsch, to connect new information with old, readers need to search for old information, which is called reinstatement search. Ifreaders have to make a large number of reinstatement searches and a large number of inferences, then reading will be dificult. Based on this model, Kintsch’s readability formula puts such variables as number of reinstatement searches made by the model in processing the paragraph, the average word frequency, propositional density, the number of inferences, the number of processing cycles, and the number of different arguments in the proposition list. The first two variables-- reinstatement searches and word frequency-- explained most of the variance, but all six variables together explained 97 percent of the variance of the outcome variable, recalling the text. The role of propositions was also investigated by Pearson’s experimental study of the reading process with above average 3rd and 4" grade readers. According to Pearson 20 (1969/ 1974-75), readers do not process a text analytically as was indicated by transformational grarnmarians. Transformational grarnmarians think that if the sentence we read or hear is close to the deep structure (the meaning), then less transformation is applied, which facilitates comprehension. Pearson’s study also did not support the idea of traditional readability studies which show the length of the sentence as a significant index of readability of texts. As was indicated by Klare (1963/1984) and Kintsch (1979), Pearson’s study also implies that reducing the length of a sentence does not necessarily facilitate children’s recall of text. Instead, children try to make a coherent whole when they process text, which is more consistent with propositional analysis. Studies conducted in the psychometric tradition have incorporated both reader’s ability and characteristics of texts. Carver's (197 7) and Stenner’s studies (1997) took into account both the reader's ability and text difficulty. Carver (1977 -7 8) maintained that the prediction of reading comprehension is made by the ability level of the reader and the characteristics of text. In traditional readability studies, ability levels were often scaled using standardized tests and these measures initially were not sealed with respect to the dimculty of the text (Carver, 1978-77). In Carver’s (1977-78) National Reading Standards, each grade ability score on the test (Ga) had been calibrated to reflect a 0.50 probability that an individual can read and understand, or comprehend the passages at the same grade of difficulty (Gd) according to the Rauding scale. The Rauding scale measured the grade difficulty of reading and understanding. A grade 5 ability means that the average accuracy is likely to be 75 percent of grade 5 materials. A choice of a 75 percent target comprehension rate is obtained through empirical evidence (Square, Huitt, and Segars, 1983; Crawford et al., 1975). The theoretical assumption of comprehension 21 in using the Rauding scale is that the rate of reading is constant and the accuracy of comprehension during reading can be predicted from a measure of material difficulty and individual ability. However, the Rauding theory was criticized because it is very mechanical, serial, and not comprehensive. In this sense “the theoretical assumption does not support every day reading phenomena such as skimming and studying (Pearson, 197 7- 78). Stenner's (1997) study on the Lexile framework (reading comprehension scale) also took into account both the reader's ability and text difficulty. In order to obtain generalizability, that is, the scale of a single object being independent of conditions, scores obtained from different test administration should be tied to a common zero (anchor). To obtain general objectivity, theoretical logit difficulties obtained were transformed to scales that could be compared to each other without ambiguity. Measurements for all persons and all texts are reportable in a Lexile framework. Some studies which investigated developmental aspects of children’s reading used grade appropriate assessments using a cross-sectional design. These studies employed linear models using GE (grade equivalent) scores that were extrapolated beyond the grade that were actually assessed (Klare, 1984; Chall, 1970) 3. However, no studies have been done to measure both the rate (acceleration/deceleration) of readers’ ability and the relative importance of each text characteristics over time using reading materials that can accommodate a wide range of readers. Gray and Leary’s (193 5) and Bormuth’s (1964) studies provided evidence that 3 Extrapolation beyond the grade level that was used in the criterion measure is not a valid assessment. 22 linguistic variables do not predict comprehension difficulty equally well for subjects with different levels of achievement. Besides, Draper et al.’s (1971) study and Chall et al.’s (1990) study indicated that vocabulary explains text difficulty better at more advanced than at early stages of reading development. This present study investigated how the importance of the psycholinguistic/linguistic characteristics of text changes across years. In this study, in addition to the most popular variables--fiequency of vocabulary and length of sentence--propositional density was used to investigate the significance that propositions play in the readability formula at each time point. Reading Development A Developmental Perspective Chall (1983) categorized six developmental stages, from stage 0 to stage 5, which characterize prototypical reading development. According to Chall, stage 0 is a prereading stage covering birth to age 6. At this stage a child gains some insight into the nature of words before going to school. Stage 1 is an initial decoding stage covering grades 1-2 (6-7 years old). A child associates arbitrary letters that they learn with the corresponding parts of spoken words. Stage 2 covers grades 2-3 (7-8 years old). At this stage, the child reads not for gaining new information, but for confirming what is already known. Children pay attention to the printed words, usually the most common and high frequency words. Stage 3 reading is also characterized by the growing importance of word meanings and of prior knowledge. This stage is composed of two phases: Phase 1 of stage 3 covers grades 4-6 (9—11 years old) and children develop the ability to read beyond an egocentric purpose, reading texts that convey conventional knowledge of the 23 world. Phase 2 of stage 3 covers grades 7-8 (12-14 years old): This stage brings readers close to the ability to read on a general adult level. Stage 4 reading is characterized by a child’s capacity to adopt multiple viewpoints. This stage covers high school grades (14- 18 years old). Stage 4 is mostly acquired through formal education. Stage 5 covers college level and is characterized by construction and reconstruction of a world view. Since the NLSY children in this study undergo three reading developmental stages, starting fi'om stage 1 to stage 3, it provides a great opportunity to investigate beginning reader’s reading development although it must be conceded that the PIAT does not lend itself to even a weak test of the validity of Chall’s stage theory. Contextual Variables The 1994 NAEP (National Assessment of Educational Progress) reading assessment shows that contextual influences, such as school and home environment, afl’ect children’s reading proficiency. However, it is assumed that the effect of these contextual variables may differ as a firnction of the developmental level of children. Luster and Dubow’s (1992) study of environmental factors on children’s verbal intelligence shows that the effect of environment changes depending upon the children’s developmental level. Evidence from an adoption study by Plomin and Daniels (1987) also indicates that the effect of shared home environment is reduced as children grow older, while the effect of the non-shared environment, such as schooling effects, becomes greater. In this sense, a developmental study is needed to investigate difl’erential effects of contextual factors. To understand the effect of changing home environment on children’s reading abilities, home cognitive stimulation score will be used. Because of access to the larger NLSY database, 24 the effect of other intra-individual factors such as gender, race, verbal memory, and testing time, will be investigated. Operationalization of the Factors in the Present Study Building on information-processing theory, this study will investigate both factors that are internal to the text, such as linguistic and psycholinguistic variables, and factors that are external to the text, such as individual differences among readers. Children’s internal characteristics, such as verbal memory, are used in order to investigate the pattern of reading development, while controlling for their efl’ect on the growth of children’s reading ability. Understanding children’s reading development is related, at least indirectly, to the item development process underlying the PIAT. A better understanding of children’s reading development would be one of the essentials for selecting and constructing the crucial subtest and its items. Norm referenced tests could benefit from a better knowledge of the qualitative changes in reading (Chall, 1983). Although the PIAT reading comprehension test has certain limitations, especially because the text of each item consists of one single sentence, it will also show various characteristics that children face in understanding texts at different time points. Thus far I have discussed information processing theories, linguistic and psycholinguistic correlates of text difliculty, particularly as they are related to readability formulas and matters of reading development, as they are reflected in individual differences among children. The statistical models used in the current study permit me to investigate each of these potentially important sources of variation. For example, the 25 level-l model represents the nesting of test items within each occasion (3 time points across 4 years) and affords the evaluation of linguistic/psycholinguistic variables; the level- 2 model represents the nesting of occasions within a child which measures pattern of development and changing environmental effect on a child’s reading development; and the level-3 model represents the intra-individual characteristics. By building the model from a lower to a higher level, I can investigate how the importance of each variable changes across occasions; how individual children’s reading ability changes due to the changing environmental characteristics; and how time-invariant individual characteristics influence the development of an individual child’s reading ability. 26 CHAPTER 3 METHODOLOGY Subjects The subjects for this study are 477 children from the National Longitudinal Survey of Youth (NLSY) data set, chosen based on age and scores on the PIAT Reading Recognition Test. Children’s ages ranged fiom 6.0 years to 6.11 years in 1988. There were 220 boys and 257 girls, among them, 89 Hispanic children, 153 Black children, and 235 non-Black- non-Hispanic (White) children. The children’s responses to reading comprehension items were observed over three time points, approximately every two years, 1988, 1990, and 1992. Those who scored over 15 on the PIAT Reading Recognition Test were given the Reading comprehension test. These are the children in the sample for this study . According to Chall’s (1983) developmental scheme, which divides children’s reading development into six stages ranging from 0 to 5, the NLSY children in 1988 would be roughly categorized into stage 1, and can thus be defined as beginning readers. Children who took the PIAT Reading Comprehension tests were the offspring of individuals selected for the National Longitudinal Survey of Youth (NLSY ’79) project. The NLSY mothers have been interviewed annually since 1979, when they were 14 to 21 years of age. The NLSY ’79 child sample, when weighted, represents a cross-section of children born to a nationally representative sample of women who were between the ages of 29 and 36 on January 1, 1994 (NLSY, 1997). It is estimated that the children in the sample typify approximately the first 70 to 75 percent of children born to the 27 contemporary cohort of American women (NLSY, 1997). The original NLSY ’79 sample included 6238 women in 1979, 456 of whom were in the military at that time. However, none of the subjects in this study were from these mothers because most of them were dropped before my data collection. In addition, children born to the economically disadvantaged White women were not available because of financial constraints of the NLSY project. Every two years from 1986 to 1994, a series of assessments were administered to the children of NLSY mothers as a means of measuring the children’s cognitive ability. Children of Hispanic, Black, and non-Hispanic and non-Black (White) ethnic groups of both sexes were investigated for this study. Data up to 1992 were gathered primarily in person using paper and pencil assessment techniques. However, information about children’s item responses was not available in the 1986 data. Also, due to large attrition, the 1994 data were not included in this study. Thus, the result can only be generalized to the population with the above characteristics. Outcome Measure General Characteristics of the PIA T Reading Comprehension Test The Reading Comprehension test in this study is one of five subtests from the Peabody Individual Achievement Test Battery: Mathematics, Reading Recognition, Reading Comprehension, Spelling, and General Information. However, the NLSY data has information only about three subtests: Mathematics, Reading Recognition, and Reading Comprehension. The PIAT Reading Comprehension test was designed for children in kindergarten through grade 12. It was originally intended for children scoring age 5 years and over on Peabody Picture Vocabulary Test (PPVT) and at least 19 on the Reading Recognition assessment. 28 Interviewers in the NLSY study administered the PIAT Reading Comprehension tests to children whose Reading Recognition score was over 15. Scores were calculated by deducting the number of incorrect responses from the ceiling item number--the highest numbered (in a sequences from easy to hard) item that the child missed. Children who scored less than 19 on the Reading Recognition test were assigned their Reading Recognition score as their Reading Comprehension test score. Total raw scores ranged from 0 to 84. The PIAT Comprehension test item number ranges were item number 19 to item number 84 (total 66 items). The PIAT Reading Comprehension sub-test measures children’s ability to derive meaning from sentences that are read silently (Dunn & Markwardt, 1970). Item construction was based on the assumptions that “reading is the facility to derive meaning from printed words” (Dunn & Markwardt, 197 0) and that the effective reader can retain the meaning after exposure to the illustrations in the absence of the passage. Thus, the PIAT Reading Comprehension Test is highly memory dependent. The individually administered test is composed of 66 one-sentence items of increasing difficulty. According to Dunn and Markwardt (1970), difficulty is based on sentence complexity, vocabulary, and sentence length. The child silently reads a sentence displayed on a separate page, the interviewer shows the child four pictures on the other side of the page, and the child is asked to select the correct picture. The PIAT Reading Comprehension test is a recall type of reading comprehension assessment because the children are asked to select, without reading the text again, the one picture that best depicts the sentence. In other words, the PIAT Reading Comprehension test depends heavily on short term memory and attention. It is a combination of a time and power test 29 (Nunnally, 1978) in that children are encouraged to respond to each item within 30-40 seconds, although Dunn and Markwardt intended this to be a power test. The PIAT Reading Comprehension has no written directions for the children to respond to each item. In this aspect, the PIAT reading comprehension test eliminates some problems related to validity that might arise from the gap between text understanding and question understanding, as found in other types of reading comprehension tests. Due to misinterpretation of directions or questions in some tests, children may not respond to questions correctly although they understand the body of the text. The PIAT Comprehension test is an adaptive test. Complete responses to all items are seldom, if ever, collected. Items are arranged in ascending order of difficulty with the easiest questions being comparable to kindergarten or first-grade level. None of the children attempt all of the items. Instead, interviewers test children with the items in the children’s critical range by constructing a basal level and a ceiling for each child. A basal level is derived from a series of correct responses, and a ceiling is determined from a series of continuous errors. The basal level is determined by finding the highest cluster of five consecutive items answered correctly. The lowest numbered item in that cluster is designated as the basal item. Most coders for this NLSY data actually coded the highest item number in a set of five consecutive correct items as a basal item. However, this coding mistake did not make any difi’erence in imputing missing values below basal item number. The ceiling is obtained by continuing to present increasingly challenging items, until the subject had made a total of five consecutive errors. The last item missed in the set of five is regarded as the ceiling item. In contrast to the errors made by coders for 30 basal items, most coders applied the procedures for determining ceiling items appropriately. This process is illustrated in Figure l, where the basal range is from the item 22 to the item 26, and the basal item number is question 22. Ceiling range is from item 31 to item 35 and the ceiling item number is 35. Item# Score lmputation 1 9 1 lmputation 20 1 lmputation 21 lmputation *Basal item# 22 23 24 25 26 27 28 29 30 31 32 33 34 *Ceiling item# 35 36 lmputation 37 lmputation 38 lmputation 39 lmputation : lmputation 84 lmputation oooco-LO-IO-l—h-L‘AA Figure 1. Information about forming ceiling and basal items where score = l is correct and score = 0 is incorrect. 31 Information about the basal and ceiling items is available with the NLSY data (information about basal item number is not available in 1988). However, some mis- coding also occurred on the information about basal and ceiling item number. Partly because of the PIAT interviewers’ mis-coding, information on ceiling number and basal item number is not always correct. Subsequently, I corrected them for the purpose of imputation. For this study, all the raw reading comprehension item responses were checked one by one to establish ceiling and basal levels for the imputation. Ifthere was no clear-cut information on forming basal and ceiling, the item responses were imputed as missing. However, while recoding this, I found out that some interviewers did not assess children on enough reading comprehension items, and some interviewers gave more opportunities to respond than the procedure calls for. Especially in 1988, interviewers did not give enough opportunities to form a ceiling partly because they could not form basal levels in many cases. Because of many missing item responses outside of actual item responses, the raw data information was consulted in order to impute scores. Irnputations on the items below the basal question (the lowest numbered item in the lowest set of five consecutively answered correct response) were made by assuming that children would answer all lower level items correctly (imputed as 1). lmputation on the items beyond the ceiling item number was accomplished by regarding these to be wrong (imputed as 0). Since the PIAT test is a multiple choice test with four options, if children are given an opportunity to respond, the probability of children’s making a correct response by blind-guessing is 0.25. To solve this problem of unequal opportunity, responses to the untried itenrs beyond the top-most difficult item were assigned by randomly generating the real numbers between 0 32 and 1. If the randomly generated number was greater than or equal to 0.75, the item was imputed as correct (1); otherwise, it was imputed as incorrect (0). Validity and Reliability of the PIA T Reading Comprehension The reading comprehension subtest of the PIAT is generally considered to be a highly reliable and valid achievement test, and has been extensively used for research purposes (NLSY, 1992). Because of the format and the high probability that any given child will not complete the entire test, test-retest reliability is the only viable index available to evaluate consistency. According to Dunn and Markwardt (1970), the median test-retest reliability was 0.65 (ranges from r =0.61 to 0.78) and standard errors of measurement for raw scores on selected grade levels ranged from 2.48 (grade 1) to 7.39 (grade 8), which implies that the PIAT is not so reliable for measuring older children’s reading abilities. Dunn and Markwardt (1970) defined reading as a functional ability, the facility to derive meaning from printed words. The reading comprehension test construction was not based simply on finding the meaning of individual words, but on the ability to comprehend passages in context. Although the passages are composed of single sentences of varying length and difficulty, they have content validity, covering kindergarten to grade 12 reading levels. Bormuth’s study (1966) also validated the efficacy of assessing sentence-level reading comprehension using multiple correlation with other predictors (R=0.68), Item discrimination and difficulty indices were used for the PIAT. For each item, a curve was drawn showing the percentage of children passing at each successive grade level. Items were retained that showed the sharpest curves, and were placed at the grade level where 33 approximately 50 percent of the subjects passed. Internal consistency was built in by selecting items that correlated most highly with the total score. Concurrent validity was assessed by examining the correlation between the Peabody Picture Vocabulary Test and the PIAT Reading Comprehension Test. The correlation coefficients ranged from 0.42 to 0.70 across different grade levels. This version was normed in the late 1960s and renorrned in 1990. Norms, however, are not a major consideration in this study because raw score growth patterns rather than normed scores are the primary data of interest. Model and the Predictor Variables In this study, to understand the nature of growth in reading comprehension, a three-level hierarchical generalized linear model (HGLM) (Bryk, Raudenbush, & Condon, 1996) was used. Item responses (level-1) were considered as being nested within testing occasions (level-2) and testing occasions as being nested within individuals (level-3). Since children took the same test on three occasions, each item was nested within each time point (occasions). In addition, the time (in month) that children took the test varied and sometimes occasions (frequency of taking the test) also varied, so it can be considered that time points were nested within individuals. In this study, ability and text characteristics were put into the model. However, here the scores on children’s abilities were not obtained directly, but abilities were regarded as an intercept in the HGLM model, when all the text characteristics and other contextual effects were controlled for. By building the model in this way, this study investigated how the level of intra-individual characteristics influence the importance of each item variable over time. In addition, 34 individual reading ability was observed while controlling for changing home environmental factors. Also, children’s reading abilities and the importance of each psycholinguistic variable at each time point were observed while controlling for the time-invariant individual characteristics at the level-3 model. The HGLM can assess the probability of binomial data, which the hierarchical linear model (HLM) cannot estimate. In addition to this, the hierarchical model affords investigation into the contextual effects that influence individual development (Bryk & Raudenbush, 1992). Although the NLSY data contains some missing values, the HGLM can deal effectively with the problem of missing values in the level-1 model. In the case of level-2 and level-3 models, the HGLM program does not allow missing data. For cases in which there were missing value for level-2 or level-3 variables, scores were imputed for each subject based on existing information. In a later section, this procedure will be discussed in detail. The level-1 model examines item characteristics, and seeks to explain performance by references to the linguistic features of the items. The level-2 model estimates the patterns of grth by examining performance across occasions, in other words, by putting time factors into the model. The level-3 model incorporates the intra-individual characteristics, such as gender, race, and verbal memory. The goal of this analysis is to find the probability, p97,, of a correct response by child k at one particular occasion j on an item i with specified characteristics. Since the outcome of the PIAT reading comprehension item was binomially distributed (Bernoulli distribution), a transformation of the probability of responding (the 35 log-odds of response) was used. Because of the nature of the distribution of the dichotomous outcome using the logit model, the probability can be estimated more reasonably. Iflogit is a linear function of other variables, the outcome, p1,}, is a nonlinear, S—shaped firnction with the probability range between 0 and 1 (Hamilton, 1992; Bryk, Raudenbush, and Condon, 1996). Level-1 Model: Item Text Characteristics The level -1 model in HGLM consists of three parts: (a) a sampling model, (b) a link function, and (c) a structural model. The sampling model in level-1 HGLM is as follows: 1) Yuklpy‘k "’ B (”aka Py'k) It denotes that Yak has a binomial distribution with "yr trials and probability of making correct response, Pg}. Yuk is 1 if a person k’s response on the item i at time point j is correct; Yuk is 0 if a person k’s response on item i at time point j is incorrect. According to the binomial distribution, the expected value and variance of Y”; are 2)E(Yu‘k1Prk)= "y'kPrk, Var (YykIPy‘k) 2 ”at Paw-Pair )- When the ”y'k =1, ng takes on values of either zero or unity which is a Bernoulli distribution. Unlike the Hierarchical Linear Model (HLM), the HGLM allows estimation of models both 1),-,1, =1 (Bernoulli case) and 1),-,1, >1. For the Bernoulli case, the predicted value of the binary outcome, Yg-k is equal to the probability of making a correct response, Py‘k =u,-,-,,, When the level-1 sampling model is binomial, the HGLM uses the logit link firnction. 7m= log (Pg/J1" PM). 779;. is the log of the odds of making a correct response. 36 While P,-,-k is constrained to be in the interval (0,1), 1),-,1, can take on any real value. Predicted log-odds can be converted to predicted probabilities by computing Pijk =1 (1 + exp"""""") Thus, whatever the value of mi}, this procedure will produce a Pg-k between zero and one. 3) 7797, = Pojk'i' P1,k(sentence length),-jk+ sz-Mvocabulary frequency),-,-k+ Here, Pork 3 ngk3 P 3,r(propositional density)” ability of a child k, at time point j, controlling for item level sentence characteristics effect of sentence length of child k at time point j, controlling for other sentence characteristics of item effect of vocabulary fiequency of child k at time point j, controlling for other sentence characteristics effect of propositional density of child k at time point j, controlling for other sentence characteristics of item At level-1, the probability of child k ’5 response to a certain item is the firnction of item characteristics such as sentence length, vocabulary fi'equency, and propositional density. These variables were grand-mean centered (the mean of the average of each predictor), so that P0,], is the probability in log-odds that a child answers an average item correctly when all item characteristics are controlled. P0,;- can therefore be considered a measure of ability on the log-odds metric. 37 Three variables were selected because research studies (See chapter 2) show the selection of these variables as appropriate. Vocabulary difliculty and sentence length are the most widely used variables in readability formulas. Stenner’s (1997) study of the PIAT reading comprehension test shows that log of the mean sentence length and the mean of the log word frequencies combined explain 85 percent of the variance (r = 0.92). As some previous studies (Shankwiler & Crain, 1986; Stenner, 1997) indicated, the correlation between item rank-order difficulty and sentence length was the highest among the linguistic/psycholinguistic variables. The correlation between item rank order difficulty and sentence length was 0.91 (R2 = 0.83). For this study, I selected sentence length, vocabulary frequency, and propositional density. Because the raw data were not as skewed as when I log transformed, using raw data, I found that the correlation between item rank order difficulty and sentence length was the highest among all the linguistic/ psycholinguistic variables that I used for this study (r = 0.91). Sentence length ranged from 5 to 31 words. The average sentence length was 14.04. Vocabulary difficulties were measured by the Standard Frequency Index (SFI) based on the total corpus used in the Educator’s Word Frequency Guide (EWFG) (Zeno, Ivens, Millard, & Duvvuri, 1995). The most frequently used words received high values in the SFI. Instead of using either high or low SFI in a sentence, mean SFI was used for this study. Mean SFI reflects a more contextual efl‘ect compared to words with either low or high SFI. Since it is possible to understand a text without knowing the meaning of every single word, I used average word frequency in measuring vocabulary difficulty. In 38 the EWFG Corpus, observed SFI values ranged between 3.5 and 88.3. In the PIAT the range of SFI values was from 20.8 to 88.3. Derivative words which were not found in the EWFG manual were assigned the lowest value of the words from the same origin. Compound words were treated as one word. The mean of average vocabulary difficulty was 63.67 and the average vocabulary difiiculty ranged from 49.45 to 72.70. The correlation between item rank order difficulty and SFI average was 0.66 (R2 = 0.44). Proposition analysis was based on Kintsch (1974). According to Kintsch, propositions represent ideas and language expresses propositions. A proposition contains a predicator and n arguments (n21). Because it was assumed that longer sentences have more propositions, there might exist a high correlation between length of sentence and number of propositions. In fact the correlation between sentence length and number of propositions was 0.92. Therefore, to avoid the problem of multicollinearity, propositional density--obtained by dividing the number of propositions by the number of words in a sentence--was used. The correlation between rank order and propositional density was 0.11. The number of propositions ranged from 1 to 13 and the propositional density ranged from 0.11 to 0.67. Indefinitives, such as both, every, some, any, and everything were not analyzed as a predicator. For example, the following sentence has two propositions: The postman must carefirlly measure every package. (1) (measure, postman, package) (2) (carefirlly, 1) 39 In addition, the genitive cases (e.g., my, your, his) were not analyzed as forming a meaning unit: Try kicking your feet in the brook. (I) (kick, you, foot) (2) (try, 1) (3) (place: in, 1, brook) Also verbs in idiomatic expressions were analyzed as one unless it had a unique meaning in the sentence: A windstorrn is making a ruin of the cottage. (1) (ruin, windstomr, cottage) However, since I only counted the number propositions to obtain the propositional density (number of propositions + length of sentence), the method of counting the number of propositions did not unveil distinctive meanings as was seen in the following examples. The following sentences have the same number of propositions, but the meanings were totally different: (1) A dog bites a man. (bite, dog, man) (2) A man bites a dog. (bite, man, dog) Level-2 Model: Age and Cognitive Stimulation Score In the level-2 model, the level-1 parameters such as the constant (intercept) and variable coeffrcient (slopes) are modeled as a firnction of time, which was measured by age 40 in months at three time points. The value of age was centered around the grandmean, so the estimate of the intercept, Bank, 3101,, B 20k, and 8301,, will be approximately the predicted value for a child k at time-point two (at about 8.5 years old). At this level, each parameter (coefficient) from level-1 becomes an outcome. PW, = 800k + BOIk(age linear)”. + Bozdage quadratic»), + 803;.(00gnitive stimulation», + R0,). P, j]. = B101, + B11k(age linear),k + B12k(age quadratic». P2];( = B 20,. + 821;,(age linear) ,7. + B 22;.(age quadratic». P3,), = B 30;. + B 31k(age linear),-k + B 32;,(age quadratic); Boo]; expected ability of individual child k at age 8.5, controlling for cognitive stimulation score and sentence characteristics of items such as length, vocabulary, and density 80”.: linear growth rate of child k’s ability at age 8.5 on a typical item, controlling for cognitive stimulation score 8021,: acceleration effect of child k ’s ability on a typical item, controlling for cognitive stimulation score 3031,: effect of home cognitive stimulation score for child k on a typical item, at age 8.5 B 10,, I average effect of sentence length for child k, at age 8.5, controlling for cognitive stimulation score and the other sentence characteristics of items BM: linear effect of age (growth rate) on sentence length slope at age 8.5 for child k controlling for all the other variables 41 812k : acceleration effect on sentence length slope for child k, controlling for all the other variables 8201,: average effect of vocabulary fi'equency slope for child k at age 8.5, controlling for cognitive stimulation score and the other sentence characteristics of items B; 11,: linear effect of age on vocabulary frequency slope at age 8.5 for child k, controlling for all the other variables 132er acceleration effect on the vocabulary frequency slope for child k, controlling for all the other variables BM: average effect of propositional density slope for child k at age 8.5, controlling for cognitive stimulation score and the other sentence characteristics of items B 3er linear effect of age on the propositional density slope for child k at age 8.5, controlling for all the other variables B32}; acceleration effect on the propositional density slope for child k, controlling for all the other variables Using the level-2 model, this study can measure whether and how the importance of the item characteristics changes across occasions. Earlier readability research suggested the advisability of examining the effect of these variables at difi’erent ages. Gray and Leary’s (1935) and Bormuth’s (1964) studies provided evidence that linguistic variables did not predict comprehension difficulty equally well for subjects with different levels of achievement. Besides, Draper et al. (1971) and Chall (1990) indicated that at the early stage of reading development, knowledge of vocabulary did not explain text difficulty as effectively as it did at the advanced level of development. This study 42 investigated how the importance of the psycholinguistic/linguistic text characteristics changes across years. In the HGLM level-1 model, the intercept, which represents the individual reading ability, varies randomly. Because it can change across occasions, using the HGLM model, changes in individual ability can be estimated across occasions. By incorporating quadratic terms, this model can estimate the nature of growth more realistically, looking for both linear increments and non-linear spurts and valleys in growth. According to Chall (1970) and Klare (1984), most readability formulas use linear regression equations, which may not capture the true growth pattern. Bormuth (1964/66) suggested the use of nonlinear models in building readability formulas. By including both linear and quadratic terms in the level-2 model, since the observations were made at three time points, it is possible to investigate whether or not reading ability and the effect of linguistic/psycholinguistic variables change linearly or curvilinearly. Also, with this model the rate of growth across adjacent occasions can be assessed. In addition, since early reading development is influenced by environmental factors such as interaction with parents, I investigated whether any linear or curvilinear trends remain after controlling for the home environment at each occasion. Environmental factors are not the major focus of this research because many existing studies have demonstrated these effects already. Nonetheless, these interactions with the variables of interest are important because they might modulate any interpretations I might wish to make about the target variables. If level-1 represents micro-level text process, level-2 represents the developmental aspect across time. By incorporating changing environmental variables at level-2 and other time-invariant individual variables at level-3, 43 this model assesses patterns of reading development and the perceived difliculty of text characteristics. There were no missing values in the 1988 data because age was one of the criteria in selecting subjects. However, for the missing value in the level—2 age linear term, imputation was conducted in the following manner: Afier estimating a regression equation using 1988 data as independent variables, the standard error of regression was used to generate a random error term which was added to the predicted values for missing values in 1990. To complete the imputation for 1992, I used the same method, estimating a regression equation based on 1990 data and added a random error term generated from the standard error of regression. Therefore, imputation for 1990 and 1992 children’s age of taking the test was conducted without changing the nature of distributions. The value of age quadratic term was obtained by squaring the age linear term. Another level-2 variable, home cognitive stimulation score, is a composite of variables, including number of books that the children have, information on the frequency of parents’ reading to the children, and number of hours watching TV. To replace the missing values, imputation was conducted by using both total home scores and cognitive stimulation scores from the other two years as predictors. The home score was the combination of the home stimulation and home emotional support scores. Probably due to some coding errors, there were some cases in which information on home cognitive stimulation was missing, but information on the home score was available. As indicated in Table l, the correlation between the total home score and home cognitive stimulation score within each year was over 0.85; higher than that of adjacent year’s cognitive 44 stimulation scores. Thus, we used the total home score information first for imputing missing values, and then we used the cognitive stimulation score of the other two years as predictors for predicting expected outcomes. I used the home score information first for imputing missing values. Then, I used the other two years as predictors for predicting expected outcomes. Using the same method as above, a regression equation was estimated including a random error term to impute predicted values for missing data. Table 1 Correlations Among Cognitive Stimulation Scores and Total Home Scores Cogsti 88 Cogsti 90 Cogsti 92 Home 88 Home 90 Home 92 Cog Sti 88 1.00 Cog Sti 90 0.59 1.00 Cog Sti 92 0.55 0.66 1.00 Home 88 0.85 0.59 0.54 1.00 Home 90 0.56 0.86 0.63 0.64 1.00 Home 92 0.52 0.59 0.86 0.57 0.66 1.00 Level—3 Model: Child Characteristics In level-3, time invariant child characteristics such as gender, race, the initial test month, and children’s verbal memory pretest were used to investigate the efi’ect of each variable on the children’s ability growth. In order to investigate the effect of verbal memory on the importance of each sentence characteristic slope, and to investigate the effect of verbal memory on the rate of change of each linguistic/psycholinguistic slope, verbal memory was used for both intercept of each linguistic/psycholinguistic predictor 45 and rate of change slope predictor of each variables. At this level, each parameter (coefficient) from level-2 becomes an outcome: Bow, = G000 + Goo/(test month), + Gooz(sex)k + Goo3(verbal memory)k+ G004(Hispanic)k + Gm5(black)k + U0]. Bo];r = G010+ Gal/(verbal memory), 302k = Gaza 803;. = G030 B 10k = G100 + G101(verbal memory), B,” = G110 + Gm(verbal memory), B 12k = G120 820k = G200 + 0201(verbal memory)k BM. = G210 + G211(verbal memory)k 3221—: G220 B301, = G 300 + G 301(verbal memory), B3“, = G310 + G31,(verbal memory). B321. = G320 G000: expected ability of a typical child at age 8.5, controlling for gender, race, the initial test month, verbal memory, text characteristics of items, and cognitive stimulation score G001: effect of the initial test month at age 8.5 on child k’s ability, controlling for all the other variables 00023 gender gap in ability at age 8.5, controlling for all the other variables (boys are coded as 1 and girls are coded as 0). 46 (;0033 60042 G005: (la/03 (3011? (lozol (lo301 (31003 (;101: (;1103 (Tl/13 effect of verbal memory at age 8.5 on child k’s ability, controlling for all the other variables adjusted mean ability differences between Hispanic and White children at age 8.5 (Hispanic children are coded as 1 and others are coded as 0), controlling for all the other variables adjusted mean ability differences between Black and White children at age 8.5 (Black children are coded as l and others are coded as 0), controlling for all the other variables average growth rate in ability at age 8.5, controlling for all the other variables verbal memory efi’ect on growth rate of child k’s ability at age 8.5, controlling for all the other variables average acceleration of ability, controlling for all the other variables average effect of home cognitive stimulation score at age 8.5, controlling for all the other variables average effect of sentence length at age 8.5, controlling for all the other variables average effect of verbal memory on sentence length at age 8.5, controlling for all the other variables average linear grow rate effect of sentence length at age 8.5, controlling for all the other variables average effect of verbal memory on sentence length growth rate at age 8.5, controlling for all the other variables 47 (;1201 (lzooi (32013 (32103 (;211: (lzzoi (l3ooi (l3013 (ls/02 (;311: (lszoi average acceleration on sentence length, controlling for all the other variables average effect of vocabulary frequency at age 8.5, controlling for all the other variables average effect of verbal memory on vocabulary frequency effect at age 8.5, controlling for all the other variables average linear growth rate effect of vocabulary frequency at age 8. 5, controlling for all the other variables average effect of verbal memory on vocabulary frequency grth rate at age 8. 5, controlling for all the other variables average acceleration on vocabulary frequency, controlling for all the other variables average effect of propositional density at age 8.5, controlling for all the other variables average effect of verbal memory on propositional density at age 8.5, controlling for all the other variables average linear grth rate effect of propositional density at age 8.5, controlling for all the other variables average effect of verbal memory on propositional density grth rate at age 8. 5, controlling for all the other variables average acceleration on propositional density, controlling for all the other variables 48 U01: random effect associated with an individual child k at age 8.5, controlling for initial test month, sex, verbal memory, race, and home cognitive stimulation score Initial test month at level-3 was used to investigate whether there exist any other environmental efi’ect on the assessment. The range of the month of taking the test in 1988 was May to December, with 97 percent of children taking the test between June and October. Verbal memory, which was assessed around two years before the collection of the PIAT item responses, was used because it has been shown to be a good indicator of children’s cognitive development, especially language learning. A study with Spanish children using the McCarthy Verbal memory sub-scale (McCarthy, 1972) showed a moderately high correlation with reading achievement (from r =0.43 to r =0.57). Verbal memory also correlated with the PIAT Reading Recognition (r =0.59) and the PIAT Reading Comprehension (r =0.39). Verbal memory was also correlated (r =0.42) with vocabulary knowledge (PPVT-R), an indicator of verbal intelligence (Baker et al., 1993). In addition, Baddeley et al.’s (1975) study of the effect of articulation on retrieval indicated that the phonological loop in working memory was the key gateway to verbal memory. Older children articulated more rapidly than younger children, and the repetition of words prevented the decay of information from the phonological store. Thus, this articulation speed was directly related to recall. Because many if not most children in this study were in the decoding stage of reading at the beginning of data collection in 198 8, this study indirectly investigated the effect of verbal memory on children’s reading abilities. 49 As I indicated in chapter 2, verbal memory was assessed roughly two years before, the PIAT item responses were collected. The correlation between the month of taking the verbal memory and the verbal memory score was low (r=-0.167, n = 448). lmputation for missing value was also conducted by adding randomly generated errors to the mean. The selected verbal memory subtest for assessing the NLSY children is only one part that forms the complete McCarthey assessment battery. Verbal memory was administered by first asking the child to repeat words or sentences said by the interviewer. The child listens to what the interviewer says and retells words or sentences. There are three parts in the verbal memory subtest: In part A, a child repeats a series of words, ideally in the same sequence. In part B, a child repeats key words. Based on the combined score of parts A and B, Part C--story telling--is administered. Since there are many missing values on part C due to a low score in the combined score of part A and B, I used a standardized combined score of A and B for this study. Verbal memory in the level-3 model was used as an intercept (child’s reading ability) predictor. The development of children’s reading abilities were observed while controlling for verbal memory along with other intra-individual variables. Research Questions 1. Do children’s reading abilities change at a constant rate? a) Do abilities increase at constant or variable rates over time? b) Do changes in reading abilities differ across individuals? 2. How does the importance of each linguistic/psycholinguistic variable change as children grow older? 50 Is the rate of change for each linguistic variable constant or variable? 3. How do individual children’s characteristics such as verbal memory interact with text characteristics, such as length of sentence, vocabulary frequency, and propositional density? a) Does the effect of sentence length on reading comprehension depend on children’s verbal memory? b) Does the effect of vocabulary frequency on reading comprehension depend on children’s verbal memory? c) Does the effect of the propositional density on reading comprehension depend on children’s verbal memory? 4. How do contextual factors influence children’s growth in reading? To what extent does children’s growth in reading depend on a) verbal memory? b) home environment? c) race? b) gender? d) the initial test month? 51 Summary By considering items as being nested in occasions, and occasions as being nested in individual child, a three-level HGLM was constructed. The model was used to investigate the patterns of importance of text characteristics along with the patterns of individual child’s reading ability. For level-1, such predictors as sentence length, average vocabulary frequency, and propositional density were included. The selection of level-1 predictors was based on information processing theory. To understand the pattern of development over years, three predictors such as age linear, age quadratic, and home cognitive stimulation were also included. In addition, the effect of intra-individual factors on children’s reading abilities were investigated. Indirectly, this study investigated the possible source of variance that each cluster of variable explained. 52 CHAPTER 4 RESULTS This study was conducted to understand how the characteristics of test items that interact with a child’s background shape our beliefs about growth in reading. More specifically, this study was an investigation of the developmental patterns of children’s reading abilities and the changing patterns of the importance (effect) of linguistic and psycholinguistic variables in the texts children encountered. In addition, several other factors that might conceivably influence young children’s reading abilities, such as characteristics of individuals and characteristics of the contexts in which children learn and develop, were investigated. By building a three-level hierarchical generalized linear model (HGLM), a strong test of this developmental model was possible. The level-1 model represented item characteristics, the level-2 model represented change over time, and the level-3 model represented characteristics of individuals. The following analyses were based on the results for 466 six-year-old children out of 477 who scored more than 15 in the PIAT reading recognition test. Due to missing information on reading comprehension responses, 11 cases were deleted automatically when a three-level HGLM was run. Patterns of Children ’s Reading Ability In order to answer how children’s reading abilities change over time, a model (see Figure 2) with both linear and quadratic terms was constructed after building a level-1 model with the three variables, length (sentence length), frequency (average vocabulary frequency), and density (propositional density). Since this study was intended to 53 investigate a typical child’s grth pattern over time, the results, which are presented in Table 2, were based on the unit specific model.4 Level-1 Model PfOb