CHILDREN’S ACQUISITION OF TONE 3 SANDHI IN MANDARIN

By

Chiung-Yao Wang

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
LINGUISTICS
2011

ABSTRACT
CHILDREN’S ACQUISITION OF TONE 3 SANDHI IN MANDARIN
By
Chiung-Yao Wang

The purpose of the dissertation is to examine Mandarin-speaking children’s acquisition of a
syntax-dependent phonological rule Tone 3 Sandhi (T3S). A Tone 3 (low dipping tone) is
changed to a Tone 2 (mid rising tone) when it is followed by another Tone 3. Application of T3S
in fact involves a complex process. In setting up the prosodic domains within which T3S applies,
syntax is partially referred to. Cyclic and non-cyclic parsing strategies are used for different
syntactic contexts. A non-cyclic strategy is used for flat structures (e.g. digit sequences), a cyclic
strategy for NPs, and a mixture of both strategies is necessary for sentences. There is also T3S
variability because of T3S optional rules. Such variability creates ambiguity in the language
input for children. Very little is known about how children acquire T3S. The current work aims
to bridge the gap between T3S theories and child language acquisition. This dissertation presents
five studies, targeting children’s application of T3S in various contexts.
Study 1 (Natural speech) examines the production data of seven children (ages 4-6) and their
caretakers (five adults). There is T3S variability in children and adults.
Study 2 (Flat structures) is an elicited production study participated in by 46 children (3- and
5-year-olds) and 20 adults. We tested the use of a non-cyclic strategy in sequences of two, three,
and five digits. The results show that children were able to apply T3S non-cyclically in
sequences of digits. However, under-application and over-application are two common error
types of children. A surface pattern produced by adults was not found in children.

Study 3 (NPs) is also an elicited production study, focusing on the cyclic strategy in NPs,
Ninety-four children (ages 3 - 6) and 20 adults participated in this study. Children were able to
apply T3S cyclically in three-syllable compound nouns and four-syllable NPs. However, when
the structures become more complex, they may default to the non-cyclic strategy.
Study 4 (Natural Speech Repetition) and Study 5 (Robot Talk Repetition) used repetition of
sentences to test T3S application at the sentence level where an integration of cyclic and noncyclic strategies is necessary. Twenty-one children (4- and 6-year-olds) and 11 adults
participated in Study 4. Forty-three children (4- and 6-year-olds) and 14 adults participated in
Study 5. Children were able to repeat the 4- and 6-syllable sentences which have T3S in Study 4
(Natural Speech Repetition). However, in Study 5 (Robot Talk Repetition) where we used
identical sentences, with the removal of the T3S effect, 4-year-olds have a lot of difficulty. Sixyear-olds were able to integrate cyclic and non-cyclic strategies in T3S application, but they still
do not have adults’ mastery of T3S. Six-year-olds have all the T3S patterns adults have, and also
approximate adults in their preference of the patterns.
Overall, the findings of these studies do not support early acquisition of T3S. The results
indicate that although children know to change a Tone 3 to a Tone 2 when it is followed by a
Tone 3, it takes time to learn how to set up the prosodic domains for T3S to apply, to develop
and reach adult-like mastery of the intricacies of T3S application.

Copyright by
CHIUNG-YAO WANG
2011

ACKNOWLEDGMENTS

As I thought about all who helped me to make this dissertation possible, I hope to
appropriately and adequately show my gratitude to each one. This dissertation work originates
from a few curious questions and thoughts I shared with my co-chair Professor Cristina Schmitt
while I was taking Child Language Acquisition with her in my second semester at Michigan
State University. I told her that while I was waiting at a bank in Taiwan where an automated
announcing system for number calling was used, I noticed that the recorded speech was without
the Tone 3 Sandhi Rule (T3S). It could be understood, yet it sounded a little strange. Professor
Schmitt came up with the brilliant idea of “Robot Talk” which became one of the studies in this
dissertation. Over the years, Cristina led me persistently in the research with her insights and
expertise. Her valuable feedback and endless time spent on advising me in the research is
priceless to me. She has continued to inspire me. She genuinely takes interest in this research and
her encouragement means a lot to me. Without her, this dissertation would not have been
accomplished.
I also would like to express my deepest gratitude for my co-chair Professor Yen-Hwei Lin’s
patient guidance, wise advice, prompt feedback, and encouragement over the years. The research
on Tone 3 Sandhi would not have been the way it is without her keen insights in linguistic issues.
In my mind, she is the best Chinese linguist— extremely knowledgeable and intelligent, and at
the same time, very humble and hard-working. She mentors her students with an understanding
of their needs. Professor Lin’s book The Sounds of Chinese (2007) has also greatly helped me to
better understand Tone 3 Sandhi. I have the two best advisors for co-chairs, freely giving me
their time, teaching me and sharing with me their ideas, and contributing their best to the
v

proposals for funds for the research. At one point when I considered quitting, they gently, but
firmly said to me, “I think you can do it,” “You have what it takes.” Running this race has not
been easy, but my co-chairs did an awesome job to help me finish this race.
I am also truly indebted and thankful to my other two committee members, Professor Grover
Hudson and Professor Alan Munn, for their expertise, patience, and heartfelt kindness. They
have been generous with their time to help me improve the quality of my research works. Their
input, insights, comments, and advice are invaluable to me. I am especially grateful that
Professor Grover Hudson remained as my committee member after his retirement. His
graciousness to me has been greatly appreciated. He is a scholar I highly respect.
The T3S studies in this dissertation would not have been carried out without the generous
funding of the NSF Doctoral Dissertation Improvement Grant (# 0820614) and Predissertation
Travel Award (College of Arts and Letters and International Studies and Programs, MSU). A
pilot study which helped tremendously in building our later studies was supported by SCRAM
(Special College Research Abroad Money, College of Arts and Letters, MSU). Even though the
pilot study is not part of this dissertation, I would like to acknowledge all those who contributed
to the pilot study as well as the studies in this thesis. I wholeheartedly appreciate each child who
participated in these studies, their parents, their teachers, the principles and staff who facilitated
at Perfection Kindergarten, Hong De Kindergarten, Situn Kindergarten, Wei Li kindergarten
(Taichung Methodist Church), and Chuan Shin Daycare in Taichung, Taiwan. I also am grateful
for each adult who participated in the studies, and the children and mothers for their participation
in the natural speech study. I thank each friend who helped me recruit subjects. My gratitude also
goes to Chia-Hsin Yeh for the transcription and coding work, observant feedback, and continual
support. I also thank Yi-Jen Huang for her endurance in completing the transcription work. I am

vi

indebted to statistician Dr. Wei-Wen Hsu who generously provided professional statistical
consultation. His advice and time is greatly appreciated.
I am grateful also for the Department of Linguistics and Languages as well as the Acquisition
Lab headed by Professor Cristina Schmitt and Professor Alan Munn at MSU where I receive fine
training and constructive feedback while conducting linguistic research. Also very much
appreciated are the Assistantships from the department over the years as well as the Dissertation
Completion Fellowship (College of Arts and Letters, MSU) which allowed me to concentrate on
dissertation writing in the final semester.
I thank each of my friends in the US and in Taiwan, especially Bernice Chang, Christina
Chang, Claire Hsieh, Linda Yang, Lisa Lin, Mei-Hua Liang, and my former 7th-grade students
for their encouragement and prayer. I thank Lyn Boudreau who drove me to Michigan, helped
me get settled in my first semester, and continues to show support, and Daphne Lin and her
daughter Belle who faithfully sent me cards of encouragement. Special thanks go to the prayer
team led by Dr. Jim Rawlinson and the Bible study group led by Shirley McGee and Sue Boutni
at Trinity Church. I have learned so much from you. Thank you for your love and prayer support.
I cannot express how much I love and appreciate each of my family members for their
unconditional love, uplifting encouragement, and unfailing support and prayer on this long
journey. Mom, Dad, my brother and two sisters, you are the best! Thank you for always listening
to me and believing in me. I thank my parents for their wise counsel and their sacrifice for me as
well as my late grandfather, who believed in the importance of education, and for the impact he
had on my life. I appreciated my nieces and nephews and Belle who are my little enthusiastic
supporters for their heartwarming drawings and cards. Thanks to Niece Jane who continually
asked me, “So how many pages (of the dissertation) have you written?” I think I wrote faster

vii

because of this very favorite question of hers. The initial interest of the topic sparked in my mind
upon hearing Nephew Han over-applying/mis-applying T3S at age 2, so I thank him as well.
Last but not least, I thank the Lord Jesus for His provision and His faithfulness. He has given
me courage when I am fearful, peace when I am anxious, and strength when I am weak.

viii

TABLE OF CONTENTS

LIST OF TABLES…………………………………………………………………………....... xiv
LIST OF FIGURES…………………………………………………………………………… xviii
LIST OF ABBREVIATIONS…………………………………………………………………… xx

CHAPTER ONE
INTRODUCTION………………………………………………………………………………... 1
1.0 Introduction…………………………………………………………………………………… 1
1.1 Study 1: Natural Speech……………………………………………………………………….9
1.2 Study 2 – Study 5: T3S in various domains of application…………………………………..10
1.3 Summary of findings…………………………………………………………………………11
1.4 The structure of the dissertation……………………………………………………………... 14

CHAPTER TWO
LINGUISTIC BACKGROUND ON TONE 3 SANDHI………………………………………...16
2.0 Introduction………………………………………………………………………………….. 16
2.1 Mandarin Tones and T3S……………………………………………………………………. 16
2.1.1 Four lexical tones ……………………………………………………………………... 16
2.1.2 Tone 3 Sandhi ……………………………………………………………………….... 19
2.1.2.1 T3S in Flat structures……………………………………………………………..19
2.1.2.2 T3S depends on syntax…………………………………………………………...20
2.1.2.3 T3S variation…………………………………………………………………….. 22
2.1.2.4 Summary………………………………………………………………………… 24
2.2 Two major Tone 3 Sandhi Models…………………………………………………………...25
2.2.1 Word-and-Phrase level Model ………………………………………………………... 25
2.2.1.1 T3S in Flat structures…………………………………………………………….28
2.2.1.2 T3S depends on syntax……………………………………………….................. 29
2.2.1.3 T3S variation…………………………………………………………………….. 32
2.2.2 Stress-foot Model……………………………………………………………………… 34
2.2.2.1 T3S in Flat structures………………………………...………………………….. 36
2.2.2.2 T3S depends on syntax………………………………...………………………... 37
2.2.2.3 T3S variation…………………………………………………………………….. 41
2.3 Some issues………………………………………………………………………………….. 42
2.3.1 Word-and-Phrase level Model………………………………………………………… 42
2.3.2 Stress-foot Model……………………………………………………………………… 46
2.4 Conclusion…………………………………………………………………………………... 48

ix

CHAPTER THREE
PREVIOUS CHILD ACQUISITION STUDIES ON TONE 3 SANDHI………………………. 49
3.0 Introduction………………………………………………………………………………….. 49
3.1 The acquisition of tones: an overview………………………………………………………. 50
3.2 Previous acquisition studies on Mandarin tones and T3S…………………………………... 50
3.2.1 Children’s acquisition of Mandarin tones……………………………………………... 50
3.2.2 Children’s acquisition of T3S…………………………………………………………. 56
3.3 Conclusion…………………………………………………………………………………... 66

CHAPTER FOUR
NATURAL SPEECH…………………………………………………………………………….67
4.0 Introduction………………………………………………………………………………….. 67
4.1 Additional linguistic background……………………………………………………………. 68
4.2 T3S in natural speech………………………………………………………………………... 70
4.3 Hypotheses and predictions………………………………………………………………… 72
4.4 Method………………………………………………………………………………………. 74
4.4.1 Subjects……………………………………………………………………………...… 74
4.4.2 Data Collection and transcription……………………………………………………... 74
4.4.3 Coding procedures…………………………………………………………………….. 75
4.5 Results……………………………………………………………………………………….. 80
4.6 Discussion………………………………………………………………………………….. 103
4.7 Conclusion…………………………………………………………………………………. 114

CHAPTER FIVE
FLAT STRUCTURES…………………………………………………………………………. 117
5.0 Introduction………………………………………………………………………………… 117
5.1 T3S in Flat structures ……………………………………………………………………… 118
5.1.1 Previous theoretical studies…………………………………………………………...118
5.1.2 Kuo et al. (2007) experimental study ………………………………………………...121
5.1.3 Re-thinking the linguistic environment for investigating flat structures…………….. 126
5.2 Research questions, hypotheses, and predictions…………………………………………...127
5.2.1 Research questions and Hypotheses…………………………………………………. 127
5.2.2 Predictions of T3S application in flat structures……………………………………...129
5.2.2.1 A two-syllable flat structure…………………………………………………….129
5.2.2.2 A three-syllable flat structure…………………………………………………...129
5.2.2.3 A five-syllable flat structure…………………………………………………… 131
5.3 Study 2: Flat structures…………………………………………………………………….. 133
5.3.1 Method……………………………………………………………………………….. 133
5.3.1.1 Subjects………………………………………………………………………… 133
5.3.1.2 Procedure………………………………………………………………………. 134
5.3.1.3 Design………………………………………………………………………….. 136
5.3.1.4 Materials……………………………………………………………………….. 137
5.3.1.5 Coding………………………………………………………………………….. 138
5.3.2 Results………………………………………………………………………………... 142
x

5.3.2.1 Overall correct rates in control items and test items…………………………… 143
5.3.2.2 Surface patterns in flat structures……………………………………………….144
5.3.2.3 Errors in children………………………………………………………………. 148
5.3.3 Checking hypotheses………………………………………………………………… 150
5.3.4 Discussion……………………………………………………………………………. 152
5.3.4.1 Correct surface patterns………………………………………………………... 152
5.3.4.2 T3S Errors……………………………………………………………………… 156
5.3.4.3 General discussion……………………………………………………………... 165
5.4 Conclusions………………………………………………………………………………… 170

CHAPTER SIX
NPs AND EVIDENCE FOR A SNTACTIC PARSING….………………………………. …...171
6.0 Introduction………………………………………………………………………………… 171
6.1 Linguistic Background……………………………………………………………………... 171
6.1.1 Cyclic T3S application at the Word Level……………………………………………171
6.1.2 Compound Nouns and NPs…………………………………………………………... 172
6.2 Research questions, hypotheses, and predictions………………………………………….. 177
6.2.1 Research questions and Hypotheses…………………………………………………. 177
6.2.2 Predictions of T3S application in NPs……………………………………………….. 178
6.2.2.1 Three-syllable compound nouns……………………………………………….. 179
6.2.2.2 Four-syllable noun phrases…………………………………………………….. 179
6.3 Study 3: NP Experiment…………………………………………………………………… 181
6.3.1 Method……………………………………………………………………………….. 182
6.3.1.1 Subjects………………………………………………………………………… 182
6.3.1.2 Procedure………………………………………………………………………. 182
6.3.1.3 Design …………………………………………………………………………. 183
6.3.1.4 Materials……………………………………………………………………….. 184
6.3.1.5 Coding………………………………………………………………………….. 187
6.3.2 Results……………………………………………………………………………….. 191
6.3.2.1 Three-syllable compound nouns……………………………………………….. 193
6.3.2.2 Four-syllable NPs……………………………………………………………….196
6.3.3 Discussion…………………………………………………………………………… 204
6.3.3.1 Three-syllable compound nouns……………………………………………….. 204
6.3.3.2 Four-syllable NPs……………………………………………………………….205
6.4 Conclusions………………………………………………………………………………… 207

CHAPTER SEVEN
T3S IN SENTENCES………………………………………………………………………….. 208
7.0 Introduction………………………………………………………………………………… 208
7.1 Research questions, hypotheses and predictions…………………………………………... 209
7.1.1 Research questions…………………………………………………………………… 209
7.1.2 Hypotheses and predictions for both experiments…………………………………… 209
7.2 Study 4: NSR = Natural Speech (with Sandhi) Repetition………………………..……….. 211
7.2.1 Method……………………………………………………………………………….. 212
xi

7.2.1.1 Subjects………………………………………………………………………… 212
7.2.1.2 Procedure………………………………………………………………………. 212
7.2.1.3 Design………………………………………………………………………….. 213
7.2.1.4 Materials……………………………………………………………………….. 223
7.2.1.5 Coding………………………………………………………………………….. 224
7.2.2 Results and discussion for control items in NSR…………………………………….. 228
7.2.3 Results and discussion for test items in NSR………………………………………....230
7.2.3.1 Results for 4w4σ and 4w6σ items………………………………………………231
7.2.3.2 Discussion for 4w4σ and 4w6σ items………………………………………….. 235
7.2.3.3 Results for PRO-5w6σ and NP-5w6σ items…………………………………... 238
7.2.3.4 Discussion for PRO-5w6σ and NP-5w6σ items………………………………. 241
7.3 Study 5: RTR = Robot Talk (without sandhi) Repetition…………………………..……… 244
7.3.1 Background………………………………………………………………………….. 244
7.3.2 Method……………………………………………………………………………….. 245
7.3.2.1 Subjects………………………………………………………………………… 245
7.3.2.2 Procedure………………………………………………………………………. 246
7.3.2.3 Design………………………………………………………………………….. 246
7.3.2.4 Materials……………………………………………………………………….. 247
7.3.2.5 Coding………………………………………………………………………….. 251
7.3.3 Results and discussion for control items in RTR…………………………………….. 251
7.3.4 Results and discussion for test items in RTR………………………………………... 254
7.3.4.1 Results for 4w4σ and 4w6σ items………………………………………………255
7.3.4.2 Discussion for 4w4σ and 4w6σ items………………………………………….. 258
7.3.4.3 Results for PRO-5w6σ and NP-5w6σ items…………………………………… 259
7.3.4.4 Discussion for PRO-5w6σ and NP-5w6σ items……………………………….. 263
7.4 General discussion…………………………………………………………………………. 266
7.5 Conclusions………………………………………………………………………………… 269

CHAPTER EIGHT
CONCLUSION………………………………………………………………………………… 273
8.0 Introduction………………………………………………………………………………… 273
8.1 Hypotheses…………………………………………………………………………………. 276
8.2 Summary and discussion of findings………………………………………………………. 279
8.3 Future research……………………………………………………………………………... 284

APPENDICES
Appendix A Study 1 Possible frozen chucks…………………………… ……………………...287
Appendix B Study 2 Experimental materials……………………………..…………………….289
Appendix C Study 3 List of test and control items…………………………………………….. 292
Appendix D Study 3 Experimental materials…………………………………………………...294
Appendix E Study 4 (NSR) Experimental materials……………………………………………313
Appendix F Study 5 (RTR) Experimental materials……………………………………………319
Appendix G Predicted surface patterns for test items in Study 4 NSR and Study 5 RTR……... 325
Appendix H Statistics notes……………………………………………………………………. 328
xii

REFERENCES…………………………………………………………………………………332

xiii

LIST OF TABLES

Table 2.1 Four lexical tones in Mandarin Chinese……………………………………………… 17
Table 2.2 T3S variation…………………………………………………………………………..23
Table 4.1 Study 1: Distribution of the subjects……………………………………………….….74
Table 4.2 Study 1: Number of T3* (adjacent and non-adjacent) and total syllables produced ….80
Table 4.3 Study 1: Number of T3-sequences: two, three, and four or more T3*……………….. 81
Table 4.4 Study 1: T3S frequency..……………………………………………………………... 82
Table 4.5 Study 1: T3S correct rates………………………………………………… …………..83
Table 4.6 Study 1: Number of T3S applications within words, within constituents and across
constituents……..…….………………………………………………. ………………84
Table 4.7 Study 1: T3S application rates (%) within constituents and across constituents by
subject……….……………………………………………………………………….. 90
Table 4.8 Study 1: Frequency (%) of cliticization of subject pronouns in two, three, and four
adjacent T3*………………………………………………………………………….. 98
Table 4.9 Study 1: Two adjacent T3* that belong to two prosodic domains…………………...101
Table 4.10 Study 1: Two adjacent T3* that belong to two prosodic domains (%)……………..102
Table 4.11 List of possible frozen chucks and number of tokens produced by each
participant …………………………….…………………………………………… 287
Table 5.1 List of grammatical and ungrammatical patterns in flat structures…………………. 121
Table 5.2 Study 2: Distribution of the subjects…………………………………………………133
Table 5.3 Study 2: Sample answers and their coding categories for data analysis…………….. 140
Table 5.4 Study 2: Sample T3S errors and their coding categories for error analysis…………. 142
Table 5.5 Study 2: Control items— 3-year-olds’ data excluded from the analysis……………. 142
Table 5.6 Study 2: Test items— 3-year-olds’ data excluded from the analysis……………….. 142
Table 5.7 Study 2: Control items (non-T3 digits)……………………………………………… 143
xiv

Table 5.8 Study 2: Test items (T3 digits) ……………………………………………………... 144
Table 5.9 Summary of discrepancies between attested and predicted patterns in a 5-syllable
flat structure…………………………………………………………………………. 152
Table 5.10 Possible parsing of odd number of syllables if left-to-right parsing is not the only
option………………………………………………………………………..……... 153
Table 5.11 Study 2: 32 combinations of T2 and T3 in a five-digit sequence………………….. 162
Table 5.12 Study 2: Percentages of attested and unattested error patterns in children………… 163
Table 5.13 Checking empirical data against theoretical predictions…………………………... 166
Table 6.1 Study 3: Predicted patterns for the structures tested..……………………………….. 181
Table 6.2 Study 3: Distribution of the subjects…………………………………………………182
Table 6.3 Study 3: Sample answers and their coding categories for analysis of T3S application
………………………………………………………………………………..……….189
Table 6.4 Study 3: Sample T3S errors and their coding categories for error analysis…………. 191
Table 6.5 Study 3: Three-syllable compound nouns (Control items)— data excluded from the
analysis……………………………………………………………………………….192
Table 6.6 Study 3: Three-syllable compound nouns (Test items)— data excluded from the
analysis……………………………………………………………………………… 192
Table 6.7 Study 3: Four-syllable compound nouns (Control items)— data excluded from the
analysis……….………………………………………………………………..……..192
Table 6.8 Study 3: Four-syllable compound nouns (Test items)— data excluded from the
analysis…….……………………………………………………………………....... 193
Table 6.9 Study 3: Correct rate (%) in three-syllable compound nouns— Control items
(no T3S)……………………………………………………………………………... 194
Table 6.10 Study 3: Correct rate (%) in three-syllable compound nouns— Test items
(with T3S)…………………………………………………………………………… 194
Table 6.11 Study 3: Correct rate (%) in four-syllable NPs— Control items (no T3S)………....197
Table 6.12 Study 3: Correct rate (%) in four-syllable NPs— Test items (with T3S)………….. 198

xv

Table 6.13 Study 3: Error types (%) in four-syllable NPs..……………………………………. 200
Table 6.14 Study 3: List of tests and controls in three-syllable compounds (in Appendix C).....292
Table 6.15 Study 3: List of tests and controls in four-syllable NPs (in Appendix C)….. ………293
Table 7.1 Study 4: Distribution of the subjects…………………………………………………212
Table 7.2 Study 4: Tokens for test and control items…………..……………………………… 221
Table 7.3 Study 4: Sample answers and their coding categories for data analysis…………….. 226
Table 7.4 Study 4: Sample correct responses and their coding categories…………………….. 228
Table 7.5 Study 4: Number of items included (I) and excluded (E) in control items………….. 228
Table 7.6 Study 4: Control items— data excluded from the analysis……………………......... 228
Table 7.7 Study 4: Correct rates (%) in control items …………………………………..…….. 229
Table 7.8 Study 4: Test items (4w4σ and 4w6σ)— data excluded from the analysis………..... 231
Table 7.9 Study 4: Test items (PRO-5w6σ and NP-5w6σ )— data excluded from the
analysis……………………………………………………………………………… 231
Table 7.10 Study 4: Number of items by pattern in 4w4σ and 4w6σ test items………………..232
Table 7.11 Study 4: Number of items by pattern in PRO-5w6σ and NP-5w6σ test items…….. 238
Table 7.12 Study 5: Distribution of the subjects………………………………………………..246
Table 7.13 Study 5: Number of items included (I) and excluded (E) in control items…………252
Table 7.14 Study 5: Control items— data excluded from the analysis………………………... 252
Table 7.15 Study 5: Correct rates (%) in control items………………………………………... 252
Table 7.16 Study 5: Test items (4w4σ and 4w6σ)— data excluded from the analysis………... 254
Table 7.17 Study 5: Test items (PRO-5w6σ and NP-5w6σ )— data excluded from the
analysis ……………………………………………………………………………. 255
Table 7.18 Study 5: Number of items by pattern in 4w4σ and 4w6σ test items………………..255
Table 7.19 Study 5: Number of items by pattern in PRO-5w6σ and NP-5w6σ test items
………………………………………………………………………………………260

xvi

Table 9.1 An example of the statistics output and the interpretation of the results……………. 328

xvii

LIST OF FIGURES

Figure 1.1 Basic ingredients for T3S………………………………………………………………5
Figure 4.1 Study 1: Percentages of T3S application at three levels by subjects……………….... 85
Figure 5.1 Study 2: A child’s hand, (a) – (c) for two, three, and five digits respectively……... 134
Figure 5.2 Study 2: Saying a digit two, three, and five times……………………………… …..138
Figure 5.3 Study 2: Total correct rates in control and test items by age groups………………..143
Figure 5.4 Study 2: Correct rates of two T3-digits by age group……………………………… 145
Figure 5.5 Study 2: Correct rates of three T3-digits by age group…………………………….. 146
Figure 5.6 Study 2: Correct rates of five T3-digits by age group……………………….……... 147
Figure 5.7 Study 2: Children’s error rates by type in flat structures……………………………148
Figure 5.8 Study 2: List of materials (in Appendix B)………………………………………… 290
Figure 6.1 Study 3: Sample materials in three-syllable compound nouns………………… …...185
Figure 6.2 Study 3: Sample materials in four-syllable NPs…………………………………. …187
Figure 6.3 Study 3: Correct rates in three-syllable compound nouns by age………………….. 195
Figure 6.4 Study 3: Correct rates in four-syllable NPs by age………………………………… 198
Figure 6.5 Study 3: Errors in four-syllable right-branching NPs by age………………………. 201
Figure 6.6 Study 3: Errors in four-syllable mixed-branching NPs by age……………………...202
Figure 6.7 Study 3: Experimental materials for three-syllable compounds (in Appendix D) ….294
Figure 6.8 Study 3: Experimental materials for four-syllable NPs (in Appendix D)………….. 299
Figure 7.1 Study 4: Sample materials………………………………………………………….. 224
Figure 7.2 Study 4: Correct rates for the control items by age………………………………… 229
Figure 7.3 Study 4: Correct rates in 4w4σ and 4w6σ sentences…………………………..…….233
Figure 7.4 Study 4: Correct rates in PRO-5w6σ and NP-5w6σ sentences……………………... 239
xviii

Figure 7.5 Study 5: Sample materials………………………………………………………….. 247
Figure 7.6 Study 5: Sample Praat Spectrogram for a test item— three T3*…………………... 248
Figure 7.7 Study 5: Sample Praat Spectrogram for a test item— all T3*……………………... 249
Figure 7.8 Study 5: Sample Praat Spectrogram for a control item…………………………….. 250
Figure 7.9 Study 5: Correct rates in control items by age……………………………………... 253
Figure 7.10 Study 5: Correct rates in 4w4σ and 4w6σ sentences……………………………….256
Figure 7.11 Study 5: Correct rates in PRO-5w6σ and NP-5w6σ sentences……………………. 261
Figure 7.12 Study 4: List of materials (in Appendix E)……………………………………….. 313
Figure 7.13 Study 5: List of materials (in Appendix F)………………………………………...319

xix

LIST OF ABBREVIATIONS
Ø
%
σ
4w4σ
4w6σ
CL
D
DP(s)
H
M-branching
MRUs
NH
NHS
NP-5w6σ
NP(s)
NSR
opt rule pattern
OR
P
PP(s)
pro
PRO-5w6σ
PRT
R-branching
RTR
ST
(T)0-(T)4
T3S
T3*
UT
v
V
VP(s)

empty beat (in a foot)
intonational break (when used in derivations for phrases or sentences)
syllable
four words, four syllables
four words, six syllables
classifier
determiner
determiner phrase(s)
(syntactic) head
mixed-branching
Minimal rhythmic units
(syntactic) nonhead
Nonhead Stress
Subject NP, five words, six syllables
noun phrase(s)
Natural Speech Repetition
pattern derived from T3S optional rules
Odds Ratio
preposition
prepositional phrase(s)
pronoun
Subject pronoun, five words, six syllables
particle
right-branching
Robot Talk Repetition
surface tones (ST1 = Surface pattern 1, ST2 = Surface pattern 2…)
Mandarin Tone 0 (neutral tone), Tone 1, Tone 2, Tone 3, and Tone 4
Tone 3 Sandhi
Tone 3s (plural T3s)
underlying tones
variable and can be either the sandhi tone, Tone 2, or Tone 3
verb
verb phrase(s)

xx

CHAPTER 1
INTRODUCTION
1.0 Introduction
In acquiring their first language, children need to figure out not only the properties of the
different linguistic components of the language they are acquiring (phonetics, phonology, syntax,
etc.), but they also need to understand how these components interact. Although we have a good
idea of the milestones in the acquisition of syntax and phonology, we know very little about the
acquisition of the mapping rules between phonology and syntax. We know that infants as young
as 7.5 months can use the rhythmic biases of their language to guide initial segmentation of
speech (Jusczyk & Luce 2002; Werker & Curtin 2005) and by age three, children have mastered
the basic properties of the phrase structure of whatever language(s) they are acquiring (Brown
1973; Guasti 2004; Hirsh-Pasek & Golinkoff 1996; Roeper 2007). We also know that prosodic
structure may play a role in the acquisition of morpho-syntax (Demuth 2001; Gerken 1996; Goad
& Buckley 2006; Lleó & Demuth 1999). However, very little is known about the acquisition of
syntax-dependent phonological rules. This dissertation focuses on children’s acquisition of a
1

syntax-dependent phonological rule, Tone 3 Sandhi (henceforth T3S), in Mandarin Chinese. In
this work, I present a series of studies which aim at widening our empirical knowledge of
children’s acquisition of T3S.
Mandarin has four lexical tones and each morpheme generally has an underlying lexical tone,
(except for functional words such as the question particles ma and ne,) and the pitch level and
contour of a neutral tone vary depending on the tone that precedes it (Bao 1999; Chao 1968;
1

I use the term Mandarin Chinese or just Mandarin to refer to Standard Mandarin or Standard
Chinese.
1

Chen 2000; Cheng 1973; Duanmu 2000/2007; Erbaugh 1992; Jeng 1979; Lin 2007 among
others). Very little is known about how children acquire T3S although this syntax-dependent rule
is the most extensively studied tone sandhi phenomenon in Mandarin Chinese. The rule can be
simplistically described as in (1).
(1)

T3 T3

T2 T3 (Chen 2000:364; Shih 1997:81)

The rule in (1) states that a Tone 3 is changed to a Tone 2 when followed by another Tone 3.
Given what we know about the acquisition of prosodic patterns and statistical abilities (Morgan
& Saffran 1995; Pierrehumbert 2003, for example) this rule should be acquired very early, and in
fact various studies have argued for early T3S acquisition in children (Jeng 1979; Jeng 1985; Li
& Thompson 1977; Zhu 2002; Zhu & Dodd 2000). However, none of these works have
examined T3S in a multiplicity of environments. The simplicity of (1) is quite deceptive because
2

the application of T3S “becomes rather complicated when there are more than two T3* in a
word or phrase” (Lin 2007:204). T3S application is a process that involves setting up prosodic
domains within which T3S applies, and both cyclic and non-cyclic strategies are used for parsing
the syllables. In addition, there are optional rules, so T3S can apply, but it does not need to apply
in those cases. For children to acquire T3S, it is not only a matter of learning lexical tones, the
rule in (1), but also when and how to use the right parsing strategies as well as the optional rules.
T3S also involves the mapping between syntax and phonology, and researchers do not always
agree to what extent T3S application relies on syntax. There are competing theories to describe
and explain (i) its cyclic and syntax-dependent component and (ii) its non-cyclic and syntaxindependent component. In flat structures (a sequence that has no internal syntactic structure,
such as a string of digits in phone numbers), researchers agree that disyllabic feet are built non2

In order to avoid confusion the similarity between “T3S” (Tone 3 Sandhi) and “T3s” (plural
Tone 3s) may cause, “T3*” is used to refer to “plural Tone 3s (T3s).”
2

cyclically from left to right, and T3S applies in each foot (Chen 2000:368; Duanmu
2000/2007:239; Lin 2007:206; Shih 1986; Shih 1997). (2) shows a four-digit string which is
parsed into two disyllabic feet.
(2)

wu
five
T3
(T2
(T2

wu
five
T3
T3)
T3)

wu
five
T3
T3
(T2

wu
five
T3
T3
T3)

‘five-five-five-five’
UT (= underlying tones)
ST (= surface tones)

Although syllables are parsed non-cyclically in flat structures, in noun phrases and in compounds
foot building is cyclic and dependent on the phrase structure of the units involved (cf. Chen 2000;
Cheng 1973; Duanmu 2000/2007; Shih 1986; Shih 1997), applying first to the innermost
constituent of a compound or a noun phrase, and then proceeding to the next level up as
exemplified in (3) and (4).
(3)

σσ
σ
[[shuiguo] niao]
fruit
bird
T3 T3
T3
(T2 T3)
T3
(T2 T2
T3)

‘birds made with fruits’
UT
ST

(4)

σ
[zhi
paper
T3
T3
(T3

σσ
[haima]]
seahorse ‘seahorses made with paper’
T3 T3
UT
(T2 T3)
T2 T3)
ST

This picture is further complicated at the sentence level where a mixed system of cyclic and
non-cyclic strategies are required as we see in (5).
3

(5) [[xiao
small
3
3
3
(3
(3

[[duan
short
3
(2
(2
2
2

tui]
leg
3
3)
2
2
2

ma]]
horse
3
3
3)
3)
3)

[hen [ke]]]
very thirsty ‘The small short-legged horse is very thirsty.’
3
3
UT
3
3
3
3
3
3
(2
3)
ST

In (5), T3* applies cyclically in xiao duan tui ma ‘small short-legged horse.’ Starting with
the innermost constituent duan tui ‘short-legged,’ T3S applies. In the next cycle, ma ‘horse’ is
incorporated, and T3S applies again. When xiao ‘small’ is incorporated, T3S does not apply
because there are no adjacent T3* at this point. T3S applies non-cyclically in the remaining
syllables hen ke ‘very thirsty.’
Prosodic domains are not always built with reference to syntax, and consequently, they do
not always map to syntactic constituents. According to one of the major T3S models, the Word3

and-Phrase level Model (Chen 2000; Shih 1986; Shih 1997), at the “Word level”, T3S applies
cyclically, and at the “Phrase level”, T3S applies non-cyclically. However, if no foot has been
parsed at the Word level, then, at the Phrase level foot building refers to syntax to form a
disyllabic foot for the smallest domain first. Once a disyllabic foot has been formed for the
smallest domain, the remaining syllables are parsed non-cyclically from left to right. Therefore,
to be able to apply T3S involves not only knowing the rule stated in (1), but also the right parsing
strategies at the right levels.
Independent of the details of each model, this brief description of T3S shows its application
depends on lexical, syntactic and phonological knowledge, as schematically represented in Fig.
1.1.

3

This model will be described and reviewed in Chapter 2.
4

Lexical
information
Underlying tones
of each σ (syllable)

Syntactic information
Phrase structure
in Mandarin Chinese

Prosodic information
Possible feet and foot
building processes in
Mandarin Chinese

T3S Application
Figure 1.1 Basic ingredients for T3S
At the lexical level children need to know the underlying tones of the units they will use to
build syntactic structures. In order to build syntactic structures, children need to have some
knowledge of the basic phrase structure properties of Mandarin. At the prosodic level, children
need to know the possible ways to build feet in Mandarin. In order to apply T3S in an adult-like
manner, all three types of information must be integrated in particular ways and in many cases
the syntax and prosody go hand in hand.
As if the picture was not complex enough, children will also have to deal with a fair amount
of T3S variability in the input. Although there is some debate with respect to the nature of the
variation, most researchers agree that some variation is associated with differences in speech
rates (Chen 2000; Cheng 1973; Lin 2007; Shih 1986; Shih 1997), because larger domains can be
formed in fast speech. There are two optional rules in the Word-and-Phrase level model: (i) T3S
is optional across prosodic domains, and (ii) the fast speech rule. In (6) below, ST1 is derived
through regular parsing. In fast speech, a larger domain is formed, and T3S applies iteratively
from left to right, which gives ST2.

5

(6)

[wo
I
T3
T3
T3
(T2

[xiang
want
T3
T3
T3
T3)

[mai
buy
T3
T3
(T2
(T2

bi]]]
pen
T3
T3
T3)
T3)

(Lin 2007: 215)
‘I want to buy pens.
UT
Word level: not applicable
Phrase level: Disyllabic foot for the smallest domain, T3S
Phrase level: Disyllabic foot for the rest, T3S;
ST1 (surface tones, Surface pattern 1)

4

Optional in fast speech:
(T3 T3
T3
T3)
(T2 T2
T2
T3)

Larger domain in fast speech
T3S from left to right; ST2 (surface tones, Surface pattern 2)

Notice that the sentence in (6), used to depict T3S variability, is fairly short. Depending on
the syntactic structure, the number of adjacent T3* and the length of the sentence, the number of
T3S surface patterns varies. It is not uncommon that a given T3S sentence has two or more than
two T3S surface patterns. Children have to deal with and cope with variability. How do children
acquire this complicated syntax-dependent phonological rule?
Not only is there an absence of studies targeting children’s T3S acquisition exclusively, there
has also been a lack of experimental evidence of adults’ T3S production. In the T3S literature,
the grammaticality judgments are commonly the researchers’ own judgments, but concerning
multiple surface patterns, we do not have a good idea of which pattern adults tend to favor or
disfavor. How the patterns are chosen and used (i.e. frequency for each T3S pattern) is relevant
to acquisition studies since adult speech is crucial for acquisition. Although studies conducting
experiments to obtain empirical evidence concerning adults’ T3S variability are not so difficult
to create, it becomes very challenging when we consider testing both adults and children on
identical tasks. This is because with children, the choice of vocabulary and the experimental
design is a lot more limited. In this thesis, T3S application in various syntactic structures in
children ages 3 – 6 is examined in order to learn children’s developmental path.

4

Refer to Chapter 2 for more details on optional rules and T3S variation.
6

If children do acquire T3S at an early age and are almost error-free as previous studies have
suggested, we expect that children ages 3-6 will have no difficulties applying T3S in phrases or
sentences.
Our studies seek to investigate whether or not children can apply T3S actively in
words/phrases/sentences that are novel or have combinations of morphemes/words that are not
likely to have been treated by the child as frozen expressions, such as the example in (7).
(7)

[xiao
small
3
3
3
(3
*(2

[[duan
short
3
(2
(2
2
3)

tui]
leg
3
3)
2
2
(2

ma]]
horse
3
3
3)
3)
3)

‘(a) small short-legged horse’
UT
Word level: T3S
Word level: T3S
Word level: no T3S, ST

In (7), although the individual words are commonly known to children, the combination of
the words in the phrase is unlikely to have been learned as a frozen chunk. Therefore, children’s
application of T3S in (7), correctly or incorrectly, will provide useful information. For instance,
cyclic application of T3S gives T3T2T2T3, the correct pattern, whereas non-cyclic parsing from
left to right gives T2T3T2T3, the ungrammatical pattern.
If children have multi-word utterances they must have knowledge from all three components
(lexical tones, phrasal structures, and setting up prosodic domains) to be able to apply T3S. We
therefore ask the following questions:
1. How do children set up the domain of application of T3S?
2. Do they differ from adults and if so, how?
3. Is there T3S variability? Does the variability in the input influence the acquisition of T3S?
4. Is there a structure-less rhythmic bias or a syntactic bias at different stages of
development?

7

In order to answer these questions, we conducted one natural speech study and four
experimental studies where we examined T3S in various syntactic contexts. Since T3S involves
both cyclic and non-cyclic parsing strategies, our experimental studies were designed so that
cyclic and non-cyclic strategies were tested separately. Importantly, T3S in sentences where the
integration of the two strategies is required was also tested.
To test the non-cyclic parsing strategy, we used flat structures which require a structure-less
rhythmic foot-building from left to right.
To test the cyclic parsing strategy, we used NPs which require a syntax-based strategy.
Prosodic domains within which T3S applies are built bottom-up.
To test an integration of cyclic and non-cyclic parsing strategies, we used sentences. For NPs
embedded in the sentences, the cyclic strategy is expected. For the remaining syllables in the
sentence, the non-cyclic parsing strategy is expected.
For all the studies, both the experimental studies and the natural speech study, we are
interested in whether or not children differ from adults with respect to T3S variability and the
parsing strategies they use.
Study 1 (Natural Speech) examines production data of children and their caretakers. Seven
children ages 4 – 6 and their caretakers (five adults) participated in this study.
Study 2 (Flat Structures) is an elicited production study, testing non-cyclic (a left-to-right
parsing strategy) T3S application in sequences of digits which have no internal structure. Fortysix children (19 3-year-olds, 27 5-year-olds) and 20 adults were tested.
Study 3 (NPs) is an elicited production study, testing cyclic (a bottom-up parsing strategy)
T3S application in NPs. Ninety-four children (3-year-olds: 24, 4-year-olds: 20, 5-year-olds: 27,
6-year-olds: 23) and 20 adults participated in the study

8

Study 4 (Natural Speech Repetition) and Study 5 (Robot Talk Repetition) use repetition of
sentences to test T3S application at the sentence level where an integration of cyclic (bottom-up)
and non-cyclic (left-to-right) strategies is required. Twenty-one children (4-year-olds: 11, 6-yearolds: 10) and 10 adults participated in Study 4. Forty-three children (4-year-olds: 20, 6-year-olds:
23) and 14 adults participated in Study 5.
1.1 Study 1: Natural Speech
Many studies have demonstrated that the input children are exposed to partially determines
the system they are acquiring and the rate of acquisition (Demuth 1995; Demuth 2001; Miller
2007; Miller & Schmitt 2009; Morgan 1986; Yang 2002). We also know that variability in the
input can cause delays in acquisition as children take longer converging on an adult grammar
(Miller 2007; Miller & Schmitt 2009).
The first step is to examine the input the child is exposed to. In particular, we are interested
in (i) how much T3S is in the input; how much T3S do children produce; and (ii) the T3S
variation in children and adults. Our research questions are as follows.
How much T3S do adults produce (language input for children)?
How much T3S do children produce?
Is there T3S variability within a speaker and across speakers?
In Study 1 we examine the natural speech of seven children and their caretakers, five adults,
for approximately thirty minutes of talk between the children and their caretakers. Both adults
and children produced T3S, and even though the sample is small, T3S variation in contexts
where T3S is optional across domains was found.

9

1.2 Study 2 – Study 5: T3S in various domains of application
In order to apply T3S in an adult-like way, children need to have information from various
linguistic levels and also need to integrate syntactic and prosodic information in particular ways
in both recursive and non-recursive ways. A series of experimental studies — Study 2, Flat
structures; Study 3, NPs; Study 4, Natural Speech Repetition (with Tone 3 Sandhi); and Study 5,
5

Robot Talk Repetition (without Tone 3 Sandhi) — were designed to determine how children
apply T3S in flat structures, at the word, phrase, and sentence levels at different developmental
stages. We ask the following questions:
1. Cyclic and non-cyclic T3S strategies:
Do children know to use non-cyclic strategies when there is no internal syntactic structure as in
digit-sequences? Do children know to use cyclic strategies in NPs? Can they integrate the two
strategies at the sentence level? Can children integrate the subject and the VP into a domain
where T3S applies? What do children do when the non-cyclic parse (a prosody-based strategy,
from left to right) and the cyclic parse (a syntax-based strategy, bottom-up) mismatch?
2. Development in T3S acquisition:
How do children go from zero T3S to adult-like T3S? Is T3S acquired early and almost errorfree as previous literature indicates? If not, is there a developmental pattern? Do younger
children and older children behave similarly in T3S application? Does T3S variability of children
reflect that of adults?

5

Robot Talk Repetition is a study that used sentences in which T3S was artificially removed
(See Chapter 7 for details).
10

The following hypotheses were tested:
Syntax-Prosody Alignment Hypothesis (Gerken 1996)
We hypothesize that T3S cases where a left-to-right parse and the phrase structure dependent
parse produce the same results will cause less trouble than cases in which left-to-right domain
building produces a different result than domain building based on the syntax. T3S cases where
prosody and syntax mismatch are more difficult than T3S cases where prosodic domains and
syntactic domains are in alignment.
Structural complexity Hypothesis
T3S at the clausal level requires the integration of DPs and compounds into a larger prosodic unit.
We hypothesize that children will take longer to acquire T3S at the sentence level than at the
phrasal level. Particularly it may be difficult for children to integrate the subject and the VP into
a domain where T3S can apply.
Variational Hypothesis (Miller 2007; Pearl 2007; Yang 2002)
If there is more variation in particular types of structures in the input, these structures will
provide evidence for more than one possible analysis, generating a certain amount of noise in the
input. If the input is nosier we expect that children will require more data to converge on the
adult language because certain outputs may not unambiguously support one or the other
hypothesis.
1.3 Summary of findings
The research questions we asked in Study 1 (Natural speech) concern how much T3S adults
and children produce, and if T3S variability is found. We found T3S variability. In addition, in

11

most cases, T3-sequences had only two adjacent T3*. Longer sequences of T3* were much rarer.
Children ages 4-6 did not appear to have trouble in T3S application in the T3-sequences.
In Study 2 (Flat structures), we asked whether or not children (age 3 and age 5) know how to
use the non-cyclic parsing strategy in digit-sequences. Two-, three-, and five-digit sequences
were tested. The results show that children know the non-cyclic parsing strategy. However, both
under-application and over-application were found in children. In the five-digit sequence, there is
a pattern in adults that was not found in children, suggesting that children may still be in the
process of acquiring all the patterns that adults use.
In Study 3 (NPs), we asked whether or not children (ages 3-6) know to use the cyclic parsing
strategy in NPs. We tested T3S application in three-syllable compound nouns ([σ[σσ]] and
[[σσ]σ]) and four-syllable NPs (right-branching [σ[σ[σσ]]] and mixed-branching [σ[[σσ]σ]]). The
results show that children knew how to use the cyclic parsing strategy. They did well in the
three-syllable compound nouns. In the case of right-branching [σ[σ[σσ]]] NPs whose predicted
pattern is (T2T3)(T2T3), children did not have very much difficulty. However, in the case of
mixed-branching [σ[[σσ]σ]] NPs whose predicted pattern is (T3T2T2T3), children had a lot of
difficulties. A common error found in children was *(T2T3)(T2T3). This error indicates that the
parsing is non-cyclic, i.e. without reference to the morphosyntactic structure of the phrase. It
appears that when the structure is more complex, they default to the left-to-right non-cyclic
parsing.
In the right-branching [σ[σ[σσ]]] NPs, cyclic parsing and non-cyclic parsing produce the
same result whereas in mixed-branching [σ[[σσ]σ]] NPs, cyclic parsing and non-cyclic parsing
produce different results. Put differently, there is a mismatch between syntax and prosody in the
mixed-branching NPs. The evidence that the mixed-branching [σ[[σσ]σ]] NPs is much more
12

difficult than the right-branching [σ[σ[σσ]]] for children support the Syntax-Prosody Alignment
Hypothesis.
In Study 4 (Natural Speech Repetition) and Study 5 (Robot Talk Repetition), the research
questions asked were whether or not children can (i) integrate cyclic and non-cyclic strategies at
the sentence level; (ii) integrate the subject and the VP into a domain where T3S can apply; (iii)
whether a subject NP and a subject pronoun differ in how feet are formed in the sentence; and (iv)
whether there is T3S variability in children and adults.
The findings show that when children age 4 and age 6 heard a correct T3S pattern in Natural
Speech Repetition, they could repeat the sentence, although not always with the same surface
pattern. Adults, too, did not always repeat the pattern they heard. However, in Robot Talk
Repetition where T3S was artificially removed, the correct rates of children age 4 and age 6
dropped dramatically. While four-year-olds had a lot of trouble in Robot Talk Repetition, 6-yearolds had all the surface patterns adults had. This suggests that 6-year-olds know to integrate
cyclic and non-cyclic strategies at the sentence level, but not 4-year-olds.
Unlike an NP, a pronoun is prosodically weak, and we asked if its behavior in T3S
application is different from that in full NPs. A monosyllabic subject NP and a monosyllabic
pronoun are both found to be integrated into the VP by a left-to-right strategy. However, there is
a distinction between them. While a monosyllabic subject NP has the option to stand alone in its
own foot, a monosyllabic pronoun does not. The subject-NP sentences can keep the subject
separate from the predicate whereas the subject-pronoun sentences do not. The contrast between
a subject-NP sentence and a subject-pronoun sentence lies in the frequency of forming a foot
with the following syllable(s) and the inability of a monosyllabic subject pronoun to form a foot

13

by itself. Evidence shows that 6-year-olds, but not 4-year-olds, were able to integrate the subject
and the VP into a domain where T3S can apply, but they have not reached adult-like mastery.
There is T3S variability in children and adults. Although even 6-year-olds still do not have
adult-like accuracy in T3S application, they are approximating their preference in the use of T3S
patterns to that of adults.
In order to apply T3S, children need to have lexical information (the underlying tones of each
syllable), syntactic information (phrase structure), and prosodic information (how to build
prosodic domains), to integrate these ingredients to apply T3S, and on top of these, to learn
appropriate T3S variation. Based on the overall findings of our studies, we conclude that T3S is
not easy and it takes years for children to reach adult-like mastery of it. The results show that
children know the cyclic and non-cyclic strategies. They know to apply T3S iteratively from left
to right in flat structures, and they refer to the morphosyntactic structure in NPs in their T3S
application. Younger children (ages 3–4) had a lot of difficulties in the more complex structures
(mixed branching NPs and sentences) while older children (ages 5–6) are becoming more adultlike in many ways, including having a much higher correct rate in T3S application, and having
more T3S patterns; and also, in many cases where there is T3S variability, the frequency of the
use of T3S patterns in these children are fairly similar in adults.
1.4 The structure of the dissertation
The organization of the dissertation is as follows. Chapter 2 gives linguistic background on
T3S and reviews two major T3S models. This chapter also provides basic information on T3S
and is the foundation of how our experimental studies were designed. In addition, the discussion
of T3S application in various syntactic environments will be based on the Word-and-Phrase level
model, so that it is important that Chapter 2 be read before Chapter 3–Chapter 7.
14

Chapter 3 illustrates the findings of previous studies on children’s T3S acquisition. In this
chapter, I discuss what can be learned from these studies and what is lacking and needs to be
learned.
Chapter 4 is a natural speech study where spontaneous speech between seven children ages 4
to 6 and their caretakers (five adults) is examined.
Chapter 5, Chapter 6, and Chapter 7 present four cross-sectional experimental studies
targeting T3S application at various levels. Participants include children age 3–6 as well as
adults. Chapter 5 investigates non-cyclic T3S application in flat structures. Chapter 6 studies
cyclic T3S application in NPs. Chapter 7, the most complicated study in the series, examines
T3S application at the sentence level where an integration of cyclic and non-cyclic parsing
strategies is required. Two studies are included in Chapter 7. The experimental sentences were
identical in these two studies, with tones manipulated in one study but not in the other.
Although the experimental chapters do not have to be read in order, reading sequentially is
recommended as they progress from ‘easy’ to ‘difficult’ in terms of the amount of T3S
application workload we required of children.
Chapter 8 summarizes the dissertation with the major findings each of our studies provides,
suggests what still needs to be learned, and what future T3S studies can investigate.

15

CHAPTER 2
LINGUISTIC BACKGROUND OF TONE 3 SANDHI
2.0 Introduction
T3S is the most extensively studied tone sandhi phenomenon in Mandarin, and there is a lot
of literature on T3S (Chen 2000; Cheng 1987; Dell 2004; Duanmu 2000/2007; Lin 2005; Lin
2007; Shih 1986; Shih 1997; Xu 1992; Zhang & Lai 2010; N. Zhang 1997; Z. Zhang 1988). To
better understand children’s acquisition of T3S, it is important to understand T3S application and
T3S variation in various syntactic structures. The purpose of this chapter is to provide basic and
crucial linguistic information on T3S, which subsequent chapters will refer to.
The organization of this chapter is as follows. Section 2.1 will provide basic information and
relevant background for Mandarin tones and T3S. For the purpose of this dissertation, the focus
will be placed on T3S. Section 2.2 will present two major theoretical models of T3S, the Wordand-Phrase level Model (Chen 2000: Ch 9; Shih 1986; Shih 1997) and the Stressed-foot Model
(Duanmu 2000/2007). The former is adopted for predicting surface T3S patterns in the
experimental studies in this dissertation. Section 2.3 will discuss some theoretical issues. Section
2.4 concludes this chapter with a brief summary.
2.1 Mandarin Tones and T3S
2.1.1 Four lexical tones
There are four phonemically contrastive tones in Mandarin Chinese, Tone 1 (T1), Tone 2
(T2), Tone 3 (T3), and Tone 4 (T4). Tones are used for distinguishing meanings, so the same
syllable with a different tone is different in meaning: e.g. mai (T3) ‘to buy’ and mai (T4) ‘to sell’,
and shu (T1) ‘book’ and shu (T4) ‘tree’. In identifying the four lexical tones (T1–T4), there are

16

two other commonly used systems. First, H (High), M (Mid), and L (Low) is often used in
linguistic analysis, and such labeling efficiently communicates the pitch height. For instance, a
sequence of MH indicates a tone starting with a mid-height pitch, and rises to a high pitch.
Second, a system that uses numbers to express pitch heights is also widely adopted. On a scale of
1-5, 1 is the lowest pitch level, and 5 is the highest (Chao 1968). Tonal features H (high), M
(mid), and L (low) corresponds to ‘4 or 5,’ ‘3,’ and ‘1 or 2’ that indicate pitch values (Lin 2007:
194). The two systems mentioned above describe the linguistic properties of the four tones in a
way that the labeling of the four tones with “T1 - T4” lacks. Either a two- scale (H and L) or
three-scale (H, M, and L) system is commonly used for phonological analysis.
Table 2.1 summarizes the four lexical tones in Mandarin, with the alternative naming system
and examples. The Chinese characters are provided in the examples as there are homophones for
identical syllables with identical tones.
Table 2.1 Four lexical tones in Mandarin Chinese
Four lexical tones T1
T2
T3
Descriptive
High level Mid-rising Low dipping tone
naming
tone
tone
Pitching level
55
35
214 (phrase final)
naming (1-5)
21 (non-phrase-final)
Pitch level
HH
MH
LH (phrase final)
naming (H/M/L)
LL (non-phrase-final)
Examples
bī 逼
bí 鼻
bǐ 筆
(to force)
(nose)
(pen)

T4
High falling tone
51 (phrase final)
53 (non-phrase-final)
HL (phrase final)
HM (non-phrase-final)
bì 必
(must)

In addition to four lexical tones, there is also what is known as neutral tone. Neutral tone
occurs only in unstressed syllables (Chao 1968; Chen 2000; Cheng 1973; Duanmu 2000/2007;
Jeng 1979; Lin 2007 among others). Neutral tone (T0) is relatively more limited, compared to
the other four lexical tones. Also, the frequency of the neutral tone can vary depending on the
variety of Mandarin. For instance, while neutral tone is found in grammatical categories as well

17

as in content words in Beijing Mandarin, it tends to be found in functional categories, and is less
common in content words in Taiwan Mandarin. A few examples with neutral tone are ma which
is a question particle placed at the end of a sentence, classifier (CL) ge, the second syllable of
many kinship terms such as baba (T4T0) ‘father,’ mama (T1T0) ‘mother,’ yeye (T2T0) ‘grandpa’
and nainai (T3T0) ‘grandma.’ In these four kinship terms, we see that neutral tone can be
preceded by any of the four lexical tones. An example of differences in the use of neutral tone in
content words is xiansheng ‘mister’ which is read as T1T1 in Taiwan Mandarin, but T1T0 in
Beijing Mandarin. The neutral tone is left unmarked in the Romanization pinyin writing, but for
the purpose of distinguishing it from the other four lexical tones, it is sometimes referred to as T0
or T5 in the linguistic literature. The use of these names (T1 - T4 and T0) is common to native
speakers of Chinese as well as to linguists. Children are taught with the naming of T1 – T4 (and
T0) in the elementary education of Mandarin in Taiwan. In this dissertation, for ease of
presentation, I label the four lexical tones with T1, T2, T3, and T4, and the neutral tone with T0.
Lastly, in Mandarin, each morpheme has an underlying lexical tone, but functional words
such as the question markers ma or ne do not have an underlying lexical tone (Chen 2000;
Duanmu 2000/2007; Erbaugh 1992; Lin 2007). A lexical tone, or phonemic tone, may undergo
some change and surface as a tone with different phonetic pitch through tonal rules or processes
(Chao 1968; Chen 2000; Cheng 1973; Duanmu 2000/2007; Lin 2007; Shih 1986; Shih 1997; Xu
1997). One of these tonal rules is the Tone 3 Sandhi rule, which we now turn to.

18

2.1.2 Tone 3 Sandhi
6

T3S is commonly described as changing a T3 to a T2 when it is preceded by another T3 as
shown in (1).
(1)

T3 T3

7

(T2 T3) (Chen 2000:364; Shih 1997:81)

Lin (2007) points out that the rule T3T3 (T2T3) is deceptively simple because T3S application
becomes very complicated when there is a sequence of more than two T3* (2007: 204). Prosodic
8

domains are important for the application. (How such domains are built will be illustrated in
more detail in later sections.)
2.1.2.1 T3S in flat structures
Without hierarchical internal structures, flat structures can be found in phone numbers or
translated proper nouns (e.g. Mìxīgēn (T4T1T1) ‘Michigan’). Syllables in flat structures are
parsed from left to right in binary feet, and at the end, if there is an unfooted syllable, it is
incorporated into the neighboring foot (Chen 2000:368; Lin 2007:206; Shih 1986; Shih 1997).
Duanmu has the same view stating that in polysyllabic names and digits, disyllabic feet are built
from left to right (Duanmu 2000/2007; Duanmu 2004:70).
(2)

four T3-digits
jiu
jiu
3
3
(2
3)

jiu
3
(2

jiu
3
3)

(Lin 2007:206)
‘9999’
9
UT (= underlying tones )
ST (= surface tones)

6

Following Lin (2007), the bold type T2 (Tone 2) indicates a Tone 2 (sandhi tone) that is
derived from Tone 3 because of the Tone 3 Sandhi rule.
7
Following the convention in the linguistics literature, parentheses ( ) refer to prosodic domains
and square brackets [ ] refer to syntactic constituents.
8
Throughout this dissertation, prosodic domains refer to T3S domains.
9

In this dissertation, I use numerals 1, 2, 3, 4 (and 0 for neutral tone), with the “T” omitted, in
the derivations of T3S to refer to the lexical tones Tone 1, Tone 2, Tone 3, and Tone 4.
19

(3)

five T3-digits
jiu
jiu
3
3
(2
3)
(2
3)

jiu
3
(2
(2

jiu
3
3)
2

jiu
3
3
3)

(Lin 2007:206)
‘99999’
UT
ST

In (2), the four digits are parsed into two disyllabic feet. T3S applies within each foot, and
the surface pattern is (T2T3)(T2T3). In (3), syllables are parsed from left to right in disyllabic
feet, and T3S applies within each foot. The unparsed syllable on the right edge is then
incorporated into the foot preceding it at the end and T3S applies again. The surface pattern is
(T2T3)(T2T2T3).
2.1.2.2 T3S depends on syntax
Unlike in flat structures, T3S in phrases and sentences that have internal structures heavily
depends on syntax, and T3S applies cyclically as shown in (4) – (7).
(4)
a.

b.

or

Three adjacent T3*
[[σσ]
σ]
[[laoshu] pao]
mouse run
33
3
(23)
3
(22
3)
*(32
3)
[σ
[mai
buy
3
3
(3
(2

[σ
[mi
rice
3
(2
2
2

σ]]
jiu]]]
wine
3
3)
3)
3)

(Lin 2007:212)
‘The mouse is running.’
UT
ST

(Lin 2007:212)
‘to buy rice wine’
UT
ST1 (Surface tones, Surface Pattern 1)
ST2 (Surface tones, Surface Pattern 2)

In (4a) and (4b), the application of T3S starts from the innermost constituent, and then in the
next step the remaining syllable, which has not been parsed yet, is incorporated into the
disyllabic foot that has been formed. T3S applies one more time in (4a) in the next step when the
unfooted syllable (the third syllable) is incorporated into the disyllabic foot. Crucially, (T3T2T3)
20

is ungrammatical in (4a). In (4b), T3S applies in the first cycle, and no further application of T3S
is needed in the second cycle. This is because there are no adjacent T3* after the unfooted
syllable (the first syllable) is incorporated into the disyllabic foot that has been formed. In (4b),
there are two surface patterns. We will discuss T3S variation in the next subsection, Section
2.1.2.3. We now turn to four-syllable structures.
(5)
a.

b.

or
or
c.

Four adjacent T3*
left-branching structure
[[[zhanlan]
guan]
exhibition
hall
3
3
3
(2
3)
3
(2
2
3)
(2
2
2
*(2
3
2
*(3)
(2
2

zhang]
director
3
3
3
3)
3)
3)

right-branching structure
[xiao
[mu [laohu]]]
small
female tiger
3
3
3
3
2
3
(2
3)
3
(3
2
3)
(2
3
2
3)
(3)
(2
2
3)
(2
2
2
3)
mixed-branching structure
[[Mi
[laoshu]]
hao]
Mickey mouse
good
3
3
3
3
3
(2
3)
3
(3
2
3)
3
(3
2
2
3)
*(2
3
2
3)
?/*(2
2
2
3)

(Chen 2000:383; Lin 2007:212)
‘exhibition hall director’
UT

ST

(Lin 2007: 212)
‘small female tiger’
UT

ST1
ST2
ST3
(Lin 2007:207)
‘Mickey Mouse is good.’
UT

ST

In (5a), (5b) and (5c), T3S applies cyclically from the innermost constituents, resulting in the
surface patterns of (T2T2T2T3), (T2T3T2T3), and (T3T2T2T3) respectively. The difference in
the surface patterns is accounted for by the syntactic differences in these structures. (5b) has two
21

additional patterns through different parsing. For (5a), (T2T3T2T3) and (T3)(T2T2T3) are both
ungrammatical. For (5c), T2T3T2T3 is ungrammatical, and T2T2T2T3 may be marginal or
ungrammatical. The reason why T2T2T2T3 is marginal or ungrammatical is unclear.
I have shown in (4) and (5) how syntax plays a crucial role in T3S application. Next, we turn
to the issue of T3S variation.
2.1.2.3 T3S variation
In this subsection, data of T3S variation are presented. Cases of two to four adjacent T3* will
be used in the discussion. T3S applies when there are two adjacent T3* except that in cases
where the two adjacent T3* belong to different prosodic domains, T3S is optional. Variation
arises when there is an optional rule or an alternative parse.
The examples in (6) and (7) exemplify obligatory and optional T3S application respectively
in cases where there are only two adjacent T3*.
(6)
a.

Two adjacent T3*
Two T3* belonging to the same prosodic domain
[mai
jiu]
(Chen 2000:366)
buy
wine
‘buy wine’
3
3
UT
(2
3)
ST

b.

Two T3* belonging to different prosodic domains
[Tou-nao]
[jian-dan]
(Chen 2000:373, 416-417)
brain
simple
‘simple-minded’
2
3
3
1
UT
(2
3)
(3
1)
ST1; no T3S
(2
2)
(3
1)
ST2; optional T3S applied
In (6a), we see a sequence of two T3* surface as T2T3 in the output. In (6b), the two T3*

belong to different prosodic domains, and T3S does not have to apply. ST1 is the surface pattern
when T3S does not apply. When optional T3S applies across domains, we have the other surface

22

pattern in ST2. In the simplest case of two adjacent T3*, we already see T3S variation. Next, let
us consider three- and four-syllable cases. I will use some examples we saw in (4) and (5).
Table 2.2 T3S variation
a. Three adjacent T3*
[[σσ]
σ]
[[laoshu]
pao]
mouse
run
33
(23)
(22

3
3
3)

b. Three adjacent T3*
[σ
[σ
σ]]
[mai
[mi
jiu]]] (Lin 2007:212)
buy
rice
wine ‘to buy rice
wine’
3
3
3
UT
3
(2
3)
(3
2
3)
ST1
or (2
2
3)
ST2
d. right-branching structure
[xiao [mu [laohu]]] (Lin 2007:212)
small female tiger
‘small female
tiger.’
3
3
3 3
UT
2
3
(2 3)
3
(3
2 3)
(2
3
2 3)
ST1
or (3)
(2
2 3)
ST2
or (2
2
2 3)
ST3

(Lin 2007:212)
‘The mouse is
running.’
UT
ST

c. left-branching structure
[[[zhanlan] guan] zhang] (Chen 2000:383;
Lin 2007:212)
exhibition hall director ‘exhibition hall
director’
3
3
3
3
UT
(2
3) 3
3
(2
2
3)
3
(2
2
2
3)
ST

Lin (2007:212) points out that expressions with the embedded constituents on the left edge
usually have one surface pattern as (a) and (c) in Table 2.2 show, whereas expressions with the
embedded constituents on the right edge have more than one surface pattern ((b) and (d) in Table
2.2).
Researchers do not always agree on the source of a particular surface pattern. For instance, in
Table 2.2, the surface pattern ST2 (T2T2T3) in (b) and the surface pattern ST3 (T2T2T2T3) in
(d) are considered a fast speech pattern where a larger domain is formed and T3S applies from
left to right in one step (Chen 2000; Lin 2007; Shih 1986; Shih 1997), but Duanmu (2000)
disagrees with the fast speech account and takes this pattern as a permissible alternative pattern
through a different parsing strategy. I take the view that the larger domain pattern is an

23

alternative pattern because: (i) in the experimental study of Kuo et al. (2007) where slow, normal,
and fast speech T3S production were compared, the larger domain pattern was found even in
slow speech; and (ii) evidence from our own experimental studies (see §5.3.2.2) showing that
participants produced the larger domain pattern even though the experiments were in the normal
speech setting, rather than the fast speech setting. Examples cited from previous literature will be
provided with the authors’ own views regarding the larger domain pattern (fast speech pattern or
just a permissible alternative pattern).
Some claims of T3S variation in the T3S literature are based on the researchers’
grammaticality judgments. As can be expected, there are dialectal differences.
(7)

[Gou
dog
3
(3)
(2
(2

[[bi
than
3
(2
2
3)

ma]
horse
3
2
2
(2

xiao]]
small
3
3)
3)
3)

(Zhang 1997: 315)
‘A dog is smaller than a horse.’
UT
ST1
ST2
ST3

In presenting the reanalysis of the sentence in (7), Wang and Lin (2011) found that ST3 was
not grammatical for some native speakers of Mandarin, particularly for some (not all) Taiwan
Mandarin speakers, and there is a tendency for Beijing Mandarin speakers to consider ST2 and
ST3 grammatical while Taiwan Mandarin speakers consider them ungrammatical.
2.1.2.4 Summary
In this section, I have discussed several important issues. Firstly, I show how T3S applies
when there is no internal syntactic structure (i.e. flat structures). Secondly, I show that syntax
plays a crucial rule in T3S application in phrases and sentences where there are hierarchical
syntactic structures. Thirdly, the phenomenon of T3S variation was presented and I also
discussed how grammaticality judgments may differ because of dialectal differences. Dialectal
differences regarding T3S variation have not attracted much attention, and the issue is worth
24

investigating further. Variation, when interpreted in another sense, can refer to the situation
where speakers vary in their own production of T3S surface patterns, producing different surface
patterns at different times, and this also has to be accounted for. We now turn to two major T3S
models and see how T3S is analyzed and how the T3S variation is accounted for.
2.2 Two major Tone 3 Sandhi Models
Two major T3S models are the Word-and-Phrase Level Model (Chen 2000: Ch 9; Shih 1986;
Shih 1997) and the Stress-foot Model (Duanmu 2000/2007: Ch 11). The two models have the
same empirical coverage for multiple T3S patterns. In what follows, I first review the Word-andPhrase Level Model, followed by review of the Stress-foot Model.
2.2.1 Word-and-Phrase level Model (Chen 2000: Ch 9; Shih 1986; Shih 1997)
Duanmu calls this approach the “stressless-foot approach” (2000:242), to contrast his use of
stress in foot-building in his model. The Word-and-Phrase level Model was developed by Chen
(2000), based on Shih (1986, 1997). Lin adopts this model for T3S in a chapter on tonal
processes (2007: Ch 9). I will refer to this model as the “Word-and-Phrase level Model” because
its major characteristic is the separation of the Word level from the Phrase level with respect to
differences in T3S application mode (cyclic vs. non-cyclic).
To know how T3S applies, we need to know how the domain within which T3S applies is
defined. According to Chen (2000:366), connected speech is broken into units which are referred
to as Minimal rhythmic units (MRUs), and T3S application is obligatory within MRUs. In other
words, MRUs are the prosodic domain, or T3S domain, within which T3S must apply. Syllables
are grouped into binary MRUs from left to right in unstructured expressions, but the building of
the MRUs is sensitive to the morphosyntax in structured expressions (Chen 2000:367-369). Chen

25

(2000:373) points out that intra-MRU T3S application is obligatory and takes precedence over
the inter-MRU T3S application, which is optional.
It should be emphasized that the formation of MRUs in structured expressions has an
important condition which prevents certain elements from being split into different prosodic
domains. Shih (1986, 1997) refers to as Immediate Constituency, defined as “join immediate
constituents into disyllabic feet.” In his analysis, Chen (2000:371) uses a constraint Congruence:
“Group X forms an MRU with its closest morphosyntactic mate.”
Building domains according to Immediate Constituency is the first step of T3S application.
For the next step, Shih (1986, 1997:98) claims a constraint Duple Meter,

10

which is described as

“scanning from left to right, join monosyllabic syllables into binary feet.” Chen (2000:374)
suggests that MRUs are first built for “word-size units,” and then by “phrasal constructions.”
That is, T3S is dealt with at the Word level and then at the Phrase level. In the final step, by the
incorporation rule,

11

any leftover unparsed syllable is incorporated into an adjacent binary foot

(Shih 1986; Shih 1997). Shih (1997:98) points out that evidence was found for unspecified
directionality for incorporation of the unparsed syllable, and more specifically, in a structure
where there is a disyllabic subject followed by a verb and a disyllabic object, the verb can be
incorporated in either direction. Thus, Shih (1997) suggests that the flexibility of directionality
be built into the rules, a modification of her earlier work (Shih 1986), where directionality
follows the syntactic branching.

10

The definition of this rule in Shih (1986) contains the phrase “unless they branch to the
opposite direction,” which was removed in the modified version of the rule in Shih (1997).
11
In Shih (1986), this rule of incorporating an unparsed syllable has the condition “according to
the direction of syntactic branching,” which was removed in the modified version in Shih (1997).
26

In the Word-and-Phrase level model, there are two basic optional rules: (i) T3S is optional
across prosodic domains, and (ii) there is a fast speech, where a larger domain is formed and T3S
applies from left to right iteratively.
In what follows, I mention several aspects that need to be clarified. The notion of Immediate
Constituency proposed by Shih (1986, 1997) is to apply T3S cyclically within what Chen (2000)
states is a “closest morphosyntactic mate.” This corresponds to the Word level in Lin (2007). At
the Word level, T3S is applied cyclically, namely, a bottom-up parsing strategy. Lin (2007) says
that in this model, compound nouns as well as NPs are both regarded to be at the Word level,
though, syntactically, NPs are phrasal.
“… a noun with a modifier that describes or specifies the noun such as xiao laoshu ‘small
mouse,’ (see Chen 2000: §9.3 for details) is also treated as a word, although syntactically
such a complex noun is often classified as a noun phrase. That is, a simple noun, a compound
noun, and a complex noun [modifier + noun] are all treated as words rather than phrases.”
(Lin 2007:207)
At the Phrase level, T3S is applied non-cyclically, except when no foot is formed at the Word
level, and in this case a disyllabic foot is formed for the smallest domain first, before parsing the
rest of the syllables from left to right. In other words, unless no foot has been built at the Word
level, and foot-building has to refer to syntax to form a disyllabic foot for the smallest
constituent, a left-to-right parsing strategy is used without reference to syntax at the Phrase level.
To summarize, once the parsing is finished at the Word level, all the remaining syllables are
parsed into disyllabic feet from left to right. After this step, if there is any remaining unparsed
syllable, it is then incorporated into a neighboring foot.
We now take a look at some simple examples and see how their surface patterns are derived
with the principles that have just been mentioned. We first look how Flat Structures are analyzed

27

in the Word-and-Phrase level Model, followed by the role syntax plays in this model, and finally,
I will discuss how T3S variation is handled in this model.
2.2.1.1 T3S in Flat structures
In this model, disyllabic feet are built from left to right in flat structures. After that, if there is
any unfooted syllable, it is incorporated into the neighboring foot (Chen 2000:368; Shih 1986;
Shih 1997).
(8)

Four T3-digits
[jiu jiu jiu
nine nine nine
3
3
3
(2 3) (2

jiu]
nine
3
3)

(Lin 2007:206)
‘nine-nine-nine-nine’
UT
disyllabic feet from left to right, T3S; ST1

Optional in fast speech:
(2 2
2
3)
*(2 2
3) (3)
(9)

Five T3-digits
[jiu jiu jiu
nine nine nine
3
3
3
(2 3) (2
(2 3) (2

jiu
nine
3
3)
2

jiu]
nine
3
3
3)

Optional in fast speech:
(2 2
2
2
3)
*(2 3) (2 3) (3)
*(2 2
3) (2 3)

ST2

(Lin, 2007:206)
‘nine-nine-nine-nine-nine’
UT
disyllabic feet from left to right, T3S
incorporation of the unparsed syllable; ST1

ST2

In (8), we see an even number of syllables, perfectly divided into two disyllabic feet. T3S
applies within both feet. In (9), two disyllabic feet are parsed, and T3S applies. The unparsed
syllable then joins the foot that precedes it and forms a three-syllable domain, and T3S applies
again (Lin 2007: 206). In fast speech, a larger domain parsing may be used (Chen 2000; Lin
2007; Shih 1997), and therefore (8) and (9) may have an additional pattern (T2T2T2T3) and
(T2T2T2T2T3) respectively. According to Chen (2000:368) and Shih (1997:98), the surface
pattern of (T2T2T3)(T2T3) is ungrammatical.
28

(10) Suo- ma- li3
3
3
(2 3) (2

ya
3
3)

‘Somalia’ (Chen 2000:369)
UT
disyllabic foot from left to right, ST

In (10), we see that in the translation for Somalia, the four syllables are parsed from left to
right in two disyllabic feet, and T3S applies within each foot. The procedure we see here is the
same as that in sequences of digits.
2.2.1.2 T3S depends on syntax
In this subsection, the analysis of phases and sentences is presented and we will see how
syntax plays a role in this model. Some of the phrases or sentences we saw earlier will be used
for illustration.
(11) Three adjacent T3*
[[laoshu] pao]
mouse run
33
3
(23)
3
(22
3)

(Lin 2007:212)
‘The mouse is running.’
UT
Word: disyllabic foot, T3S
Phrase: incorporation, T3S; ST

(12) Three adjacent T3*
[mai
[mi
jiu]]
buy
rice
wine
3
3
3
3
(2
3)
(3
2
3)

(Lin 2007:212-213)
‘to buy rice wine’
UT
Word: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1

Optional in fast speech:
(3
3
3)
one prosodic domain in fast speech
(2
2
3)
T3S; ST2

In (11), T3S is applied first in the inner constituent laoshu ‘mouse,’ and T3S applies. When
pao ‘run’ is incorporated into this foot at the Phrase level, T3S applies again. The surface pattern
is (T2T2T3). In (12), the normal foot-building process applies and T3S gives (T3T2T3), but
with optional fast speech domain building, one large domain is formed and T3S applies from left
to right in one step and produces ST2 (T2T2T3) (Lin 2007:213).
29

(13) left-branching structure
[[[zhanlan]
guan]
exhibition
hall
3
3
3
(2
3)
3
(2
2
3)
(2
2
2

zhang]
director
3
3
3
3)

(Chen 2000:383; Lin 2007:212)
‘exhibition hall director’
UT
Word: disyllabic foot, T3S
Word: incorporation, T3S
Word: incorporation, T3S, ST

According to Chen (2000:383), the compound noun in (13) is a complex word and T3S must
apply cyclically from the innermost constituent zhanlan ‘exhibit,’ and then to the next domain
zhanlan guan ‘exhibit hall,’ and finally, to the outermost domain zhanlan guan zhang ‘exhibit
hall director.’ The surface pattern is T2T2T2T3. The pattern derived through the optional fast
speech rule is also T2T2T2T3.
(14) right-branching structure
[xiao [mu [laohu]]]
small female tiger
3
3
3
3
2
3
(2 3)
3
(3
2
3)
(2
3
2
3)
Optional in fast speech I:
3
(3
3
3)
3
(2
2
3)
(3
2
2
3)

(Lin 2007: 212)
‘small female tiger.’
UT
Word: disyllabic foot, T3S
Word: incorporation, no T3S
Word: incorporation, no T3S; ST1

one prosodic domain for [mu [laohu]]
T3S from left to right
incorporation, no T3S; ST2

Optional in fast speech II:
(3
3
3
3)
one prosodic domain for all syllables
(2
2
2
3)
T3S from left to right; ST3
According to Lin (2007:213), in (14), through cyclic foot-building, the normal pattern is
T2T3T2T3. In fast speech, either mu laohu ‘female tiger’ or xiao mu laohu ‘small female tiger’
in the phrase forms a larger domain, and two additional patterns are (T3T2T2T3) and
(T2T2T2T3) respectively. Next, we turn to some sentences.

30

(15) [[Mi
Mickey
3
3
(3
(3

[laoshu]]
mouse
33
(23)
23)
22

hao]
good
3
3
3
3)

(Lin 2007: 209)
‘Mickey Mouse is good.’
UT
Word: disyllabic foot; T3S
Word: incorporation; no T3S
Phrase: incorporation; T3S

In (15), at the Word level, a disyllabic foot is parsed for the smallest domain laoshu ‘mouse,’
and T3S applies. Next, the unfooted syllable is incorporated into the disyllabic foot that has been
built. T3S does not apply since we do not have adjacent T3* at this point. At the Phrase level, the
unparsed syllable hao ‘good’ is incorporated into the adjacent three-syllable foot, and T3S
applies (Lin 2007: 209). Now we consider a sentence with a different structure in (16).
(16) [wo
I
3
3
3
(2

[xiang
want
3
3
3
3)

[mai
buy
3
3
(2
(2

[bi]]]]
pen
3
3
3)
3)

Optional in fast speech:
(3
3
3
3)
(2
2
2
3)

(Lin 2007: 215)
‘I want to buy pens.’
UT
Word: not applicable
Phrase: disyllabic foot for the smallest domain, T3S
Phrase: disyllabic foot for the rest, T3S; ST1

one prosodic domain in fast speech
T3S from left to right; ST2

In (16), T3S is not applicable at the Word level. At the Phrase level, after the disyllabic foot
is formed for the smallest domain, the rest of the syllables are parsed from left to right. The
optional rule in fast speech yields the surface pattern of (T2T2T2T3) through left-to-right T3S
application in one step (Lin 2007:214-215). Clearly, the derived surface patterns differ in (15)
and (16) because of their structural differences. Let us see how T3S works in a longer sentence.

(17) [[Mi
[laoshu]] [xiang [zhao
[hao [mi jiu]]]]]
Mickey mouse
want look for good rice wine
3
3
(3
(3

33
(23)
23)
23)

3
3
3
(2

3
3
3
3)

3
3
(3
(3

3
(2
2
2

31

3
3)
3)
3)

(Lin 2007: 221)
‘Mickey Mouse wants to look
for good rice wine.’
UT
Word: T3S
Word: incorporation; no T3S
Phrase: disyllabic foot from left
to right, T3S, ST

In (17), disyllabic feet are formed for laoshu ‘mouse’ and mi jiu ‘rice wine’ at the Word level.
In the next step, Mi ‘Mickey’ and hao ‘good’ are incorporated into their following feet. At this
point, foot-building and T3S application are completed at the Word level. At the phrase level, a
disyllabic foot is formed non-cyclically, from left to right. We see in (17) that T3S is applied
with reference to syntax (a bottom-up strategy) at the Word level, but without reference to syntax
(a left-to-right strategy) at the Phrase level.
2.2.1.3 T3S variation
In the Word-and-Phrase level model (Chen 2000: Ch 9; Shih 1986; Shih 1997), there are two
basic optional rules: (i) T3S is optional across prosodic domains, and (ii) there is a fast speech
where a larger domain is formed and T3S applies from left to right in one step. Let us first look
at the examples we saw earlier in (11) and (12), repeated below in (18) and (19), for illustrating
T3S variation.
(18) Three adjacent T3*
[[laoshu] pao]
mouse run
33
3
(23)
3
(22
3)

(Lin 2007:212)
‘The mouse is running.’
UT
Word: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1

Optional in fast speech:
(33
3)
one prosodic domain in fast speech
(22
3)
T3S; ST2 (=ST1)
(19) Three adjacent T3*
[mai
[mi
jiu]]
buy
rice
wine
3
3
3
3
(2
3)
(3
2
3)

(Lin 2007:212-213)
‘to buy rice wine’
UT
Word: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1

Optional in fast speech
(3
3
3)
one prosodic domain in fast speech
(2
2
3)
T3S; ST2

32

While there is only one surface pattern (T2T2T3) in (18), there are two surface patterns
(T3T2T3) and (T2T2T3) in (19). This is because in (18) the cyclic application and the larger
domain parsing in fast speech result in the same sequence of T2T2T3. In (18) and (19), we see
how syntax and the optional pattern interact, giving different results— one without variants, and
the other with two variants in the output.
In addition to the optional rule for fast speech, T3S is optional across prosodic domains. Lin
(2007) clarifies how they are different in the derivational steps as we see in (20) and (21).
(20) [wo
I
3
3
(2

[xiang
want
3
3

[ma
buy
3
(3

hua]]]
flower
1
1)

3)

(3

1)

(Lin 2007:215)
‘I want to buy flowers.’
Word: not applicable
Phrase: disyllabic foot for the smallest domain, no
T3S
Phrase: disyllabic feet for the rest, T3S; ST1

Optional rule between two T3* in different prosodic domains:
(2
2)
(3
1)
T3S across domains; ST2
Optional in fast speech:
(3
3
3
(2
2
3
(21) [xiao
small
3
3
(3

[mu
female
3
(2
2

1)
1)

[yezhu]]]
boar
(31)
31)
31)

Optional in fast speech
(3
3
31)
(2
2
31)

one prosodic domain for all syllables
T3S from left to right; ST3
(Lin 2007:215)
‘small female boar’
Word: disyllabic foot, no T3S
Word: incorporation, T3S
Word: incorporation, no T3S; ST1

one prosodic domain in fast speech
T3S from left to right; ST2

In (20), in normal speech, T3S is not applicable at the Word level. At the Phrase level, since
no foot has been formed yet, a disyllabic foot is formed for the smallest domain, mai hua ‘buy
flowers,’ and T3S does not apply. In the next step, syllables are formed from left to right, and
T3S applies within this foot, and ST1 (T2T3)(T3T1) is derived. ST2 (T2T2)(T3T1) surfaces

33

when T3S applies across the domains. Alternatively, as we see in ST3, (T2T2T3T1) results from
applying T3S from left to right in one step in fast speech. ST2 and ST3 are of the same sequence
of T2T2T3T1, although their prosodic domains differ.
In (21), cyclic T3S application gives ST1 (T3T2T3T1). The parsing of one prosodic domain
in fast speech gives ST2 (T2T2T3T1) where T3S applies from left to right in one step. While in
(20) the sequence T2T2T3T1 can result from either one of the two paths (optional T3S across
domains or one larger domain in fast speech), in (21) there is only one path in deriving
T2T2T3T1.
(20) and (21) have the same branching and the same sequence of underlying tones, but their
structural differences, along with optional T3S rules, account for the variants (20) and (21).
2.2.2 Stress-foot Model (Duanmu 2000/2007)
Duanmu (2000/2007) suggests that the alternation of strong and weak beats is an important
property of stress and rhythm, and each alternation is what we call a foot. As stress is part of a
foot, a stress implies the existence of a foot, and a foot implies there is stress (Duanmu
2000/2007:126). He assumes that there has to be (at least) two beats in a foot, and that if a
syllable is stressed, it must be heavy (Duanmu 2000/2007:130).
For Duanmu (2000/2007), T3S domains are set up and T3S applies cyclically with reference
to syntax throughout the derivation, from the smallest constituent to the sentence level. He
follows Cinque (1993) in the stress assignments between heads and non-heads in the syntactic
structure. For Mandarin, he suggests that the syntactic nonhead is on the left in compounds, but
on the right in most phrases and, therefore, stress assignment is on the left for compounds and on
the right for most phrases (Duanmu 2004:70). The central notion of the Stress-foot Model is (22).

34

(22) Nonhead Stress (NHS): Syntactic nonheads must have stress (Between a syntactic head and
a syntactic nonhead, the nonhead has more stress). (Duanmu 2000/2007:130-131)
Duanmu (2000) uses X’s in showing that the stress is placed on the nonheads, based on
NHS stated in (22). If we take a DP (determiner phrase), for example, the concept that nonheads get stress can be illustrated in (23).
(23) DP
[D

NP]

(X

X
X)

In (23), the DP is constituted by a D (determiner) and an NP (noun phrase). Suppose the D
and the NP in (23) are both monosyllabic, and they form a foot. The head of the DP is D, and the
NP is a nonhead. According to NHS (22), the NP, being the nonhead, should get stress, which is
marked by an X above the foot formed by the D and the NP. (24) schematically presents the
steps of how a nonhead gets stress, with H and NH referring to head and nonhead respectively.
(24)
H

NH

(X
H

X
X)
NH

In (24), two syllables for head and nonhead form a foot. Recall that the existence of a foot
implies stress, and vice versa (Duanmu 2000/2007:126). As a foot is formed, there has to be
stress. According to NHS (22), NH gets stress, which is marked at the top line above NH in (24).
Simplified marking of NHS is used in examples (25), provided by Duanmu (2000/2007) in
illustrating the point of “Nonheads get stress.”
(25) Examples of “Nonheads get stress” (Duanmu 2000/2007:131)
a.
a DP (determiner phrase)
X
a
house
[D
NP]

35

b.

a PP (prepositional phrase)
X
in
school
[P
NP]

c.

a VP (verb phrase)
X
eat
dinner
[V
NP]
In (25a) – (25c), a, in, and eat are the heads of the DP, PP, and VP. House, school, and

dinner, sisters of D, P, and V respectively, are the non-heads and they must get stress according
to the Nonhead Stress principle stated in (22). Rules describing how T3S operates in the Stressfoot model are in (26).
(26) T3S (Duanmu: 2000/2007: 248, 250)
a.

Feet are determined by NHS (in (22)) at all branches of the syntactic tree (not just the
lowest branches).

b.

T3S is cyclic starting from each foot.

c.

T3S need not apply between two cyclic branches.

d.

A T3 can, but need not change to T2 before a T2 that came from a T3.

e.

In flat structures, feet are built by left-to-right construction of syllable trochees.
In the following sections, we will see the process of foot-building and stress assignment, and

how T3S is applied in this model. We begin with flat structures, followed by structured phrases
or sentences. Finally, we present how this model accounts for T3S variation.
2.2.2.1 T3S in Flat structures
Regarding flat structures, Duanmu has the same view as Shih (1986, 1997) and Chen (2000),
stating that in polysyllabic names and digits, disyllabic feet are built from left to right (Duanmu
2000/2007; Duanmu 2004:70).
36

(27) wu
five
3
(2

wu
five
3
3)

wu
five
3
(2

wu
five
3
3)

Duanmu (2000/2007:239)
‘five-five-five-five’
UT
ST

(28) yi
one
1
(1
or (1

wu
five
3
3)
2)

wu
five
3
(3
(3

qi
seven
1
1)
1)

Duanmu (2000/2007:239)
‘one-five-five-seven’
UT
ST1
ST2

In flat structures, the Stress-foot Model has the same prediction of two disyllabic prosodic
domains for (27) and (28). There is no mention of odd-number syllables in flat structures, so the
position of the model regarding the incorporation of an unparsed syllable is unclear.
2.2.2.2 T3S depends on syntax
We begin with two simple examples in Duanmu (2000/2007).
(29) X
[hao
good
(3
(2
(30)
[mai
buy
3
3
2

jiu]
wine
3)
3)

Duanmu (2000/2007:249)
‘good wine’
Foot
T3S

X
jiu]
wine
(3 Ø)
(3 Ø)
(3 Ø)

Duanmu (2000/2007:249)
‘buy wine’
Foot (Ø =empty beat)
T3S cycle 1 (no effect)
T3S cycle 2

In (29), the two syllables form a foot; with the head being jiu ‘wine,’ the stress is on the nonhead hao ‘good.’ T3S applies in the disyllabic foot formed by hao ‘good’ and jiu ‘wine.’ In (30),
according to Duanmu (2000/2007), given that the object jiu ‘wine’ is the nonhead, it gets stress.
Stress implies the presence of a foot, so jiu ‘wine’ must be in a foot. Since a foot must be
composed of two beats, Duanmu (2000/2007) proposes that there is an empty beat (Ø) in the foot

37

jiu ‘wine’ is in. T3S does not apply in the first cycle. Duanmu (2000/2007) states that the second
cycle gives the surface form T2T3 (2000/2007:249).
In some cases, such as three syllables in structured expressions, foot merger needs to be
applied. Foot merger happens when a monosyllabic word that carries the main stress is followed
by another foot. Then the monosyllabic word and the foot can be merged and form one foot
(Duanmu 2000/2007: 180). If stress on the monosyllabic word is to be maintained, the stress
from the disyllabic word which the monosyllabic word is merged with must be deleted when the
two words join, according to the foot merger process in Duanmu (2000/2007:133).
(31) X
X
X
12
[zhi3
[lao3hu3]]
paper
(old)-tiger
‘paper tiger’

X
([3

[33]])

v23 (Duanmu 2000/2007:249)
(“v” indicates T2 or T3)

According to Duanmu (2000:249), in the inner bracket, Nonhead Stress is placed on lao ‘old,’
and in the outer bracket, stress goes to zhi ‘paper.’ Then, as zhi is monosyllabic, foot merger
applies; the stress from laohu ‘tiger’ is deleted; and only stress on zhi ‘paper’ remains when the
three syllables form one foot. Cyclic T3S application gives the surface pattern of T3T2T3 or
T2T2T3. The variable surface patterns result from the optional rule which states T3S is optional
when a T3 is followed by a derived T2 (Duanmu 2000/2007:250); therefore, T3S can optionally
apply in the first syllable zhi ‘paper,’ resulting in the variation of either T3T2T3 or T2T2T3 in
the output.
In (32), the derivation of zhi laohu ‘paper tiger’ is the same as that in (31), and in the last
cycle, xiao ‘small’ gets stress through NHS. Because it is monosyllabic, foot merger applies, and

12

The numbers following the syllable indicate underlying tones in Duanmu’s (2000/2007)
presentation.
38

the result is one foot with four syllables. Cyclic application gives vv23, which are T2T3T2T3,
T3T2T2T3, or T2T2T2T3 (Duanmu 2000/2007:250).
(32) X
X
X
[xiao3 [zhi3 [lao3hu3]]]
paper
(old)-tiger
‘paper tiger’

X
([3 [3 [33]]])

vv23 (Duanmu 2000/2007:250)

In (33), stress is placed on zhan ‘show,’ and zhan lan ‘exhibit’ is the nonhead when guan
‘hall’, which is already stressed, joins it. Finally, zhan lan guan ‘exhibit hall’ is the nonhead and
should get stress by NHS, but it has stress already. Cyclic application gives only one surface
pattern T2T2T2T3.
(33)

X
X
[[[zhan3 lan3] guan3] li3] ([[[3 3] 3]3])
show-see
hall
inside
‘inside of exhibition hall’

2223 (Duanmu 2000/2007:250)

(32) and (33) show the contrast of a right-branching structure and a left-branching structure,
and how their surface patterns differ because of their structural differences.
Lastly, let us look at how T3S applies in a sentence in (34), taken directly from Duanmu
(2000/2007). (35) shows how the sentence is processed cyclically in stress assignment, starting
with the smallest constituent, shu ‘book.’
(34) T3S in a sentence (Duanmu 2000/2007:251)
X
X
[Wo3 [xiang3 # [mai3 shu1 Ø]]] (2 3) # 3 (1 Ø)
I
want
buy book
(# indicates boundary between cyclic branches)
‘I want to buy books.’
Optional rule:
(2
2) # 3

(1

Ø)

T3S is optional across T3S domains

39

(35) steps of stress assignment for the sentence in (35) according to NHS
a.
b.
c.
d.
X
X
X
X
X
[mai shu]
[xiang [mai shu]]
[Wo [xiang [mai shu]]
(X Ø)
[shu]
buy book
want buy book
I
want buy book
book
‘buy books’
‘want to buy books’
‘I want to buy books.’
‘book’
In (34), the stress assignment in the VP mai shu ‘buy books’ is the same as the VP mai jiu
‘buy wine’ in (30). Duanmu (2000/2007:251) argues that the object shu ‘book’ is the nonhead of
mai shu ‘buy books,’ so that it gets stress and forms a trochee foot with an empty beat. In xiang
mai shu ‘want to buy books,’ mai shu ‘buy books’ is the nonhead, and should get stress. It
already has stress (from the previous cycle). At the sentence level, ni ‘you,’ the subject of the
sentence, is the nonhead and should get stress. According to Duanmu (2000/2007:251), the
monosyllabic subject pronoun wo ‘I’ must form a foot with the following syllable xiang ‘want,’
and T3S applies within the two feet separately, meeting at # which indicates the boundary
between cyclic branches. As can be seen in (35d), there are two stresses (on the first and last
syllables), indicating that there are two feet, with the first and the second syllable being the first
foot, and the third and the fourth syllable being the second foot. Even though stress assignment is
cyclic, there is prosodic grouping of the first two syllables at the end, despite that they are not a
constituent syntactically. In short, stress assignment is purely syntax-based in the Stress-foot
Model, but this approach does not ignore prosodic well-formedness and does have the prosodic
component built in.
Finally, since T3S is optional between cyclic branches, we have variable patterns. When T3S
does not apply across the two domains, we have T2T3T3T1, and when T3S does apply across
the two domains, we have T2T2T3T1.

40

2.2.2.3 T3S variation
In this model, T3S variation arises through syntactic structures as well as the optional
application of T3S across domains. The right-branching structure in (32) and the left-branching
structure in (33) in the previous section show that not only do the surface patterns differ, the
number of surface patterns also differ. This is repeated in (36) and (37) for convenience. The
multiple surface patterns in (36) arise because T3S is optional when a T3 is followed by a
derived T2 (Duanmu 2000/2007:250).
(36) X
X
X
[xiao3 [zhi3 [lao3hu3]]]
small
paper (old)-tiger
‘small paper tiger’
(37)

X
[[[zhan3 lan3] guan3] li3]
show-see
hall inside
‘inside of exhibition hall’

X
([3 [3 [33]]])

X
([[[3 3] 3]3])

vv23 (Duanmu 2000/2007:250)
(T2T3T2T3, T3T2T2T3, or T2T2T2T3)

2223 (Duanmu 2000/2007:250)

As mentioned earlier, Lin (2007:212) points out that expressions with the embeddedness of
constituents on the left edge usually have one pattern whereas those with embeddedness of
constituents on the right edge have more than one surface pattern. The contrast shown in (36)
and (37) is supportive evidence.
Regarding T3S variation, the Stress-foot Model is similar to the Word-and-Phrase level
Model in that both syntactic structures and optional T3S across domains are the causal factors of
multiple surface patterns. The differences between them are: (i) the fast speech account is used in
the Word-and-Phrase level Model, but not the Stress-foot Model; and (ii) a T3 can, but need not
change to T2 before a T2 that came from a T3 in the Stress-foot Model, but not in the Word-andPhrase level Model. (38) is an example that shows T3S variation because of optional T3S across
domains in the Stress-foot Model.
41

(38)

X
X
[([xiu1-gai3]) # ([gao3-jian4])]
revise
manuscript
‘to revise a manuscript’

(1 3) # (3 4) or (1 2) # (3 4) (Duanmu 2000/2007:250)

In (38), the two disyllabic feet are formed for the two words, followed by independent T3S
applications in these two feet, and due to the boundary between the two feet, T3S does not have
to apply (Duanmu 2000/2007:250). The surface pattern (T1T3)(T3T4) in (38) will become
(T1T2)(T3T4) if the other T3S optional rule is applied across the two feet.
2.3 Some issues
Most of the previous T3S studies focused on developing a better T3S model that can account
for the multiple T3S surface patterns (Chen 2000: Ch 9; Duanmu 2000/2007: Ch11; Shih 1986;
Shih 1997; Zhang 1997). From previous sections, I have established that T3S is a phonological
rule that heavily depends on syntax. Both syntax and prosody play essential roles in T3S
application, and without either one it is impossible to build proper T3S domains within which
T3S applies. In the following sections, I will review and discuss issues concerning the two T3S
models. Some general issues with T3S research will also be discussed after the review of the two
models.
2.3.1 Word-and-Phrase level Model
The Word-and-Phrase level Model provides a fairly effective way to capture T3S variation. It
is criticized for its fast speech account of variability, however. Duanmu (2000/2007) argues
against the claim that fast speech explains a variant.
Fast speech
Fast speech is often regarded as the parsing strategy of one large domain (Chen 2000; Lin
2007; Shih 1986; Shih 1997) or larger domains (Zhang 1997). The ‘fast speech’ account is
commonly accepted in the literature. The fast speech pattern is derived by parsing the syllables in
42

one large domain, and then applying T3S from left to right in one step, according to the Wordand-Phrase level Model. Zhang (1997:308) states “fast speech” differently: “In a more casual or
faster style of speaking, a TS domain can be larger than two syllables. It can be as large as an
intonational phrase, which roughly corresponds to a syntactic clause.” Duanmu (2000/2007)
argues against the explanation of T3S variability resulting from different speech rates showing
that for a given expression, the variant surface patterns can be easily produced at the same
speech rate (Duanmu 2000/2007:247-248).
The examples used in the literature for illustrating fast speech are often short sentences where
parsing all the syllables in one large domain and applying T3S from left to right in one step is
easy. It needs to be further investigated whether or not the effect of T3S application in one step
still remains if the number of syllable grows. For instance, it is probably much less likely that all
the syllables are parsed in one domain in a sentence of ten syllables than in a sentence of four
syllables.
Optional T3S across domains
In (39), ST1 is derived through normal parsing (not fast speech parsing).
(39) [[Mi
[laoshu]] [xiang [zhao
[hao [mi jiu.]]]]] (Lin 2007:221)
Mickey mouse
want look for good rice wine ‘Mickey Mouse wants to look for
good rice wine.’
3
33
3
3
3
3 3
UT
3
(23)
3
3
3
(2 3)
Word: T3S
(3
23)
3
3
(3
2 3)
Word: incorporation; no T3S
(3
23)
(2
3)
(3
2 3)
Phrase: disyllabic foot from left to
right, T3S; ST1
Optional T3S across domains:
(3
23)
(2
2)

(3

2

3)

T3S applies across second and
third domains; ST2

Cases with optional T3S across domains in the literature are often in the context of two
domains, such as in (T2T3)(T3T1)

(T2T2)(T3T1). In ST1 in (39), we see that there are two
43

adjacent T3* belonging to two domains (in the second and the third prosodic domains). If an
optional rule applies across the two domains, we have ST2.
Fast speech or optional T3S across domains?
There are two optional rules in this model: the fast speech rule and T3S across domains. The
patterns predicted by these two rules sometimes produce the same sequence as we saw in (20),
repeated here in (40).
(40) [wo
I
3
3
(2

[xiang
want
3
3

[mai
buy
3
(3

hua]]]
flower
1
1)

3)

(3

1)

(Lin 2007:215)
‘I want to buy flowers.’
Word: not applicable
Phrase: disyllabic foot for the smallest domain, no
T3S
Phrase: disyllabic feet for the rest, T3S; ST1

Optional rule between two T3* in different prosodic domains:
(2
2)
(3
1)
T3S across domains; ST2
Optional in fast speech:
(3
3
3
(2
2
3

1)
1)

one prosodic domain for all syllables
T3S from left to right; ST3

In (40), ST2 and ST3 are derived by different paths, but the sequences in the two patterns are
the same. In empirical data where both analyses are possible, it may be difficult to distinguish
which parsing strategy is used by the speaker. More sentences should be investigated to see
whether or not positing only one optional rule can adequately account for all variation patterns.
Directionality
Shih (1986) required that at the Phrase level, syllables are parsed into disyllabic feet unless
they branch in opposite directions, and in addition, the incorporation of an unparsed syllable is
made according to the direction of syntactic branching. In her later work (Shih 1997:98), the
component of directionality was removed due to evidence of irrelevance of directionality. The
sentence in (41) shows a case where the monosyllabic verb can be parsed with the subject or the
44

object. For ease of presentation, I adopt the derivational process presented in Lin (2007) in the
following examples.
(41) [[Lao Li]
Lao Li
3
3
(2
3)
(2
2
or (2
3)

[mai
buy
3
3
3)
(3

[hao
good
3
(2
(2
2

Optional in fast speech:
(2
2
2
2

jiu]]]
wine
3
3)
3)
3)

(Shih 1997:85)
‘Old Li buys good wine.’
UT
Word: T3S
Phrase: incorporation (leftwards), T3S; ST1
Phrase: incorporation (rightwards), no T3S; ST2

3)

one large domain in fast speech, T3S; ST3

In (41), ST1 results from leftward incorporation of the verb mai ‘buy’, whereas ST2 results
from rightward incorporation, and both are grammatical. Leaving the directionality unspecified
accounts for the flexibility of directionality in cases like (41). It is not clear, however, to what
extent the irrelevance of directionality applies to other sentences of the same or similar
structures. If this is found in some cases but not in others, the source of the variability should be
sought. If the choice of directionality can be made freely, we are left with the consequence that
there are two possible derivations at the point of incorporation at the Phrase level. Whether or not
the two possibilities are always grammatical requires more investigation.
Resistance of T3S in certain cases
In short proper nouns like [Mi-[laoshu]] (T3T3T3) ‘Mickey Mouse,’ [Ma [Yo-Yo]] (T3T3T3)
‘Yo-Yo Ma (a cellist),’ or even common nouns [ye [laohu]] (T3T3T3) ‘wild tigers’ or [xiao
[laoshu]] (T3T3T3) ‘little mice,’ we would expect that parsing all three syllables in one domain
and applying T3S from left to right is possible, and (T2T2T3) should be grammatical. However,
this pattern appears to be either ungrammatical or marginally acceptable.

45

2.3.2 Stress-foot Model
This T3S model makes reference to syntax throughout—not just at the word level, but
beyond the word level up to the highest, sentential, level. Syntax is crucial in this model in that
stress assignment is based on the relationship of two constituents. Foot building is through the
NHS principle— nonheads get stress. Once the stress assignment for the whole expression is
finished, if there are unfooted syllables, they will follow prosodic parsing (e.g. two unfooted
syllables will form a foot if there is stress assigned to either of these two unfooted syllables). An
advantage of this model is that there is no need to assume that speech rate is the source of a
variant pattern (Duanmu 2000/2007:254). In addition, there is no need to separate the phrases or
sentences into the Word level and the Phrase level.
Syntactic and prosodic components
Regarding stress assignment, the approach is purely syntactic. However, prosody also plays
an important role as we saw in (34), repeated in (42) below.
(42) T3S in a sentence (Duanmu 2000/2007:251)
X
X
[Wo3 [xiang3 # [mai3 shu1 Ø]]] (2 3) # 3 (1 Ø)
I
want
buy book
(# indicates boundary between cyclic branches)
‘I want to buy books.’
Optional rule:
(2
2) # 3

(1 Ø)

T3S is optional across T3S domains

The presence of stress indicates the presence of a foot (Duanmu 2000/2007); the stress on wo
‘I’ indicates the presence of a foot. Subsequently, wo ‘I’ is parsed with xiang ‘want’ that follows
it. At this final stage, the approach relies on prosody. Although the stress assignment is purely
syntax-based, foot-building is not completely syntax-based, particularly as we see in (42) that the
first two syllables are grouped in a disyllabic foot not because they are a syntactic constituent,
but because of prosodic well-formedness.
46

Empty beats
In the examples provided by Duanmu (2000), the empty beats occur ‘in the final position,’
including sentence-final position (e.g. Sentence-final: Wo (I) xiang (want) mai (buy) shu (book)
Ø ‘I want buy books’ and at a major boundary (e.g. xiang (want) MAI (emphatic: BUY) Ø
gupiao (stock) ‘want to buy stocks’). The empty beats in the phrase/sentence-final position might
be related to the lengthening effect in this position. However, Dell (2004) argues that empty
beats in this model are a serious weakness and he argues that the environments where one can
invoke empty beats need to be precisely indicated (Dell 2004: 55).
Prosodic domains
T3S application does not appear to be restricted within a foot in (30), for instance, repeated
here in (43).
(43)
[mai
buy
3
3
2

X
jiu]
wine
(3 Ø)
(3 Ø)
(3 Ø)

Duanmu (2000/2007: 249)
‘buy wine’
Foot (Ø =empty beat)
T3S cycle 1 (no effect)
T3S cycle 2

Unlike the Word-and-Phrase level Model where the phrase in (43) would be parsed in a foot
([mai jiu] ‘buy wine’ (T3T3) (T2T3)) and T3S applies within a foot, we see that in the Stressfoot Model, T3S can apply outside the foot. Mai ‘buy’ is unfooted, but it still undergoes T3S.
Dell (2004:50) points out that, “…some syllables are left out of foot structure, and this does not
prevent them from undergoing tone sandhi.” How unfooted syllables are handled and the
prosodic domain within which T3S applies are not very clear. The issue with respect to the
domain, or foot, within which T3S applies will need to be made clear.

47

2.4 Conclusion
In this chapter, I have presented two T3S models and have discussed how each model
accounts for T3S in flat structures, phrases, and sentences. Although the prosodic domain within
which T3S applies largely depends on syntax, it also relies on prosody. Multiple T3S surface
patterns are accounted for by optional rules or an alternative parse. For the purpose of the central
focus of the thesis, I adopt the Word-and-Phrase level Model in predicting surface T3S patterns
and summarize what is needed for children to acquire T3S.
To acquire T3S, children will need to learn both cyclic and non-cyclic parsing strategies, and
importantly, to be able to use them at the right levels. Children need to learn that for flat
structures, a non-cyclic parsing strategy is used, and for NPs, a cyclic parsing strategy is used. At
the sentence level, they need to integrate the two strategies. In addition to these, they also have to
learn the optional rule or an alternative parse which produces multiple T3S patterns. The
experimental studies in this dissertation investigate whether or not children know how to apply
T3S non-cyclically in flat structures, and cyclically in NPs, and how to integrate the two
strategies in sentences.

48

CHAPTER 3
PREVIOUS CHILD ACQUISITION STUDIES ON TONES AND TONE SANDHI
3.0 Introduction
The acquisition of tones or tone sandhi rules has not attracted much attention. Although tones
have been extensively studied in Mandarin (Chao 1968; Chen 2000; Cheng 1973; Duanmu
2000/2007; Lin 2007 among others), how children acquire lexical tones and tone sandhi rules
remains an area we do not know very much about. Demuth (1989:82, 85) points out that a child
acquiring a language has to learn what kind of language it is: lexical tone (e.g. Chinese),
grammatical tone (e.g. Sesotho and other Bantu languages), stress/intonational (e.g. English), or
accentual (e.g. Japanese) and by age 2, Sesotho-speaking children are well aware of their
language being a grammatical tone language. Mandarin-speaking children have also been
reported to acquire tones early as well (generally by age 2) (Chang 1991; Clumeck 1977;
Clumeck 1980; Jeng 1979; Jeng 1985; Li & Thompson 1977; Li 1978; Zhu 2002; Zhu & Dodd
2000). There have not been many studies on the acquisition of sandhi rules (tonal changes in
certain contexts), specifically, the T3S rule in Mandarin. The main purpose of this chapter is to
summarize the findings reported in previous studies on tones and tone sandhi rules, with the
focus placed on the acquisition of T3S.
Section 3.1 gives an overview of children’s acquisition of tones and tone sandhi, including
findings on acquisition of tones and tone sandhi rules in several languages. In Section 3.2,
previous studies on children’s acquisition of Mandarin tones and T3S will be reviewed and
discussed. Section 3.3 concludes the chapter with a report of major findings of previous studies
on Mandarin tones and T3S, and areas which still need to be investigated.

49

3.1 The acquisition of tones: an overview
Previous studies have shown that Mandarin-speaking children’s tonal acquisition is
completed before segmental acquisition (typically by age 2) (Chang 1991; Clumeck 1977;
Clumeck 1980; Jeng 1979; Jeng 1985; Li & Thompson 1977; Li 1978; Zhu 2002; Zhu & Dodd
2000). Studies on tonal acquisition of other languages report similar findings of early acquisition
of lexical tones. In their study on phonological acquisition of Cantonese-speaking children, So
and Dodd (1995) found that contrastive use of tones is acquired by age two.
Demuth conducted many studies on children’s acquisition of Sesotho, a Southern Bantu
language, and reports the acquisition of lexical tones (High tone and Low tone) by age 2
(Demuth et al. 2010; Demuth 1989; Demuth 1993; Demuth 1995; Demuth 2003; Demuth 2007).
Sandhi rules are acquired later, such as the High tone spreading rule, acquired by age 3. Sandhi
rules that involve OCP (Obligatory Contour Principle) are acquired later (Demuth 1995; Demuth
2003). Demuth (1989; 1993) suggests that the sandhi rules possibly impede the acquisition of
lexical tones. In tonal languages, tonal rules (or sandhi rules) may greatly differ between
languages.
Mandarin T3S is a type of tone sandhi rule different from the Sesotho sandhi rules. In this
thesis we concentrate on the acquisition of T3S, beginning in the next section with some
background on Mandarin-speaking children’s acquisition of lexical tones.
3.2 Previous acquisition studies on Mandarin tones and T3S
This section reviews previous studies on Mandarin tones and T3S, with focus on the latter.
3.2.1 Children’s acquisition of Mandarin tones
In this section, I review several studies on the acquisition of Mandarin-speaking children’s
acquisition of phonology, including studies focused on the segmental aspect, tonal aspect, or
50

both. First, I describe chronologically the emergence of studies on phonological acquisition in
Mandarin-speaking children. Then, the findings of these studies will be presented and discussed.
Chao (1951) was an early study that reported phonological acquisition of a Mandarinspeaking child. A small number of studies on child acquisition of Mandarin phonology (Clumeck
1977; Jeng 1979; Li & Thompson 1977; Li 1978) were conducted in late 1970s. Sporadic case
studies of Mandarin-speaking children’s tonal or segmental acquisition appeared in the 1980s
and early 1990s (Clumeck 1980; Erbaugh 1992; Jeng 1985). Almost a decade had passed before
a pioneering large-scale study of Chinese children’s phonological acquisition was carried out in
Beijing, China (Zhu & Dodd 2000). This study gave a better picture of the acquisition order of
segments and tones, based on over 100 children. The scale of the study provides a large amount
of empirical and systematic data, unlike most Mandarin child acquisition studies which were
based on a small number of subjects.
Prosodic development in infants
Chen and Kent (2009) studied Taiwanese

13

infants’ (0; 7 – 1; 6) prosodic development. Due

to the fact that the babies’ production of the tonal contours do not always map to the lexical tones
(especially in the babbling stage and before producing the first word), prosodic patterns are

13

Mandarin and Taiwanese are both spoken in Taiwan, with the former being the major
language used in class instruction in schools and in the majority of media. In everyday life, either
Mandarin or Taiwanese can be the major language spoken depending on the regions in Taiwan.
A tendency is that Taiwanese is spoken more than Mandarin in southern Taiwan. The study of
Chen and Kent (2009) is relevant to both Mandarin and Taiwanese since, as babies grow up, they
may use either language as the major language, or use both languages equally well, although this
is less likely. If the language input is one of the minority languages, such as Hakka, the child will
of course acquire Hakka (not Taiwanese), along with the major language, Mandarin.
51

categorized as high, mid, or low, along with falling, rising, and level contours instead of lexical
tones (T1 – T4)

14

in this study.

Falling contours were found to occur more often than rising or level contours in infants and
in child-directed speech (no significant difference), and high prosodic patterns are produced
significantly more often than mid and low prosodic patterns in infants and in child-directed
speech. Chen & Kent (2009:80) also found that infants used significantly more mid prosodic
patterns and fewer low(er) patterns than adults. These findings indicate that falling contours and
high(er) prosodic patterns were more easily acquired and acquired early.
From Chen and Kent (2009), we know that falling, rising, and level contours were all found
in infants as were high, mid, and low prosodic patterns. This study shed some light on what kind
of contours and prosodic patterns appear to be easier than others in the prosodic acquisition of
Taiwanese infants. Most likely, the prosodic development of the infants is closely related to the
later acquisition of four lexical tones and the T3S rule. Now we turn to the acquisition of
Mandarin lexical tones.
Early acquisition of lexical tones
Previous studies agree that the acquisition of tones is complete before the acquisition of
segments (Clumeck 1977; Clumeck 1980; Jeng 1979; Jeng 1985; Li & Thompson 1977; Li 1978;
Zhu 2002; Zhu & Dodd 2000). T1 (High level tone) and T4 (High falling tone) are reported to be
acquired before T2 (Mid rising tone) and T3 (Low dipping tone) (Clumeck 1980; Jeng 1979;
Jeng 1985). Li (1978: 311) studied his son (from 2 to 3 years old) and daughter (from 13 to 20
months), and suggests that children acquire tones very early and accurately. In the study of his

14

The contours and prosodic patterns referred to in this study do not translate directly to the four
lexical tones in Mandarin.
52

two sons’ phonological development carried out in Taiwan, Jeng (1979) reports that T2 and T3
are acquired by them at about the same time, at 19.5 months for one son, and between 16.5 and
18.5 months for the other son. In another study, Jeng (1985), of a child from 0;9–2;6 in Taiwan,
T1 and T4 are acquired early whereas T2 and T3 developed from 1;0 and were completed by 2;3.
In Chao’s (1951) study conducted in the US, his own granddaughter’s (2;4) spontaneous speech
was observed for a month, and T2 and T3 are reported to be produced.
Clumeck (1980) in his longitudinal study of the tonal acquisition of two Mandarin-speaking
children (Child P: 2;3–3;5 and Child J 1;10–2;10), used only words uttered in isolation or in
utterance-final position in order to avoid possible contextual effect on pitch. Both children
showed a lower accuracy in T2 and T3 than in T1 and T4 (Clumeck 1980:268, 270). He found
that both children were able to produce all four tones accurately throughout the period of study,
although there were errors. Both children were reported to reach almost complete mastery of T1
and T4, but have much greater difficulty in T2 and T3, and T2 and T3 were mostly allophones of
each other in those errors (Clumeck 1980:268-270). The findings suggest that the four lexical
tones could be produced accurately by one child as early as 1;0. Clumeck (1980:269) reports that
Child P’s production of T2 and T3 had achieved almost the accuracy of T1 and T4 at the end of
the study, and for Child J there was no evidence that T2 and T3 had been mastered by the end of
the study. Clumeck (1980) points out that, in terms of perception, while T1 and T4 are stable in
these two children, there is variation between T2 and T3. In summary, T1 and T4 appear to be
easier than T2 and T3 for these two children.
Li and Thompson (1977) studied 17 Mandarin-speaking children from 1;6 to 3;0 in Taiwan.
Free speech data as well as children’s responses in picture-naming tasks were used. Zhu and
Dodd (2000) carried out an experimental study on 129 Mandarin-speaking children (1;6–4;6) and

53

a longitudinal study on four children (from age under two to about two years of age) in Beijing,
China. Both Li and Thompson (1977) and Zhu and Dodd (2000) further distinguish the
acquisition order of T2 and T3 and suggest the acquisition order T1, T4 before T2, and T3 last.
Wong et al. (2005) investigate the perception and production of T1 vs. T2 and T1 vs. T4; T2
vs. T4 and T2 vs. T3 of thirteen 3-year-olds (2;10–3; 4, mean age: 3;0) in the US. Seventy-two
pictures and 72 words, were used for the picture-pointing task in the perception study and the
picture-naming task in the production study. They found accurate perception of the four lexical
tones by age 3, and acquisition of T1, T2, and T4 before T3 (Wong et al. 2005). This finding is
slightly different than that of other studies which suggest the acquisition order T1 and T4 before
T2 and T3 (Clumeck 1980; Jeng 1979; Jeng 1985) or T1 and T4 before T2, with T3 last (Li &
Thompson 1977; Zhu & Dodd 2000).
Taken together, all these studies agree that T1 and T4 are acquired first and are stable from
early on. Also consistent is that T3 is acquired last. There is less agreement on the acquisition
order of T2. Although the age of the children studied varied in these studies and the specific age
at which T2 and T3 are acquired is not always provided, the findings of these studies point to the
completion of the acquisition of T2 and T3 between age 2 and age 3. Overall, these studies agree
on early acquisition of four lexical tones and that these lexical tones are not acquired
simultaneously. They reported slightly different acquisition orders of the four tones.
Why are T2 and T3 acquired later than T1 and T4?
From previous studies, we find that T2 and T3 are acquired later than T1 and T4. According
to Li and Thompson (1977), there is confusion between T2 and T3 until the two-to-three-word
stage. Phonetic similarity is believed to cause the delayed acquisition of T2 and T3 (Clumeck
1980; Li & Thompson 1977). Li and Thompson (1977:194) proposed the Similarity Hypothesis
54

and the Difficulty Hypothesis to account for the confusion between T2 and T3. The
Similarity Hypothesis refers to the perceptual similarity between T2 and T3, and the Difficulty
Hypothesis refers to the greater physiological effort required for rising tones (both T2 and T3
have a rising contour) than for the other two tones (T1, a level tone, and T4, a falling tone). The
similarity is in the rising part of T2 and T3 — the pitch contour ‘35’ in T2 (Mid rising tone; 35)
and the tail portion ’14’ in T3 (Low dipping tone; 214). The pitch changes from 3 to 5 and from
1 to 4 in T2 and T3 respectively may be very similar and cause confusion.
Clumeck (1980:274) agrees that the similarity of T2 and T3 lies in the fact that both have a
rising end component which causes the difficulty. Nevertheless, Clumeck (1977; 1980) disagrees
with Li and Thompson’s (1977: 194) alternative account of difficulty in production, according to
which a falling contour can be produced faster than a rising contour and may require less
physiological strength than a rising pitch. T4 is a falling contour while T2 is a rising contour, so
the fact that T2 is acquired after T4 could result from a difficulty in production. Clumeck (1977)
reports that at 1;10, Child M in his study acquired T2 first, which indicates that the rising tone,
T2, is not hard to produce. Furthermore, he points out that in Thai children’s acquisition of tones,
the rising tone is acquired before the high-level tone and the falling tone (Clumeck 1980:271). It
remains controversial as to precisely what causes the delayed acquisition of T2 and T3.
Why is tonal acquisition completed before segmental acquisition?
Clumeck (1980:260) says that children have “relative ease in approximating the phonetic
values of tones in the adult language.” He explains that, given that there are many more segments
than tones for children to acquire, it would be expected that the acquisition of tones is completed
relatively quickly with ease. He acknowledges that the T3S rule may cause difficulties in the

55

process of tonal acquisition, suggesting that it may take a longer time to arrive at the level of
consistent and correct use of tonal allophones in various environments.
In a sequence of T2 and T3 in the same prosodic domain, because of the T3S rule, the T2 in
the surface could be a ‘true T2’ (an underlying T2 surfacing as a T2) or a sandhi tone (a T2
derived through the T3S rule from an underlying T3). T3 can surface as a T2 or a T3 because of
the T3S rule. For instance, in xiao ma ‘a small horse, a pony’ (T3T3 T2T3), the word xiao
‘small’ is produced in T2, but the same word xiao is produced in T3 in ‘small’ in xiao mao ‘a
small cat, a kitten’ (T3T1 in both underlying tones and surface tones). Upon encountering a T2
or a T3, there may be disambiguation for children to do. They need to know if a T2 is a true T2
or if it is a derived T2 through the tone sandhi rule. They also need to know that a T3 does not
always surface faithful to its underlying form and that a sandhi tone is a ‘disguised T3.’ As
Clumeck (1980:269) argues that T2 words are always heard as T2, whereas T3 words are heard
as T2 or T3. He suggests that this possibly leads to children’s overgeneralization of the tones and
that it may take some time before they discover that although they are phonologically contrastive,
in one environment, two tones alternate. This takes us to the next discussion, the acquisition of
T3S.
3.2.2 Children’s acquisition of T3S
It has been established in the previous chapter that the T3S rule is more than simply knowing
the rule that T3T3

T2T3. What Mandarin-speaking children encounter is an extremely

complex rule application of T3S. In addition to the fact that T2 and T3 are more difficult than T1
and T4 in nature—whether perceptually or in terms of production, or both—T3S also requires
building prosodic domains and mapping between syntax and prosody. Mandarin-speaking
children also have to know the optional application of T3S: (i) in fast speech, and (ii) across

56

prosodic domains. This is not all. T3S variability also presents a great challenge. That is, the oneto-many mapping relationships between underlying tones and surface tones in a sentence with
multiple potential cases of T3S are still to be discovered.
Early acquisition of T3S reported in previous studies
Previous studies (Jeng 1979; Jeng 1985; Li & Thompson 1977; Zhu 2002; Zhu & Dodd 2000)
suggest early acquisition of T3S. The existing findings regarding children’s acquisition of T3S in
the literature are typically just a small portion of the studies whose focus is the general
phonological acquisition and/or acquisition of individual tones. However limited, these findings
do provide useful information and help to better our understanding and to advance our
knowledge of T3S acquisition.
When is T3S rule acquired?
In studying children’s language acquisition, an important piece of information researchers
(and other people including readers, parents, and non-linguists) are interested in learning is: at
what age is the grammar under investigation acquired? Some aspects of grammar may be
acquired instantaneously, while others may take time to develop, and in that case there is a
period from the time of the emergence of the grammar to adult-like competence. If T3S does
take some time before it is fully acquired, we ask: when does the acquisition process begin; when
is T3S completely acquired; and how long does it take for children to develop adult-like mastery.
Previous T3S studies
The three most often cited papers are the pioneering work of Chao (1951), the first crosssectional phonological study of 17 Mandarin-speaking children by Li and Thompson (1977), and
the first large-scale cross-sectional experiment with 129 Mandarin-speaking children by Zhu and
Dodd (2000). We will begin with Chao (1951), a case study of his granddaughter at 2;4, and a
57

study which continued for a month. It is the earliest literature to indicate interest in children’s
acquisition of T3S. With respect to T3S, Chao (1951) provides the following examples.
(1)

(2)

Biao
watch
3
2
*3

you
existential (there is)
3
3
3

Bi
pen
3
2
*3
2

you
exsitential (there is)
3
3
3
3

‘There is a watch.’
UT
ST
Child production

‘There is a pen.’
UT
ST
Child (2;4) production (first try)
Child (2;4) production (self-correction a few seconds after
the first try)

Chao (1951) says that T3S is “only beginning to be learned,” and does not discuss it further
or offer his interpretation of what the data may indicate. Nor did he claim early acquisition of
T3S. In the later literature, we see that the data were interpreted differently by different
researchers. Hong (1980:11) reported that this child had acquired sandhi rules. Jeng (1979:157)
stated that the child generally had no problem with the tone sandhi phenomena, and in a later
paper, he says that this child was just beginning to learn the rule (Jeng 1985:19). Wong and
collegues reported only “had some difficulties with the tone sandhi rules” (Wong et al.
2005:1066).
With (1) and (2) being the only pieces of T3S data in the Chao (1951) study, it is difficult to
conclude one way or the other. The child, at 2;4, did not apply T3S in (1). She did not apply T3S
in (2) at first, but corrected herself a few seconds later. This may indicate that at 2;4, she was
aware of the T3S rule, although she may not have been able to apply it in an adult-like fashion.
The acquisition of T3S had started and, was possibly in the process of being developed into
adult-like “proficiency.” Since no additional examples concerning T3S were provided, we do not

58

have enough information to piece together how much of the T3S rule the child had acquired. We
now turn to a few other studies that claim early acquisition of T3S.
Li and Thompson (1977) in a study of 17 children age 1;6 – 3;0 report that the tone sandhi
rules are acquired, with infrequent errors, as soon as the child’s multi-word utterances begin. In
the study of his son JW’s tonal acquisition from 0;2 – 1;9, Jeng (1979) claims that he had no
problem with T3S. In a later study Jeng (1985) reports that Child K (0;9 – 2;6) developed his T2
and T3, and tone sandhi rule from 1;0. He did not begin to apply T3S correctly until he was 1;9
when he had good control of T3 and Jeng (1985) argues that the emergence of T3S implies that
T3 is already acquired. Furthermore, he says that without a 90% correct rate of T3, acquisition of
T3S is impossible. At 2;3, Child K’s acquisition of tone sandhi rules was virtually complete
(Jeng 1985:20-22).
Zhu and Dodd (2000) carried out two studies, a cross-sectional experimental study with129
children aged 1;6 to 4;6 and a longitudinal study with four young children under age 2, and
reported similar findings on early acquisition of T3S. T3S errors were found occasionally in the
two younger age groups (1;6 – 2;0 and 2; 1 – 2) in the experimental study. No T3S errors were
found in the free speech study. Acquisition of T3S was reported to have stabilized by 1;9 for all
four children. (We will return to the Zhu and Dodd (2000) study for a more detailed discussion.)
Evidence of T3S acquisition in previous studies
We saw that researchers interpret differently the two pieces of evidence provided in Chao
(1951). In what follows, the evidence provided in T3S studies will be presented and discussed.
Not all these studies include sample sentences, so only those that were available in the literature
are presented here. The presentation of the phrases and sentences from different studies are
slightly modified from their original presentation for the purpose of consistency, and also to
59

provide information (such as the predicted surface patterns) that can be compared to the patterns
produced by the children.
(3)

[You
there are (existential)
3
2
*3

[xiao
small
3
3
3

[yu]]]
fish
2
2
2

(Li & Thompson 1977)
‘There are small fish.’
UT
ST
Child production

(4)

[Hui
will
4
4
*4

[yao
bite
3
2
3

[ni]]]
you
3
3
3

(Li & Thompson 1977)
‘will bite you’
UT
ST
Child production

An important point Li and Thompson (1977) made was that if the child correctly produced as
T2T3 a simple noun such as xiaoniao (T3T3

T2T3) (‘birdie,’ literally ‘small bird’), this cannot

serve as evidence that the child can actively apply the rule—not until “he is able to make up his
own multi-word predictions” (Li & Thompson 1977:195). They excluded cases such as xiaoniao
(T3T3) ‘birdie’ as evidence that the child had acquired T3S. Two examples of T3S errors they
provided were (3) and (4) where T3S should have been applied, but the child failed to apply the
rule. The age of the child/children who produced (3) and (4) is not specified, but we know that
the 17 children in this study aged from 1;6 to 3;0. There are no sample sentences of correct
application of T3S.
Jeng (1979) reports that his son generally had no problem with T3S, with the data in (5) as
supporting evidence.
(5)
a.

[Wo
I
3
2
2

[ye
also
3
3
3

[yao
want
4
4
4

[chu
to go out
1
1
1

qu]]]] (Jeng 1979)
directional comp
4
4
4

60

‘I also want to go out.’
UT
ST
Child production (at 21.5
months)

b.

wo
I
3
3
3

de
possessive PRT
0
0
0

(Jeng 1979)
‘Mine.’
UT
ST
Child production (at 21.5 months)

(5a) and (5b) are evidence of T3 surfaced as a T3 in a non-T3 sequence, and a T2 in a T3sequence. Even though (5a) is clearly an example of T3S application, it is the only piece of
evidence in the study. With the sole example above, and no description of other environments
where T3S application occurred (e.g. T3S applied at a certain location in a novel sentence), it is
not clear if T3S had been acquired completely. In a study of a different child, Jeng (1985)
provided phrases/sentences as in (6) – (8), with the number of tokens.
(6)

Hao
so
3
2
2
*1

yuan
far
3
3
3
3

(Jeng 1985:22)
‘It’s so far.’
UT
ST
Child (1;9) production (one token)
Child (1;9) production (one token)

(7)

Gei
give
3
2
2

wo
me
3
3
3

(Jeng 1985:22)
‘Give me.’
UT
ST
Child (1;9) production (two tokens)

(8)

Hao
so
3
2
2

kongbu
scary
34
34
34

(Jeng 1985:22)
‘How terrible! (How scary!)’
UT
ST
Child (1;9) production (one token)

In (6), the child applied T3S correctly at one time, but at another time T3S was not applied
and hao ‘so’ was pronounced with a T1, instead of the underlying tone T3, or the sandhi tone T2.
In (7) and (8) where there are also two adjacent T3*, the child applied T3S correctly.
In the sample examples from the previous studies presented in (1) – (8), we see that there are
cases of correct T3S application in (2) and (5) – (8), cases of non-application of T3S in (1) – (4),
61

a case of self-correction in (2), and a case that involved using a tone other than T2 or T3 in (6).
Sample examples in (1) – (8) provide rather helpful information. Notice, however, that in all the
examples we see in (1) – (8), there are only two adjacent T3*, with one T3S application. A
sequence of two T3*, regardless of word categories or sentential position, is unable to provide
the much needed information on how T3S is applied, because by changing the first T3 to a T2 in
a T3T3-sequence, correct applications may surface. A child may know to change a T3T3
sequence to a T2T3 sequence, but not know about cyclicity in T3S (i.e. cyclic and non-cyclic
strategies in T3S application). This argument is not to diminish the value of application of T3S in
cases where there are two adjacent T3*, but we should not ignore the fact that T3S occurs in
environments with two adjacent T3* as well as others with more than two.
Do children know how to apply T3S when there are three or more adjacent T3* and T3S is
applied multiple times? Unfortunately, only some studies (Chao 1951; Jeng 1979; Jeng 1985; Li
& Thompson 1977) provide the sample phrase/sentences produced by children, so it is not
always clear what kind of evidence other studies were based on in their argument of early
acquisition of T3S.
Contrary to the belief that T3S is acquired early, Lee (1996:300) says, “There is fairly good
agreement that children approximate the phonetic values of tones fairly early, and articulatory
control of tone is completed before segmental acquisition. However, it is less clear that the
phonology of tone (including the various tone sandhi rules, for example) is acquired in full any
earlier than the segmental system.” I agree with Lee’s (1996) view and believe that more indepth studies should be carried out before we come to any conclusion.
Chen-Wilson (2003) also points out that children may have learned T3S on the item-by-item
basis in her review of Zhu’s (2000) study of Mandarin-speaking children’s acquisiton of Chinese

62

phonology. In other words, the results should be interpreted with caution since it is not clear if an
utterance in which T3S applies correctly is indeed an active application, rather than a lexicalized
item that is acquired through common expressions in daily life. Researchers should try to avoid
using utterances that might be learned as “chunks” in drawing a conclusion for T3S acquisition.
The claim is weakened without sufficient evidence of active T3S application, and can be
misleading if the conclusion of early acquisition of T3S is based on only a few tokens of data.
In what follows, I will briefly discuss what we can learn for future studies from previous
work regarding the acquisition of T3S.
Zhu and Dodd (2000)
The only existing large-scale child acquisition study of Chinese phonology is Zhu and Dodd
(2000). Refreshing and different, their study sets a new milestone and offers a rich resource for
research on phonological acquisition by Mandarin-speaking children which can greatly enhance
our understanding of their phonological development. As the focus of Zhu and Dodd (2000) is on
the segmental acquisition, T3S is not studied deeply. Nevertheless, it would be unwise to
overlook their findings on the acquisition of tones and T3S. Zhu and Dodd (2000) is a journal
article, which later was incorporated into a book chapter in Zhu (2002), where more information
was provided. Zhu (2002) concerns both normally-developing children and children with
functional speech disorders. In this review, only the data regarding normally-developing children
will be discussed.
Zhu and Dodd (2000) contains two studies. In Study 1, Picture naming and picture
description tasks included 129 children aged 1;6 to 4;6. They show that (i) errors of four
separate lexical tones are rare even in the youngest child group, and (ii) five out of 21 children
from the 1;6 to 2;0 age group and three out of 24 from the 2;1 to 2;6 age group occasionally

63

made tone sandhi mistakes. In Study 2, they examine longitudinal natural speech from four
children (0;10-2;0, 1; 0-2;0, 1;1-2;0, and 1;2-1;8). Natural speech data were collected in childparent interactions. Findings were that (i) T1 and T4 emerged earlier than T2, which is followed
by T3; (ii) tone sandhi rules stabilize (66.7% accuracy criterion) soon after their first emergence
(by 1;9 for all four children); and (iii) T3S errors were not found.
Regarding experimental materials, 44 words/phrases that young children are likely to know
were used, including 39 nouns such as taiyang (T4T2) ‘sun,’ pingguo (T2T3) ‘apple,’ bizi (T2T0)
‘nose’; four phrases/short expressions xiexie (T4T0) ‘thank you,’ zaijian (T4T4) ‘bye-bye,’ xi
lian (T3T3 T2T3) ‘wash (your) face’, and shua ya (T1T2) ‘brush (your) teeth’; and one color
word hong (T2) ‘red’ (Zhu 2002:201-202; Zhu & Dodd 2000:14).
In the picture-naming task, although the items for testing acquisition of consonants, vowels
and individual tones were well-selected, the use of certain items on the list for testing application
of T3S poses some problems. For instance, shouzhi (T3T3
(T3)

T2T3) ‘finger’, and “xi (T3) lian

(T2T3)” ‘wash (your) face’ may have been learned as frozen chunks without further

analysis on the child’s part, considering these are commonly used vocabulary. Regarding T3S
application within a word, Zhu and Dodd (2000) acknowledge that in such case, the word may be
produced without a child’s knowledge of the T3S rule; furthermore, it appears that they believe
that T3S might be acquired in an instantaneous fashion, rather than through an acquisition
process that takes time:
“As Li & Thompson (1977) pointed out, a child who is able to adjust tones in a single
word context may not necessarily have acquired the tone sandhi rule. It is likely that s/he
manages to learn the single words as adjusted forms without being aware of tone sandhi
rule… the scarcity of tone sandhi errors in the study may be an artifact of the crosssectional design, in that tone sandhi rules may be acquired during a very short period of
time and such a study is unable to capture such changes.” (Zhu & Dodd 2000:21-22)

64

In fact, if the items that might have been learned as frozen chunks are excluded, the two
nouns composed of two adjacent T3* (as well as the common expressions) in this study should
be excluded for testing the acquisition of T3S. No other items from the list of the experimental
items provided by Zhu (2000) have a T3-sequence (i.e. at least two adjacent T3*) that will
trigger T3S. The additional information on the number of items used for testing T3S provided by
Zhu (2002: 204) shows that there are three items used, which could be lexicalized items and had
not been excluded despite Zhu and Dodd (2000) having agreed with Li and Thompson (1977)
that T3S in a noun does not serve well as evidence of T3S acquisition.
Another source of T3S data in Zhu and Dodd (2000) is the picture-description task.
Unfortunately, no sample T3S production data, either correct or incorrect T3S applications, were
presented. Therefore, we do not know in what environments T3S was applied correctly in most
age groups, as they repot that all age groups except for the two youngest age groups make T3S
errors occasionally. We also do not know how the errors were made (e.g. under-application or
over-application), and in what kind of environments they occurred. Children’s T3S production
data (correct applications or incorrect applications including under-application, mis-application
or over-application) provide information that would help us learn how children process the T3S
rule. It would have been very helpful if some sample phrases/sentences for T3S acquisition had
been included by Zhu and Dodd (2000) and Zhu (2002). Although the findings on the acquisiton
of T3S presented by Zhu and Dodd (2000) might be flawed, their work makes a significant
contribution to our understanding of Mandarin-speaking children’s acquisition of phonology as
well as their developmental patterns.

65

3.3 Conclusion
In general, previous studies show that Mandarin-speaking children acquire lexical tones by
age 2, with T1 and T4 being acquired before T2 and T3. T3S has been reported to be acquired
early. The definition of what counts as mastery of T3S is very fuzzy. It is unclear what was used
in those studies as evidence for adult-like use. Without carefully examination of the T3S
phenomenon, the arguments can be misleading or overstated. A more in-depth study of children’s
acquisition of T3S is needed to consider whether or not children can apply T3S in novel
contexts; whether they can correctly use a non-cyclic parsing strategy in flat structures and a
cyclic parsing strategy in NPs; and whether or not they can integrate the two strategies at the
sentence level. Do they know the T3S optional rules and produce different T3S patterns as adults
do? These are the questions that have not been answered in previous studies, and these are
questions the current work seeks to answer.

66

CHAPTER 4
NATURAL SPEECH
4.0 Introduction
To understand children’s acquisition of T3S, spontaneous speech of child-adult interaction
provides valuable information through which we can learn, for instance, how the child and adult
apply T3S in various syntactic contexts, the variability in T3S application in adults and children,
and the approximate frequency of the T3S input.
Although T3S is the most extensively studied tone sandhi phenomenon in Mandarin, very
little is known about T3S in the context of child-parent interactions. This study seeks to fill in
some gaps. First, T3S application in the caretakers’ speech has not been studied or described in
previous research. Second, we do not have detailed reports on children’s application of T3S
within a word or across words, within constituents or across constituents. Furthermore, we don’t
know the frequency of T3S in the input. It would be good to have a general idea of how much
T3S input a child receives, and the types of T3S application (e.g. cyclic or non-cyclic,
application or non-application of optional T3S) and the T3S variability in the input. Finally, do
children and adults behave similarly in spontaneous speech with respect to T3S?
The questions we address in this chapter are the following: what is the frequency of T3S
application produced in spontaneous speech samples of children and caretakers (the number of
T3S applications is compared to total syllables produced by each participant)? How do children
and caretakers apply T3S at different levels (within words, within constituents, and across
constituents)? Is there T3S variation within and across speakers?
The chapter is organized as follows. Section 4.1 provides additional information on T3S
(which was not included in Chapter 2) that is relevant to our discussion in this natural speech
67

study. Section 4.2 briefly discusses T3S in natural speech. Our hypotheses and predictions are in
Section 4.3. In Section 4.4, the methodology used in the study is described. Section 4.5 presents
the results. Section 4.6 discusses the results and findings. Section 4.8 concludes the chapter with
a summary of our findings.
4.1 Additional linguistic background
The Word-and-Phrase level Model (Chen 2000; Shih 1986; Shih 1997), reviewed in Chapter
2, is used in demonstrating and discussing the sample sentences produced in our natural speech
study. Before we discuss the natural speech data in our study, a point regarding T3S at the Word
level which is relevant to the analysis of the natural speech data will be addressed first. For ease
of presentation, I follow Lin’s (2007) derivational processes.
According to the immediate constituents principle (Shih 1997:98), in the Word-and-Phrase
level Model immediate constituents are joined in disyllabic feet, such as in (1) and (2).
(1)

[Gou
dog
3
3
3
(2
(2

[yao
bite
3
3
3
3)
2)

[[hao-xin]
good-natured
31
(31)
(31
(31
(31

ren]]]
person
2
2
2)
2)
2)

(Shih 1997:97-99)
‘Dogs bite a good-natured person.’
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: T3S; ST1
Phrase: T3S across domains; ST2

or

(3)

(2

31

2)

derived by cyclic application; ST3

Shih (1997:98) suggests that the two syllables hao-xin ‘good-natured’ are joined to form a
foot because they are immediate constituents. At the Word level, there is no T3S. At the phrase
level, the first two syllables are parsed and form a foot, and T3S applies. We have ST1
(T2T3)(T3T1T2) (ST1 is not listed in the Shih 1997, but it is included here as the model predicts
this pattern as well). If optional T3S applies across the domains, we have ST2 (T2T2)(T3T1T2).
Shih (1997:97) argues that for the first two monosyllabic syllables, prosodic restructuring occurs
68

as these two syllables form a domain and such operation can ignore a very strong syntactic
boundary (subject-predicate boundary in this case). According to Shih (1997:97), the parsing of
ST3 is derived by cyclic T3S application in terms of syntactic structure.
(2)

[[ta
he
1
1
1
(1
(1
*(1

[[da zhong]
hit swollen
3 3
(2 3)
(2 3)
2 3)
2 2)
3) (2

shou]]
hand
3
3
(3
(3
(3
3

le]
aspect marker
0
0
0)
0)
0)
0)

(Shih 1997:108)
‘He hit his hand and it became swollen.’
UT
Word: T3S
Phrase: no T3S
Phrase: no T3S; ST1
Phrase: T3S across domains; ST2
ungrammatical

In addition to the immediate constituents condition by which an adjective hao-xin ‘goodnatured’ is parsed first at the Word level, the unit “verb + resultative complement” is also dealt
with at the Word level, by the immediate constituent condition (Shih 1997:108). Shih suggests
that by the immediate constituent condition the verb da ‘hit’ and the resultative complement
zhong ‘swollen’ in (2) should be grouped together first because the resultative complement is the
complement of verb, the sister of the verb in the syntactic structure.
As we see in (2), although da zhong ‘hit-swollen’ (‘it was hit and it became swollen’) is
composed of a verb and a resultative complement, it is dealt with at the Word level because of
the immediate constituents condition. ST1 is not included in Shih (1997), but is listed in (2) as it
is a possible pattern, predicted by the model. When T3S applies across domains, we have ST2. In
the unacceptable pattern in (2), the verb and its complement are not dealt with first (i.e. they do
not form a foot at the Word level). The first two syllables ta ‘he’ and da ‘hit’ are grouped
together from left to right at the Phrase level, requiring in the immediate constituents da ‘hit’ and
zhong ‘swollen’ to be separated, and such parsing is unacceptable. Let us look at one more
example in (3).

69

(3)

[[[xiuli] hao]
repair fine

[biao]]
watch

13
(13)
(12
(12

3
3
3
3)

3
3
3)
2

(Chen 2000:395-396)
15
‘repair the watch so that it works fine now (that the
watch works fine is the result of the repair’
UT
16
Word : no T3S
Word: T3S
Phrase: incorporation, T3S, ST

In (3), we see a disyllabic verb, followed by a resultative complement. In this example,
parsing of the syllables begins with the lexical item xiuli ‘repair,’ and T3S does not apply.
Similar to (2), where the unit of a verb followed by a resultative complement is handled at the
Word level, xiuli ‘fix’ + hao ‘fine’ is dealt with at the Word level, and T3S applies. At the Phrase
level, biao ‘watch’ is incorporated in the preceding foot, and T3S applies again. The surface
pattern is T1T2T2T3.
The discussion above is relevant to the analysis and discussion of sentences in the current
study. I will follow the immediate constituents condition (Chen 2000; Shih 1986; Shih 1997) in
the analysis of the data. For instance, xizao (T3T3)

(T2T3) ‘bathe, take a shower’ is taken as a

lexical item and parsed at the Word level, rather than the Phrase level.
4.2 T3S in natural speech
In order to study T3S, adjacent T3* are needed because T3S is triggered only in such
environment. If there are two T3-sequences in the same sentence, interrupted by one or more
than one non-T3 syllable, T3S operates separately within these two T3-sequences, and never

15

Chen (2000:394-395) says that there are two additional interpretations of the meanings of the
sentence: (i) hao ‘fine’ modifies the noun ‘watch’: [[xiuli] [hao [biao]]] ‘to repair a fine watch,’
and (ii) hao as an aspect marker indicating having finished something [[[xiuli] hao] biao]
‘finished with fixing the watch.’ These two readings are not the intended meaning for his
analysis in (3).
16
Chen (2000:395) uses “Lexical MRU (Minimal Rhythmic Unit)” which corresponds to the
foot building at the Word level in Lin (2007).
70

goes across the non-T3 syllable(s) as illustrated in (4), a sentence of an adult participant in this
study.
(4)

Two T3- sequences interrupted by a non-T3
[[Wo
[ye [hao [xiang [qu xizao]]]]]
wo]
I
also really want go take a shower PRT

3
3
(2
(2
(2

3
3
3)
3)
3)

3
3
(2
(2
(2

3
3
3)
3)
3)

4
4
4
(4
(4

33
(23)
(23)
23)
23

0
0
0
0
0)

(Adult CL)
‘I also really want to go take a
shower!’ (speaking for an animal
while playing with the child)
UT
Word: T3S
Phrase: disyllabic feet; T3S
Phrase: incorporation
Phrase: incorporation; ST

In (4), the non-T3 syllable qu ‘go’ interrupts the sequence of T3*, resulting in two separate
T3-sequences. T3S applies within each of the two T3-sequences without going across the non-T3
syllable qu ‘go.’ T3S in xizao ‘take a shower’ is a case of application at the Word level as
discussed in Section 4.2. The other two T3S applications in the first two prosodic domains are
cases of T3S application at the phrase level. A sentence with many adjacent T3* as we see in (4)
does not occur very frequently. On the other hand, cases of two adjacent T3* within words or
across words are fairly common as shown in (5) and (6) respectively.
(5)

Two adjacent T3* — within a word
[shuimu]
‘jellyfish’
33
UT
(23)
Word: T3S; ST

(6)

Two adjacent T3* — across words
[[hao] [shao]]
so
little
‘so little’
3
3
UT
3
3
Word: no T3S
(2
3)
Phrase: disyllabic foot, T3S; ST
In (5) and (6) where there are two adjacent T3*, the first T3 surfaces as a sandhi tone, T2.

For a lexical item that has two underlying T3* as in (5), the surface tones T2T3 are always what

71

children hear in the input. In (6), hao ‘so’ when parsed with another T3 in the same prosodic
domain surfaces as a T2, but in (7) remains as its underlying tone when followed by a non-T3.
(7)

No adjacent T3*, no T3S application
[[hao] [duo]]
so
much
‘so much’
3
1
UT
3
1
Word: no T3S
(3
1)
Phrase: no T3S; ST
Hao ‘so’ remains as its underlying tone T3 in (7) since there is no T3-sequence to trigger

T3S application. (6) and (7) are simple examples that show a short phrase in a T3-sequence and a
non-T3 sequence respectively. Contrastive examples like (6) and (7) may be simple, but such
examples as well as multiple T3S applications in a more complex context such as in (4) are
essential for children to figure out what the underlying tones are and when and how to apply T3S
correctly.
By studying what children hear in the input, not only do we learn how adults actually apply
the rule, we also have a better understanding of what kind of T3S input children receive and how
they apply T3S. We know that T3S is triggered where there are at least two adjacent T3*. Since
there are four lexical tones in Mandarin Chinese, the probability of having all syllables in T3 in a
sentence is relatively low.
In the current study, all the T3-sequences produced by children and adults in natural speech
are extracted to investigate how T3S is applied in various contexts. In the next section, we
present hypotheses for investigating T3S in natural speech.
4.3 Hypotheses and predictions
The age range of child participants is 4 to 6 years of age. Since these children are older than
those in previous spontaneous speech studies, we expect that these children will have little

72

trouble with T3S application. More specifically, they will have no trouble applying T3S
cyclically at the Word level and non-cyclically at the Phrase level.
Because these children are older and presumably are more mature not only in their
phonological development but also in their syntactic development, they probably can produce
longer and more complex sentences than the younger children in previous studies.
In what follows, “within constituents” refers to syntactic units such as a verb phrase. “Across
constituents” refers to units that are not typically grouped together syntactically as in the case of
a unit formed by a subject and a verb, even though such a unit is not uncommon in prosodic
parsing. We hypothesize that T3S application occurs more frequently within constituents than
across constituents. For instance, T3S application in a ‘subject + verb’ unit (i.e. a non-constituent)
is expected to be less frequent than T3S application in a ‘verb + object’ unit (i.e. a constituent).
Finally, if adult speech has multiple T3S patterns, we expect that children’s speech will also.
Since we cannot compare identical sentences across participants in spontaneous speech, we will
observe whether there are identical or very similar sentences across speakers. We hypothesize
that there will be T3S variation in identical or similar sentences. Our hypotheses are summarized
in (8).
(8)

Hypotheses for T3S in spontaneous speech in 4- to 6-year-olds and their caretakers
H1: Children age 4 – 6 can apply T3S cyclically at the Word level and non-cyclically both
in constituents and across constituents at the Sentence level.
H2: T3S application occurs more frequently within constituents than across constituents.
H3: Variability in T3S application is expected due to the various strategies that can be
used.

73

4.4 Method
For the recording of the interactions between the caretaker and the child, we provided a set of
toy wild animals, farm animals, farm vehicles, and a play mat which had sections of meadows,
farmlands, barns, a pond, etc. The child and the caretaker could play with these toys, but were
free to play with their own toys and engage in a typical play session if they wish.
4.4.1 Subjects
Seven children age from 4;5 to 6;6 and five caretakers participated in this study (Table 4.1).
Three recordings are recordings of one child and one caretaker. One recording is of twin boys
(BR 6;6 and ER 6;6) playing with their mother (Adult TT), and one recording is of a boy (CH
4;5) and a girl (LI 4;6) who are cousins playing with the girl’s mother (Adult CZ). The children
had no known language or hearing deficits at the time of the recording. Three of the adult
participants are elementary school teachers.
Table 4.1 Study 1: Distribution of the subjects
Children

Children’s
age

Caretakers

Child CH
Child LI
Child IU
Child ES
Child GK
Child BR
Child ER

4;5
4;6
4;6
5;5
5;9
6;6
6;6

Time

Adult CZ
Adult CZ
Adult LU
Adult CL
Adult EE
Adult TT
Adult TT

Duration (minutes)

35
35
30
30
36
25
25

4.4.2 Data Collection and transcription
The data were collected in Miaoli, Taiwan. The children and the caretakers were audio- and
video-recorded at participants’ homes for approximately half an hour. After setting up the
equipment, the investigator and the research assistant normally left or stayed on the other side of
the room to make sure that the interaction between the child and the mother would be as natural
74

as possible. All subjects’ responses were recorded on a Marantz PMD660 with an Audiotechnica miniature clip-on microphone (AT831B Cardioid Condenser Lavalier microphone).
A research assistant specialized in phonology and phonetics transcribed all the recordings
following CHILDES (Child Language Data Exchange System) conventions (MacWhinney 2000).
All the sentences were transcribed in Mandarin Chinese, and the surface tones produced were
recorded. Since Mandarin Chinese was used in the transcription, the underlying tone of each
character is evident as each character has a lexical tone assigned to it. Surface tones provide the
information on how the sentences were said.
4.4.3 Coding procedures
For the purpose of examining T3S application in children and adults, all the sentences with
T3-sequences were extracted and further analyzed. Children’s repetition of the caretakers’
utterances was excluded from the analyses.
Sequences of adjacent T3* were considered T3S environments which trigger the application
of T3S. The minimal number of adjacent T3* that triggers T3S application is two. However, if
the two T3* belong to different prosodic domains, T3S application is optional. There is no upper
limit of number of adjacent T3* to be counted as a T3-sequence. Each occurrence of adjacent
T3* was regarded as one single T3-sequence. A sentence in spontaneous speech may have no
T3-sequence at all, or it could have one or more than one T3-sequence (i.e. only one T3sequence or more than one T3-sequence, interrupted by one or more than one non-T3 syllable).
For each T3-sequence that triggers T3S, the T3S application by the speaker was categorized
at three levels— (i) Word level, (ii) within constituents, and (iii) across constituents. Examples
of for these levels are in (9) – (11).

75

Where there are multiple patterns predicted for the phrases or sentences produced by children
and adults, the pattern that was produced will be marked ‘used.’ For instance, ST1 and ST2 may
both be predicted to be grammatical patterns, and if ST2 was used by the speaker, “ST2 (used)”
will show. It should be emphasized that this does not mean that ST1 is ungrammatical. It may be
used by other speakers, or the same speakers at other times. It only means that it was not used by
the speaker at the time of the recording.
(9)
a.

Within words
[keyi] ‘can (auxiliary)’ (LI 4;6)
33
UT
(23)
T3S; ST

b.

[nali]
33
(23)

c.

[suoyi] ‘so’ (ES 5;5)
33
UT
(23)
T3S; ST

d.

[zhiyou] ‘only’ (GK 5;9)
33
UT
(23)
T3S; ST

e.

[laohu] ‘tiger’ (BR 6;6)
33
UT
(23)
T3S; ST

‘where’ (LI 4;6)
UT
T3S; ST

In (9), T3S applies within the lexical items. Evidence that the data in (9) are not underlyingly
T2T3 are keshi ‘but’ (T3T4), nar ‘where’ (T3T0), suode ‘income’ (T3T2), zhiyao ‘as long as’
(T3T4), and laoshi ‘teacher’ (T3T1). In these examples, the first syllable is a T3, and it does not
undergo T3S because it is followed by a non-T3. In addition, each character is assigned an
underlying tone, so it is clear what the underlying tones of items in (9) are. In (10), T3S applies
in syntactic constituents gei wo ‘let me’ and zhao wo ‘give back (the change; the amount of
money) to me’, and these are cases of T3S application within constituents at the sentence level.
76

(10) Within constituents
a.
[[Gei
wo] kan]
let/allow me see
3
3
4
2
3
4
(2
3)
4
(2
3
4)
b.

[Ni
you
3
3
(3

[yao
have to
4
4
4)

[[zhao
give back
3
2
(2

(ER 6;6)
‘Let me see.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, no T3S; ST
wo]
me
3
3
3)

[shi
ten
2
(2
(2

kuai]]]]
dollar
4
4)
4)

(GK 5;9)
‘You have to give me ten dollars back.’
UT
Word: no T3S
Phrase: disyllabic feet, T3S; ST

There are adjacent T3* in both (11a) and (11b). T3S does not apply within words in (11a)
and (11b). At the Phrase level, when the subject pronoun wo ‘I’ is incorporated into the foot that
follows it, T3S applies. In (11a) and (11b), T3S applies across the subject-predicate boundary,
and wo ‘I’ surfaces as a T2.
(11) Across constituents
a.
[Wo
[xiang wan]]
I
want
play
3
3
2
3
3
2
3
(3
2)
(2
3
2)
b.

[Wo
I
3
3
3
(2

[xihuan
like
31
(31)
(31)
31)

[wanju
toy
24
(24)
(24
(24

c.

[[Nali]
where
33
(23)
(23)
(23)
(22)

[you
have
3
3
(3
(3
(3

[[rou]
meat
4
4
4)
4)
4)

(LI 4;6)
‘I want to play.”
UT
Word: no T3S
Phrase: disyllabic foot for smallest domain, no T3S
Phrase: incorporation, T3S; ST
che]]]
car
1
1
1)
1)
[hao
to
3
3
(3
(3
(3

chi]]]
eat
1
1
1)
1
1

(IU 4;6)
‘I like toy cars.”
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: incorporation, T3S; ST
a]
PRT
0
0
0
0)
0)

77

(ES 5;5)
‘Where can I find meat to eat?’
UT
Word: T3S
Phrase: disyllabic foot, no T3S
Phrase: incorporation, no T3S; ST1
Phrase: T3S across domains; ST2 (used)

In (11c), you ‘have,’ rou ‘meat’, hao ‘good, and ’chi ‘eat’ appear with the structure “you
‘have’ + noun + hao ‘lit. good’ + verb” which is commonly used to express, for instance, ‘there
is something to drink/eat/read/say’. There are three adjacent T3* in (11c). At the Word level,
T3S applies in the word nali ‘where.’ This is a case of T3S application within a word. At the
Phrase level, T3S applies across nali ‘where’ and you ‘there is,’ which is a case T3S application
across constituents.
Once all the data had been coded, total T3S applications, total correct applications, and T3S
applications at different levels were counted for each participant. Each T3S production was
counted as one T3S token, including the same item said multiple times. T3S type counts refer to
the number of different phrases or contexts the T3S instances occurred in.
The sentence in (12) illustrates how the number of T3S applications is counted.
(12) [[Gei
Give
3
3
3
(2
(2

[wo]]
me
3
3
3
3)
2)

[wu-shi
five-ten
32
(32)
(32
(32
(32

kuai]]
dollar
4
4
4)
4)
4)

‘Give me fifty dollars.’
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: disyllabic foot, T3S; ST1 (used by GK 5;9)
Phrase: optional T3S across domains; ST2
(used by IU 4;6, Adult LU, Adult EE)

In (12), ST1 has only one sandhi tone (gei ‘give’) that undergoes T3S, so ST1 has one T3S
application. ST2 has two sandhi tones (gei ‘give’ and wo ‘I’), so ST2 of Child IU, Adult LU and
Adult EE has two T3S applications.
Finally, examples of how T3S is counted at the three levels will be presented. Recall that
each case of adjacent T3* is counted as one T3-sequence. Within each T3-sequence, there can be
one T3S application as in (13), or there can be more than one T3S application at different levels
as we see in (14).

78

(13) One T3S application within a word
[nali]
33
(23)

‘where’ (LI 4;6)
UT
T3S; ST

(14) Two T3S applications within constituents and across constituents
[Wo
I
3
3
(2
(2

[xiang(yao)
want
3
3
3)
2)

[xiao
small
3
(3
(3
(3

de]]]
one
0
0)
0)
0)

(BR 6;6)
‘I want the small one.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S; ST1
Phrase: T3S across domains; ST2 (used)

In (13), the first syllable undergoes T3S, and this T3S application occurs within a word. The
example in (13) is counted as one T3S-sequence, and one T3S application within words.
There are two surface patterns for (14), and BR (6;6) produced ST2. In (14), T3S is not
applicable at the Word level. At the Phrase level, wo ‘I’ and xiang ‘want’ are parsed in one foot,
and T3S applies within this foot. ST1 (T2T3)(T3T0) is derived. When optional T3S is applied
across the two prosodic domains, we have ST2 (T2T2)(T3T0). In this case, the first two syllables
undergo T3S. T3S applies across the subject-predicate boundary and wo ‘I’ surfaces as the
sandhi tone, T2. This is a case of T3S application across constituents because wo xiang ‘I want’
is not a syntactic constituent. The second syllable that also surfaces as the sandhi tone T2 is a
case of T3S application within a constituent because xiang xiao de ‘want the small one’ is a
syntactic constituent (i.e. the VP is a syntactic constituent, as opposed to, for instance, wo xiang
‘I want’ which is not a syntactic constituent). The example in (14) is counted as one token of T3sequence (that is, adjacent T3*), but two T3S applications.
In (14), ST2 is derived through optional T3S across domains. Alternatively, left-to-right
parsing in fast speech will produce (T2T2T3T1). However, Child BR (6;6) produced this in a
natural speech setting, not the fast speech setting. The fast speech parsing (T2T2T3T1) and ST2

79

(T2T2)(T3T1) are both strings of T2T2T3T1 when we look only at the sequence of the tones
produced without the prosodic domains they are in.
4.5 Results
In this section, we first report the production of T3*, including adjacent and non-adjacent
T3* produced by each participant in Table 4.2.
Table 4.2 Study 1: Number of T3 (adjacent and non-adjacent) and total syllables produced
Participants

T3:
Non-adj
753

T3:
Adj
258

T3:
total
1011

Total N
of σ
4655

Adult LU

582

275

857

3301

Adult CL

467

222

689

2912

Adult EE

295

219

514

2131

546

160

706

2589

CH (4;5)

24

12

36

176

LI (4;6)

136

39

175

770

IU (4;6)

237

106

343

1395

ES (5;5)

323

103

426

1635

GK (5;9)

320

135

455

1817

BR (6;6)

135

57

192

790

ER (6;6)

146

39

185

838

Adult CZ

Adult TT

17

18

17

% of T3
21.72
(1011/4655)
25.96
(857/3301)
23.66
(689/2912)
24.12
(514/2131)
27.27
(706/2589)
20.45
(36/176)
22.73
(175/770)
24.59
(343/1395)
26.06
(426/1635)
25.04
(455/1817)
24.30
(192/790)
22.08
(185/838)

Adult CZ is the mother of LI (4;6) and the caretaker of CH (4;5). The two children LI and CH
are cousins and they play together.
18
Adult TT is the mother of twin boys BR (6;6) and ER (6;6).
80

19

The production of adjacent T3* is further divided into two, three, and four or more

adjacent T3*. Table 4.3 shows the number of tokens of two, three, and four or more T3*
produced by each participant.
Table 4.3 Study 1: Number of T3-sequences: two, three, and four or more T3*
UT
Two T3*: 33
Three T3*: 333
Four or more T3*:3333(3)
% of T2
23 3)(3 #
# 223 323 # # 2223 2323 3)(323 3223 # # (Total T2/
ST
T3 T2
T3 T2
T3 T2 Total T3*)
Adult
47.67
CZ
95 8 206 95 9 7 48 25
1
0
0
0
4
3 (123/258)
Adult
49.82
LU 115 3 236 115 9 4 39 22
0
0
0
0
0
0 (137/275)
Adult
49.10
20
CL
88 2 180 88 5 4 30 15
1
1
1
0 12 6 (109/222)
Adult
48.86
21
EE
85 3 176 85 6 4 30 16
2
0
0
0 13 6 (107/219)
Adult
48.75
TT
57 2 118 57 5 5 30 15
0
1
0
2 12 6 (78/160)
CH
58.33
(4;5)
3 0
6
3
2 0
6 4
0
0
0
0
0
0
(7/12)
22
LI
53.85
(4;6) 14 0 30 15 3 0
9 6
0
0
0
0
0
0
(21/39)
IU
50.00
(4;6) 44 0 88 44 3 3 18 9
0
0
0
0
0
0 (53/106)
ES
48.54
(5;5) 40 1 82 40 3 4 21 10
0
0
0
0
0
0 (50/103)
GK
48.89
(5;9) 52 2 108 52 2 3 15 7
1
0
0
2 12 7 (66/135)
BR
52.63
(6;6) 12 0 24 12 7 4 33 18
0
0
0
0
0
0
(30/57)
ER
53.85
(6;6) 14 0 28 14 0 1
3 1
2
0
0
0
8
6
(21/39)
(# T3= total number of underlying T3; # T2= total number of sandhi tone (i.e. T3 that surfaced
as T2; “3)(3” or “3)(323” indicate the two adjacent T3* belong to different prosodic domains. )
19

There is only one case of five adjacent T3* produced by one adult (Adult EE). There are no
cases where the number of adjacent T3* goes beyond five.
20
For the category of three T3*, Adult CL also produced a sequence of (T2T3)(T3T4)
21

For the category of “four or more T3*, Adult EE produced two sequences of four adjacent
T3* (4 T3* × 2 = 8 T3*) and one sequence of five adjacent T3* (5 T3* ×1 = 5 T3*).
22
Child Li (4;6) made a T3S error in a two-T3 sequence where she produced *T3T2 instead of
T2T3. There are 15 two-T3 sequences, with a total of 30 underlying T3* and 15 sandhi tones.
Within the 15 sandhi tones, one is a misapplication.
81

Table 4.4 shows the T3S frequency (total number of T3S applications divided by total
syllables produced) for each participant.
Table 4.4 Study 1: T3S frequency
Caretakers Total T3S
Total
applications syllables
produced
Adult CZ
132
4655

T3S
(%)

Children

CH (4;5)
LI (4;6)
Adult LU
153
3301
4.63 IU (4;6)
Adult CL
123
2912
4.22 ES (5;5)
Adult EE
91
2131
4.27 GK (5;9)
Adult TT
90
2589
3.48 BR (6;6)
ER (6;6)
(T3S % = Total T3S applications/Total syllables produced)
2.84

Total T3S
applications
7
20
57
58
72
24
20

Total
syllables
produced
176
770
1395
1635
1817
790
838

T3S
(%)
3.98
2.60
4.09
3.55
3.96
3.04
2.39

The numbers of T3S applications in Table 4.4 are the number of times the T3S rule is applied
and an underlying T3 changes to a T2. The first question is whether or not T3S frequency of
children and adults are similar. In Table 4.4, we see that the T3S frequency for each child and
adult is under 5%. All children except for CH (4;5) have a slightly lower T3S frequency than
their caretakers. However, the T3S frequency of children highly resembles that of adults. Next,
we turn to how children did when they produced T3S.
Table 4.5 presents all the child and adult participants’ overall production of T3S, token
counts and correct rates. Each different case of T3S is regarded as a type. Each type may be said
one time only, or multiple times. “Token counts” in Table 4.5 refers to total tokens of T3S
applications.

82

Table 4.5 Study 1: T3S correct rates
Caretakers Type Token Correct Correct Children Type Token Correct Correct
rate
counts counts tokens
rate
counts counts tokens
(%)
(%)
Adult CZ
85
132
130
4
5
5
98.48 CH (4;5)
100
LI (4;6)
11
18
17
94.44
Adult LU
85
142
142
IU (4;6)
19
53
53
100
100
Adult CL
68
115
115
ES (5;5)
41
57
57
100
100
Adult EE
52
80
80
GK
41
65
64
100
98.46
(5;9)
Adult TT
61
88
87
19
22
21
98.86 BR (6;6)
95.45
ER (6;6)
13
16
16
100
(Correct % = Total correct tokens/ Total tokens)

All children and adults have a 100% or near 100% correct rate. Two T3S errors were found
in Adult CZ and Adult TT as shown in (15a) and (15b) respectively.
(15) T3S errors in adults
a.
[Ni [hui [na [shenme dongxi]]
you will take what
thing

3
3
(3
(3

2
2
2
2)

30
(30)
(30)
(30)

11
(11)
(11)
(11)

*(3 4
b.

4
4
4)
4

2)

(30)

(11)

[Zhuan
spin
3
3
(2
*(3

[hen
very
3
(3
3
3

yuan]]
round
2
2)
2)
2)

[[gen tamen] [yiqi
zhu]]]] (Adult CZ)
with hem together cook
‘What will you take
(out) to cook with
them?’
1
10
13
3
UT
1
(10) (13)
3
Word: no T3S
1
(10) (13)
3
Phrase: no T3S
(1
10)
(12
3)
Phrase: incorporation;
T3S; ST
(1
10)
(13
3)
ST (used; two tokens)

(Adult TT)
‘(It) spins round and round; (it) spins and it’s round.’
UT
Word: disyllabic foot for the smallest domain, no T3S
Phrase: incorporation, T3S; ST
ST (used; one token)

The small number of T3S tokens in some children, such as Child CH (4;5) who had only 5
T3S tokens, may not be very telling. Also, Child IU (4;6) produced a large number of T3S tokens,
but many of them were of the same types, so the type counts were much lower. This was the case
for three caretaker adults CZ, LU, and CL as well. It is not surprising as repetition of the same

83

sentence is common in child-parent interactions. Sentences such as “…gei wo (give (something)
to me)” and “Wo xiang… (I want…) are two common sentence types with identical T3 strings
that were said multiple times by some of these children and their caretakers. Overall, children
appeared to apply T3S without much difficulty.
Let us now look more specifically at T3S application at different levels—within words,
within constituents, and across constituents. Table 4.6 shows the number of T3S applications at
the three levels by subjects. The percentage for each level by subjects is presented in Figure 4.1.
Table 4.6 Study 1: Number of T3S applications within words, within constituents, and across
constituents
Application levels
T3S applications
Within
Within
Across
Participants
total
Words constituents
constituents
CH (4;5)
3
2
2
7
LI (4;6)
11
4
5
20
IU (4;6)
12
41
4
57
ES (5;5)
19
27
12
58
GK (5;9)
13
57
2
72
BR (6;6)
3
15
6
24
ER (6;6)
6
10
4
20
Adult CZ
48
60
25
132
Adult LU
34
83
36
153
Adult CL
42
66
15
123
Adult EE
9
61
21
91
Adult TT
27
48
15
90

84

Figure 4.1 Study 1: Percentages of T3S application at three levels by subjects

T3S within words, within constituents
and across constituents
100
90
80
70
60
50
40
30
20
10
0

CH LI IU ES GK BR ER adult adult adult adult adult
(4;5) (4;6) (4;6) (5;5) (5;9) (6;6) (6;6) CZ LU CL EE TT
Across constituents 28.57 25 7.02 20.69 2.78 25 20 18.9423.5312.2023.0816.67
Within constituents 28.57 20 71.9346.5579.1762.50 50 45.4554.2553.6667.0353.33
Within words
42.86 55 21.0532.7618.0612.50 30 36.3622.2234.15 9.89 30
For interpretation of the references to color in this and all other figures, the reader is referred to
the electronic version of this dissertation.
In Figure 4.1, the number of T3S applications within words is greatest in two four-year-olds
CH (4;5) and LI (4;6). As mentioned earlier, the small T3S counts make these results very
preliminary. Except for these two children, T3S applications appear to occur most frequently
within constituents. T3S applications at all three levels are attested in all the participants.
However, there is not a clear trend of increase or decrease of T3S application with age at any
particular level.
The second question asked is whether or not children ages 4 – 6 can apply T3S cyclically at
the Word level and non-cyclically at the constituent level. Unfortunately, we are unable to
conclude whether or not children ages 4 – 6 could apply T3S cyclically within words due to the
lack of evidence of multiple T3S applications within a lexical item. In almost all the Word-level
lexical items used by the children there were at most only two adjacent T3*, such as keyi (T3T3)
85

‘can,’ suoyi (T3T3) ‘so,’ nali (T3T3) ‘where,’ laohu (T3T3) ‘tiger,’ xizao (T3T3) ‘take a
shower/bath,’ yongyuan (T3T3) ‘forever,’ buru (T3T3) dongwu (T4T4) ‘mammals (lit. breastfeeding animals).’ All of these underlying T3T3 sequences surface as T2T3 sequences. These
vocabulary items most likely are learned as frozen chunks

23

(see Appendix A for a complete list

of possible frozen chunks). The only example with more than two adjacent T3* at the Word level
in children’s data was Mi-laoshu ‘Mickey Mouse’ (T3T3T3

T3T2T3) produced by ES (5;5),

and as a proper noun (it may be a lexicalized item), it does not serve as a good piece of evidence
that the child did apply T3S cyclically. Due to the lack of child spontaneous speech data of cyclic
T3S application in multiple-layered morphosyntactic structures, such as compound nouns or NPs
where cyclic application is required, we are unable to have a conclusive argument regarding
children’s cyclic T3S application within words.
With respect to strategies, they also can apply T3S non-cyclically both within constituents
and across constituents. To know whether or not children apply T3S non-cyclically, T3S
applications at the Phrase level are examined. Crucially, T3S applications in sequences of three
or more T3* must be examined. Correct T3S application in a sequence of two T3* is insufficient
because T2T3 can be derived from either cyclic or non-cyclic parsing as illustrated in (16) where
T1 is used for the first syllable, but T2 or T4, which are also non-T3*, will give the same
prediction regarding how T3S is applied.

23

Although disyllabic lexical items which have a sequence of T3T3 underlyingly and surface as
T2T3 are most likely to be learned as frozen chunks, it is not clear whether or not separate
monosyllabic T3-lexical items, when combined (such as wo ‘I’/ ni ‘you’ and xiang ‘want’ to
form “I/you want…” which are common in child-parent interactions), are also learned as frozen
chunks.
86

(16) [T1 [T3 [T3 …]]]
a.
a cyclic parsing strategy (bottom-up)
T1 (T2 T3) (T1T2T3)
b. a non-cyclic parsing strategy (left-to-right)
(T1 T3) T3 (T1T2T3)
In (16), we see that even though derivational processes of cyclic and non-cyclic parsing
strategies differ, the surface patterns in (16a) and (16b) are the same. If the two adjacent T3* are
the first two syllables, and the third syllable is a non-T3, the surface pattern will be (T2T3T1)
through both cyclic and non-cyclic parsing strategies.
In what follows, I will show how a sequence of at least three T3* at the Phrase level allows
us to see more clearly what parsing strategy is used at this level. Several cases produced by some
children will be discussed.
T3S application in three adjacent T3* from three hierarchical layers can offer us some
evidence as shown in a simplistic way in (17). Disyllabic T3T3 lexical items (e.g. xizao T3T3
(T2T3) ‘bathe/take a shower’) preceding or following another T3 are excluded from the
discussion of (17) because T3S always applies to T3T3 lexical items before it applies to a third
T3.
(17) [T3 [T3 [T3 …]]]
a.
a cyclic parsing strategy (bottom-up)
T3 (T2 T3) (T3T2T3)
b. a non-cyclic parsing strategy (left-to-right)
(T2 T3) T3 (T2T2T3)
As we see in (17), a cyclic parsing strategy predicts (T3T2T3) while a non-cyclic parsing
strategy predicts (T2T2T3). A small number of sentences were produced by children in this
study that fit the description of the T3-sequence in (17). Sample sentences are in (18) – (20),
with the adjacent T3* underlined. The derivational process in (18), (19), and (20) follows the
Word-and-Phrase level Model (non-cyclic parsing at the Phrase level).

87

(18) [[Mama,]
Mommy
24
32
(32)
(32)
(32)
(32)
(19) [Wo
I
3
3
(3
(3

[wo
I
3
3
(2
(2
(2

[yao
want
4
4
4)
4)

[ye
also
3
3
3)
3)
2)

[zhao
look for
3
3
(2
(2

[xiangyao
want
34
(34)
(34)
(34
(34
[wo
my
3
3
3)
2)

wan]]]]
play
2
2
2
2)
2)

[baba]
daddy
32
(32)
(32)
(32)

(LI 4;6)
‘Mommy, I also want to play.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, no T3S; ST1
Phrase: T3S across domains; ST2 (used)

[mama]]]]]
mommy ‘
32
(32)
(32)
(32)

(ES 5;5)
I want my mommy and daddy.’
UT
Word: no T3S
Phrase: disyllabic feet, T3S; ST1
Phrase: T3S across domains, ST2
(used)

(20) [[Wo [xiang [[hen duo
zhi] [hen duo zhi]]]] ye] (BR 6;6)
I
want very many CL very many CL PRT ‘I want many many (of the
animals)!’
3
3
3
1
1
3
1
1
0
UT
25
3
3
(3
1
1) (3 1
1)
0
Word: no T3S
(2
3)
(3
1
1) (3 1
1)
0
Phrase: disyllabic foot, T3S
(2
3)
(3
1
1) (3 1
1
0)
Phrase: Incorporation, no T3S,
ST1
(2
2)
(3
1
1) (3 1
1
0)
Phrase: T3S across domains;
ST2 (used)
In (18), (19), and (20), non-cyclic parsing at the Phrase level predicts the three-T3 sequence
to surface as either (T2T3)(T3..) or (T2T2)(T3..). The former has two adjacent T3*, but since
they belong to two different prosodic domains, it is grammatical. The three children who
produced the examples in (18) – (20) used (T2T2)(T3..) which does not have any adjacent T3*.

24

Mama ‘mommy’ is underlyingly T1T0 originally, but it has undergone some change in
Taiwan Mandarin speakers, especially in children and in child-directed speech where mama
‘mommy’ is T3T2 underlyingly. Later in the discussion section, I will return to discuss the new
forms (new underlying tones) in some vocabulary (mostly kinship terms) which Yeh (2010)
investigates. I will also discuss how these new forms relate to this current T3S study.
25
The parsing of hen duo zhi ‘very many CL (classifier)’ is simplified here, without going
through the cyclic parsing for simple nouns, compound nouns, or NPs at the Word level in the
Word-and-Phrase level model. Cyclic parsing for [[hen duo] [zhi] ‘very many CL’ is
(T3T1)T1 (T3T1T1).
88

Although the sentences in (18) – (20) are evidence that these children know how to apply
T3S non-cyclically at the Sentence level, due to the scarcity of similar data in the current study,
we cannot claim that children do know to use non-cyclic parsing at the phrase or sentence level.
In order to test our second hypothesis—H2: T3S application occurs more frequently within
constituents than across constituents— the frequency of T3S being applied within constituents
and across constituents is compared. Let us take a look at examples in (21) and (22). Application
of T3S (ST2) and non-application of T3S (ST1) are both grammatical in these examples. If T3S
applies more easily within constituents than across constituents, we should find a higher T3S
application rate in the former than in the latter.
(21) Within constituents
[Ta
[you [wasi lu]]]
he
has
gas
stove
1
3
31
2
1
3
(31
2)
(1
3)
(31
2)
↑
non-application
(1

2)

(31

(Adult CZ)
‘He has a gas stove.’
UT
Word: no T3S
Phrase: no T3S; ST1

2)

Phrase: T3S across domains; ST2 (used)

↑
application
(22) Across constituents
[[laohu] [ye
[shi
tiger
also are
33
3
4
(23)
3
4
(23)
(3
4)
↑
non-application
(22)

(3
↑
application

4)

[liang
two
3
(3
(3

zhi]]]
CL
1
1)
1)

ye]
PRT
0
0
0

(Adult CL)
‘There are also two tigers!’
UT
Word: T3S
Phrase: disyllabic foot, no T3S;
ST1 (used)

(3

1)

0

Phrase: T3S across domains; ST2

In (21), T3S is not applicable within words. T3S does not apply across the two prosodic
domains in ST1. The non-application is grammatical because the two adjacent T3* belong to
89

different prosodic domains. ST1 is a case of non-application within a syntactic constituent you
wasi lu ‘has a gas stove.’ Adult CZ’s production is ST2, with the application of T3S across the
two prosodic domains. This is a case of T3S application within a syntactic constituent.
In (22), T3S applies within a word. At the Phrase level, ye shi ‘also are’ form a disyllabic
foot and T3S does not apply across the first two prosodic domains. This is ST1 that Adult CL
used. ST2 is also a grammatical pattern where T3S applies across the first two prosodic domains.
As we see in (21) and (22), application and non-application of T3S can occur within constituents
as well as across constituents.
Application and non-application of T3S within constituents and across constituents were
examined for each participant. T3S application rates in constituents and across constituents are
calculated separately (T3S application % = total application of T3S/(total application of T3S +
total non-application of T3S)) for each individual. It should be emphasized that all these tokens
used for the calculation of the rate of T3S application are grammatical patterns (i.e. application
or non-application of T3S in these contexts are both correct, and T3S errors were not included in
such calculation). Table 4.7 shows the T3S application rates within constituents and across
constituents by subject.
Table 4.7 Study 1: T3S application rates (%) within constituents and across constituents by
subject
Caretakers Within
Across
Children
Within
Across
constituents
constituents
constituents
constituents
Adult CZ
83.88 (60/72) 86.21 (25/29)
CH (4;5)
100 (2/2)
100 (2/2)
LI (4;6)
100 (4/4)
100 (5/5)
Adult LU
95.40 (83/87) 90 (36/40)
IU (4;6)
100 (41/41)
66.67 (4/6)
Adult CL
95.65 (66/69) 83.33 (15/18)
ES (5;5)
100 (27/27)
75 (12/16)
Adult EE
96.83 (61/63) 95.45 (21/22)
GK (5;9)
98.28 (57/58) 28.57 (2/7)
Adult TT
94.12 (48/51) 62.50 (15/24)
BR (6;6)
100 (15/15)
66.67 (6/9)
ER (6;6)
100 (10/10)
80 (4/5)
Average
84.21
99.36
70
Average
92.98
(112/133)
(children)
(156/157)
(35/50)
(adults)
(318/342)

90

As seen in Table 4.7, all adults have application and non-application of T3S within
constituents and across constituents. Except for Adult CZ, all adults have a higher T3S
application rate within constituents than across constituents. Adult TT’s T3S application rate for
across constituents is much lower than that for within constituents (within constituents: 94.12%;
across constituents: 62.50%), but the difference is not great in other adults.
Almost all children applied T3S all the time within constituents as we see a 100% T3S
application rate in all children except GK (5;9) whose T3S application rate within constituents is
close to 100%. Children applied T3S across constituents 70% of the time. There was very little
T3S data for two 4-year-olds, CH (4;5) and LI (4;6), so that the 100% within constituents and
across constituents for them may not be as meaningful as for other participants. In fact, for all
children, the data for T3-sequences across constituents are very few; therefore, these percentages
of Table 4.7 may not accurately reflect how each of them actually applies T3S across
constituents. Children vary greatly regarding application of T3S across constituents (e.g. GK 5;9:
28.57%, ER 6;6: 80%, and LI 4;6 100%).
If we compare adults’ and children’s average rates of application and non-application of T3S
within constituents and across constituents, it appears that while adults do show little tendency of
applying T3S more in one case than the other (within constituents: 92.98% vs. across
constituents: 84.21%; a 8.77% difference), children appear to apply T3S more within
constituents than across constituents (within constituents: 99.36% vs. across constituents: 70%; a
29.36% difference). While adults apply T3S fairly similarly within constituents and across
constituents, children seem to distinguish them and almost always apply T3S within constituents,
but apply T3S only 70% of the time across constituents. This may indicate that children are still
developing to the stage where adults apply T3S within constituents and across constituents rather

91

similarly. Children may not yet apply T3S as freely and automatically as adults would across
constituents. Our second hypothesis— H2: T3S application occurs more frequently within
constituents than across constituents— cannot be confirmed for the following reasons.
(i) Adults data: T3S application occurred only slightly more frequently within constituents than
across constituents (by a 8.77% difference).
(ii) Children’s data: Although T3S application occurred a lot more frequently within
constituents than across constituents in children (by a 29.36% difference), due to the relatively
small amount of across-constituent data from each child, further investigation will be needed to
confirm whether or not children do apply T3S much more frequently within constituents than
across constituents.
We now turn to T3S variation attested in the spontaneous speech data. Despite the fact that
sentences produced in spontaneous speech are not controlled for the purpose of comparison
across subjects, sentences of similar T3S environments were extracted for testing our third
hypothesis—H3: Variability in T3S application is expected due to the various types of strategies
that are available. We expect that there will be T3S variation in identical or similar sentences.
We begin with one sentence that was produced by four participants.
(23) [[Gei
Give
3
3
3
(2
(2

[wo]]
me
3
3
3
3)
2)

[[wu-shi]
five-ten
3
2
(3
2)
(3
2
(3
2
(3
2

kuai]]
dollar
4
4
4)
4)
4)

‘Give me fifty dollars.’
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: disyllabic foot, T3S; ST1 (used by GK 5;9)
Phrase: T3S across domains; ST2 (used by IU 4;6,
Adult LU, Adult EE)

In (23), Child GK (5;9) did not apply T3S across the prosodic domains, but Child IU (4;6),
Adult LU, and Adult EE all did. Next, we will focus on sentences that begin with a T3-subject
pronoun.
92

Pronouns occur frequently in mother-child interactions, which allow us to compare speakers’
parsing strategies. Frequently used T3-subject pronouns wo ‘I’ and ni ‘you’ provide a chance for
us to compare across subjects how T3S is applied in sentences with subject pronouns.
Furthermore, they allow us to observe and to better understand the cliticization of the pronouns
in T3S application. In what follows, we focus on various sentences containing a T3-subject
pronoun. The predicted pattern(s) and the attested pattern(s) are both listed for comparison.
Where there is more than one surface pattern, the pattern produced by the speaker(s) will be
noted (e.g. ST1, ST2 (used), ST3).
(24)
a.
[Ni
you
3
3
3
(3

wo]]
me
3
3
3)
3)

(IU 4;6, GK 5;9, Adult LU)
‘You give (something) to me.’
UT
Word: no T3S
Phrase: disyllabic foot for the smallest domain
Phrase: incorporation; ST1 (used)

(2

2

3)

Larger domain in fast speech, T3S; ST2

[Wo
I
3
3
3
(3

[gei
give
3
3
(2
2

ni]]
you
3
3
3)
3)

(IU 4;6, GK 5;9)
‘I give (something) to you.’
UT
Word: no T3S
Phrase: Disyllabic foot for the smallest domain
Phrase: Incorporation; ST1 (used)

(2

b.

[gei
give
3
3
(2
2

2

3)

Larger domain in fast speech, T3S; ST2

In (24), based on the Word-and-Phrase level model, a disyllabic smallest domain has to be
parsed first at the Phrase level because no foot has been formed at the Word level as there are no
nouns, and pronouns are clitics. After the disyllabic smallest domain has been parsed at the
Phrase level, the subject pronoun is incorporated. No further T3S application needs to apply at
this point since there are no adjacent T3*. ST1 (T3T2T3) was the pattern used in three different
individuals. According to the Word-and-Phrase level model, one large domain can be formed in
93

fast speech and T3S applies from left to right, so that ST2 (T2T2T3) is also possible for (24a)
and (24b). Given that a three-syllable sequence is quite short, it should be easy to form a threesyllable domain and apply T3S non-cyclically, but such pattern was not found in our participants
although it is a possible pattern. Next, we look (25) which has (24b) inside the longer sentence.
(25) [[Wo [[gei [ni]] [shi kuai]]] [jiu [haole]]] (Adult EE)
I
give you ten dollar then sufficient ‘I’ll give you ten dollars, and that will be
enough (that is what I am willing to pay).’
3
3
3
2
4
4
30
UT
3
3
3
(2 4)
4
(30)
Word: no T3S
(2
3) 3
(2 4)
4
(30)
Phrase: disyllabic foot, T3S
(2
2
3) (2 4)
(4
30)
Phrase: incorporation, T3S; ST1
(3)

(2

3)

(2

4)

(4

30)

ST2 (used)

In (25), the first three syllables are the same as (24). Unlike in (24), there are syllables parsed
at the Word level in (25), so that at the Phrase level T3S should apply non-cyclically, starting
with the first two syllables wo gei ‘I give,’ followed by the incorporation of the third syllable ni
‘you.’ Such procedure predicts (T2T2T3) for the first three syllables in (25); however, T3T2T3
was produced for the first three syllables, identical to the pattern in (23) and (24).
The surface pattern in (25) indicates that the non-cyclic parsing from left to right at the
Phrase level did not occur. One possibility is that because of the strong syntactic boundary, the
subject pronoun makes a degenerate foot by itself without being parsed with a neighboring foot,
despite the fact that pronouns are prosodically weak and are prone to cliticize. The idea of
“strong syntactic boundary” was mentioned in an example of Shih (1997:97-99) that we saw
earlier, repeated in (26).

94

(26) [Gou
dog
3
3
3
(2
(2
or (3)

[yao
bite
3
3
3
3)
2)
(2

[[hao-xin]
good-natured
31
(31)
(31
(31
(31
31

ren]]]
person
2
2
2)
2)
2)
2)

(Shih 1997:97-99)
‘Dogs bit a good-natured person.’
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: T3S, ST1
Optional T3S across domains, ST2
derived by cyclic application; ST3

When optional T3S applies across the domains, we have ST2 (T2T2)(T3T1T2). Shih
(1997:97) argues that the first two syllables, in prosodic restructuring, can ignore a very strong
syntactic boundary and form a foot. According to Shih (1997:97), ST3 is derived by cyclic T3S
in terms of syntactic structure. Given that there is a strong syntactic boundary, an alternative
explanation for ST3 is that it is a subject-predicate parsing.
Non-cliticization of the pronoun was found in children as well, as (27) shows.
(27) [Ni
you
3
3
(2
(2
(2
(3)

[[gei
give
3
3
3)
2
2

[wo]]
me
3
3
3
3)
2)

[liang
two
3
(3
(3
(3
(3

kuai]]]
dollar
4
4)
4)
4)
4)

(GK 5;9)
‘Give me two dollars.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1
Phrase: T3S across prosodic domains; ST2

(2

2)

(3

4)

ST3 (used)

In (27), the monosyllabic foot that ni ‘you’ is in cannot be derived from non-cyclic parsing at
the Phrase level. Sentences in (25) and (27) provide counter evidence that, at the Phrase level,
syllables are parsed from left to right. In ST3 in (27), although prosodically weak, the subject
pronoun stands alone in its own monosyllabic domain. As mentioned, the speaker possibly
preferred maintaining the subject-predicate boundary and not apply T3S across it. It appears that
a monosyllabic foot followed by a disyllabic foot is better than a ternary foot in these cases. For
(25) and (27), if we compare the prosodic domains in ST3 which the speakers used with the
syntactic constituents, we found that prosody and syntax align rather nicely.
95

While we found sentences where T3S does not apply across the subject-predicate boundary
when it could in (25) and (27), we also found numerous examples of T3S applying across the
subject-predicate boundary where the subject is also a pronoun as in (28) – (30).
(28) [[Wo
I
3
3
(2
(2

[xiang
want
3
3
3)
2)

[[dakai]
open
31
(31)
(31)
(31)

lai]]]
directional comp
2
2
(2
(2

(29) [[Ni
you
3
3
(2
(2

[ye
also
3
3
3)
2)

[xihuan
like
31
31
(31)
(31)

(30) [Na
then
4
4
(4
(4
(4

[[wo
I
3
3
3)
3)
2)

[keyi
can
33
(23)
(23)
(23)
(22)

ye]
PRT
0
0
0)
0)

[wanju che]]]]
toy
car
24
1
(24
1)
(24
1
(24
1

[mai
buy
3
3
(2
(2
(2

gou]]]
dog
3
3
3)
3
3

wo]
PRT
0
0
0)
0)

ma]]
question PRT
0
0
0
0)
0)

(ES 5;5)
‘I want to open this up!’
UT
Word: no T3S
Phrase: T3S; ST1
Phrase: T3S across domains;
ST2 (used)
(Adult LU)
‘You also like toy cars!’
UT
Word: no T3S
Phrase: T3S; ST1
Phrase: T3S across domains;
ST2 (used)
(Adult EE)
‘Then can I buy dogs?’
UT
Word: T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, no T3S; ST1
Phrase: T3S across domains;
26
ST2 (used)

Sentences in (28) – (30) show cases where T3S applies across the subject-predicate boundary.
The non-application of T3S across the subject-predicate boundary in (25) and (27) involve three
and four adjacent T3* respectively. The application of T3S across the subject-predicate boundary
in (28) and (29) involve three adjacent T3*, and in (30) five adjacent T3*. In the very long
sequence of T3* in (30), the subject-predicate boundary would have been a good ‘break’ for the
T3-sequence to be divided into domains. That is, T3S not to apply across such boundary so that
26

For this speaker, optional T3S was applied in both the second syllable and the fourth syllable.
If only one optional T3S occurs, namely (T4T2)(T2T3)(T2T3T0) or (T4T3)(T2T2)(T2T3T0),
the patterns are also grammatical.
96

the sequence of five adjacent T3* is at least down to one T3 to the left of the boundary and four
T3* to the right of the boundary. Attested for Adult EE was a T2T2T2T2T3 sequence where
T3S was applied across three prosodic domains as shown in ST2. An alternative account for this
adult speaker’s surface pattern is the larger domain parsing strategy. If this is the case, there is
only one extremely large domain for the whole sentence, and T3S applies from left to right. It is
not clear whether this speaker was using a larger domain parsing and applied T3S from left to
right in one step, even though it was not a fast speech setting.
We have just discussed the application and non-application of T3S across the subjectpredicate boundary in sentences with a subject pronoun. The grouping of the subject pronoun
with the following syllable is not well-formed in terms of syntax because such unit is not a
syntactic constituent. However, in terms of prosody, it is preferred that a monosyllabic subject
pronoun joins a neighboring syllable to form a larger foot, so it does not stand by itself as a
degenerate foot. Violating syntactic well-formedness in the parsing satisfies prosody, whereas
violating prosodic well-formedness in the parsing satisfies syntax. Neither choice is perfect. Both
“syntax-over-prosody” and “prosody-over-syntax” choices are attested in the participants in this
study.
Earlier we saw an example of five adjacent T3* in (30), the greatest number of adjacent T3*
and the only case in our data. One question we might ask is: does the number of adjacent T3* the
subject pronoun is in affect the cliticization of the subject pronoun? The sentences with T3subject pronouns followed by at least one T3 are extracted. Except for Child CH (4;5), the rest of
the participants had some of such sentences. Table 4.8 shows the frequency of cliticization of the
subject pronoun with two, three, and four adjacent T3*.

97

Table 4.8 Study 1: Frequency (%) and Number of cliticizations of subject pronouns in two, three,
and four adjacent T3*
Number of
Two
Three
Four
adjacent T3* (T3-pronoun+
(T3-pronoun + T3T3) (T3-pronoun + T3T3T3)
T3)
Participants
CH (4;5)
­ (Ø)
­ (Ø)
­ (Ø)
LI (4;6)
100 (3/3)
100 (2/2)
­ (Ø)
IU (4;6)
100 (4/4)
33.33 (1/3)
­ (Ø)
ES (5;5)
100 (6/6)
33.33(1/3)
­ (Ø)
GK (5;9)
100 (2/2)
0 (0/3)
0 (0/1)
BR (6;6)
66.67 (2/3)
80.00 (4/5)
­ (Ø)
ER (6;6)
100 (1/1)
0 (0/1)
100 (2/2)
Adult CZ
100 (19/19)
0 (0/4)
­ (Ø)
Adult LU
100 (33/33)
66.67 (2/3)
­ (Ø)
Adult CL
100 (7/7)
100 (1/1)
­ (Ø)
Adult EE
100 (7/7)
0 (0/4)
­ (Ø)
Adult TT
100 (7/7)
20.00 (1/5)
33.33(1/3)
When there are only two adjacent T3*, adults always cliticized the subject pronoun, and
children behaved very similarly. There was only one case where a child BR (6;6), did not
cliticize the subject pronoun. This same child cliticized the subject pronoun in the other two
cases where there were two adjacent T3*.
The number of sentences which begin with a T3-subject pronoun followed by two or three
T3* (i.e. three or four adjacent T3*) are not very many, as shown in Table 4.8. Despite the small
number of three or four adjacent T3* appearing in the child and adult speech we collected, the
data seem to indicate that the subject pronoun is much less consistently cliticized where there are
three or four adjacent T3* than where only two adjacent T3* occur. Sentences in (31) and (32)
are sample sentences with the subject pronoun in three and four adjacent T3* respectively.

98

(31) Three adjacent T3*
a.
Subject pronoun undergoes T3S
[[Wo
[xiang [liang zhi]]] ye]
I
want two
CL
PRT
3
3
3
1
0
3
3
(3
1)
0
(2
3)
(3
1)
0
(2
3)
(3
1
0)
(2
2)
(3
1
0)
b.

Subject pronoun does not undergo T3S
[Ni
[hen chao]]
(GK 5;9)
you
very noisy
‘You are very noisy.’
3
3
3
UT
3
3
3
Word: no T3S
3
(2
3)
Phrase: disyllabic foot for smallest domain; T3S
(3
2
3)
Phrase: incorporation, No T3S; ST1 (used)
(2

2

3)

Larger domain in fast speech; ST2

(32) Four adjacent T3*
a.
Subject pronoun undergoes T3S
[Wo
[ye
[xiang [xuan [san
I
also want choose three
3
3
3
3
1
3
3
3
3
(1
(2
3)
(2
3)
(1
(2
2)
(2
3)
(1
b.

(LI 4;6)
‘I want two (of the animals)!’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, no T3S, ST1
Phrase: T3S across domains; ST2 (used)

zhi]]]]] (ER 6;6)
CL
‘I also want to choose three (animals).’
1
UT
1)
Word: no T3S
1)
Phrase: disyllabic feet, T3S, ST1
1)
Phrase: T3S across domains, ST2 (used)

Subject pronoun does not undergo T3S
[Ni
[[gei [wo]] [liang kuai]]]
you
give me
two
dollar
3
3
3
3
4
3
3
3
(3
4)
(2
3)
3
(3
4)
(2
2
3)
(3
4)
(2
2
2)
(3
4)

(GK 5;9)
‘Give me two dollars.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1
Phrase: T3S across domains; ST2

(3)

ST3 (used)

(2

2

3

4)

The sentences in (31a) and (32a) show that the subject pronoun is cliticized to the following
syllable and undergoes T3S whereas the sentences in (31b) and (32b) present cases where the
subject pronoun did not undergo T3S. In (31b), the subject pronoun surfacing with its underlying

99

tone can be accounted for because after the subject pronoun has been incorporated into the
disyllabic foot that follows it, there are no more adjacent T3*, so T3S needs not apply. Is it
possible that only children leave the subject pronoun as a degenerate foot, but not adults? Such
parsing was found in adults as in (33).
(33) [Wo [qing nimen] [dao
[wo jia] [he-he
[cha]]] ba] (Adult CZ)
I
invite you (pl.) arrive/go I/my home drink-drink tea
PRT ‘Why don’t you come
over to my place for a
cup of tea?’
3
3
30
4
3
1
1 1
2
0 UT
3
3
(30)
4
(3 1)
1 1
2
0 Word: no T3S
(2 3)
(30)
4
(3 1)
(1 1)
(2
0) Phrase: disyllabic foot,
T3S
(2 3)
(30)
(4
3
1)
(1 1)
(2
0) Phrase: incorporation;
no T3S; ST1
(2 2)
(30)
(4
3
1) (1 1)
(2
0) Phrase: T3S across
domains; ST2
(3) (2
30)
(4
3
1) (1 1)
(2
0) ST3 (used)
ST3 of (32b) and (33) seems to have a better prosody-syntax alignment, similar to those of
(25) and (27).
We hypothesized that there would be variability in T3S application due to different parsing
strategies. In various sentences presented in (24) – (33) which involve T3-subject pronouns, we
have seen variability regarding how the monosyllabic subject pronoun is parsed. The attested
surface patterns indicate that sometimes the subject pronoun is parsed with its following syllable
to form a foot, and sometimes it stands alone as a degenerate foot. Leaving the subject pronoun
as a degenerate foot is possibly a different parsing strategy (e.g. cyclic parsing or subjectpredicate parsing). A subject pronoun is not always cliticized to its following domain, and it
appears that it is much less consistently cliticized where there are three or four adjacent T3* than
where only two adjacent T3* occur. Although T3S variability is not limited to cases with subject
pronouns, the data we see in these sentences with subject pronouns that are available to us reveal

100

T3S variability exhibited in children and adults. Our third hypothesis—H3: Variability in T3S
application is expected due to the various types of parsing strategies that are available— is
confirmed.
Lastly, it is worth looking into application and non-application of T3S in the context of only
two adjacent T3*. We know that when two adjacent T3* belong to different prosodic domains,
T3S application is optional across the two domains. The number of cases where T3S does not
have to apply when there are two adjacent T3* which belong to different prosodic domains is
listed in Table 4.9. The application and non-application of T3S in sequence of two T3* are
presented in two categories: within constituents and across constituents.
Table 4.9 Study 1: Two adjacent T3* that belong to different prosodic domains
T3T3
Within constituents
Across constituents
Total
T3S applied
T3S not
T3S applied
T3S not
T3T3
applied
T3T3
applied
T2T3
T3T3
T2T3
T3T3
Participants
T3T3
T3T3
CH (4;5)
0
0
2
0
2
LI (4;6)
0
0
0
0
0
IU (4;6)
1
0
0
0
1
ES (5;5)
2
0
1
1
4
GK (5;9)
8
1
0
1
10
BR (6;6)
0
0
2
0
2
ER (6;6)
0
0
0
0
0
Adult CZ
11
7
1
1
20
Adult LU
10
3
0
0
13
Adult CL
8
2
2
0
12
Adult EE
10
2
2
1
15
Adult TT
3
1
3
1
8
(The numbers in the table refer to number of cases of application and non-application within
constituents and across constituents.)
As we see in Table 4.9, there are few sentences in the child data that contain two adjacent
T3* that belong to different prosodic domains. GK (5;9) and ES (5;5) are the only two children
that have more than two such cases. ES (5;5) and GK (5;9) are also the only two children that
show evidence of application and non-application of T3S in a sequence of two T3* that belong to
101

different prosodic domains. Without more such T3S environments produced by other children,
whether or not they know that T3S application is optional in two T3* that belong to different
prosodic domains is unknown.
Each of the adults shows both T3S application and non-application in the environment of two
adjacent T3* which belong to different prosodic domains. We found that T3S tends to be applied,
rather than not applied, across prosodic domains both within constituents and across constituents.
Since children’s data were too little, we focus on the adult data now. Table 4.10 shows the
percentages of each adult’s application and non-application of T3S within constituents and
across constituents as well as the average percentage for the adults as a group.
Table 4.10 Study 1: Adjacent T3* that belong to two prosodic domains (%)
Adult
Within constituents
Across constituents
participants
T3S applied
T3S not applied
T3S applied
T3S not applied
T3T3
T3T3
T3T3
T3T3
T2)(T3
T3)(T3
T2)(T3
T3)(T3
Adult CZ
61.11 (11/18)
38.89 (7/18)
50 (1/2)
50 (1/2)
Adult LU
n/a
n/a
76.92 (10/13)
23.08 (3/13)
Adult CL
100 (2/2)
0 (0/2)
80 (8/10)
20 (2/10)
Adult EE
66.67 (2/3)
33.33 (1/3)
83.33 (10/12)
16.67 (2/12)
Adult TT
75 (3/4)
25 (1/4)
75 (3/4)
25 (1/4)
Average
73.68 (42/57)
26.32 (15/57)
80 (8/10)
20 (2/10)

There are a lot more cases of adjacent T3* that belong to different prosodic domains within
constituents than across constituents. In a sequence of two adjacent T3* both within (e.g.
[(T1T3)(T3T4)]) and across constituents (e.g. [(T1T3)][(T3T4]]), the chance of T3S being
applied is much higher than it not being applied.
Within constituents, application of T3S is about three times the non-application of T3S
(73.68% vs. 26.32%). This shows that adults prefer to apply T3S in the sequence of two T3*
although such application is optional (in T3S across domains). Across constituents, application

102

of T3S is four times the non-application of T3S (80% vs. 20%). This shows that even if the two
T3* belong to different prosodic domains, and the two domains are not in the same syntactic
constituent, they still prefer to apply T3S than not to apply T3S.
Taken together, in a sequence of two T3* that belong to two prosodic domains, adults prefer
to apply T3S both within constituents and across constituents. One out of three to four times on
average, the adults did not apply T3S in the context where two adjacent T3* belong to different
prosodic domains. Consequently, children receive more of the application of T3S than the nonapplication in the input for the environment in which two adjacent T3* belong to different
prosodic domains. We would expect children’s data to reflect adults’ preference, but we will
need more of children’s data in order to learn whether or not this is true.
The investigation of the adult data revealed adults exhibited variability as they did not
consistently apply or not apply T3S. Variability that arises from optional T3S across domains as
well as from cliticization of pronouns and different parsing strategies in sentences that contain
subject pronouns presented earlier gave us a good amount of evidence to support our third
hypothesis—H3: Variability in T3S application is expected due to the various types of strategies
that can be used.
4.6 Discussion
In this section, we will discuss our findings in detail in several aspects.
T3S application at different levels
Children age 4 – 6 exhibited a high rate of correct T3S applications in spontaneous speech.
Regardless of low counts of T3S instances in some children, T3S application within words,
within constituents, and across constituents were all attested in all child and adult participants.

103

Due to the lack of data of cyclic T3S application within words and insufficient data of noncyclic T3S application at the Phrase level, no conclusion can be drawn regarding whether or not
children indeed know to use cyclic and non-cyclic strategies at the Word level and at the Phrase
level respectively. To test cyclicity at the Word level, lexical items which are composed of only
two adjacent T3* do not serve as evidence, as these items always surface as T2T3 which
children hear and produce. Novel NPs or compound nouns that are composed of multilayers are
needed. They can give us more informative evidence on how children parse syllables in their
T3S application.
As for T3S application at the Phrase level, we found a small number of sentences (see (18),
(19) and (20)) that support children’s knowledge of the non-cyclic strategy. However, evidence
of only several sentences is inadequate to make the claim. More importantly, we found that the
surface patterns in some sentences of children and adults did not match the non-cyclic parsing
strategy at the Phrase level in the Word-and-Phrase level Model. These are sentences that begin
with a subject pronoun which appears to stand alone as a degenerate foot rather than being
parsed with its following syllable in forming a larger prosodic domain. A different parsing
strategy of subject-predicate parsing or syntax-based parsing may account for these sentences.
Subject pronouns
Chen (2000) discussed object pronouns taken as clitics in T3S application (Chen 2000: 402403). There is no mention of the status of subject pronouns in the T3S literature. A pronoun
cliticizes because of its prosodically weak nature. In such case, we would expect to see a subject
pronoun form a foot with its following syllable, but this did not always occur in our spontaneous
speech data. This may indicate that a Mandarin subject pronoun may behave as a typical clitic

104

and cliticize, or may act like a regular noun and stand alone in its own domain in the subject
position.
One may ask what motivates a pronoun not to join its following syllable in forming a foot,
but to be in a degenerate foot by itself. A possibility is that, on the one hand, a prosodically
weak element like a pronoun should cliticize, but, on the other hand, there is a strong subjectpredicate boundary (see some discussion in §4.2) that the subject pronoun prefers not to cross
over. Crossing over to cliticize honors prosody (since a degenerate foot is not preferred), and not
crossing over to cliticize and remaining as a degenerate foot honors syntax (maintains the strong
subject-predicate boundary). If speakers were in fact maintaining the subject-predicate boundary,
then there is an additional parsing for the structure in (34).
(34) Subject pronoun + give + recipient
[ni/wo
[gei
ni/wo]]
you/me
give
you/me
‘I give you/you give me.’
3
3
3
UT
3
3
3
Word: no T3S
3
(2
3)
Phrase: disyllabic foot, T3S
Two possible parsings in the next step:
Incorporation of the subject pronoun:
(3
2
3)
Phrase: Incorporation, no T3S; ST1
No incorporation of the subject pronoun:
(3)
(2
3)
Phrase: No incorporation, no T3S; ST2
(an additional possible parsing)
(2

2

3)

Larger domain in fast speech, T3S; ST3

As we see in (34), the surface pattern ST1 T3T2T3 is predicted to be (T3T2T3) in one
domain, and the other possible parsing suggested is (T3)(T2T3) in two domains. Since the
surface pattern (T3T2T3) and (T3)(T2T3) produce the same sequence T3T2T3, which parsing
was actually used for the production is unclear and might not be easy to know. Although the
three-syllable sentence in (34) can be easily parsed in a three-syllable prosodic domain in one
105

step, the absence of ST2 (T2T2T3) in our data might reveal other factors (such as the resistance
of the subject pronoun to undergo T3S in certain contexts). Speakers might even use T3T2T3 if
they were asked to speak faster; however, this needs to be tested.
We now look at one of the sentences in our data (presented in (27), repeated here in (35))
which clearly shows the subject pronoun being parsed as a degenerate foot.
(35) [Ni
you
3
3
(2
(2
(2
(3)

[[gei
give
3
3
3)
2
2

[wo]]
me
3
3
3
3)
2)

[liang
two
3
(3
(3
(3
(3

kuai]]]
dollar
4
4)
4)
4)
4)

(GK 5;9)
‘Give me two dollars.’
UT
Word: no T3S
Phrase: disyllabic foot, T3S
Phrase: incorporation, T3S; ST1
Phrase: T3S across prosodic domains; ST2

(2

2)

(3

4)

ST3 (used)

According to the Word-and-Phrase level Model, at the Phrase level disyllabic feet are parsed
from left to right, so in (35), the initial syllable ni ‘you’ has no choice but to undergo T3S as it
would be parsed with its following syllable which is also a T3. The pattern used by GK (5;9) is a
subject-predicate pattern. Wo ‘I’ stands alone as a degenerate foot without undergoing T3S. The
pattern is possibly a case of cyclic application like the example in (26), or the child prefers to
separate the subject from the predicate and produced the subject-predicate pattern. A pattern like
this will need to be addressed as it is a phenomenon we found in adults as well as children.
If the subject pronoun can stand alone as a degenerate foot in (35), it should not be very
surprising that such parsing can occur in (34) as well. In (34), the parsing (T3)(T2T3) may
indicate that the alignment of syntactic boundaries and prosodic domains is well attended to (that
is, the parsing of the subject pronoun being a degenerate foot in (34) and (35) is a better
alignment of both syntactic constituents and prosodic domains).

106

To summarize, in building the prosodic domains within which T3S applies, syntaxdependency and well-formedness of the foot structure are both crucial, but there is more to how
speakers parse the syllables into feet. Based on the data in our study, these speakers seem to
attend to the alignment of prosodic domains and syntactic constituents, and sometimes at the cost
of leaving the subject pronoun as a degenerate foot. In dealing with subject pronouns, variable
strategies (cliticization and non-cliticization) were found in the participants of this study. If a
Mandarin subject pronoun does have a dual status (a clitic and cliticizes or a non-clitic and does
not cliticize), then it follows that the cliticization or non-cliticization of subject pronouns is one
of the sources for T3S variability.
In short, the subject-predicate boundary and the alignment of prosody and syntax may play
important roles in how speakers build the prosodic domains and cause subject pronouns to
behave inconsistently as a clitic or not. Further investigation will be needed to test this
assumption. In future studies we should examine both the frequency of a subject pronoun as a
clitic or non-clitic in both child and adult production.
T3S application across the subject-predicate boundary
Regarding T3S variability, for particular sentences, sometimes one surface pattern appears to
be stronger than another which was absent in our data, as in the sentence Wo gei ni ‘I give you’
or Ni gei wo ‘you give me’ (see (24)). In this three-T3 sequence four participants (Child IU 4;6,
Child GK 5;9, Adult LU, and Adult EE) produced the pattern T3T2T3, and the other possible
pattern T2T2T3 was absent.
In the environment of more than two adjacent T3*, cases where T3S applies across a subjectpredicate boundary often include a subject pronoun followed by an adverb (e.g. ye ‘also’) or an
auxiliary (e.g. xiang ‘want’), both of which, being monosyllabic, form a disyllabic foot with the
107

preceding pronoun. Crucially, this foot is commonly followed by a disyllabic lexical item
beginning with a T3 (e.g. dakai T3T1 ‘open,’ xihuan T3T1 ‘like,’ keyi T3T3 ‘can (aux)’). Given
that T3S applies at the Word level first, the lexical items are dealt with before T3S applies at the
sentence level. It is understandable why the subject pronoun and the monosyllabic T3-syllable
that follows it are parsed as a disyllabic foot. Without this, the two syllables would be left as
degenerate feet. Successive degenerate feet are banned when the two syllables can form a perfect
disyllabic foot even though such prosodic domain crosses over the subject-predicate boundary.
Put differently, if the subject-predicate boundary can be maintained (that is, if there is such
option), speakers tend to maintain the boundary. But if maintaining the boundary causes prosodic
ungrammaticality, T3S is applied across the boundary.
We have mainly discussed T3S application in cases that contains T3-pronouns in the subject
position. Whether or not T3S applies across the subject-predicate boundary appears to depend
largely on whether or not the speakers have an option. If the sentence is grammatical, whether
speakers apply T3S across the subject-predicate boundary or not, they tend to choose to honor
the subject-predicate boundary by not applying T3S across the boundary. If, however, not
applying T3S across the boundary will result in an unacceptable T3S output, they will apply T3S
across the boundary.
Optional T3S application in two adjacent T3* that belong to two prosodic domains
When two adjacent T3* belong to different prosodic domains, T3S application is optional.
Both application and non-application (See Table 4.10) were found in all adults and two children
(ES 5;5 and GK 5;9), with application more frequent than non-application.
On average, for adults, application of T3S is 73.68% and non-application is 26.32% within
constituents. Across constituents, the application and non-application of T3S are 80% and 20%
108

respectively. Application of T3S is preferred both within constituents and across constituents. In
any sequence of two T3* that belong to two different domains, approximately one out of three to
four times there will be a non-application of T3S. In the input to child language acquisition, the
non-application of T3S in the environment of only two adjacent T3* belonging to two prosodic
domains can be noise that makes the data less transparent. With T3S variability in adult speech,
children do not have unambiguous data.
Optional T3S across prosodic domains in sequences with multiple T3*
Regarding optional T3S application across prosodic domains when there are more than two
adjacent T3*, it was difficult to compare the behavior of children and adults without adequate
and systematic data, which cannot be controlled for in natural speech studies. A sentence
produced by four participants in (23), repeated in (36), offers some helpful information for
comparison.
(36) [[Gei
Give
3
3
3
(2
(2

[wo]]
me
3
3
3
3)
2)

[wu-shi
five-ten
3 2
(3 2)
(3 2
(3 2
(3 2

kuai]]
dollar
4
4
4)
4)
4)

‘Give me fifty dollars.’
UT
Word: no T3S
Word: incorporation, no T3S
Phrase: disyllabic foot, T3S; ST1 (GK 5;9)
Phrase: optional T3S across domains; ST2
(IU 4;6, Ault LU, Adult EE)

In (36), both adult speakers applied T3S across two prosodic domains. One child did and one
child did not. With few sentences produced by children and adults in our data that are
syntactically identical or similar and with multiple adjacent T3* (at least three adjacent T3*), no
further comparisons can be made across child and adult participants in this regard. The T3S
variants arise in (36) because of optional T3S across prosodic domains.

109

Application or non-application of T3S within constituents and across constituents
On average, adults apply T3S within constituents and across constituents fairly similarly
(within constituents: 92.98% vs. across constituents: 84.21%; a 8.77% difference), but children
appear to apply T3S more within constituents than across constituents (within constituents: 99.36%
vs. across constituents: 70%; a 29.36% difference). Adults handle these similarly, as the data
show that they apply T3S only slightly more frequently within constituents than across
constituents.
Unlike adults, children may be more limited to what is in the immediate environment and
looking across constituents may not be as easy for them as for adults. That children apply T3S
within constituents more than across constituents may indicate that they differentiate the two.
Although our child data show that T3S is applied more frequently within constituents than across
constituents by approximately 30%, the amount of child data was relatively small, so that
whether or not children do distinguish “within constituents” from “across constituents” and apply
T3S more in the former than in the latter needs to be confirmed when more child data becomes
available.
The new falling-rising (T3-T2) sequence in Taiwan Mandarin (Yeh 2010)
In child-directed speech, reduplications are common. Yeh reports (2010) a new falling-rising
(T3-T2) sequence in some vocabulary, mostly kinship terms. This new pattern is relevant to our
study of T3S application. I will briefly talk about Yeh’s (2010) study and then present examples
from our natural speech study to show supporting evidence of this new form as well as the new
pattern’s interaction with the T3S rule.
Yeh (2010) conducted an experiment to compare the phonetic naturalness of Rising-Falling
(T2T3) and Falling-Rising (T3T2) tonal patterns. Participants were asked to read the stimuli
110

created from the base syllables for six kinship terms. An example Yeh (2010) provided was the
syllable jie, taken from jie jie 姊姊 ‘sister.’ In writing, there are different characters for this
same syllable: jie 姊 (T3) ‘sister’, jie 解 (T3) ‘to solve, to untie’, jie 結 (T2) ‘a knot.’ These
words are used to form disyllabic stimuli. As participants read the stimuli in a carrier sentence,
depending on what characters they read the tonal production would differ: (i) jie jie 姊姊 ‘elder
sister’ T3T3

T3T0 (the Neutral tone sandhi), (ii) jie jie 解解 ‘to solve’ T3T3

sandhi), (iii) jie jie 姊姊 ‘elder sister’ T3T3

T2T3 (Tone 3

T3T2 (the new form), and (iv) jie jie 解結 ‘to

untie a knot’ T3T2 (UT=ST). For (iii), where the new form is expected, the participants were
reminded to produce it in a childlike manner (as a child would produce it).
Yeh (2010) says that the results of the experimental study show that the new T3T2 pattern,
the opposite pitch contour of the T2T3 pattern which is derived from the T3S rule, has
significantly fewer pitch changes than the T2T3 pattern, and appears to be more natural
phonetically (with ‘phonetic naturalness’ being defined as fewer pitch changes, longer duration
of rising (T2) syllables and shorter duration of falling (T3) syllables).
The new pattern T3T2 is relevant to our current study because it applies to kinship terms
frequently used in child-directed speech. As Yeh (2010) points out, in the following examples
the “original pattern” changes to “the new pattern”— mama ‘mother’ (T1T0
‘uncle’ (T2T0

T3T2), jiejie ‘sister’ (T3T0

T3T2), shushu

T3T2), didi ‘(younger) brother’ (T4T0

T3T2).

Notice that the initial syllable in the new form is T3, regardless of its original underlying tone.
Certain non-kinship nominal reduplications, such as gou ‘dog’ (T3), gives gou-gou ‘dog, doggie’
the surface form of T3T2, rather than the T2T3 that T3S would predict. In addition to the T3-T2
sequences which Yeh’s (2010) experimental study focuses on, shenme ‘what’ is one of the
limited number of lexical items which have undergone tonal changes and surface as T3-T2
111

sequences. Unlike other vocabulary that has the new form of T3T2 only, shenme ‘what’
interestingly also has T3T0 as its new form. Therefore, for Taiwan Mandarin speakers as shown
in (37), there are three underlying tones altogether for shenme ‘what’: T2T0 (original lexical
tone), T3T2 (one new form), and T3T0 (the other new form). The new tonal patterns, T3T2 and
T3T0, are double underlined.
(37) shenme
20
32
30

‘what’
UT1 (original form)
UT2 (New form 1 in Taiwan Mandarin)
UT3 (New form 2 in Taiwan Mandarin)

While the reduplication of kinship terms tends to be more restricted to child-directed speech,
the new forms of shenme ‘what’ in (37) are widely used across age groups in Taiwan, and both
forms were attested in our child and adult data. In (37), the new forms are marked as T3T2 and
T3T0 underlyingly rather than as surface patterns because speakers did not use the original
underlying tones T2T0 when this lexical item interacts with an adjacent T3. Instead, the new
forms T3T0 and T3T2 were used as underlying (i.e., lexical) forms as we see in (38) and (39)
respectively. In these cases of shenme ‘what’ preceded by a T3 syllable, T3S is triggered.
(38) [Kan
see
a.
4
4
(4

[ni
you
3
3
3)

[yao
want to
4
4
(4

[mai
buy
3
3
3)

shenme]]]]
what
20
(20)
(20)

(GK 5;9)
‘See what you want to buy.’
UT (shenme ‘what’ in original form)
Word: no T3S
Phrase: no T3S; ST1

b.

3
3
3)
3)

4
4
(4
(4

3
3
3)
2)

30
(30)
(30)
(30)

UT (shenme ‘what’ in new form)
Word: no T3S
Phrase: no T3S
Phrase: T3S across domains; ST2 (used)

4
4
(4
(4

112

(39) [Yang
raise
a.
3
3
(3

[shenme]]
what
20
(20)
20)

(Adult TT)
‘Raise what?’
UT (shenme ‘what’ in original form)
Word: no T3S
Phrase: incorporation, no T3S; ST1

b.

(32)
32)

Word: no T3S
Phrase: incorporation, T3S; ST2 (used)

3
(2

In (38) and (39), T3S applies because the new forms for shenme ‘what’ (T3T0 and T3T2)
begin with a T3, and therefore, create a T3-sequence that triggers T3S. Whether or not a Taiwan
Mandarin speaker has shenme ‘what’ with the new form of T3T2 or T3T0, both forms begin with
a T3 and trigger T3S when following an underlying T3. T3S application in these cases cannot be
accounted for without assuming the new lexical tones of shenme ‘what.’ The majority of current
Taiwan Mandarin speakers appear to use T3T2 or T3T0 as their lexical tones for shenme ‘what,’
rather than the original lexical tones, T2T0. It should be noted that the predicted surface forms
and the attested surface patterns resulting from the distinct new forms of T3T2 or T3T0 are all
grammatical.
To summarize, the tonal changes in shenme ‘what’ in Taiwan Mandarin speakers explain
why T3S is applied in cases where T3S seems to be unnecessary if we simply look at the original
lexical tones. T3S application in cases like (38) and (39) are counted as correct. It should be
noted that in this current study, all the children and caretakers are native speakers of Taiwan
Mandarin. Mandarin speakers of other regions may not have the new forms of certain vocabulary
that has been discussed earlier. Mandarin speakers from other regions may also have different
grammatical judgments regarding what counts as a grammatical T3S output. The findings of this
study apply to Taiwan Mandarin speakers and may not apply to Mandarin speakers of other
regions.

113

4.7 Conclusion
In this section, I summarize the findings of the present study with respect to T3S in
spontaneous speech.
Three levels of T3S application were examined— within words, within constituent, across
constituents. T3S application at all three levels was attested in all child and adult participants.
Due to the lack of multi-layer structures at the Word level produced by both children and
adults, no conclusion can be drawn regarding cyclic T3S application within words. At the Phrase
level, we found cyclic and non-cyclic T3S application. But there were also cases that could not
be accounted for by the non-cyclic parsing strategy at this level. It appeared that children and
adults do not always parse disyllabic feet from left to right at the Phrase level, as we saw in the
case of a subject pronoun parsed as a degenerate foot, rather than parsed with the following
syllable. We suggest that the subject-predicate boundary and a better alignment of syntactic
constituents and prosodic domains are the possible sources of participants’ use of the parsing
strategy.
The number of adjacent T3* appears to affect the choice of maintaining the subject-predicate
boundary or applying T3S across the boundary to form a binary foot. Based on our spontaneous
speech data, when a T3-subject pronoun is followed by another T3, the subject pronoun often
surfaces as a sandhi tone, T2, as the result of being parsed with the following word regardless of
the word category (e.g. an adverb, and auxiliary, or a verb). Subject pronouns are much less
likely to cliticize in sequences where there are three or four adjacent T3* than in sequences
where there are only two.
In a sentence beginning with a monosyllabic T3 pronoun followed by a VP which contains a
monosyllabic T3 verb and a monosyllabic T3 pronoun, the attested pattern in both children and

114

adults is T3T2T3, even though T2T2T3 is a possible pattern as well. Both patterns are predicted
by the Word-and-Phrase level Model. The absence of the T2T2T3 pattern is interesting because,
presumably, in a three-syllable sequence, parsing all three syllables in one domain is relatively
easy. In longer sentences which also have a monosyllabic pronoun followed by two or more than
two T3*, we found the subject pronoun is sometimes parsed as a degenerate foot, and sometimes
it forms a prosodic domain with its following syllable(s). It seems that a monosyllabic subject
pronoun has a dual status, behaving like a clitic sometimes, and at other times behaving like a
monosyllabic noun that is not prosodically weak and can stand alone as a degenerate foot. In
summary, the behavior of a subject pronoun in T3S is intriguing. It does not consistently join
other syllable(s) in forming prosodic domains. Its flexibility as to what kind of prosodic domain
it is in (whether it stands alone in a monosyllabic foot or joins other syllables in forming a
prosodic domain) is still to be examined.
Optionality is one of the sources of T3S variation. Where optional T3S is allowed, variation
is attested across two prosodic domains: within and across constituents. Different parsing
strategies that result in different T3S surface patterns were attested. For instance, in the context
of adjacent T3* belonging to different prosodic domains both within constituents and across
constituents, adults apply T3S about three to four times as frequently as not. The shifting
between application and non-application of T3S produces variation. Because of insufficient child
data for this context in our investigation, it is unknown whether or not children behave similarly
to adults in this regard.
It is a great challenge to obtain desired T3S in various syntactic structures across child and
adult speakers, although the limited data in this study and the analysis we conducted provide
some information on T3S in child-parent interactions to fill in some gaps in the T3S acquisition

115

literature. More mother-child natural speech data can be collected in future studies to confirm the
findings of this study as well as to answer certain questions we are unable to answer due to limits
of the data we collected.
As T3S data are quite limited in natural speech, we seek also to investigate T3S application
in children and adults through a series of experiments. Presented in the next three chapters are
T3S experimental studies of non-cyclic T3S application in flat structures, cyclic T3S application
in NPs, and cyclic and non-cyclic T3S strategies at the sentence level.

116

CHAPTER 5
FLAT STRUCTURES
5.0 Introduction
It has been established in previous chapters that T3S heavily depends on syntax. Most
utterances people produce, short phrases or long sentences, are in the form of hierarchical
syntactic structures. There are times, however, when there is no internal structure, such as a
string of digits in phone numbers or a foreign proper noun translated into the target language.
“Louisiana,” for instance, is translated as “lùyìxīānnà” in Mandarin Chinese. No syllable in the
sequence of five syllables is in a syntactic position higher than others. We refer to these
structures without internal structures as “flat structures.”
T3S requires setting up the prosodic domains within which T3S applies. Typically, syntax
and prosody both play vital roles in the building of such domains. In a flat structure, all the
syllables in the string are in principle at the same level.
A flat structure serves as a perfect opportunity to investigate T3S with the focus on the
prosodic facets only. What does T3S depend on when there is “no syntax” for it to refer to?
Exactly how T3S application relies on prosody in building the T3S domain is what this study
seeks to answer. We focus on three major areas in T3S application in flat structures: (i) binary
parsing, (ii) the incorporation of an unparsed syllable, and (iii) directionality of foot building. In
addition, we are interested in children’s developmental pattern in acquisition of T3S in flat
structures and what it tells us about children’s way of grouping syllables into larger units.
The organization of this chapter is as follows: In Section 5.1, central issues regarding how
T3S is applied in flat structures is discussed in detail, based on what we know from previous
theoretical studies. Some findings relevant to flat structures from an experimental study will also
117

be reviewed and discussed. In Section 5.2, we ask our research questions, followed by our
hypotheses and predictions. Section 5.3 includes the design of our experiment on flat structures,
the results and discussion. Section 5.4 concludes this chapter.
5.1 T3S in Flat structures
We know that both syntax and prosody are crucial elements in T3S application. In a flat
structure, with no obvious internal syntactic structure, one solution for T3S application is to
depend solely on prosody.
A disyllabic foot is a preferred foot structure in Mandarin Chinese, and such binary foot
comprises the basic T3S domain within which T3S must apply (Lin 2007:205-206). Exactly how
is the string of syllables grouped to form one or more prosodic domains? We test how children
and adults parse syllables into feet in flat structures. We ask the following questions:
(i)

Is binary parsing the main foot-building strategy in flat structures?

(ii)

Is an unfooted syllable incorporated into a neighboring foot in cases where there is odd
number of syllables?

(iii) What is the directionality of foot-building in flat structures?
(iv) Is there a developmental pattern in children’s acquisition of T3S?
5.1.1 Previous theoretical studies on T3S in flat structures
Syllables in flat structures are parsed from left to right in binary feet, and any leftover
syllable is then incorporated into the neighboring foot according to the Word-and-phrase level
Model (Chen 2000:368; Shih 1986; Shih 1997). The Stress-foot Model has the same view: in
polysyllabic names and digits, disyllabic feet are built from left to right (Duanmu 2000/2007;
Duanmu 2004:70). There is no mention of incorporation of an unfooted syllable in Duanmu

118

(2000/2007). (1) - (2) are examples of T3S application in flat structures from Chen (2000) and
Lin (2007).
(1)

Flat structure: translation of ‘Somalia’
Suo
ma
li
ya
3
3
3
3
(2
3)
(2
3)
*(3)
(2
2
3)
*(2
2
3)
(3)
*(2
3)
(3)
(3)

(Chen 2000:369)
‘Somalia’
UT
ST

(2)

Flat structure: four T3-digits
jiu
jiu
jiu
jiu
3
3
3
3
(2
3)
(2
3)

(Lin 2007:206)
‘9999’
UT
ST

In (1), the binary parsing from left to right predicts the surface pattern (23)(23). It is not clear
whether or not the large domain parsing of (2223) is regarded as a possible surface pattern by
Chen (2000). In (2), we see that the sequence of four digits have the same prediction of two
binary feet as the four-syllable translation of ‘Somalia.’ In the context of parsing syllables into
feet in fast speech, a larger domain can be created (Chen 2000; Lin 2007; Shih 1986; Shih 1997).
Therefore, (2223) is also a possible output. In (3), we see that the two adjacent T3-digits belong
to different domains.
(3)

Flat structure: two T3-digits that belong to two different domains (Duanmu 2000/2007:239)
qi
wu
wu
qi
‘7557’
1
3
3
1
UT
(1
3)
(3
1)
ST1 (No T3S application across domains)
(1
2)
(3
1)
ST2 (T3S applies across domains)
In (3), there are two possible surface patterns ST1 and ST2 because the two T3-digits belong

to different prosodic domains, and T3S does not have to apply, though it can (Duanmu
2000/2007:239). Next, let us look at how the surface pattern is derived in a five-digit sequence.
In (4), we see a case with odd number of syllables where binary parsing will leave one syllable
unparsed.
119

(4)

Flat structure: five digits
jiu
jiu
jiu
jiu
3
3
3
3
(2
3)
(2
3)
(2
3)
(2
2

jiu
3
3
3)

(Lin 2007:206)
‘99999’
UT
Disyllabic feet, T3S
Incorporation, T3S; ST

In (4), binary feet are built from left to right, followed by incorporation of the unfooted
syllable to its neighboring foot. The prediction is (23)(223), a binary foot followed by a ternary
foot. Lin (2007) points out that if the directionality is from right to left, then the unfooted
syllable would have been on the left edge (Lin 2007:206). In Chen’s (2000) OT (Optimality
Theory, Prince & Smolensky 1993/2004) analysis of a flat structure composed of five
consecutive digits wu ‘five’, (23)(223) is the optimal output, while (223)(23), (3)(23)(23), and
(23)(23)(3) are not (Chen 2000:368). Notice that the right-to-left binary parsing followed by
incorporation, which results in (223)(23), is one of the non-optimal outputs. Even a seemingly
good output (3)(23)(23) which does not violate two adjacent T3* is ungrammatical due to
violation of left-to-right parsing.
Shih (1997) also regards (223)(23) as an ungrammatical pattern (Shih 1997:98). She suggests,
“In the absence of any existing structure, such as a list of digits or nonsense syllables, prosodic
reconstructing proceeds from left to right. The directionality is shown in domains containing odd
number syllables: it is the last foot that accommodates an extra member” (Shih 1997:98). In
other words, for Shih (1997), the ternary foot in the flat structure signifies that an unfooted
syllable is accommodated.
According to the Word-and-Phrase level Model (Chen 2000; Shih 1986; Shih 1997), fast
speech have larger domains, so that (22223) is potentially a grammatical surface pattern. Our
experiment investigates how T3S is applied in normal speech rather than in fast speech.
Nevertheless, this larger domain parsing (22223) will be considered as a possible output as well.

120

Table 5.1 summarizes what are grammatical and ungrammatical patterns in sample flat
structures with some examples that were presented earlier in this section. The predictions are
based on the Word-and-Phrase level Model (Chen 2000; Shih 1986; Shih 1997).
Table 5.1 List of grammatical and ungrammatical patterns in flat structures
Flat structures jiu jiu jiu
Suo ma li ya jiu jiu jiu jiu qi wu wu qi
‘999’
‘Somalia’
‘9999’
‘7557’
UT
333
3333
3333
1331
Surface
(223)
(23)(23)
(23)(23)
(13)(31)
pattern(s)
(2223)
(12)(31)
Ungrammatical *(23)(3)
*(3)(223)
*(3)(223)
*(1221)
pattern(s)
*(3)(23)
*(223)(3)
*(223)(3)
*(23)(3)(3)
*(23)(3)(3)

jiu jiu jiu jiu jiu
‘99999’
33333
(23)(223)
(22223)
*(223)(23)
*(3)(23)(23)
*(23)(23)(2)
*(23)(23)(3)
*(3)(223)(3)

In the next section, I will review an experimental study that tested how T3S is applied in flat
structures.
5.1.2 Kuo et al.’s (2007) study
Kuo et al. (2007) present an acoustic experimental study where the participants were asked to
produce the sentences at three speech rates, slow, normal, and fast. Two-, three-, and four-digits
are embedded sentence-initially in carrier sentences which are question and answer pairs. An
example of a question and answer pair is given in (5). According to Kuo et al. (2007: 231), T3S
applies in the digits first, and then, T3S applies across the syntactic boundary, which is reflected
in the derivations in (5). I provide a more detailed word-for-word gloss for the sentences for the
purpose of clarity and illustration of relevant information on T3S involved here. All the possible
patterns that are derived through the Word-and-Phrase level Model (Chen 2000; Shih 1986; Shih
1997) are added for comparing the attested patterns to the predicted patterns.

121

(5)
a.

Four-syllable flat structures in carrier sentences (Kuo et al. 2007:214-217)
Q: Liang liang liang liang you
mei you
27
28
two
two
two
two
there is there is not ‘Is there two-two-two-two?’
3
3
3
3
3
2 3
UT
29
(2
3)
(2
3)
(3
2 3)
Disyllabic feet, T3S; ST1
(2
2)
(2
3)
(3
2 3)
T3S across domains, T3S; ST2
(2
2)
(2
2)
(3
2 3)
T3S across domains, T3S; ST3
(2

2

2

3

A: Liang liang
two
two
3
3
(2
3)
(2
3)
(2
2)

liang
two
3
(2
(2
(2

liang
two
3
3)
2
2

you
there is
3
3
3)
3)

‘There is two-two-two-two.’
UT
Disyllabic feet, T3S
Incorporation, T3S; ST1
T3S across domains, T3S; ST2

(2

b.

2

2

2

3)

Larger domain in fast speech, T3S; ST3

2

2

3)

Larger domain in fast speech, T3S;
ST4

Notice that the last two syllables in (5a) are irrelevant for T3S application because the
negation particle mei ‘not’ is a T2. Upon reaching this syllable, the series of multiple consecutive

27

Presentation of these experimental sentences is different from the Kuo et al. (2007) paper in
order to provide the relevant information for the discussion in this section. An example of the
original presentation of the testing sentence of five T3-digits embedded in the carrier sentence is
below, where “L” indicates the low tone (T3) and “R” indicates the Rising tone (T2):
LiangL liangL liangL liangL youL meiR youL?
‘Is there X (X= number sequence, which is ‘2’ in this case)?’ Kuo et al. (2007: 214).
The tones are indicated by letters such as L (Low, T3) or R (Rising, T2). For consistency in this
dissertation, numerals are used instead. In addition, prosodic domains for the attested surface
forms have been added, so that we can see the speakers’ parsing strategies more easily.
28
“Is there X (X range from 2 – 4 digits)?” is the translation provided by the authors in the Kuo
et al. (2007) study. I personally think that the structure tested in this study corresponds to a
topicalized structure, with the “X” is being moved from the sentence-final position to the
sentence-initial position. If this is correct, “X, is there (it)?” may be a more appropriate
translation. The structure is relevant and potentially affects how the sentence is parsed in a
sentence that is topicalized. Nevertheless, the original translation is reported as is.
29
“You mei you (you ‘there is’ + mei you ‘there is not’) ‘Is there…?’ is parsed in a prosodic
domain in one step here for simplification. The derivational steps are [T3 [T2T3]] T3
(T2T3) (T3T2T3).
122

T3* discontinues. Since the last two syllables in (5a) are irrelevant, they were excluded from the
reports of the findings.
Within the sequence of adjacent T3*, all syllables except for the final syllable (you ‘there is’)
were extracted for phonetic analysis (you ‘there is’ is the final syllable in the T3-sequence and
always surfaced as a T3, so this syllable was not included in the phonetic analysis) (Kuo et al.
2007). In their report of the surface patterns of T3-sequences, you ‘there is’ is presented as L
(Low tone) in parentheses to distinguish it from the digit-sequence preceding it (e.g. RRRR(L))
for indicating T2T2T2T2T3, with the first four syllables being digits and the final syllable being
a non-digit.
It should be emphasized that the underlined portions (indicating consecutive T3*) in the
question and answer pair in (5a) and (5b) are identical (i.e. the syllables of the whole sentence in
(5b) are identical to the first five syllables in (5a)). Kuo et al. (2007) reported that most subjects
30

broke the sequence of five T3* into two spans (T2T3)(T2T2T3) , indicating the strong binarity
effect. In addition, speakers who had the parsing of (T2T3)(T2T2T3) for (5) at the slow speed
consistently used the same parsing at the fast speed. One speaker had a large domain parsing of
(T2T2T2T2T3) in both slow and fast speech rate (Kuo et al. 2007:215-218).
The sample example from (Kuo et al. 2007), shown in (5), has a sequence of four digits, but a
sequence of two and three digits were also tested in the same carrier sentences. In those cases
where there are two and three digits, followed by you ‘there is,’ Kuo et al. (2007) reported that
starting with the low speed, all subjects show “left-to-right sweep,” resulting in all digits’
underlying T3* changing to the sandhi tone (T2) and you ‘there is’ stays intact as it is the last
30

The boxed sequence of tones indicates the sequence of digits, followed by an unboxed T3,
which is an existential verb you ‘there is.’
123

syllable with a T3 in the sequence (2007:215). That is, in a sequence of three adjacent T3* and
four adjacent T3*, (T2T2T3) and (T2T2T2T3) are the attested patterns.
In addition, one of the robust findings of this study is that the derived T2 and the underlying
T2 differ in that the former is slightly lower in pitch though they have the same shape, and
therefore, these two are acoustically distinct (Kuo et al. 2007:222). There are some points worth
mentioning.
First, the results show that both (T2T3)(T2T2T3) and (T2T2T2T2T3) are attested in slow
and fast speech rate in the first five syllables in (5), with the former being the dominant pattern.
This challenges a widely accepted view that one large domain only occurs at fast speech rate,
and multiple domains occurring at normal or slower speech rate. Although a prosodic domain
can grow larger as the speech rate increases, it does not have to. The relation between size of
prosodic domains and the speech rate is not one of cause and effect.
Second, for a sequence of three T3* (two T3-digits followed by you ‘there is’), after the twodigits are parsed, it is relatively easy for adults to incorporate a third T3 into a binary foot that
has been formed. Potentially, under the interpretation of the structure as a topicalized sentence
(see footnote 28), a major syntactic boundary between the two digits and you ‘there is’ should be
considered, and in that case, T3S does not have to apply. The fact that most participants of the
study did apply T3S indicates that for a three-T3 sequence, to apply T3S across the domains is
preferred, either because the sentences are too short, or because there is an effect of constantly
repeating the same carrier string.
Third, two syllables constitutes a binary foot. For a slightly larger foot, a three-syllable foot,
it is referred as a super foot (Chen 2000; Lin 2007; Shih 1986; Shih 1997). As such a foot is only
a little larger than a typical binary foot, it is not surprising that for a sequence of three T3* (two

124

T3-digits followed by you ‘there is,’ a T3), all the syllables are grouped together to form one foot,
or the prosodic domain. The prosodic domain appears to be stretched further to four syllables in
a three-digit sequence followed by you ‘there is,’ a T3, since (Kuo et al. 2007) reported the
patterns to be T2T2T2T3. In (Kuo et al. 2007), a four-syllable domain appears to be an upper
limit beyond which syllables are divided into multiple domains, as we see in (5). For a sentence
with very many consecutive T3* (five and beyond), although theoretically possible, speakers
may not necessarily use the “left-to-right sweep” fashion of changing all the T3* into sandhi tone
except for the last one in the sequence. Rather, dividing the longer sequence of T3* into more
than one domain appears to be preferred by the participants in this experiment (by breaking the
sequence of five T3* into two spans (T2T3)(T2T2T3) where the boxed tones refer to a flat
structure followed by a T3 verb you ‘there is’).
Lastly, Kuo et al. (2007) acknowledge in the endnotes that in the five-T3 sequence in (5), the
first four T3* are in an unstructured environment, and the last T3 you ‘is there’ is outside this
environment; therefore, Kuo and colleagues suggest a two-step derivation with the first step
dealing with digits only; in the second step, T3S applies across a syntactic boundary (Kuo et al.
2007:231): [[3333][323]] [(2223)][(323)] [[(2222)][(323)]]. Given that the first four syllables
are digits, and can potentially be divided into two binary feet, followed by you mei you (T3T2T3)
‘is there?” it is possible to have two binary feet in [four T3-digits], followed by [you mei you ‘is
there?’ (T3T2T3)]. It would be interesting to see whether or not this possible pattern,
[(23)(23)][(323)] for (5a), is acceptable or produced by other speakers, although it is not attested
in the seven subjects in Kuo et al. (2007).

125

5.1.3 Re-thinking the linguistic environment for investigating flat structures
In the Kuo et al. (2007) study, two, three, and four digits are embedded in the sentence-initial
position of carrier sentences which are question and answer pairs. In other words, the flat
structure— the adjacent digits — is part of a sentence, which is not a flat structure. Would
parsing of the flat structure be affected by the carrier sentences? Depending on what follows the
flat structure, we see different number of possible surface patterns in (5), repeated in (6) below
for convenience.
(6)
a.

liang
two
3
(2
(2
(2

liang
two
3
3)
3)
2)

you
mei you
there is there is not
3
2 3
(3
2 3)
(3
2 3)
(3
2 3)

(2

2

2

3

A: Liang liang
two
two
3
3
(2
3)
(2
3)
(2
2)

liang
two
3
(2
(2
(2

liang
two
3
3)
2
2

you
there is
3
3
3)
3)

(2

b.

Q: Liang liang
two
two
3
3
(2
3)
(2
2)
(2
2)

2

2

3)

2

2

2

3)

‘Is there two-two-two-two?’
UT
Disyllabic feet, T3S; ST1
T3S across domains, T3S; ST2
T3S across domains, T3S; ST3
Larger domain in fast speech, T3S;
ST4
‘There is two-two-two-two.’
UT
Disyllabic feet, T3S
Incorporation, T3S; ST1
T3S across domains, T3S; ST2
Larger domain in fast speech, T3S;
ST3

There are more possible surface patterns in (6a) than in (6b). This is mainly because that in
the question, you mei you ‘Is there …?’ is a three-syllable unit that could be a foot by itself,
whereas in the answer, you ‘there is’ is monosyllabic and may be too light to form a foot itself,
and consequently, it is incorporated into the preceding foot formed by digits. In short, in the
identical T3-sequences in (6a) and (6b), the last T3 in the sequence (you ‘there is’) must join its
preceding foot in (6b), but not in (6a) because the carrier sentences differ in the number of
syllables.
126

There is no mention of whether or not there are parsing differences between the T3sequences extracted from the question and from the answer. Without considering the effect of
two different carrier sentences types, the tendency of speakers’ parsing of one way or the other
may be biased. Although it may not be as relevant to the question asked by Kuo et al. (2007),
how speakers parse the T3-sequence in (6a) and (6b) would provide valuable information for
those interested in larger domain parsing. To sum up, despite the fact that the T3-sequences
extracted from question and answer pairs are identical, how the sentence-initial sequence of
digits is parsed is potentially affected by what follows it.
5.2 Research questions, hypotheses, and predictions
Our study investigates how T3S is applied in pure flat structures of two, three, and five digits.
Our purpose is to learn how flat structures are parsed in children and adults. Since a flat structure
has no internal structure, is left-to-right binary parsing strategy used in building T3S domains?
We test three major factors in T3S application in flat structures—binary parsing, incorporation of
an unfooted syllable, and directionality of foot-building.
5.2.1 Research questions and hypotheses
Our three main research questions are repeated below in (7) for convenience, followed by
four hypotheses our experimental study tests.
(7)

Research questions:

a.

Is binary parsing the main foot-building strategy in flat structures?

b.

Is an unfooted syllable incorporated into a neighboring foot in cases where there is odd
number of syllables?

c.

What is the directionality of foot-building in flat structures?

127

d.

Is there a developmental pattern in children’s acquisition of T3S? Is T3S acquired at an
early age? When is T3S acquisition completed?
As mentioned earlier, in Mandarin, a disyllabic foot is a preferred foot structure, and such

binary foot comprises the basic T3S domain within which T3S obligatorily applies (Lin
2007:205), so a disyllabic foot makes a perfect foot structure. We hypothesize that syllables are
grouped according to the binary parsing strategy, and test the hypothesis in two-, three-, and fivesyllable flat structures. Based on the Word-and-Phrase level Model (Chen 2000; Shih 1986; Shih
1997), we hypothesize that if there is an unparsed syllable at the end, it is incorporated into a
neighboring foot. In addition, following the existing T3S models, it is hypothesized that the
directionality is from left-to-right.
Regarding children’s acquisition of T3S, no existing T3S studies investigate children’s T3S
acquisition in flat structures. Various T3S acquisition studies were reviewed in the previous
chapter. Contrary to the findings of these studies, we found in our pilot study (Wang 2008) that
children can change a T3 to a T2 when it is followed by another T3 early on, but it takes time for
children’s to perform in an adult-like way. We hypothesize that T3S acquisition begins early, but
develops with age. Our four major hypotheses are summarized in (8).
(8)

Four hypotheses

Binary parsing (H1): Binary parsing precedes other parsing strategies. Binary feet are built
iteratively until no more binary foot can be built (i.e. zero or one syllable left at the end).
Incorporation (H2): If there is an unparsed syllable, it is incorporated into a neighboring foot.
Directionality L to R (H3): Binary feet are formed from left to right. That is, every two
syllables form a binary foot, going from left to right.
Developmental Hypothesis (H4): Mastery of T3S develops with age. It is not acquired
128

instantaneously at an early age.
5.2.2 Predictions of T3S application in flat structures
With binary parsing, even number of syllables can be evenly divided into disyllabic feet, but
not odd number of syllables. Directionality of foot building is crucial since left-to-right parsing
and right-to-left parsing strategies predict different surface patterns in odd number of syllables in
flat structures. Shih (1986; 1997) proposes left-to-right prosodic parsing in flat structures, and
such view is supported in later studies (Chen 2000:368; Duanmu 2000/2007:238; Lin 2007:206).
In the subsections that follow, the predictions are made for flat structures that are composed of
two, three, and five syllables.
5.2.2.1 A two-syllable flat structure
For a structure that consists of two T3-digits such as “five-five,” the prediction for T3S
application is in (9).
(9)

A two-syllable flat structure
σσ
(σσ)
Binary parsing
Prediction of T3S application:
wu
wu
five
five
‘five-five’
3
3
UT
(2
3)
ST
Two syllables form a perfect binary foot within which T3S applies. Non-application of T3S

will result in ungrammatical surface form.
5.2.2.2 A three-syllable flat structure
In cases where there are three syllables, according to the Word-and-Phrase level model (Chen
2000; Shih 1986; Shih 1997), after a disyllabic foot has been parsed from left to right, the

129

leftover syllable is predicted to be incorporated into the disyllabic foot as we see in (10a).
According to the Word-and-Phrase level model, the derived output (323) in (10b) is
ungrammatical because binary parsing should be from left to right, not from right to left.
(10) A three-syllable flat structure
a.
σσσ
(σσ)σ Binary parsing from left to right
(σσσ) Incorporation
Prediction of T3S application:
wu
wu
wu
five
five five
3
3
3
(2
3)
3
(2
2
3)
b.

σσσ
σ(σσ)
(σσσ)

Binary parsing from right to left
Incorporation

Prediction of T3S application:
wu
wu
wu
five
five five
3
3
3
3
(2
3)
*(3
2
3)
c.

σσσ
(σσσ)

‘five-five-five’
UT
T3S
T3S; ST

‘five-five-five’
UT
T3S
No T3S; ST

a three-syllable foot built in one step (larger domain in fast speech)

Prediction of T3S application:
wu
wu
wu
five
five five
‘five-five-five’
3
3
3
UT
(2
2
3)
T3S; ST
If a three-syllable foot is built in one step as we see in (10c) without the step of incorporation,
a three-syllable foot is formed and T3S applies from left to right. The surface pattern in (10c) is
the same as that of (10a). That is to say, (T2T2T3) can surface from either source. Therefore, a
three-syllable sequence will not allow us to disambiguate by which parsing strategy the surface

130

pattern (T2T2T3) is obtained. Given that three-syllable is fairly short, and might be easily parsed
in a three-syllable foot in one step, a surface form of (T2T2T3) may in fact be generated through
such parsing.
To summarize, we expect that in a flat structure that consists of three syllables, a threesyllable prosodic domain will be built. If the surface pattern is T2T2T3, it could be derived
through left-to-right parsing followed by incorporation, or it could be by way of setting up one
domain in one step. If the surface pattern is T3T2T3, then it is derived from right-to-left binary
parsing followed by incorporation. In order to test directionality, it is necessary that we extend
the number of syllables.
5.2.2.3 A five-syllable flat structure
How do we know when and how an unfooted syllable is incorporated into a neighboring foot?
As previously mentioned, the binary foot that the unparsed syllable is incorporated into signifies
the directionality (Chen 2000; Lin 2007; Shih 1986; Shih 1997). A four-syllable flat structure
does not allow us to test directionality since binary parsing from either direction will predict the
same surface output (T2T3)(T2T3). A five-syllable flat structure, much like the three-syllable
structure, is ideal for testing directionality. For now, let us assume that binary parsing strategy is
used, and when there is a leftover syllable as in the case of odd number of syllables, it is
incorporated into a neighboring foot. For odd number of syllables, there should be multiple
binary feet plus one ternary feet, which is the result of incorporation, at either edge as illustrated
in (11).

131

(11) Odd number of syllables composed of multiple binary feet and one ternary foot
a.
Left-to-right parsing
σσσσσσ……………………………………………….σσσ
(Binary foot)(Binary foot)(Binary foot) … (Ternary foot)
b.

Right-to-left parsing
σσσ………………………………………………σσσσσσ
(Ternary foot)… (Binary foot)(Binary foot)(Binary foot)
The derivations for a left-to-right parsing strategy and a right-to-left parsing strategy are in

(11a) and (11b) respectively. The ternary foot is on the right edge in left-to-right parsing, and on
the left edge in the right-to-left parsing. We now turn to a five-syllable flat structure.
(12) A five-syllable flat structure
a.
σσσσσ
(σσ)(σσ)σ
Binary parsing from left to right
(σσ)(σσσ)
Incorporation
Prediction of T3S application:
(T2T3)(T2T3)T3
T3S
(T2T3)(T2T2T3)
T3S; ST
b.

σσσσσ
σ(σσ)(σσ)
(σσσ)(σσ)

Binary parsing from right to left
Incorporation

Prediction of T3S application:
T3(T2T3)(T2T3)
T3S
*(T3T2T3)(T2T3)
No T3S; ST
c.

σσσσσ
(σσσσσ)

five-syllable foot built in one step (larger domain in fast speech)

Prediction of T3S application:
T3T3T3T3T3
UT
(T2T2T2T2T3)
T3S; ST
If foot-building is from left to right, the leftover syllable should be on the right edge, and
then it is incorporated into a neighboring foot to form a ternary foot as in (12a). Hypothesis H3
Direction L to R will be supported in this case. However, if foot-building is from right to left, the
leftover syllable should be on the left edge, and subsequently it is incorporated into the

132

neighboring foot to form a ternary foot as in (12b). Since at this point, there are no adjacent T3*,
T3S does not apply. If this is the case, Hypothesis H3 Direction L to R will be rejected. It should
be noted that (T2T2T3)(T2T3), the reversal of two feet in the predicted surface pattern
(T2T3)(T2T2T3), is ungrammatical according to the Word-and-Phrase level model.
As an unparsed syllable should be incorporated into a neighboring foot, the surface pattern
(T2T3)(T2T2T3) implies a left-to-right binary parsing, and the surface pattern (T3T2T3)(T2T3)
implies a right-to-left binary parsing. A five-syllable flat structure reveals the directionality when
there are two domains.
An additional surface pattern (T2T2T2T2T3) is possible. It shows that all the five syllables
are parsed in one step as in (12c). This output does not support or reject Hypothesis H3
Directionality L to R because the hypothesis concerns the directionality of parsing binary feet.
One large domain parsing in this case, therefore, does not qualify to test this hypothesis.
However, whether or not there is a bias of using this larger domain parsing among different age
groups can be compared. We expect that it is easier for adults than children to have the larger
domain parsing since adults can process a larger amount of information at one time. Among
children, we expect that older children have more of the larger domain parsing than younger
children.
5.3 Study 2: Flat structures
5.3.1 Method
5.3.1.1 Subjects
Sixty-six subjects were recruited in Taichung, Taiwan for this study. There are three age
groups: three-year-olds, five-year-olds, and adults. Table 5.2 shows the distribution of the
participants.
133

Table 5.2 Study 2: Distribution of the subjects
Age groups

N

Age range

Mean

Standard deviation

3-year-olds
5-year-olds
adults

19
27
20

3;4 – 3;11
5;1 – 5;11

4;4
6;3

2.42 (mo.)
3.05 (mo.)

5.3.1.2 Procedure
All children were tested individually in a quiet classroom in the kindergarten or in the home
of the child. Adult subjects were also tested individually in a quiet room. The elicited production
task lasted approximately 12 minutes for children, and 6 minutes for adults.
Children were told that they were going to play a game. Some stuffed animals were
introduced to the children in the beginning in order to create a more friendly game-like setting.
The stuffed animals then were set aside on the table as if they were watching the child and the
experimenter play the game they were about to play. Each subject sat in front of a laptop
computer which displayed a large colored digit. As the digit showed on the screen, the
experimenter asked the child what it was. This was to make sure that the child knew the digit and
could say it with the underlying tone correctly. The task was to say a digit two times, three times,
and then five times. As simple as it may sound for adults, for 3-year-olds, or even 5-year-olds, it
may not be necessarily easy, especially in repeating a digit five times. Keeping track of how
many times the digit has been said and how many times it still needs to be said may give them
extra burden. The procedure below is followed in order to remove such burden for children.
Figure 5.1 Flat structures: A child’s hand, (a) – (c) for two, three, and five digits respectively
a. for two digits
b. for three digits
c. for five digits

134

The experimenter said, “What’s this? (pointing to the digit on the screen)” After the child
gave the answer, she was told to hold out one hand just like the experimenter showed her, with
five fingers up straight. Then the experimenter gently bent down three of her fingers, leaving two
up (See Figure 5.1 (a)) and said, “You say it (pointing to the digit on the screen) when I tap your
fingers, okay?” As two fingers were up, the child said the digit upon each of the two fingers was
tapped by the experimenter’s index finger.
After completing saying the digits two times, the experimenter held out one hand again, with
five fingers up straight, and asked the child to do the same. She now gently bent down two of the
child’s fingers, leaving three fingers up (See Figure 5.1 (b)). The child was told to say the same
digit, which was still on the computer screen, when the experimenter tapped her fingers. As three
fingers were up, the child said the digit when each of the three fingers was tapped.
Finally, the experiment once again held out one hand, with five fingers up straight. The child
followed. There was no need to bend down the child’s fingers this time. With her five little
fingers up straight (See Figure 5.1 (c)), the experimenter asked, “Are you ready? I am going to
tap (your fingers) now.” The experimenter proceeded when the child was ready. The child said
the digit five times in this final round. Each child was familiarized with the task in a practice
session before proceeding to the experiment. (See Appendix B for Mandarin experimental
prompts and materials.)
For adults, it could be easily understood that the task was to say the digit shown on the screen
two, three, and five times. Adults saw the digit on the computer screen as well, but they were
instructed to say the digit two times, then three times, and then finally, five times. There was no
need to hold out a hand, and saying the digit as their fingers were tapped by the experimenter as
it was done with children. All subjects’ responses were recorded on a Marantz PMD660 with an

135

Audio-technica miniature clip-on microphone (AT831B Cardioid Condenser Lavalier
microphone). (A second digital recorder, a Sony ICD-P530F, was used in case of technical
problems.)
5.3.1.3 Design
An elicited repetition task (Crain & Thornton 2000; McDaniel et al. 1998) is used in this
experiment. Digits are used in the task. Digits from 0 to 9 are all single syllables in Mandarin.
Except for “5” and “9” which are in Tone 3, all the rest of the digits are in Tone 1, Tone 2, or
Tone 4 (i.e. non-T3*). T3-digits “5” and “9” were used as the test items, and non-T3 digits were
used as the control items and in the practice session.
In the control items, the surface tones and the underlying tones are the same because non-T3*
are not affected by T3S. In the test items, T3S will apply according to how the string of syllables
is parsed. Surface tones will differ from underlying tones due to T3S application. In (13) below,
we keep only the parsing information and predicted outputs, which is based on the Word-andPhrase level Model (Chen 2000; Shih 1986; Shih 1997). Detailed derivations of the predicted
patterns have been presented and discussed in Section 5.2.2.
(13) Flat Structures in two, three, and five syllables
a.
two syllables
σσ
(σσ)
Binary parsing from L to R
T3T3
b.

Or

(T2T3)

three syllables
σσσ
(σσ)σ
(σσσ)
T3T3T3
T3T3T3

Binary parsing from L to R
Incorporation

(T2T3)T3
(T2T2T3)

(T2T2T3)

136

c.

Or

five syllables
σσσσσ
(σσ)(σσ)σ
(σσ)(σσσ)

Binary parsing from L to R
Incorporation

(T2T3)(T2T3)T3 (T2T3)(T2T2T3)
(T2T2T2T2T3)
For two-syllable items, the surface pattern (T2T3) derived through disyllabic foot-building

and through larger domain parsing in fast speech are the same. For three-syllable items, the
surface pattern (T2T2T3) can result from disyllabic parsing followed by incorporation, or from
the larger domain parsing. In the five-syllable item, however, the predicted larger domain parsing,
(T2T2T2T2T3), differs from the pattern derived from the step-by-step binary parsing from left
to right followed by incorporation of the unfooted syllable as we see in (13c).
5.3.1.4 Materials
A sample control item and a sample test item are in Figure 5.2 (a) and (b) respectively. A
complete list of the experimental materials as well as the instructions in Mandarin is in Appendix
B.

137

Figure 5.2 Study 2: Saying a digit two, three, and five times
a. Sample control item: a non-T3 digit “3” (Tone 1)
(i) two times
san
san
three three
‘three-three’
(ii) three times
san
san
san
three three three
‘three-three-three’
(iii) five times
san
san
san
san
san
three three three three three
‘three-three-three-three-three’
b. Sample test item: a T3-digit “9”
(i) two times
jiu
jiu
nine nine
‘nine-nine’
(ii) three times
jiu
jiu
jiu
nine nine nine
‘nine-nine-nine’
(iii) five times
jiu
jiu
jiu
jiu
jiu
nine nine nine nine nine
‘nine-nine-nine-nine-nine’

3
9

5.3.1.5 Coding
Two native speakers transcribed the data and coded the answers. Numbers 1, 2, 3 and 4 were
used in transcribing the four lexical tones, T1, T2, T3, and T4, respectively. Data were coded in a
way to preserve the most available information in subjects’ responses. The coding categories are
in (14).
(14) Coding categories for data analysis:
a.

Included in the analysis:

i.

Correct application of T3S without missing any syllables (missing syllables are indicated
by underscores in the coding).
138

ii.

Incorrect application of T3S without missing any syllables

b.

Excluded in the analysis:

i.

No answer: Saying ‘I don’t know’ or being silent without giving an answer.

ii.

Non-target answers: Not saying the digit with the targeted number of times (e.g. saying the
digit four times when it should be said three times).

iii.

Pauses: Pauses between two T3*.
Answers with pauses between T3* were excluded from the analysis because a pause destroys

the T3S environments created. For the control items, since T3S does not apply in non-T3 digits,
the surface tones are the same as underlying tones. T3S application is irrelevant in the control
items. Sample answers for a T3-digit “9” and how they fit in the coding categories are listed in
Table 5.3.

139

Table 5.3 Study 2: Sample answers and their coding categories for data analysis
A T3-digit “9”
Sample answers
Included
(only tonal
in the
information is
analysis?
included here)
a. Two times
i.
(23)
jiu jiu
ii.
(33)
nine nine
‘nine-nine’
iii. (22)
3
3
UT
iv. (32)
v. (34)
b. Three times
i.
(223)
jiu jiu jiu
ii.
(323)
nine nine nine
‘nine-nine-nine’ iii. (333)
3
3 3
UT
iv. (233)
v. (222)
vi. (232)
c. Five times
i.
(23)(223)
jiu jiu jiu jiu jiu
ii.
(22223)
nine nine nine nine nine ‘nine-nine-nine- iii. (223)(23)31
nine-nine’
iv. (33333)
3
3 3 3 3
UT
v. (23)(333)
vi. (22222)
vii. (223)(22)
viii. (23)(23)(2)
ix. (23)(23)(3)
x. (3)(23)(33)
xi. (23)p(23)p(2
3)p(23)p(23)
(p = pause)

Correct
or
incorrect

n/a

For statistical analysis, various error patterns were further coded and placed under two basic
error categories, with the first category overriding the second one: (i) Over-application: overapplication of T3S at the right edge of the domain, resulting in a T2 in the final-digit in the
sequence, and (ii) Under-application: under-application of T3S include non-application of T3S as
well as under-application of T3S in one or more syllables. To know T3S is to know when to
apply the rule as well as when not to apply it. The first category “Over-application” captures the

31

Although this pattern is not predicted by the Word-and-Phrase level Model, it was treated as a
correct pattern because the pattern was attested in adults.
140

T3S errors made by subjects when they failed to “stop applying T3S” at the rightmost digit in the
prosodic domain when the preceding digit(s) had undergone T3S. Errors of this type include
*(T2T2), *(T2T2T2), and *(T2T2T2T2T2), but are not limited to these.
Another common T3S error is under-application, namely, T3S is not applied when it should.
Examples of under-application errors include *(T3T3), *(T2T3T3), *(T2T3)(T2T3T3). Among
all the errors, there was a single error (*T3T3)(T2T3T2) produced by a 3-year-old that fit the
descriptions of both error categories. As it was decided that “Over-application” overrides “underapplication” in our coding for error types, it was coded as an over-application error, rather than
creating a third error category of this single “mixed” error type.
It is worth emphasizing that of all the errors made by the participants, there was only one
error *(T3T4) (for a sequence of two T3-digits) that involves another tone that is not T2 or T3.
This error is treated as “under-application” as it is a case of not applying T3S when it should.
Incorrect answers in Table 5.3 are used in Table 5.4 for presenting how errors were categorized
in the two error types for our error analyses.

141

Table 5.4 Study 2: Sample T3S errors and their coding categories for error analysis
A T3-digit “9”
Sample errors (only Over-application
tonal information is (O) or Underincluded here)
application (U)
a. Two times
i.
(33)
U
jiu jiu
ii.
(22)
O
nine nine
‘nine-nine’
iii. (32)
O
3
3
UT
iv. (34)
U
b. Three times
i.
(323)
U
ii.
(333)
jiu jiu jiu
U
nine nine nine
‘nine-nine-nine’ iii. (233)
U
iv. (222)
3
3 3
UT
O
v. (232)
O
c. Five times
i.
(33333)
U
jiu jiu jiu jiu jiu
ii.
(23)(333)
U
nine nine nine nine nine ‘nine-nine-nine- iii. (22222)
O
nine-nine’
iv. (223)(22)
O
3
3 3 3 3
UT
v. (23)(23)(2)
O
vi. (23)(23)(3)
U
vii. (3)(23)(33)
U
5.3.2 Results
In this section, we first report the answers that were excluded from the analysis. There were
no items excluded from 6-year-olds and adults. For 3-year-olds, numbers of excluded control
items and test items are summarized in Table 5.5 and Table 5.6 respectively.
Table 5.5 Study 2: Control items— 3-year-olds’ data excluded from the analysis
σσ
σσσ
σσσσσ
Excluded total
No answer: 5
No answer: 5
No answer: 1
11
Non-target: 0
Non-target: 0
Non-target: 0
Pauses: 0
Pauses: 0
Pauses: 0
Table 5.6 Study 2: Test items— 3-year-olds’ data excluded from the analysis
σσ
σσσ
σσσσσ
Excluded total
No answer: 2
No answer: 1
No answer: 1
6
Non-target: 1
Non-target: 0
Non-target: 0
Pauses: 0
Pauses: 0
Pauses: 1
Next, we turn to the results. We found that both child groups did well in the control items.
Unlike adults, however, children’s correct rates dropped dramatically in the test items. In what
142

follows, the results for the control items and the test items are presented and discussed.
5.3.2.1 Overall correct rates in control items and test items
All the subjects did perfectly in the control items with two, three, and five syllables.
Table 5.7 Study 2: Control items (non-T3 digits)
Number of
σσ
σσσ
Syllables
Age
% (N)
% (N)
3
100 (33/33)
100 (33/33)
5
100 (54/54)
100 (54/54)
A
100 (40/40)
100 (40/40)

σσσσσ
% (N)
100 (37/37)
100 (54/54)
100 (40/40)

The fact that even the 3-year-olds did not have any difficulties in the control items suggest
that the task itself was not beyond what 3- and 5-year-olds could accomplish. More specifically,
even saying the non-T3 digit for the maximal times, five times, in the experiment, appeared to be
easy for children. For T3-digits, while adults did well in the test items (97.50% correct in two,
three, and five syllables), children’s correct rates dropped dramatically as we see in Figure 5.3.
Figure 5.3 Study 2: Total correct rates in control and test items by age groups

Correct rates in Flat Structures
100
90
80
70
60
50
40
30
20
10
0

3
5
A

σσ

σσσ
σσσσσ
control items
100
100
100
100
100
100
100
100
100

σσ
28.57
59.26
97.50

143

σσσ
σσσσσ
test items
27.03
22.22
66.67
68.52
97.50
97.50

Since children did perfectly in the control items, T3S is the source of difficulties which
caused the dropping of correct rates in both child groups. There is only one acceptable surface
pattern, (T2T3) and (T2T2T3) for two-syllable and three-syllable test items respectively. These
predicted patterns match what adults produced, and are attested in children as well.
For five-syllable items, two predicted patterns are (T2T3)(T2T2T3) and (T2T2T2T2T3),
with the former pattern attested in adults only, and the latter pattern attested in both child groups
and adults. An additional pattern (T2T2T3)(T2T3) was found in adults as well as children. Even
though it is not a predicted pattern, it is considered as a grammatical pattern in our analysis
mainly because it was attested in adults. Total correct rates for each age group are calculated by
adding up the correct rates of all possible correct patterns within each age group. The
information on the frequency of individual correct patterns by age group is in Table 5.8. Figure
5.4 – Figure 5.6 in the next section present the same information in bar charts.
Table 5.8 Study 2: Test items (T3 digits)
Syllable
σσ
σσσ
Age
(23)
(223)
3
28.57%
27.03%
(10/35)
(10/37)
5
59.26%
66.67%
(32/54)
(36/54)
Adults
97.50%
97.50%
(38/40)
(39/40)

(22223)
16.67%
(6/36)
61.11%
(33/54)
70.00%
(28/40)

σσσσσ
(23)(223)
(223)(23)
0%
5.56%
(0/36)
(2/36)
0%
7.41%
(0/54)
(4/54)
22.50%
5.00%
(9/40)
(2/40)

Total
22.22%
68.52%
97.50%

5.3.2.2 Surface patterns in flat structures
Two T3-digits
Logistic regression analyses (see Appendix H) were conducted for correct responses as well
as incorrect responses in flat structures. In what follows, the results for two-, three-, and fivedigits will be presented separately.

144

The results show that age is significant (chi square = 46.067, p < .001 with df = 2). For
correct surface pattern T2T3 relative to errors, both 3-year-olds and 5-year-olds are significantly
different from adults in T3S application in two T3-digits and they are less likely than adults to
have the correct surface pattern of T2T3 (3-year-olds: Odds Ratio (OR) = .010, p < .001; 5-yearolds: OR = .037, p = .002). There is a significant difference between 3-year-olds and 5-year-olds
(OR = .275, p = .006).
Figure 5.4 Study 2: Correct rates of two T3-digits by age group

Flat structure: two digits
100
80
60
40
20
0

3-yearolds
(23) 28.57

5-yearolds
59.26

adults
97.50

Even though it is a small domain with only two digits, 3-year-olds had a lot of difficulties.
Even 5-year-olds had a correct rate of only about 60%. For the adults, the only one error that was
produced by one adult was *T2T2, a case of over-application.
Three T3-digits
The results show that age is significant (chi square = 48.539, p < .001 with df = 2). For
correct surface pattern T2T2T3 relative to errors, both 3-year-old and 5-year-olds are
significantly different from adults in T3S application in three T3-digits (3-year-olds: OR = .009,
p < .001; 5-year-olds: OR = .051, p = .005). There is a significant difference between 3-year-olds
and 5-year-olds (OR = .185, p < .001).
145

Figure 5.5 Study 2: Correct rates of three T3-digits by age group

Flat structure: three digits
100
90
80
70
60
50
40
30
20
10
0
(223)

3-yearolds
27.03

5-yearolds
66.67

adults
97.50

For the three-digit sequence, 3-year-olds had a lot of difficulties. Five-year-olds had a correct
rate below 70%. For the adults, the only one error that was produced by one adult was *T2T2T2,
a case of over-application.
Five T3-digits
The results show that age is significant (chi square = 71.132, p < .001 with df = 6). For five
T3-digits, three surface patterns were attested in adults — larger domain parsing (22223),
Binary-Ternary parsing (23)(223), and Ternary-Binary parsing (223)(23). The last pattern,
Ternary-Binary parsing, is not predicted by the Word-and-Phrase level model, but was attested in
all age groups with a low frequency (3-year-olds: 5.56%, 5-year-olds: 7.41%, and adults: 5%).
For both 3-year-olds and 5-year-olds, two surface patterns were attested — larger domain
parsing (22223), and Ternary-Binary parsing (223)(23). The Word-and-Phrase level model
predicts left-to-right Binary parsing followed by incorporation of unfooted syllable, which results
in Binary-Ternary parsing (23)(223). Interestingly, this is the surface pattern that is missing in

146

both child groups. The only error in the adult group, *T2T2T2T2T2, was produced by the same
individual who had over-application errors in the two- and three-digit items.
Larger domain parsing—(22223)
For larger domain parsing (22223) relative to errors, 3-year-olds and 5-year-olds are found to
be significantly different from adults, and both child groups are less likely than adults to have
larger domain parsing (3-year-olds: OR = .008, p < .001; 5-year-olds: OR = .069, p = .012).
Three-year-olds and 5-year-olds are significantly different (OR = .110, p < .001).
Ternary-Binary parsing—(223)(23)
For Ternary-Binary parsing—(223)(23) relative to errors, 3-year-olds are found to be
significantly different from adults (OR = .036, p = .020) while 5-year-olds are not (OR = .118, p
= .112). The two child groups are not significantly different from each other (OR = .304, p
= .195). Figure 5.6 shows the distribution of surface patterns in the five-syllable item by age.
Figure 5.6 Study 2: Correct rates of five T3-digits by age group

Flat structure: five digits
100
90
80
70
60
50
40
30
20
10
0
(223)(23)
(23)(223)
(22223)

3-yearolds
5.56
0
16.67

5-yearolds
7.41
0
61.11

adults
5
22.50
70

147

The most common surface pattern in all age groups is the larger domain parsing. Five-yearolds’ correct rates are below 70% and they are far from adult-like. The correct rate of about 20%
shows that 3-year-olds had a lot of difficulties with T3S in the test items. The Binary-Ternary
parsing is missing in both child groups.
5.3.2.3 Errors in children
Since the adult correct rates for the two-, three-, and five-syllable flat structures are 97.50%
(39/40), the T3S error analysis is focused on children’s errors by comparing 3-year-olds’ errors
to 5-year-olds’.
Children’s T3S errors were categorized under “over-application” or “under-application” as
stated earlier. Do younger children and older children’s errors tend to be one way or another? Or
does one error type occur more frequently than the other type in children? Figure 5.7 shows
children’s error rates by type.
Children’s error rates by type
Figure 5.7 Study 2: Children’s error rates by type in flat structures

Children's error rates by type in flat structures
100
90
80
70
60
50
40
30
20
10
0

overunderoverunderoverunderapplication application application application application application
σσ
σσσ
σσσσσ
3-year-olds
45.71
25.71
43.24
29.73
58.33
19.44
5-year-olds
29.63
12.96
16.67
16.67
12.96
18.52
148

In Figure 5.7, a developmental trend can be clear seen. The error rates decrease by age,
regardless of the error types. Three-year-olds are prone to make over-application errors. Fiveyear-olds’ T3S errors do not show a strong tendency of either over- or under-application in threeand five-syllable items. In the two-syllable item, however, they tend to over-apply the T3S rule.
Logistic regression analyses were conducted for children’s error types in flat structures. The
results for two-, three-, and five-digits are as follows.
Two T3-digits
The independent variable age is significant (chi square = 7.447, p = .024 with df = 2). For
both error types relative to correct surface pattern (T2T3), 3-year-olds are significantly different
from 5-year-olds (Over-application: OR = 3.100, p = .026; Under-application: OR = 3.986, p
= .026). The Odds Ratio value indicates that 3-year-olds are about three times more likely than 5year-olds to over-apply T3S rule. They also are four times more likely than 5-year-olds to underapply T3S rule.
Three T3-digits
Age is still significant in three T3-digits (chi square = 14.592, p = .001 with df = 2). ). For
both error types relative to correct surface pattern (T2T2T3), 3-year-olds are significantly
different from 5-year-olds (Over-application: OR = 6.400, p = .001; Under-application: OR =
4.400, p = .010). The Odds Ratio value indicates that 3-year-olds are roughly 6.5 times more
likely than 5-year-olds to over-apply T3S rule, and they are 4.5 times more likely than 5-yearolds to under-apply T3S rule.
Five T3-digits
The results show that age is significant (chi square = 24.496, p < .001 with df = 2). For the
error type Over-application, 3-year-olds are significantly different from 5-year-olds (OR =
149

13.875, p < .001). The Odds Ratio value indicates that 3-year-olds are roughly 14 times more
likely than 5-year-olds to over-apply T3S rule. The two child groups are not significantly
different in the other error type, Under-application (OR = 3.237, p = .062).
5.3.3 Checking hypotheses
The adult grammar is ultimately what children will arrive at. The adult T3S patterns attested
in this study are compared against the surface patterns predicted by the Word-and-Phrase level
model. The Binary parsing hypothesis (H1) and Incorporation Hypothesis (H2) are supported by
adults’ answers of (T2T3) and (T2T2T3) in the two and three T3-digit items respectively, just as
predicted. No adults produced two T3* in the two-syllable items. The fact that (T2T3) was the
only response in the adult group indicates that a binary foot is formed for two syllables. For
three-syllable items, if we had the answer type (T2T3)(T3), it would be evidence against our
Incorporation Hypothesis (H2) which states an unfooted syllable should be incorporated into a
neighboring foot, but it was not attested in adults. As mentioned previously, (T2T2T3) pattern
have two sources—(i) larger domain parsing or (ii) the step-by-step parsing (binary parsing
followed by incorporation of unfooted syllable). We cannot be completely certain that the pattern
(T2T2T3) here is a result of incorporation. Nevertheless, the Incorporation Hypothesis (H2) was
also tested in the five-syllable items, where there was no ambiguity. We will return to this in the
later discussion.
For testing H3, Directionality L to R, five T3-digits were used. Even though we cannot use
three T3-digits to test directionality partially because we were unable to disambiguate the
sources of (T2T2T3), the unattested pattern (T3T2T3) in adults sheds some light. (T3T2T3) is a
pattern that results from right-to-left parsing, followed by incorporation of the first syllable that
is unfooted. As (T3T2T3) never surfaced in the adult data, at this point, we do not see any
150

evidence of right-to-left parsing. To confirm whether or not right-to-left parsing is indeed never
used in adults, we now turn to the results of five-digit items.
In five T3-digits, the larger domain parsing (T2T2T2T2T3) is the dominant pattern across
age groups (adults: 70.00%, 5-year-olds: 31.11%, and 3-year-olds: 16.67%). The non-fast speech
pattern (T2T3)(T2T2T3) that the Word-and-Phrase level model predicts was attested only in the
adult group, at 22.50%. Not a single child produced this pattern. The fact that (T2T3)(T2T2T3)
was attested, but not (T3T2T3)(T2T3), gives strong evidence that it was left-to-right parsing,
rather than right-to-left parsing. Hypothesis H3 Directionality L to R is supported by the adult
data. In addition, (T2T3)(T2T2T3) confirms Incorporation Hypothesis (H2) that an unfooted
syllable is incorporated into a neighboring foot.
Interestingly, an unpredicted pattern (T2T2T3)(T2T3) was attested across all age groups,
with a small percentage (between 5% - 8%) in each age group. We will return to discuss this
pattern in more detail in Section 5.3.4.
Lastly, even though there were a lot of T3S errors in children, it was clear that 3-year-olds
can change a T3 to a T2 when followed by another T3, and they had correct rates between 20% 30% for two, three, and five T3-digits. Five-year-olds had correct rates at about 60% - 70% for
the two, three and five T3-digits. The results roughly translate to an increase in the correct rate
by 40% in children’s T3S application in flat structures in two years’ time, from age 3 to age 5.
Five-year-olds are still in the process of mastering the use of the T3S rule and still do not have
adult-like performance. Hypothesis H4 Developmental Hypothesis is supported by our
experimental results.

151

5.3.4 Discussion
This section is divided into three subsections. First, observations of correct patterns attested
will be discussed in detail. Next, we turn our attention to T3S errors in children. Both attested
and unattested errors will be looked into in order to identify any existing patterns in children’s
errors. Discussion on what can be learned from the Kuo et al. (2007) study as well as our current
study will conclude this section.
5.3.4.1 Correct surface patterns
(T2T3) and (T2T2T3) are the predicted and attested patterns in two T3-digits and three T3digits respectively. For five T3-digits, both child groups have two patterns, the larger domain
parsing (T2T2T2T2T3) and an unpredicted pattern (T2T2T3)(T2T3) which was also attested in
adults. The predicted pattern (T2T3)(T2T2T3) was attested in adults, but not in children. A
summary table of the discrepancies is in Table 5.9.
Table 5.9 Summary of discrepancies between attested and predicted patterns in a 5-syllable flat
structure
σσσσσ
Predicted patterns
Unpredicted patterns
ST1
ST2
ST3
T3T3T3T3T3 UT
(T2T3)(T2T2T3)
(T2T2T2T2T3)
(T2T2T3)(T2T3)
3-year-olds
×
√
√
5-year-olds
×
√
√
adults
√
√
√
(Attested: √, unattested: ×; shaded cells show the discrepancies between predicted patterns and
attested items.)
ST1 (T2T3)(T2T2T3): Children do not have the predicted pattern and this is not the most
frequent pattern.
ST2 (T2T2T2T2T3): It is a fast speech pattern according to the Word-and-Phrase level model,
but in our experimental setting where fast speech was not required, it was the most commonly
used pattern across age groups. Such results suggest that ST2 does not necessarily occur only in

152

fast speech. The claim is supported by the results in the Kuo et al. (2007) study where larger
domain parsing was attested in slow, normal, and fast speech rates.
ST3 (T2T2T3)(T2T3): Neither left-to-right nor right-to-left parsing can account for this pattern
For the Word-and-Phrase level model to account for this pattern, modifications will be needed to
accommodate this pattern unless such pattern is regarded as ungrammatical. A possible
explanation may be that both binary and ternary feet are available in flat structures. In other
words, upon knowing that the total number of syllables is five, in the subject’s mind, the string is
divided into a binary foot and a ternary foot, and both of which are available before the first
syllable is produced. If a binary foot is picked first, the pattern (T2T3)(T2T2T3) surfaces. If a
ternary foot is picked first, the pattern (T2T2T3)(T2T3) surfaces. The ability of dividing a string
of five syllables into a binary foot and a ternary foot may be intuitive and automatic as “five” is
not that great a number. As the number grows, speakers probably depend on some orderly way of
parsing the string of syllables. We explore some other possibilities of parsing odd number of
syllables in flat structures beyond the standard view of left-to-right binary parsing in flat
structures as shown in Table 5.10.
Table 5.10 Possible parsing of odd number of syllables if left-to-right parsing is not the only
option
Number of syllables
five
seven
nine
eleven
σσσσσ
σσσσσσσ
σσσσσσσσσ
σσσσσσσσσσσ
a. Left-to-right
(σσ)(σσσ) (σσ)(σσ)(σσσ) (σσ)(σσ)(σσ)(σσσ) (σσ)(σσ)(σσ)(σσ)(σσσ)
parsing
(predicted)
b. Right-to-left
(σσσ)(σσ) (σσσ)(σσ)(σσ) (σσσ)(σσ)(σσ)(σσ) (σσσ)(σσ)(σσ)(σσ)(σσ)
parsing
(not predicted)
c. Ternary parsing
n/a
n/a
(σσσ)(σσσ)(σσσ)
n/a
when it is possible
d. Bi-directional
n/a
(σσ)(σσσ)(σσ) (σσ)(σσ)(σσσ)(σσ) (σσ) (σσ)(σσ)(σσσ)(σσ)
parsing
(σσ)(σσσ)(σσ)(σσ) (σσ)(σσ)(σσσ)(σσ)(σσ)
(σσ)(σσσ)(σσ)(σσ)(σσ)

153

In (a) and (b) in Table 5.8, a ternary foot is on the right edge for left-to-right parsing, and on
the left edge for right-to-left parsing. In (c), a string of nine syllables is potentially good for
ternary parsing, with three evenly divided ternary feet. It is not clear if the bi-directional parsing
in (d) does happen as it requires the parsing to go the opposite way at the same time.
Even though binary parsing gives a perfect foot structure, it may not necessarily always
precede other types of feet. In our daily life, we sometimes have a string of digits to read out, as
in the case of phone numbers, social security numbers, and credit card numbers. In these cases,
hyphens to break up the string are placed in the written form. For a seven-digit phone number, it
is normally in the form of XXX-XXXX. For social security numbers, it is XXX-XX-XXXX.
And for credit cards, we commonly see XXXX-XXXX-XXXX-XXXX. Notice that each chunk
of the digits in these examples is composed of two, three, or four digits. This may indicate that
for five digits and beyond, dividing the sequence into multiple units helps us process the
information more easily. Is it possible that a three-digit unit and a four-digit unit are as accessible
as a two-digit unit? I believe the answer is, “very likely” as we survey additional supporting
evidence as follows.
First, Cowan et al. (2007) also points out the presentation of phone numbers in the form of
### - ####, and suggests that there exists some rapid grouping process to help retrieve the digits
by reducing the number of chunks (Cowan et al. 2007). Cowan et al. (2007) reported that in the
(Ericcson et al. 1980) study where an individual was trained to increase his digit span up to 80
digits within a year, the person learned to repeat about 20 digits in months, which was said to be
learned through grouping 3 or 4 digits into new chunks, and later the chunks are further grouped
into super-chunks (Cowan et al. 2007). This provides supporting evidence that a three- or fourdigit chunk is a legitimate chunk. It is robust and active in the initial digit-grouping.

154

Second, consider again the case of phone numbers and social security numbers. In these
series of digits, a three-digit unit precedes the rest of the units composed by two and/or four
digits. A three-digit chunk may precede chunks formed by two or four digits, which is contrary
to the notion of incorporation of the unfooted syllable on the right edge when the left-to-right
parsing is followed. One may argue that foot-building has to do with syllables, and not all digits
are of the same number of syllables. Such argument is reasonable. “Zero” and “seven” are the
only two digits that are disyllabic among digits 0 – 9 in English. Without any hyphens in digits
that are in the written form, number of syllables may have an effect on how the digits are parsed.
However, in Mandarin, digits 0 – 9 are all monosyllabic, so there is no such concern. The surface
pattern (T2T2T3)(T2T3) which was attested in our data was not as unusual as some might have
thought if we take into consideration that grouping digits in units of two or three is common. The
ternary foot in this case then is possibly the result of a digit-parsing strategy, rather than a
product of a binary foot followed by incorporation of an unfooted syllable.
While binary parsing works very well in even-number syllables such as two or four syllables,
ternary parsing will not be very useful in such environment. However, if the even number
happens to be a multiple of “three,” then ternary parsing is available as binary parsing is
available. Take a six-syllable string for example, it may be parsed in three binary feet
((σσ)(σσ)(σσ)) or in two ternary feet ((σσσ)(σσσ)). This does not mean that any parsing strategy
can happen randomly. There should be a certain order, such as honoring the directionality “leftto-right” or using binary foot or ternary foot only when the context allows.
Before we close this section, it should be pointed out that it is possible that the TernaryBinary parsing (T2T2T3)(T2T3) is a pattern used only by Taiwan speakers, including children
and adults. Only when such parsing strategies are found in Mandarin speakers from other regions

155

could we be more certain that ternary feet indeed are available for parsing digits. Future studies
can test whether this pattern is also found in adults in Mandarin speakers in other regions where
Mandarin is spoken.
5.3.4.2 T3S Errors
T3S rule involves T2 and T3 only, and the other two lexical tones, T1 and T4 are irrelevant.
Among all the adults, there were three error tokens of over-application from the same individual
(*T2T2, *T2T2T2, *T2T2T2T2T2). When children make T3S mistakes, what is the nature of
the mistakes they make? We found in children’s answers a great variety of T3S errors, which is a
rich source through which we can have a peek into what they do when they parse and produce
flat structures.
Of all the responses in the test items, only one error was found to be involved with a nonT3— *T3T4, which was produced by a 3-year-old for saying two T3-digits (target answer:
T2T3). One possibility is that the child knew that T3T3 was ungrammatical after the first
syllable had been produced. In order to meet the requirement of “no adjacent T3*” in the T3S
rule, one thing that could be done was to change the tone of the second syllable because it was
too late to change the first syllable. If this was what happened, it actually indicates that the child
knew that two T3* standing next to each other is bad, and he used his own repair strategy.
Except for this single error made by one child, all the T3S errors children made involve T2
and/or T3. We now turn to these errors in the context of two, three and five digits.

156

Two T3-digits
If we assume that children know that only T2 and T3 are involved in their T3S application,
2
then for a sequence of two T3-digits, there are four (two slots with two possible tones =2
combinations) different combinations with only T2 or T3 in each slot as shown in (15).
(15) Four possible combinations of T2 and T3 in two T3-digits
*T2T2
T2T3

*T3T2
*T3T3

Only one surface pattern is grammatical, T2T3. All the other three combinations (*T2T2,
*T3T3, and *T3T2) were found in our child data. For comparison of the frequency of each error
type, the number of tokens of each error type is divided by total tokens of errors. The frequency
of the error types are calculated separately for 3-year-olds and for 5-year-olds. We found that
*T2T2 is the most common (3-year-olds: 61.54%; 5-year-olds: 60.87%), followed by T3T3 (3year-olds: 34.62%; 5-year-olds: 30.43%) in both child groups. *T3T2 is less common (8.70%)
and was attested in the 5-year-olds only. It appears that for both age groups, in a two-syllable flat
structure, they are prone to over-apply T3S rule rather than under-apply it.
*T2T2 and *T3T2 both meet the requirement of “No adjacent T3*,” and in this regard, this
type of errors show that the child has certain knowledge about T3S, though it may not be
complete knowledge, namely, they know that there are alternations between T2 and T3, but they
do not know the right time to use one tone or the other. For *T3T2, it may be that the child
realized that T3S should have been applied after the first syllable had been produced in the
underlying tone, T3. To avoid two adjacent T3*, the second syllable is changed to a T2. This is
possibly a repair strategy used by the child.

157

*T3T3 violates T3S, and children who had this type of errors may or may not be aware of it.
It is possible that the grammar was in place, but the child did not produce it correctly, and did not
attempt to repair the error after it was produced.
Three T3-digits
For three T3-digits, there are eight (three slots with two possible tones =23 combinations)
different combinations of T2 and T3 in the sequence as in (16).
(16) Eight possible combinations of T2 and T3 in three T3-digits
*T2T2T2
T2T2T3
*T2T3T2
*T2T3T3

*T3T2T2
*
T3T2T3
*T3T3T2 (not attested)
*T3T3T3

Except for *T3T3T2, all patterns were attested in this study. The predicted surface pattern
T2T2T3 is attested in all age groups. Children have the pattern of T3T2T3, but not adults.
The pattern T3T2T3 does not violate what T3S prohibits, two adjacent T3* in the domain. If
binary feet are built from left to right, followed by incorporation of unfooted syllable, T3T2T3
should not have surfaced. Two possible explanations for T3T2T3 are the following:
(i)

Syllables are parsed from right to left, followed by incorporation of the unfooted syllable.
T3T3T3

(ii)

T3 (T2T3)

(T3T2T3)

The directionality is from left to right, but first, the leftmost syllable is somehow parsed
as a degenerate foot, and then the other two syllables are parsed as a binary foot.
T3T3T3

(T3)T3T3

(T3)(T2T3)

For 3-year-olds, *T2T2T2 is the most common (55.56%), followed by *T3T3T3 (25.93%)
and the other error types are much less common (below 6%). For 5-year-olds, *T3T3T3 is the

158

most common (31.58%), followed by *T2T2T2 (21.05%) and *T2T3T2 (21.05%), and the other
types of errors are less common (below 6%).
The error type *T2T3T2 is as common as *T2T2T2 in 5-year-olds. As previously suggested
in errors of *T3T2 and *T3T4 for two T3-digits, *T2T3T2 may be a child’s repair strategy for
three T3-digits in order to avoid two adjacent T3* after the second syllable had been produced in
T3. It was one way out, even though it was not perfect. To the child, it could be a better choice
than *T2T3T3. Another possibility of the source of *T2T3T2 may be due to child’s expectation
of alternations between T2 and T3. Children may be familiar with binary foot building, and a
string of multiple binary feet such as (T2T3)(T2T3)(T2T3)(T2T3) has a good rhythm in the
alternation of the tones. In a three-digit sequence, the child had to end it (probably unexpectedly)
at the third digit as she realized there were no more digits after that. Later in the five-digit items,
we will again examine the error type of alternations between T2 and T3.
Five T3-digits
For five-digit items, the Binary-Ternary parsing, (T2T3)(T2T2T3), is missing in both child
groups. Predicated by the Word-and-Phrase level model, the pattern is obtained by binary
parsing from left to right followed by incorporation of the unfooted syllable on the right edge. In
adults, 22.50% of the correct answers are of this pattern, with the larger domain parsing
(T2T2T2T2T3) being the dominant pattern. Ternary-Binary parsing, (T2T2T3)(T2T3), is the
least common, with only 5%, in adults. As adult speech is the language input for children, and if
frequency of each surface pattern plays a role, it is intriguing that children have the most and the
least frequent patterns attested in adults, but not the second common pattern.
5
For five T3-digits, there are 32 (five slots with two possible tones =2 combinations)
different combinations of T2 and T3 in the sequence. Without any knowledge of how to parse
159

feet and how to incorporate an unparsed syllable, the chance is 1/32 (3.125%) to correctly choose
a particular surface pattern that is desired. The fact that children did much better than 3.125%
show that they had some knowledge of T3S. It was not just by chance that they applied T3S
correctly when they did. Even when they did not apply T3S correctly, their errors reveal a
substantial amount of information. We now take a closer look at what we can learn from
children’s errors.
We know that there are 32 combinations of T2 and T3 in a 5-syllable sequence. With three
correct surface patterns, we have 29 patterns left, and all of these are errors. In our study, only 11
error patterns were attested. To better understand why some error patterns surface, while others
do not, let us remember what T3S requires, and what the consequences are if the requirements
were not met. Simply put, what makes a bad pattern bad? Following what T3S requires, the
result should meet each point listed in (17), regardless of how many feet are parsed in a flat
structure.
(17) Summary of characteristics of expected surface patterns when T3S is correctly applied
a.

Initial syllable is a T2.

b.

The final syllable is a T3.

c.

No adjacent T3* within the same domain
Not correctly applying T3S naturally will not generate the expected result listed in (17). A

summary of what is helpful in categorizing possible error types is in (18), which is the opposite
of (17).

160

(18) Summary of characteristics of ungrammatical patterns when T3S is not correctly applied
a.

*Initial syllable is a T3

b.

*Final syllable is a T2

c.

*Adjacent T3* within the same domain
Notice that an ungrammatical pattern does not have to have all of (18a), (18b), and (18c).

With just one of them, the pattern is ungrammatical. If some error patterns surface, while others
do not, maybe some errors are better than others. A convenient way to help us better understand
the attested and unattested error patterns is to use the concept of violation in Optimality Theory
(Prince & Smolensky 1993/2004), and regard (18a), (18b), and (18c) each as one violation of
T3S. We assume that an error pattern with one violation is better than another error pattern with
two violations, which in turn is better than yet another error pattern with three violations.
Without further complicating the picture, violations in (18a) – (18c) are treated as equally bad
(they are neither ranked nor weighted.)
Table 5.11 lists all 32 combinations of T2 and T3 in a five-syllable structure. The patterns
are further divided into grammatical and ungrammatical patterns, attested and unattested patterns,
and other sub-categories. The digits in Table 5.11 indicate the surface tones (e.g. T2T2T2T2T3
is represented as 22223.)

161

Table 5.11 Study 2: 32 combinations of T2 and T3 in a five-digit sequence
Attested (14 patterns)
Unattested (18 patterns)
Grammatical (3 patterns)
Ungrammatical (11 patterns) Ungrammatical (18 patterns)
Characteristics of ungrammatical patterns:
In children and adults:
22223
a. *Initial syllable in T3 (*Initial T3)
22323
b. *Final syllable in T2 (*Final T2)
c. *Adjacent T3* (*T3T3)
Only in adults:
In children and adults:
Not in children or adults:
23223
*22222 (one violation)
Only in children:
In both child groups:
*22322 (one violation)
*22333 (one violation)
*33333 (one violation)
In 3-year-olds only:
*22232 (one violation)
*23333 (one violation)
*33233 (two violations)
*33232 (three violations)
In 5-year-olds only:
*23232 (one violation)
*23233 (one violation)
*32333 (two violations)

Three violations (4 patterns):
*Initial T3, *Final T2, *T3T3:
*32332
*33222
*33322
*33332
Two violations (9 patterns):
*Initial T3 and *Final T2:
*32222
*32322
*32232
*Final T2 and *T3T3:
*23322
*22332
*23332
*Initial T3 and *T3T3:
*33223
*33323
*32233
One violation (5 patterns):
*Final T2:
*23222
*Initial T3:
*32223
*32323
Adjacent T3*:
*23323
*22233

162

From Table 5.11, we know that the errors 3-year-olds made include one, two, and three
violations. For 5-year-olds, their errors include one violation and two violations. For the only
adult that made the error *T2T2T2T2T2, it was an error of one violation. It shows that the
younger the age, the higher number of violations appears to be tolerated.
An interesting discovery is that, our child subjects knew what kinds of errors are “better
errors” to make. That is, when they made a T3S error, errors of fewer violations had a better
chance to be picked than errors of more violations. This point is demonstrated by the percentage
of an error to actually surface in our participants in Table 5.12. The numbers of possible patterns
and attested patterns for calculating the “survival rate of error patterns” is obtained from Table
5.11.
Table 5.12 Study 2: Percentages of attested and unattested error patterns in children
Errors by
Number
Number
Total
Percentage Percentage
number of T3S of
of
number of
of attested
of
violations
attested
unattested possible
patterns
unattested
error
error
error
(“survival
patterns
patterns
patterns
patterns
rate”)
Calculation
(A)
(B)
(C = A + B) (D = A/C)
(E = B/C)
Three violations
1
4
5
20.00%
80.00%
Two violations
2
9
11
18.18%
81.82%
One violation
8
5
13
38.46%
61.54%
Total
11
18
29
n/a
n/a

Total

(D + E)
100%
100%
100%
n/a

Of all of the 13 possible error patterns of one violation, eight of them surfaced (61.54%).
Errors of two or three violations surfaced at a much lower rate, with 18.18% and 20.00%
respectively. In other words, the higher the number of violations, the less likely it is to be attested.
This indicates that even though children made T3S mistakes, the mistakes were not just random
mistakes.
Are there particular error types that children are prone to make? For 3-year-olds, the most
common error type is *T2T2T2T2T2 (60.71%), followed by *T3T3T3T3T3 (14.29%). All the
other error types are of low frequency (0% - 7.14%). For 5-year-olds, the most common error
163

types is *T3T3T3T3T3 (41.18%), followed by *T2T2T2T2T2 (17.65%) and *T2T3T2T3T2
(17.65%). The results echo what we had seen previously in the three-digit items. Based on
frequency of children’s error types, the profiles we have for 3-year-olds and 5-year-olds can be
summarized in (19) and (20) respectively.
(19) Three-year-olds
Using sandhi tone for all syllables (60.71%) is a better strategy than using underlying
tones for all the syllables (14.29%).
(20) Five-year-olds
a.

Using underlying tones for all the syllables (41.18%) is a better strategy than using sandhi
tone for all the syllables (17.65%).

b.

Alternation strategy (alternating between T2 and T3 in a string of syllables, 17.65%) is as
good as using sandhi tone for all the syllables (17.65%).
While *T2T3T2T3T2 was one of the favored strategies in 5-year-olds, such pattern is non-

existent in 3-year-olds. At age 3, they might not have noticed the alternation strategy that 5-yearolds have noticed. At this young age, they also might be better at maintaining the same tone in a
sequence (e.g. *T2T2T2T2T2 or *T3T3T3T3T3). At age 5, children not only have noticed the
option of alternation strategy (*T2T3T2T3T2), they are more mature in terms of articulatory
development and can manage alternations better than younger children could. *T2T3T2T3T2 is
not the only error pattern of alternations. *T3T2T3T2T3 is also a pattern that alternates between
T2 and T3. However, it never surfaces. The absence of this pattern provides indirect evidence
that the parsing is from left-to-right, rather than from right-to-left.
Of the eleven attested error patterns, 72.73% (8/11) are the result of one violation, 18.18% is
a result of two violations (2/11), and only 9.09% (1/11) is a result of three violations (occurred in
164

3-year-olds only). It is evident that children’s error patterns were attended to, rather than
produced carelessly. Their production of T3S in flat structures, even when the attempts did not
succeed, was governed by the grammar of T3S that was still maturing.
In all unattested error patterns, about 80% of the errors of two and three violations did not
appear in children’s production. That means these children were aware of the degree of “badness.”
Their errors of one violation may be bad, but errors of one violation are closer to the target than
other errors that are relatively worse.
5.3.4.3 General discussion
This current study as well as the Kuo et al. (2007) study provide much needed empirical data
for us to better understand T3S in flat structures. Both studies support the areas where
predictions made by T3S theories match the empirical evidence. In a number of areas where
predictions did not perfectly match the experimental data, the findings raise issues that need to be
addressed in future work. The findings of the Kuo et al. (2007) study and the current study are
checked against predications made by the Word-and-Phrase level model. The summary is in
Table 5.13.

165

Table 5.13 Checking empirical data against theoretical predictions
T3S application in flat
(Kuo et al. 2007)
This current study
Flat structures
structures predicted by
Flat structure (2, 3, and 4 digits)
the Word-and-Phrase
embedded sentence-initially,
(2, 3, and 5 digits)
level Model
followed a T3 you ‘there is’) in a
carrier sentence
a. Binary parsing
Evident in sequences of five
Yes
adjacent T3*, but not very clear
in three and four adjacent T3*
b. Incorporation of the
Yes
Yes
unfooted syllable
c. Directionality:
Yes
Yes, but there was an extra
from left to right
pattern in five-syllable items
that could not be accounted
for by left-to-right binary
parsing.
d. Larger domain parsing Larger domain parsing occurred
Larger domain parsing
in fast speech
in slow, normal, and fast speech
occurred in normal speech
rates.
(the experimental setting).

One may wonder whether or not the experimental design affects the results in Table 5.12. In
the next two subsections, this possibility is briefly discussed.
T3S in flat structures in the Kuo et al. (2007) study
Two main areas the Kuo et al. (2007) study investigates are (i) the phonetic nature of the
derived T2 and a true T2, and (ii) how T3S is applied in flat structures. Embedding a flat
structure in a carrier sentence does not have an effect on (i), but may have an effect on (ii). When
the number of adjacent T3* is small, the string of T3* may be prone to be parsed in one domain.
However, as the number of adjacent T3* grows, separating the syllables into multiple prosodic
domains is inevitable.
Schematically, the sentence in (21) represents a flat structure composed of an odd number of
syllables (digits) being embedded sentence-initially.

166

(21) Odd number of syllables in a flat structure embedded in a sentence
Liang liang ….
liang liang liang you
mei you?
Two
two
…
two
two
two
there is there is not

a.
b.
c.

3
(2
(2
*(2

3
3)
3)
3)

…
…
…
…

3
(2
(2
(2

3
2
2
3)

3
3)
2)
(2

3
(3
(3
3)

2
2
2
(2

3
3)
3)
3)

‘Is there two-two-…
-two-two-two?’
UT
ST1
ST2
ungrammatical

In the flat structure, syllables are parsed in disyllabic feet from left to right, leaving the last
digit unparsed. Then this unparsed syllable is incorporated into the disyllabic foot that precedes it.
The result is ST1. In (21b), when T3S applies across the last two prosodic domains, ST2 results.
In (21c), it appears that disyllabic feet are formed nicely; however, the right-most digit in the
sequence is “detached” from the rest of the digits when it joins the following syllable in the nonflat structure to form a disyllabic foot. The extraction of this syllable out of the flat structure is
most likely to be responsible for the ungrammaticality of the potentially perfect binary parsing.
The dangling unfooted digit at the edge of the sequence should join the members of its own kind
(i.e. digits), rather than being “given away” to a syllable in the non-flat structure, even if that
allows the formation of a perfect binary foot. The ungrammaticality of (21c) strongly indicate
that T3S is dependent on syntax even in this case where both structure-less flat structure and
structured carrier sentence are present.
To eliminate the effect a carrier sentence may have on the T3S application in a flat structure,
the syllable(s) immediately preceding and/or following the flat structure may be restricted to
non-T3* (T1, T2, or T4). In that case, the T3S application in the flat structure will not be affected
by a neighboring T3 in the carrier sentence.
How does T3S apply in a flat structure that is in a sentence when one or more T3* are on
either side, or even both sides (of the flat structure)? The location of the flat structure can be at
the sentence-initial, sentence-medial, or sentence-final position, so shifting the location of the
167

flat structure will allow us to learn how the flat structure interacts with neighboring syllable(s)
from outside the flat structure. By manipulating the location of the flat structure, as well as the
number of adjacent T3* within and outside the flat structure, it may help us understand more of
the nature of T3S application. In the literature, discussion on flat structures are generally
restricted to pure flat structures, so the (Kuo et al. 2007) study initiates an area in T3S that had
not been previously explored.
T3S in flat structures in the current study
Our experimental work focuses on flat structures of two, three, and five digits. Existing T3S
literature on flat structures commonly agree on the left-to-right binary parsing, followed by
incorporation (Chen 2000; Lin 2007; Shih 1986; Shih 1997). We provide data from both
children and adults in their application of T3S. While the surface patterns for two- and three-digit
flat structure matched the predicted patterns, in the five-digit flat structure, in addition to the two
predicted patterns (T2T2T2T2T3 and T2T3T2T2T3) predicted by the Word-and-Phrase level
model, a third pattern was attested (T2T2T3T2T3). For future studies, the number of digits can
be expanded to test whether or not there is indeed an alternative parsing besides the conventional
parsing predicted by the T3S models.
In our study, 5-year-olds are not yet adult-like. Future studies can study children of a wider
age range, preferably including children beyond five years old. This will allow us to learn
approximately at what age children become adult-like. In addition, the tasks were purposefully
kept simple in our study for the youngest age group was three years of age. For future studies, if
it is appropriate for the participants’ ages, the series of digits can be made longer, or more
complicated with different digits in the same string (e.g. instead of 555555, use 595959 or
555999). T3S in odd number of digits (e.g. 3, 5, 7, 9 digits) and even number of digits (e.g. 2, 4,
168

6, 8 digits) can be compared. Do subjects consistently use the same strategy for odd number of
digits, or even number of digits? For example, is binary parsing used consistently in even
number of digits? In cases where incorporation is predicted in odd number of digits, is it found
true in speakers’ production? In the number of syllables where both binary parsing and ternary
parsing are possible as in a 6-digit sequence, is it that only binary parsing is used? These are
some appealing questions future research on T3S in flat structures can ask.
For the unexpected pattern (T2T2T3)(T2T3) attested in the experiment, without further
evidence, we could only offer a plausible explanation that both binary feet and ternary feet are
robust. In the case of five syllables, to divide the string into two domains, it is either “2 + 3” or
“3 + 2,” and since “five” is not a large number, the calculation could happen in the speakers
mind instantly or automatically. It would be very interesting to see what speakers do in a sevensyllable flat structure, which is ideal for testing whether binary parsing goes before ternary
parsing in the beginning. A study using seven-digit sequence may have to minimize the possible
bias of phone-number reading. A bias that comes from a habitual way of grouping 3 digits
followed by 4 digits in such case may not be very easy to eliminate.
A nine-digit sequence potentially has even more possibilities of how syllables can be
chunked. The pattern predicted by the Word-and-Phrase level model is three binary feet followed
by a ternary foot (2+2+2+3), with a total number of four feet. Is it possible that the sequence is
divided into three ternary feet (3+3+3)? An even number of digits can be equally interesting and
it does not have to be always divided into binary feet. An example is the 10-digit cell phone
numbers in Taiwan, and the sequence is typically broken down as XXXX-XXX-XXX, which has
three feet only. With a long string of digits, it is reasonable to maximize the number of digits a
domain can accommodate, and yet for each domain, the load has to be manageable for the

169

speaker. This may in fact be a better option than following binary parsing and have five small
feet identical in size. Linking T3S in flat structures and how short and longer sequence of digits
are divided is an area still to be explored.
5.4 Conclusions
For the control items, 3-year-olds, 5-year-olds, and adults all did perfectly (100% correct rate)
showing they all understood the task. In the test items, adults had a 97.50% correct in two-,
three-, and five-digit flat structures. Binary parsing and incorporation of the unfooted syllable
were supported by our data. Nevertheless, in the five-syllable item, an unpredicted pattern was
attested in all age groups.
We found also in the five-syllable item the larger domain parsing pattern was the dominant
pattern in all age groups. This suggests that larger domain parsing is not restricted in fast speech
only.
In our study, 3-year-olds’ correct rates in the test items in two, three, and five syllables were
between 20% and 30%, while 5-year-olds’ were between 60% and 70%. A developmental
pattern is clear. At age 3, children have the knowledge of changing a T3 to a T2 when followed
by another T3 in flat structures, but at age 5, children still are not adult-like. An investigation of
carefully sorting out children’s T3S errors proved very interesting. It is true that children had
difficulties with T3S application, especially the 3-year-olds, but they were not just making errors
randomly. Even when they made errors, those errors were the “better” kind of errors. The correct
patterns they produced and the errors they made were governed by certain principles or restricted
by a range of constraints.

170

CHAPTER 6
NPS AND EVIDENCE FOR A SYNTACTIC PARSING
6.0 Introduction
In the previous chapter, we examined contexts without internal structures that require noncyclic T3S application. As most utterances humans produce have internal structures, when T3S
is applied, most likely, it is a case of T3S application in a structured phrase or sentence. In this
chapter, we focus specifically on short NPs, a context in which T3S should be applied cyclically.
In the next chapter, we will look at sentences where a mixture of cyclic and non-cyclic strategies
is needed.
At the level of NPs, prosodic domains for T3S have to be built from the innermost
constituents outwards. T3S outputs depend on the syntactic structures because the prosodic
parsing is built based on it. Within NPs, a speaker’s T3S surface patterns reflect how the T3S
structure was built. The rest of the chapter is organized as follows. I begin in section 6.1 with
some linguistic background on cyclic T3S applications at the Word level, and then I show how it
works in various noun compounds and NPs. Section 6.2 presents research questions, hypotheses,
and predictions. Our experimental study on NPs is in Section 6.3, the major section of this
chapter. The results and detailed discussions are also included in this section. Finally, Section 6.4
concludes this chapter with a summary of the findings.
6.1 Linguistic background
6.1.1 Cyclic T3S application at the Word Level
According to the Word-and-Phrase level model (Chen 2000; Shih 1986; Shih 1997), T3S is
cyclic at the Word level, which includes simple nouns, compound nouns and complex nouns.
Complex nouns refer to ‘modifier + noun,’ such as xiao laoshu ‘small mouse’ (Lin 2007:207).
171

Lin further clarifies that, although a noun with a modifier is often treated as a noun phrase
syntactically, in this T3S model, compound nouns, complex nouns [modifier + noun], along with
simple nouns are treated as words, instead of phrases (Lin 2007:207). ‘Modifier + noun’ and
32

‘verb + resultative complement ’ constructions are taken as lexical, instead of phrasal
constructions, because they behave like integral lexical items

33

phonologically (Chen 2000:387).

Verbs are typically treated at the phrase level except for the verb compounds (e.g. xizao ‘take a
bath/shower’ and xunzhao ‘to look for’) and ‘verb + resultative complement’ construction
mentioned earlier. We restrict the contexts of testing cyclic T3S application only to NPs in order
to keep the experimental task simple for children. In the rest of the chapter, we investigate cyclic
T3S application in NPs only (including compound nouns) and compound verbs will not be
discussed further. It should be noted that all the examples in this chapter are examples of T3S
applications at the Word level. Therefore, in the presentation of the examples, the distinction
between the Word level and the Phrase level will not be included. In the next chapter where
examples at the sentence levels are presented, such distinction will be made for clarity.
6.1.2 Compound Nouns and NPs
In this section, cyclic T3S application in compounds and NPs of various structures and length
are exemplified in (1) – (7). Focus will be placed on the cyclic foot-building, and the possibility
of larger domain parsing in fast speech is not our focus here. Thus, the fast speech pattern is not
32

An example of the structure of “verb + resultative complement” is as follows where wan
‘finish’ is a resultative complement to indicate the state/result of the action.
[chi wan]
eat finish ‘done eating’
33
Chen (2000:387) indicates that “modifier + noun” and “verb + resultative” constructions
behave like integral lexical items. This means the two elements of each of the structures above
are grouped together. The two elements will be parsed in the same prosodic domain in T3S
application.
172

included in the following examples, predicted by the Word-and-Phrase level Model. Cyclic
application in (1) and (2) gives different surface patterns.
(1)
σσ
[[laohu]
tiger
3 3
(2 3)
(2 2
*(3 2

σ
wei]
tail
3
3
3)
3)

‘tiger tail’
UT
T3S
Incorporation; T3S, ST

(2)

σ
[paper
paper
3
3
(3
?
(2

σσ
[laohu]]
tiger
3
3
(2 3)
2
3)
2
3)

‘a tiger made of paper; paper tiger’
UT
T3S
Incorporation; No T3S, ST

In (1) and (2), we see two three-syllable noun-noun compounds. In (1), we see a disyllable
noun followed by a monosyllabic noun. T3S first applies in the innermost constituent, the
disyllabic noun. In the second cycle when the monosyllabic noun is incorporated, T3S applies
again, resulting in the T2T2T3 surface pattern. In (2), it is a monosyllabic noun followed by a
disyllabic noun. Again, T3S applies to the innermost constituent, the disyllabic noun. When the
monosyllabic noun preceding it is incorporated in the next cycle, T3S does not apply because
there are no adjacent T3* now. The pattern (T2T2T3) in (2) may be grammatical to some native
speakers, but not others, though such pattern can be derived in the Word-and-Phrase level model
through larger parsing in fast speech. The results of the current study will provide some answers
to this question.

173

Next, let us look at phrases in (3) – (6) which are all composed of four syllables. Their
internal structures differ, however. We will see that T3S starts in the innermost constituents in
these NPs consistently.
(3)

[xiao
small
3
3
3
(3
*(2

[[lao-hu]
tiger
33
(23)
(22
22
3)(2

wei]]
tail
3
3
3)
3)
3)

‘small tiger tail’
UT
T3S
Incorporation; T3S
Incorporation; No T3S, ST

[xiao
small
3
3
3
(3
*(2

[[duan tui]
short leg
3
3
(2
3)
(2
2
2
2
3) (2

gou]]
dog
3
3
3)
3)
3)

‘(a) small short-legged dog’
UT
T3S
Incorporation; T3S
Incorporation; No T3S, ST

(4)

In (3), the innermost constituent is laohu ‘tiger’ which is in the phrase-medial position. Two
subsequent steps produce the surface pattern of T3T2T2T3. The NP in (4) is a case of mixedbranching NP, a structure that is less talked about in the T3S literature. Typically, left-branching
and right-branching structures are used for the contrast in cyclicity in previous T3S studies. In
(4), the second and the third syllables form a foot duan tui ‘short-legged’ which modifies gou
‘dog.’ A ternary foot for ‘duan tui gou ‘short-legged dog’ forms in the next cycle where T3S
applies to tui ‘leg’ when the rightmost syllable gou ‘dog’ is incorporated. The final step is to
incorporate the initial syllable xiao ‘small’ and T3S does not apply in this cycle. The NPs in (3)
174

and (4) have the same surface pattern though they are slightly different in their internal structures
in that laohu ‘tiger’ is disyllabic lexical item whereas duan tui ‘short-legged’ are two
monosyllabic lexical items. The following examples illustrate different patterns.
(5)

[[hai
sea
3
(2
*(3

di]
bottom
3
3)
2

[hai
sea
3
(2
2

cao]]
grass
3
3)
3)

‘sea-bottom seaweed’
UT
T3S, ST

[hai
sea
3
(2
2
2
2

cao]]]
grass
3
3)
3)
3)
3)

‘small purple seaweed’
UT
T3S
Incorporation; No T3S
Incorporation; T3S, ST

(6)

[xiao
small
3
3
3
(2
*(3

[zi
purple
3
3
(3
3
2

In (5), the embeddedness of the two constituents is the same. Two binary feet are parsed at
the same time, and T3S applies simultaneously within these two feet, producing the surface
pattern of T2T3T2T3. In (6), notice that the surface pattern is also T2T3T2T3, yet it is not
derived through the same way. As the phrase in (6) is a right-branching structure, T3S applies
first in the innermost constituent haicao ‘seaweed’, and then proceeds to a layer higher to
incorporate the second syllable zi ‘purple’ where T3S does not apply since there are no adjacent
T3* at this point. Finally, xiao ‘small,’ the topmost layer, is incorporated. T3S applies again.
Here we see that though the same surface pattern may come from different syntactic structures,
we know the surface pattern for (3) and (4) cannot be used in (5) and (6), and vice versa. In this
175

experiment, two types of structures ((4) and (6)) were tested. We now turn to two more examples
of 5-syllable phrases in (7) and (8). Although the patterns in (7) and (8) are not tested, they also
are used to demonstrate cyclic T3S in the NPs which have different internal structures.
(7)

[[Bei
north
3
(2
(2
(2
*(3

hai]
sea
3
3)
3)
2)
2

[xiao
small
3
3
(3
(3
3)

[hai
sea
3
(2
2
2
(2

gou]]]
dog
3
3)
3)
3)
3)

‘small North Sea fur seals’
UT
T3S
Incorporation, No T3S, ST1
Optional T3S across domains, ST2

(8)

[ai
short (height)

[[duan
pao] [xuan
short (length) run
select

3
3
(3
*(2
*(2

3
(2
2
3)
2)

3
3)
3)
(3
(3

3
(2
(2
2
2

shou]]]
hand (a trained athlete)
‘(a) short sprinter’
3
UT
3)
T3S
3)
Incorporation; No T3S, ST
3)
3)

In (7), the five-syllable NP is composed of a two-syllable and a three-syllable constituents,
with the latter having one more layer than the former. T3S applies cyclically in bei hai ‘North
Sea’ and xiao hai gou ‘small fur seal,’ resulting in (T2T3)(T3T2T3) in the surface. Notice that
the second and the third syllable both surface as a T3, but the two adjacent T3* belong to
different prosodic domains and the surface pattern is grammatical. When optional T3S applies
176

across the domains, another pattern (T2T2)(T3T2T3) surfaces. The two grammatical patterns in
(7) are ungrammatical in (8) which also have five syllables. In (8), duan pao ‘short race’
modifies xuanshou ‘(a) trained athlete,’ and T3S applies separately in these two disyllabic feet
first. Ai ‘short (in height)’ modifies duan pao xuan shou ‘sprinter’ and when it is incorporated
into the following foot, no T3S applies. The surface pattern is (T3T2T3)(T2T3) for the NP in (8).
The structural difference between (7) and (8) explains why a surface pattern is grammatical in
one is ungrammatical in the other. Their T3S patterns reflect their syntactic structures.
To sum up, (1) – (8) clearly show that cyclicity is strictly followed in T3S applications in
compounds and NPs because if that were not true, we would have found those NPs of three, four,
and five syllables in (1) – (8) to exhibit the same surface pattern for the same number of syllables.
In other words, (T2T2T3), (T2T3)(T2T3), and (T2T3)(T2T2T3) would have been found
consistently in three-, four-, and five-syllable structures respectively, regardless of the internal
structural differences. In short, T3S in NPs is sensitive to morpho-syntax. Prosody-based left-toright parsing that works in flat structures is out of the picture in structured NPs. In the next
section, based on what we know about cyclic T3S application in NPs, research questions are
raised, followed by hypotheses and predictions for the experimental study on NPs.
6.2 Research questions, hypotheses, and predictions
These experiments are designed to investigate children’s and adults’ T3S application in NPs.
Three-syllable compound nouns and four-syllable NPs which have different internal structures
are used to test children’s parsing strategies.
6.2.1 Research questions and Hypotheses
Our major research questions are as follows.

177

(9)

Research questions

a.

Do children know to apply T3S cyclically in NPs?

b.

Does structural complexity affect parsing strategies they use?
Whether or not children refer to syntax when they build T3S domains can be inferred from

the surface patterns they produce. Responses reflecting cyclic parsing provide evidence that
syntax is referred to. On the contrary, responses reflecting non-cyclic parsing, even when the
branching of the NPs differs, will be counter evidence that syntax is referred to. We assume that
R-branching (right-branching) is less complicated than the M-branching (mixed-branching)
structure (i.e. branching of one direction is easier than branching of more than one direction).
Our hypotheses are in (10).
(10) Two hypotheses
NP Cyclic Parsing Hypothesis (H1): Children know how to use cyclic bottom-up parsing
strategy in NPs.
Strategy Shift Hypothesis (H2): When structures increase in complexity they may default
to a prosodic parsing and ignore syntax.
6.2.2 Predictions of T3S application in NPs
In this section, the predictions of three-syllable compound nouns and four-syllable NPs are
presented. Our focus is placed on the cyclic parsing process, and the fast speech pattern obtained
through a larger domain within which T3S applies from left to right is excluded from the
predictions because our experimental setting does not require fast speech, and as normal speech
is used, we expect the normal cyclic parsing.

178

6.2.2.1 Three-syllable compound nouns
The surface patterns in (11) and (12) below differ because of their morphosyntactic
differences. In (11), T3S applies in the innermost unit, the first noun, which has two syllables.
When the second noun is incorporated in the next cycle, T3S applies again. The surface pattern is
(T2T2T3). In (12), T3S applies first to the innermost unit, the second noun, which has two
syllables. When the first noun is incorporated in the next cycle, T3S is inapplicable.
(11) A three-syllable [[σσ] σ] compound noun
[[laoshu]
bi]
mouse
pen
‘mouse-pen’ (a pen that looks/shapes like a mouse)
3
3
3
UT
(2 3)
3
T3S
(2 2
3)
Incorporation, T3S; ST
(12) A three-syllable [σ [σσ]] compound noun
[zhi
[haima]]
paper
seahorse
‘paper seahorse’ (a seahorse that is made of paper)
3
3 3
UT
3
(2 3)
T3S
(3
2 3)
Incorporation, No T3S; ST
If children make no reference to syntactic properties of the novel compound nouns, we
expect to see (T2T2T3) for both structures ([[σσ] σ] and [σ [σσ]]). However, if children use
syntactic properties in building feet, then the two different structures [[σσ] σ] and [σ [σσ]] should
have (T2T2T3) and (T3T2T3) respectively through the cyclic parsing strategy.
6.2.2.2 Four-syllable noun phrases
Now we look at predictions for four-syllable NPs. Right-branching and left-branching
structures are commonly used for contrasting the different T3S surface patterns in the two
structures. Mixed-branching structures, however, are less talked about. In (13) - (15), all three
structures are presented. Due to the relatively low occurrence of left-branching NPs in Mandarin
and difficulties in finding suitable left-branching examples for children, right-branching and

179

mixed-branching structures are used for testing cyclic T3S applications in NPs. Our purpose in
this experiment is to test whether or not children are sensitive to syntax in applying T3S in NPs,
so using two different structures will sufficiently meet our needs.
(13) A four-syllable right-branching ([σ [σ [σσ]]]) NP
[xiao [zi
[haima]]]
small purple seahorse
‘(a) purple seahorse’
3
3
33
UT
3
3
(23)
T3S
3
(3
23)
Incorporation, No T3S
(2
3
23)
Incorporation, No T3S; ST
(14) A four-syllable left-branching ([[[σσ] σ] σ]) NP
[[[zhanlan]
guan] zhang]
(Chen 2000:383)
exhibition
hall director
‘exhibition hall director’
33
3
3
UT
(23)
3
3
T3S
(22
3)
3
Incorporation, T3S
(22
2
3)
Incorporation, T3S; ST
(15) A four-syllable mixed-branching ([σ [[σσ] σ]]) NP
[xiao
[[duan tui]
ma]]
small
short leg
horse
‘(a) small short-legged horse’
3
3
3
3
UT
3
(2
3)
3
T3S
3
(2
2
3)
Incorporation, T3S
(3
2
2
3)
Incorporation, No T3S; ST
In (13) – (15), T3S always begins with the innermost constituent, and then proceeds outwards
cyclically, taking one layer at a time when a syllable is incorporated into the foot that has been
built. A right-branching structure in (13) results in the “alternating pattern,” (T2T3)(T2T3),
which alternates between T2 and T3. A left-branching structure in (14) begins T3S at the left
edge of the structure. One syllable is incorporated at a time and T3S applies each time since there
are adjacent T3* upon each incorporation. The surface pattern is (T2T2T2T3). In (15), the
innermost constituent is in the middle. When the final syllable is incorporated into the binary

180

foot, T3S applies, but when the initial syllable is incorporated into the ternary foot, T3S does not
apply since there are no adjacent T3*. The surface pattern is (T3T2T2T3).
To summarize, different surface patterns are expected depending on internal structures of the
compounds or NPs. For a three-syllable noun-noun compound whose first noun is disyllabic,
(T2T2T3) is expected. If it is the second noun that is disyllabic, (T3T2T3) is expected. If
children do not refer to structural differences, then the left-to-right prosodic parsing which result
in (T2T2T3) will surface for both structures. For four syllable NPs, (T2T3T2T3) and
(T3T2T2T3) is expected for a right-branching NP and a mixed-branching NP respectively
through the cyclic bottom-up parsing strategy. If no reference is made to syntax, (T2T3T2T3)
obtained by the left-to-right binary parsing will surface for both structures. A summary of the
predicted patterns for the structures tested is in Table 6.1.
Table 6.1 Study 3: Predicted patterns for the structures tested
Structures
Three-syllable compound nouns
Four-syllable NPs
[[σσ]σ]
[σ[σσ]]
[σ[σ[σσ]]]
[σ[[σσ]σ]]
N-N compound
N-N compound R-branching NP M-branching NP
UT
333
333
3333
3333
ST
223
323
2323
3223

6.3 Experiment 2: NPs
Experiment 2 consists of elicitation of novel three-syllable compound nouns and foursyllable NPs, shown in (16) and (17) respectively. T3S applies cyclically in both structures.
(16) Two structures tested in three-syllable compound nouns
a.

[[σ σ]
3 3
(2 3)
(2 2

σ]
3
3
3)

UT
ST

b. [σ
3
3
(3

[σ
3
(2
2

181

σ]]
3
3)
3)

UT
ST

(17)
a.

Two structures tested in four-syllable NPs
[σ
3
3
2
(2

[σ
3
3
(3
3

[σ
3
(2
2
2

σ]]]
3
UT
3)
3)
3) ST

b. [σ
3
3
3
(3

[[σ
3
(2
(2
2

σ]
3
3)
2
2

σ]]
3
3
3)
3)

UT

ST

6.3.1 Method
6.3.1.1 Subjects
One hundred fourteen subjects were recruited in Taichung, Taiwan for this study. There are
five age groups: three-, four-, five-, and six-year-olds, and adults. Table 6.2 shows the
distribution of the participants.
Table 6.2 Study 3: Distribution of the subjects
Age groups
N
Age range
3-year-olds
4-year-olds
5-year-olds
6-year-olds
adults

24
20
27
23
20

3;1 – 3;11
4;1 – 4;9
5;0 – 5;11
6;0 – 6;11

Mean
3;7
4;3
5;3
6;6

Standard
deviation
2.49 (mo.)
2.40 (mo.)
3.05 (mo.)
3.04 (mo.)

6.3.1.2 Procedure
All child subjects were tested in a quiet classroom in the kindergarten or in the home of the
child. Adult subjects were tested in a quiet room. The elicited production task lasted
approximately 15-20 minutes for children, and 10 minutes for adults. Children were told that
they were going to look at pictures on the computer and play a game. Each subject sat in front of
a laptop computer which displayed slides of pictures. All data was recorded by a Marantz PMD
660 with an Audio-technica miniature clip-on microphone.

182

6.3.1.3 Design
An elicited repetition task (Crain & Thornton 2000; McDaniel et al. 1998) is used in this
experiment. The procedure used to elicit three-syllable compound nouns and four-syllable NPs
are similar, with the latter more complicated than the former. Comparing to the three-syllable
compound nouns which requires putting two nouns together, four-syllable items have more
‘building blocks’ and more layers to be attended to in building the novel NPs. As a result, more
pictures were used to elicit four-syllable NPs than three-syllable compound nouns.
T3S does not apply in the control items which have no adjacent T3* underlyingly; therefore,
the surface tones are the same as the underlying tones. In the test items, T3S will apply through
the building of prosodic domains based on the morphosyntactic structure of the compound nouns
or NPs. Surface tones should reflect the internal structure of the compound noun or NP. The
structures tested and their derivations of T3S application are presented separately below. A list of
controls and tests is in Appendix C.
Three-syllable compound nouns
Two three-syllable structures tested are in (18). We keep only the parsing information and
predicted outputs, which is based on the Word-and-Phrase level Model (Chen 2000; Shih 1986;
Shih 1997).
(18) Two structures in three-syllable compound nouns
a.

b.

noun-noun compound
σσ
33
(23)
(22

σ
3
3
3)

noun-noun compound
σ
3
3
(3

UT
T3S
Incorporation,
T3S; ST

183

σσ
33
(23)
23)

UT
T3S
Incorporation,
No T3S; ST

Four-syllable NPs
Two four-syllable structures tested are in (19).
(19) Two structures in four-syllable NPs
a. Right-branching NP

σ
3
3
3

σ
3
3
(3

σ
3
(2
2

σ
3
3)
3)

(2

3

2

3)

b.

Mixed-branching NP

σ
3
3
3

σ
3
(2
(2

σ
3
3)
2

σ
3
3
3)

(3

UT
T3S
Incorporation,
No T3S
Incorporation,
T3S; ST

2

2

3)

UT
T3S
Incorporation,
T3S
Incorporation,
No T3S; ST

6.3.1.4 Materials
We showed each child pictures of animals and objects in order to elicit novel compound
nouns. Animals or objects that are typically known to children were used, but the combinations
of the nouns in the compounds are novel. Each picture is shown individually, one at a time on
different Powerpoint slides. Sample experimental materials are provided for three-syllable
compound nouns and four-syllable NPs in Figure 6.1 and Figure 6.2 respectively. Both figures
include a sample control item and a test item for both structures tested. A complete list of the
experimental material as well as the instructions in Mandarin is in Appendix D.

184

Figure 6.1 Study 3: Sample materials in three-syllable compound nouns
Control or
Control
Test
Structures
[[σσ]σ]
a.
b.

[[binggan]
cookie
3 1
3 1
[σ[σσ]]

niao]
bird ‘cookie-bird’
3
UT
3
ST

c.

[shui
water
elephant’
3
3

[[shuiguo]
fruit
3 3
2 2

Test

niao]
bird ‘fruit-bird’
3
UT
3
ST

d.

[daxiang]]
elephant ‘water4 4
4 4

[shui
water
3
3

UT
ST

[laohu]]
tiger
3 3
2 3

‘water-tiger’
UT
ST

For instance, to elicit a novel compound that has the structure of [[σσ]σ] (Figure 6.1 (b)) and
with all three syllables in T3, [[T3T3] T3]: [[shuiguo] niao] ‘fruit-bird,’ the experimenter first
showed a picture of a bird that looks very happy when seeing cookies (Figure 6.1 (a)). As this
picture was shown to the child, the experimenter told the child, “Look at this bird. He’s so happy
to see the cookies. He loves eating cookies. Let’s call it a cookie-bird.” This was to model how
185

the compound noun was to be built. Then the experimenter showed another picture that also had
a bird in it ((Figure 6.1 (b)): A different bird which looks very happy when he sees fruits.) The
experimenter asked (pointing at the fruits), “What are these?” to make sure the child knew the
name of the item (shuiguo ‘fruit’). She continued to say (pointing to the bird), “He loves fruits,
so we call it …” The child is expected to build a compound noun for this test item. The
procedure is used for all the test items and the control items.
As done in three-syllable compound nouns, we showed each child pictures of animals and
objects in order to elicit novel four-syllable NPs. Similarly, the animals, objects, and adjectives
used are typically known to children. Novel four-syllable NPs with right-branching and mixedbranching structures were created with adjectives (size, color, etc.) and nouns. The experimenter
made sure that the child knew how to say the individual items before the intended NP was built.
In order to elicit the intended adjectives to be used, a pair of contrasting ideas was presented (e.g.,
a big elephant vs. a small elephant, a green frog vs. a white frog, and a tall bird vs. a short bird).
The procedure used for eliciting novel four-syllable NPs is similar to that for three-syllable
compounds. What is different is that there are more layers in the four-syllable NPs, and one layer
is elicited at a time, starting with the innermost unit. So for the right-branching structure
[σ[σ[σσ]]], we began with the last two syllables, followed by the second syllable, and finally,
adding the outermost layer, which is the first syllable. For a mixed-branching structure
[σ[[σσ]σ]], we began with the two middle syllables, followed by the last syllable, and finally,
adding the outermost layer, which is the first syllable. Sample materials used for four-syllable
NPs are in Figure 6.2.

186

Figure 6.2 Study 3: Sample materials in four-syllable NPs
Control or
Control
Test
Structure
[σ [σ [σσ]]]
a.
b.

Test

[xiao [hong [mianyang]]]
small red
sheep
‘(a) small red sheep’

[σ [[σσ] σ]]

[xiao [zi
[haima]]]
small purple seahorse
‘(a) small purple seahorse’

small red
3
2
3
2
c.

small purple seahorse
3
3
33
2
3
23
d.

sheep
22
22

UT
ST

[xiao [[chang bi] xiang]]
small long
trunk elephant
‘(a) small long-trunked elephant’

[xiao [[duan tui]
ma]]
small short leg
horse
‘(a) small short-legged horse’

small long
3
2
3
2

small short leg
3
3
3
3
2
2

UT
ST

trunk elephant
2
4
UT
2
4
ST

horse
3
UT
3
ST

6.3.1.5 Coding
Two native speakers transcribed the data and coded the answers. After the transcriptions were
completed, the test items were coded for statistical analysis. Numbers 1, 2, 3 and 4 were used in
transcribing the four lexical tones, T1, T2, T3, and T4, respectively. Data were coded in a way to
187

preserve the most available information in subjects’ responses. Our target answers in this study
are three or four syllables in compound nouns and NPs respectively. Children do not always give
the desired answers— they may miss a syllable or give an extra syllable, for example. An
underscore is used to indicate a missing syllable in the subject’s response. For extra words, they
were transcribed as said. The coding categories are in (20).
(20) Coding categories for data analysis:
a.

Included in the analysis: Correct or incorrect application of T3S without missing any
syllables and with syllables in the correct word order.

b.

Excluded in the analysis:

i.

No answer: Saying ‘I don’t know’ or being silent without giving an answer.

ii.

Non-target answers: Saying something else, such as adding additional word(s) or missing
word(s), which result in non-target answers.

iii.

Word order: Scrambling word orders, which did not fit the intended template of the N-N
compound or NPs.

iv.

Pauses: Pauses between two T3*.
For the analysis of children’s T3S application, the data were used only if the responses fit the

exact number of target words and in the desired order. Responses with additional words,
insufficient words, or wrong word orders were excluded from the analysis. This was because the
environment that was created to trigger T3S was altered, and the condition for T3S application
changed as a result. In short, all the data used for analyzing T3S were 100% correct in terms of
syntax.
T3S does not apply in the control items, so T3S application is irrelevant in the control items.
Sample answers for the test items in three-syllable compound nouns and four-syllable NPs and
188

how they fit in the coding categories are listed in Table 6.3.
Table 6.3 Study 3: Sample answers and their coding categories for analysis of T3S application
Three-syllable compound nouns (a – b) and Sample answers
Included Correct
four-syllable NPs (c – d)
(surface tones)
in the
or
analysis? incorrect
a. [[shuiguo] niao]
i. (223)
fruit
bird
‘fruit-bird’
ii. (233)
3 3
3
UT
iii. (232)
2 2
3
ST
iv. (222)
v. (323)
vi. (23_)
n/a
vii. nouns reversed
n/a
b. [shui
[laohu]]
i. (323)
34
water
tiger
‘water-tiger’
ii. (223)
3
3 3
UT
n/a
iii. (_23)
3
2 3
ST
n/a
iv. nouns reversed
c. [xiao
[zi
[haima]]]
i. (2323)
small
purple seahorse
ii. (3223)
‘(a) small purple seahorse’
iii. (3323)
iv. (3123)
small
purple seahorse
v. (3_23)
n/a
3
3
33
UT
vi. (_323)
n/a
2
3
23
ST
vii. (3)p(3)p(23)
n/a
(p = pause)
viii. Scrambling
n/a
errors
d. [xiao
[[duan tui]
ma]]
i. (3223)
small
short leg
horse
ii. (2223)
‘(a) small short-legged horse’
iii. (3323)
iv. (2323)
small
short leg
horse
v. (3_23)
n/a
3
3
3
3
UT
vi. (32_3)
n/a
3
2
2
3
ST
vii. (_2_3)
n/a
viii. (3)p(323)
n/a
ix. Scrambling
n/a
errors
(Note: Underscores refers to missing syllables. Correct or incorrect are based on the surface
patterns predicted by existing T3S models as well as whether or not such pattern is attested in
adults.)
34

Briefly mentioned in Section 2.3.1, the pattern (T2T2T3) may be grammatical to some native
speakers, but not others, though such pattern can be derived in the Word-and-Phrase level model.
This pattern was regarded as incorrect, mainly based on the adult production in this study. Adults
never produced this pattern.
189

For statistical analysis, various error patterns were further coded and placed under three basic
error categories in (21).
(21) Error categories
a.

Under-application (U): Not applying T3S to one or more syllable when needed. An
example for a response in the structure [T3 [[T3T3] T3]] is (*T3T3T2T3).

b.

Mis-application (M): This category includes ”over-application” and “wrong
applications.” Over-application refers to applying T3S to a syllable when not needed in
the final syllable position, such as the answer (*T2T2T2) for [[T3T3] T3]. Mis-application
refers to applying T3S to two adjacent syllables wrongly as we see in a response
(*T2T3T2T3) for the structure [T3 [[T3T3] T3]]. In this case, even though there are no
adjacent T3*, it is an ungrammatical pattern.

c.

Other (O): Errors that do not fit the descriptions of the previous two categories are placed
under the “Other” category. An example is *T4T2T3 for [T3 [T3T3]].

The first category “Under-application” captures the T3S errors made by subjects when they
failed to apply T3S when necessary. Unlike in flat structures, we found a pure “over-application”
error very rare. More specifically, over-application in the final syllable was very rare in
compound nouns and NPs. Therefore, such type of error was combined with the “Misapplication” category which is a common error type in children across age groups. Errors
categorized as “Mis-application” errors most often involved two syllables, with the application
reversed (apply T3S to the syllable that should not undergo T3S or not apply T3S to the syllable
that should undergo T3S.) The “Other” category is to categorize all other errors, such as
producing a tone other than T2 or T3 in the response. Incorrect answers in Table 6.3 are

190

extracted and used in Table 6.4 for presenting how errors were categorized in the three error
types for our error analyses.
Table 6.4 Study 3: Sample T3S errors and their coding categories for error analysis
Three-syllable compound nouns (a – b) and
Sample errors (only Under-application (U)
four-syllable NPs (c – d)
tonal information is Mis-application (M)
included here)
Other (O)
a. [[shuiguo] niao]
i.
(233)
U
fruit
bird
‘fruit-bird’
ii.
(232)
M
3 3
3
UT
iii. (222)
M
2 2
3
ST
iv. (323)
M
b. [shui
[laohu]]
water
tiger
‘water-tiger’
3
3 3
UT
3
2 3
ST
c. [xiao
[zi
[haima]]]
small
purple seahorse
‘(a) small purple seahorse’

i.
ii.

small
3
3

short
3
2

leg
3
2

U
M

i.
ii.
iii.

(3223)
(3323)
(3123)

M
U
O

i.
ii.

small
purple seahorse
3
3
33
UT
2
3
23
ST
d. [xiao
[[duan tui]
ma]]
small
short leg
horse
‘(a) small short-legged horse’

(333)
(223)

(3323)
(2323)

U
M

horse
3
UT
3
ST

6.3.2 Results
In this section, we first report the answers that were excluded from the analysis. Numbers of
excluded control items and test items in three-syllable compound nouns and four syllable NPs
are in Table 6.5 – Table 6.8.

191

Table 6.5 Study 3: Three-syllable compound nouns (Control items)— data excluded from the
analysis
[σ [σσ]]
[[σσ] σ]
No
Non- Diff. Pauses Excluded
No
Non- Diff. Pauses Excluded
answer target word
total
answer target word
total
order
order
3
7
7
4
0
18
2
5
0
0
7
4
2
7
11
0
20
1
3
0
0
4
5
1
6
4
0
11
1
5
0
0
6
6
1
7
1
0
9
0
1
0
0
1
A
0
2
0
0
2
0
0
0
0
0

Table 6.6 Study 3: Three-syllable compound nouns (Test items)— data excluded from the
analysis
[σ [σσ]]
[[σσ] σ]
No
Non- Diff. Pauses Excluded
No
Non- Diff. Pauses Excluded
answer target word
total
answer target word
total
order
order
3
7
12
5
0
24
1
7
0
0
8
4
0
4
12
0
16
1
5
0
0
6
5
2
6
3
0
11
0
2
0
0
2
6
0
5
1
0
6
0
0
1
0
1
A
0
0
0
0
0
0
0
0
0
0

Table 6.7 Study 3: Four-syllable compound nouns (Control items)— data excluded from the
analysis
Right-branching
Mixed-branching
No
Non- Diff. Pauses Excluded
No
Non- Diff. Pauses Excluded
answer target word
total
answer target word
total
order
order
3
3
11
0
0
14
6
25
0
0
31
4
0
9
0
0
9
4
15
0
0
19
5
4
1
0
0
5
3
9
2
0
14
6
1
0
0
0
1
0
4
2
0
6
A
0
0
0
0
0
0
0
0
0
0

192

Table 6.8 Study 3: Four-syllable compound nouns (Test items)— data excluded from the
analysis
Right-branching
Mixed-branching
No
Non- Diff. Pauses Excluded
No
Non- Diff. Pauses Excluded
total
answer target word
total
answer target word
order
order
3
3
9
0
2
14
6
24
1
2
33
4
0
5
0
0
5
2
20
1
0
23
5
1
2
0
2
5
1
15
1
6
23
6
0
1
0
0
1
0
12
2
1
15
A
0
0
0
0
0
0
0
0
0
0

Next, we turn to the results. We found that children across age groups did make reference to
syntax when applying T3S to compounds and NPs. Unlike adults, however, children showed less
consistency in cyclic T3S applications at the Word level. In what follows, the results for threesyllable compound nouns and four-syllable NPs are presented and discussed in separate
subsections.
6.3.2.1 Three-syllable compound nouns
The task for three-syllable compound nouns involves placing two nouns together to create a
novel compound noun. NP Cyclic Parsing Hypothesis (H1) was tested to see whether or not
children know to use cyclic bottom-up parsing strategy in compound nouns.
In the three-syllable compound nouns, adults and children across age groups had a 100%
correct rate in the control items (Table 6.9). This indicates that the task of building a noun-noun
compound by placing two nouns together was an easy task for children across age groups. For
the test items, adults still had a 100% correct rate.

193

Table 6.9 Study 3: Correct rate (%) in three-syllable compound nouns—
Control items (no T3S)
Structure
[σ [σσ]]
[[σσ] σ]
3-year-olds
100 (30/30)
100 (41/41)
4-year-olds
100 (20/20)
100 (36/36)
5-year-olds
100 (43/43)
100 (48/48)
6-year-olds
100 (37/37)
100 (45/45)
adults
100 (38/38)
100 (40/40)

We know that children did well in the control items which were without T3S applications.
Did they do as well in the test items which required T3S applications? Now we turn to the test
items in three-syllable compound nouns.
Table 6.10 Study 3: Correct rate (%) in three-syllable compound nouns— Test items (with T3S)
Structure
[σ [σσ]]
[[σσ] σ]
*(223)
(323)
(223)
*(323)
3-year-olds
4.17 (1/24)
91.67 (22/24)
92.50 (37/40)
5.56 (1/40)
4-year-olds
0 (0/24)
100 (24/24)
91.18 (31/34)
0 (0/34)
5-year-olds
0 (0/43)
100 (43/43)
96.15 (50/52)
0 (0/52)
6-year-olds
2.50 (1/40)
97.50 (39/40)
97.78 (44/45)
0 (0/45)
adults
0 (0/40)
100 (40/40)
100 (40/40)
0 (0/40)
In Table 6.10, the ungrammatical pattern *(T3T2T3) for the [[σσ] σ] is uncontroversial. As
mentioned earlier, some native speakers may find the surface pattern (T2T2T3) in the [σ [σσ]]
structure grammatical or acceptable, while others may consider it ungrammatical. This pattern
never surfaced in our adult data. For now, we treat this pattern as ungrammatical, mainly based
on the adult data. The pattern was rarely produced by children, and that may support the
possibility of children’s judgment of this pattern being ungrammatical.
Will the decision of treating (T2T2T3) as ungrammatical affect later analysis? There were
only two tokens of (T2T2T3) in the [σ [σσ]], occurred once in one 3-year-old and once in one 6year-old, so the effect of treating such pattern as grammatical or ungrammatical was minimal.
Either way, these two ago groups were adult-like and were not different from adults.

194

Figure 6.3 Study 3: Correct rates in three-syllable compound nouns by age

Three-syllable compound nouns
100
90
80
70
60
50
40
30
20
10
0

3
4
5
6
A

*223
323
[σ[σσ]]: [water [tiger]]
4.17
91.67
0
100
0
100
2.50
97.50
0
100

223
*323
[[σσ]σ]: [[fruit] bird]
92.50
2.50
91.18
0
96.15
0
97.78
0
100
0

As described in Section 6.3.1.5, errors fall under three categories: Under-application, Misapplication, and Other. As children did very well in three-syllable compound nouns, there were
few errors. All three error types, Under-application, Mis-application, and Other, were attested,
with Under-application being the most common (one token in the 3-year-old group, three tokens
in the 4-year-old group, one token each in 5- and 6-year-old groups), followed by Misapplication. The error type Other is rare. This error type Other is when a tone other than T2 or
T3 (so either T1 or T4) is used in the surface pattern.
Two-proportion z-test (α = .05, two-tailed) was conducted to determine if there is a
significant difference between any two age groups. The Null hypothesis is that there is no
difference between any two age groups. The results show that none of any two age groups are
significantly different from each other in the test items in the three-syllable compound nouns.
Since the p-values are greater than the significant level (0.05), the Null hypothesis that no
195

significant difference exist between any two age groups is confirmed ([[σσ] σ] compound
noun— 3 (3-year-olds) & A (adults): p = 0.0767, 4 (4-year-olds) & A: p = 0.1848, 5 (5-year-olds)
& A: p = 0.5933, 6 (6-year-olds) & A: p = 0.9522, 3 & 4: p = 0.8259, 3 & 5: p =0.7627, 3 & 6:
p = 0.5261, 4 & 5: p = 0.6227, 4 & 6: p = 0.4197, 5 & 6: p = 0.8997; [σ [σσ]] compound noun—
3 & A: p = 0.2661, 6 & A: p = 1.000, 3 & 4: p = 0.4703, 3 & 5: p = 0.2412, 3 & 6: p = 0.6477, 4
& 6: p = 0.7949, 5 & 6: p = 0.9713).
NP Cyclic Parsing Hypothesis (H1) is confirmed by the results. Children across age groups
and adults did differentiate the structural differences ([[σσ] σ] vs. [σ [σσ]]) and did apply T3S
cyclically. None of the child groups are significantly different from adults in their T3S
application in the three-syllable compounds with different internal structures. In Figure 6.3, we
see that children clearly are sensitive to the internal structure of the compound nouns and apply
T3S accordingly.
To summarize, children’s application of T3S in the three-syllable compound nouns was not
found to be significantly different from adults’. They referred to the internal structure when they
applied T3S.
6.3.2.2 Four-syllable NPs
Four-syllable R-branching NP ([σ[σ[σσ]]]) and M-branching NP ([σ[[σσ]σ]]) predict
different (T2T3T2T3) and (T3T2T2T3) respectively. Presumably, M-branching (more than one
branching direction) is more complicated than R-branching (one branching direction). First, we
tested the NP Cyclic Parsing Hypothesis (H1) to see whether or not children know to use cyclic
bottom-up parsing strategy in building novel NPs. Secondly, we tested the Strategy Shift
Hypothesis (H2) and see if children always use cyclic bottom-up parsing strategy, or if they may

196

ignore syntax and shift to the non-cyclic left-to-right parsing when the structure becomes more
complicated.
In four-syllable NPs, all age groups had a 100% correct rate in the control items in both rightbranching (R-branching) and mixed-branching (M-branching) structures (Table 6.11). The fact
that children did perfectly in the control items shows that they could build the structure and
produce the novel NPs effortlessly when T3S was not involved.
Table 6.11 Study 3: Correct rate (%) in four-syllable NPs— Control items (no T3S)
Structure
Right-branching
Mixed-branching
3-year-olds
100 (34/34)
100 (17/17)
4-year-olds
100 (31/31)
100 (21/21)
5-year-olds
100 (49/49)
100 (40/40)
6-year-olds
100 (45/45)
100 (40/40)
adults
100 (40/40)
100 (40/40)

Figure 6.4 shows that when T3S is required, we see a very different picture. In the test items,
which have T3S, the correct rates dropped slightly in the R-branching structure, and more
dramatically in the M-branching structure. Even adults did not reach the 100% correct rate. The
distribution of different T3S patterns in subjects’ responses is in Table 6.12, with the primary
pattern in bold type.

197

Figure 6.4 Study 3: Correct rates in four-syllable NPs by age

Correct rates in four-syllable NPs
100
90
80
70
60
50
40
30
20
10
0

3
4
5
6
A

control
test
Right-branching
100
50
100
57.14
100
85.71
100
84.44
100
92.50

control
test
Mixed-branching
100
53.33
100
58.82
100
64.52
100
54.84
100
77.50

Table 6.12 Study 3: Correct rate (%) in four-syllable NPs— Test items (with T3S)
Structure
Right-branching
Mixed-branching
(2323)
*(3223)
(2223)
*(2323)
(3223)
3-year-olds
11.76
0
20
50.00
46.67
(17/34)
(4/34)
(0/34)
(3/15)
(7/15)
4-year-olds
5.71
0
23.53
57.14
52.94
(20/35)
(2/35)
(0/35)
(4/17)
(9/17)
5-year-olds
0
0
25.81
85.71
41.94
(42/49)
(0/49)
(0/49)
(8/31)
(13/31)
6-year-olds
6.67
0
38.71
84.44
48.39
(38/45)
(3/45)
(0/45)
(12/31)
(15/31)
adults
0
0
12.50
92.50
77.50
(37/40)
(0/40)
(0/40)
(5/40)
(31/40)

(2223)
6.67
(1/15)
5.88
(1/17)
22.58
(7/31)
6.45
(2/31)
0
(0/40)

Logistic regression analyses were conducted for the responses in four-syllable test items. The
results show that the independent variables age as well as structural branching as a set are
significant (chi square = 202.819, p < .001 with df = 15).

198

Pattern (2323): Grammatical for R-branching, ungrammatical for M-branching
The use of pattern (2323) relative to errors in R-branching NPs and that in M-branching NPs
is significantly different (Odds Ratio (OR) = 2.567, p = .013). The OR value indicates that an Rbranching NP is about 2.5 times more likely to have this pattern than an M-branching structure.
For the surface pattern (2323) relative to errors, both 3- and 4-year-olds (3-year-olds: OR
= .168, p = .001; 4-year-olds: OR = .222, p = .005) are significantly different from adults, but 5and 6-year-olds are not (5-year-olds: OR = .802, p = .684; 6-year-olds: OR = 1.382, p = .590).
Pattern (3223): Grammatical for M-branching, ungrammatical for R-branching
The use of pattern (3223) relative to errors in R-branching NPs and that in M-branching NPs
is also significantly different (OR = .058, p < .001). The OR value indicates that an M-branching
NP is about 17 times more likely to have this pattern than an R-branching structure.
For the surface pattern (3223) relative to errors, again both 3- and 4-year-olds (3-year-olds:
OR = .261, p = .037; 4-year-olds: OR = .233, p = .023) are significantly different from adults, but
5- and 6-year-olds are not (5-year-olds: OR = .328, p = .080; 6-year-olds: OR = .687, p = .575).
Pattern (2223): Grammatical for M-branching and R-branching
The use of pattern (2223) relative to errors in R-branching NPs and that in M-branching NPs
are not significantly different (OR = 8.728E-10, p = .998). This pattern surfaced only in Mbranching NPs, and only in children.
For the surface pattern (2223) relative to errors, 3-, 4-, and 5-year-olds (3-year-olds: OR =
1.214E8, p < .001; 4-year-olds: OR = 1.032E8, p < .001; 5-year-olds: OR = 8.100E8, p < .001)
are significantly different from adults.
One of our research questions was whether or not children are sensitive to structural
differences in their T3S application. Both 5- and 6-year-olds are adult-like in distinguishing two
199

surface patterns (2323) and (3223) according to the internal structures, but 3- and 4-year-olds are
not. For four-syllable NPs, Cyclic Parsing in NPs Hypotheses (H1) is confirmed in older children
(5- and 6-year-olds), but rejected in younger children (3- and 4-year-olds).
Error types in four-syllable NPs
We now turn to the T3S errors. As shown in Table 6.13, the most common error type is
Under-application in the R-branching and Mis-application in the M-branching NPs.
Table 6.13 Study 3: Error types (%) in four-syllable NPs
Structure
Right-branching
Mixed-branching
UnderMisOther
UnderMisapplication application
application application
3-year-olds
11.76
8.82
20
29.41
26.67
(10/34)
(4/34)
(3/34)
(4/15)
(3/15)
4-year-olds
5.71
5.71
17.65
31.43
23.53
(11/35)
(2/35)
(2/35)
(3/17)
(4/17)
5-year-olds
0
2.04
9.68
12.24
25.81
(6/49)
(0/49)
(1/49)
(3/31)
(8/31)
6-year-olds
4.44
2.22
6.45
8.89
38.71
(2/45)
(4/45)
(1/45)
(2/31)
(12/31)
adults
0
0
10
7.50
12.50
(3/40)
(0/40)
(0/40)
(4/40)
(5/40)

Other
0
(0/15)
0
(0/17)
0
(0/31)
0
(0/31)
0
(0/40)

For R-branching NPs, all three error types, Under-application, Mis-application, and Others
were attested in children whereas only Under-application was found in adults (Figure 6.5).

200

Figure 6.5 Study 3: Errors in four-syllable right-branching NPs by age

Errors in four-syllable right-branching NPs
(small purple seahorse)
100
90
80
70
60
50
40
30
20
10
0
other
mis-application
under-application

3
8.82
11.76
29.41

4
5.71
5.71
31.43

5
2.04
0
12.24

6
2.22
8.89
4.44

A
0
0
7.50

The error rates decrease by age, and even adults had under-application errors. While 3- and
4-year-olds’ error rates are high between 40% and 50%, by age five or six, they drop to about
15%. The error type Other is attested in all child groups in the R-branching NPs, but not in the
M-branching NPs. For M-branching NPs, only the Under-application and Mis-application error
types were attested in all children and adults (Figure 6.6).

201

Figure 6.6 Study 3: Errors in four-syllable mixed-branching NPs by age

Errors in four-syllable mixed-branching NPs
(Small short-legged horse)
100
90
80
70
60
50
40
30
20
10
0
other
mis-application
under-application

3
0
20
26.67

4
0
23.53
17.65

5
0
25.81
9.68

6
0
38.71
6.45

A
0
12.50
10

The overall error rates are higher in the M-branching than the R-branching NPs across age
groups. Children’s under-application errors decrease by age. Mis-application errors, however,
increase by age. All the mis-application errors were the (T2T3T2T3) pattern. Unlike children,
adults have about the same proportion for both error types.
A logistic regression analysis was performed for the errors found in four-syllable test items.
The results show that the independent variables age and branching structure as a set are
statistically significant (chi square = 73.794, p < .001 with df = 15). This shows that the
independent variables as a set reliably distinguished among the error patterns.
Under-application errors
For the error type Under-application relative to correct responses in the four-syllable NPs,
the branching structure of the NP was not found to be significant (OR = .926, p = .827). This
indicates that Under-application errors did not occur more because of one or the other structural

202

branching type. The OR value indicates that such error type occur with about the same frequency
in R-branching and in M-branching NPs.
For the error type Under-application relative to correct responses, both 3- and 4-year-olds (3year-olds: OR = 5.534, p = .001; 4-year-olds: OR = 4.602, p = .003) are significantly different
from adults, but 5- and 6-year-olds are not (5-year-olds: OR = 1.424, p = .509; 6-year-olds: OR
= .715, p = .609).
Mis-application errors
For the error type Mis-application relative to correct responses in the four-syllable NPs, the
branching structure of the NP was found to be significant (OR = .147, p < .001) . The OR value
indicates that such error type is roughly seven times more likely to occur in the M-branching NP
than in the R-branching NP.
For the error type Under-application relative to correct responses, 3-, 4-, and 6-year-olds (3year-olds: OR = 5.780, p = .008; 4-year-olds: OR = 3.872, p = .043; 6-year-olds: OR = 5.219, p
= .003 ) are significantly different from adults, but 5-year-olds are not (5-year-olds: OR = 2.189,
p = .202).
Other errors
The error type that is categorized as Other refers to using a tone other than T2 or T3 in the
surface pattern. For the error type Other relative to correct responses in the four-syllable NPs,
there were only seven tokens in the R-branching NPs and none in the M-branching NPs. The
frequency of this error type found in children are extremely low, and none of the child groups are
found to be significantly different from adults (3-year-olds: OR = 1.817E8, p = .997; 4-year-olds:
OR = 1.049E8, p = .997; 5-year-olds: OR = 2.788E7, p = .998; 6-year-olds: OR = 3.013E7, p
= .998).
203

One of the two hypotheses was the Strategy Shift Hypothesis (H2), testing whether children
always used the cyclic bottom-up parsing strategy regardless of structural complexity, or if they
may shift to the non-cyclic left-to-right parsing when the structure becomes more complex.
Simply put, do children make T3S errors which are based on the non-cyclic left-to-right parsing
in the M-branching NP ([σ[[σσ]σ]])? We found that mis-application errors occurred more in Mbranching NPs than in R-branching NPs, indicating that the more complicated structure is the
source of the shift of strategies. In addition, all the mis-application errors in the M-branching
NPs were of the (T2T3T2T3) pattern, a left-to-right non-cyclic parsing strategy. The Strategy
Shift Hypothesis (H2) is confirmed in all children except for the 5-year-olds who were not
statistically different from adults. In terms of under-application, both 5- and 6-year-olds are
adult-like, but 3- and 4-year-olds are not.
6.3.3 Discussion
In this section, T3S applications in three-syllable compound nouns and four-syllable NPs will
be discussed separately.
6.3.3.1 Three-syllable compound nouns
All child groups were adult-like in [σ [σσ]] and [[σσ]σ] compound nouns. Children do refer
to the morphosyntactic structure of the novel compound nouns in their T3S application. They
apply T3S cyclically in the three-syllable compound nouns.
For the pattern (T2T2T3) in the [σ [σσ]] compound noun, we know that it could be derived
in the Word-and-Phrase level model and that some speakers find it acceptable while others do
not. If such pattern is derived through the larger domain parsing, we would expect to find the
pattern in some adults, given that they are more likely to process a larger load of information at a
time, hence, parse all three syllables in one domain. In fact, not only did adults never use the

204

pattern, almost all children did not, either. The only two tokens of the (T2T2T3) pattern were
found in a 3-year-old and a 6-year-old. The fact that children and adults in this study rarely used
the larger domain parsing, even in a relatively small domain of three syllables, may suggest that
the optional larger domain parsing does not apply in certain contexts, such as in the case of [σ
[σσ]]. It appears that for the [T3 [T3T3]] structure, once the inner constituent has undergo T3S,
no further T3S needs to apply in the first syllable, situated in the outer layer of the structure.
Future studies can further investigate this issue.
6.3.3.2 Four-syllable NPs
First of all, children across age groups all had 100% rate in the control items (which are
without T3S) in the R-branching and in the M-branching NPs. The overall error rates in the test
items are higher for the M-branching than the R-branching NPs, indicating that T3S was more
difficult for them in the M-branching structure.
Children’s under-application errors decrease with age in both R-branching and in Mbranching, showing a developmental trend. The decrease in under-application points to more
awareness of applying T3S when necessary, and therefore, less under-application errors.
Mis-application errors did not show a clear trend in the R-branching NP, but such errors
increase with age in the M-branching NP. All the mis-application errors in M-branching NPs
were the *(T2T3T2T3) pattern, which was the result of the prosodic left-to-right binary parsing.
Such a strategy possibly was used because the cyclic parsing in the M-branching NP had become
a little too complicated for the children. In other words, when faced with complicated structures,
children’s ability to apply T3S cyclically in the NP may be weakened and they may ignore the
internal structure and default to the non-cyclic prosodic parsing from left to right. Their choice of
switching to the default binary parse, comparing to not applying T3S at all, may suggest that they
205

would rather apply T3S wrong than not apply it at all. Namely, in their grammar, non-cyclic T3S
application (even if it is incorrect) is better than not applying T3S at all. Interestingly, some
adults had this incorrect binary parsing (12.50%) in the M-branching NP as well. It is not clear
why adults did that. One possibility is that, the M-Branching NP items were items at the end of
the experiment, so that they might have been tired and did not pay much attention, which could
be true for children as well.
For NPs that are different in the internal structure but the same in syllable number, the noncyclic parsing strategy results in the same surface patterns, while the cyclic parsing strategy
results in different surface patterns. The fact that (T2T3T2T3) and (T3T2T2T3), the predicted
patterns for the R-branching and the M-branching NPs respectively, were the most frequent
patterns for the two structures within each age group indicates that they did refer to syntax and
apply T3S cyclically, although they are still maturing in consistent cyclic T3S application in NPs.
A majority T3S errors in the M-branching items were *(T2T3T2T3), a non-cyclic left to right
parsing in the structure when the structure [σ[]σσ]σ]] predicts (T3T2T2T3). This indicates that
even though children have the cyclic parsing strategy, they may shift to the non-cyclic parsing
strategy when the syntactic structure becomes more complicated.
The larger domain parsing (T2T2T2T3) was predicted for the M-branching and the Rbranching NPs, and in our experimental data, while the pattern was found in the M-branching
NPs, it was not found in the R-branching NPs. In addition, in the M-branching NPs, all child
groups have the (T2T2T2T3) pattern, but not adults, though adults were presumably better in
processing a larger domain. The larger domain parsing (T2T2T3) in the [σ[σσ]] compound
nouns was not attested in adults (and only found in two children out of 94 children). The
disfavored larger domain parsing in the three-syllable [σ[σσ]] compound noun and the four-

206

syllable M-branching NP [σ[]σσ]σ]] may suggest that for compound nouns and NPs, speakers
prefer the cyclic pattern which in a way preserves the internal structure of the compound nouns
or NPs.
6.4 Conclusions
Our research questions for this experiment were:
1.

Do children know to apply T3S cyclically in NPs?

2.

Does structural complexity affect parsing strategies they use?
Cyclic T3S application in three-syllable compound nouns ([σ[σσ]] and [[σσ]σ]) and four-

syllable NPs (R-branching [σ[σ[σσ]]] and M-branching [σ[[σσ]σ]]) were used to test two
hypotheses, the NP Cyclic Parsing Hypothesis and the Strategy Shift Hypothesis. Our results
show that children refer to syntax in building prosodic domains and apply T3S cyclically. NP
Cyclic Parsing Hypothesis is confirmed.
We also found that when the structure became more complicated as in the M-branching
structure, children sometimes shifted to the non-cyclic binary parsing from left to right without
referring to the syntactic structure. R-branching NPs were not difficult for children, but many of
the children had trouble in the M-branching NPs. All the mis-application errors in the Mbranching NPs ([σ[[σσ]σ]]) were *T2T3T2T3, resulting from a non-cyclic parsing strategy. We
know that cyclic application would give T3T2T2T3. This confirms the Strategy Shift Hypothesis.
Children across age groups were adult-like in their T3S application in the three-syllable
compound nouns. In the four-syllable NPs, R-branching was easier than M-branching. T3S
application in four-syllable M-branching structure was the most challenging task across age
groups. In short, although children show evidence of cyclic parsing in compound nouns and NPs,
they might shift to the non-cyclic parsing when the structure increases in complexity.

207

CHAPTER 7
T3S IN SENTENCES
7.0 Introduction
Previous studies suggest early acquisition of T3S (Jeng 1979; Jeng 1985; Li & Thompson
1977; Zhu 2002; Zhu & Dodd 2000). The previous three chapters present a Natural Speech study
as well as two experimental studies which tested non-cyclic and cyclic parsing strategies in flat
structures and in NPs respectively. In this chapter, we test T3S application at the sentence level
where an integration of cyclic and non-cyclic strategies is required. We test specifically the
factors that potentially affect the application of T3S, including the length of the sentence, the
number of adjacent T3* embedded in the sentence, the noun-pronoun distinction, and the
syntactic differences in the sentences. How the participants, children and adults, parse syllables
into prosodic domains for T3S application in sentences will provide valuable information on
children’s acquisition of T3S and T3S theories. In this chapter, two repetition studies are
presented.
Both Study 4 and Study 5 are elicited production studies. The composition of the
experimental sentences in Study 4 and that in Study 5 are identical. The only difference is in the
audio recordings of the stimuli, one with tonal manipulation and one without. In Study 4 Natural
Speech Repetition (henceforth NSR), the audio recordings were of natural speech (one of the
surface patterns was used for the recordings) whereas in Study 5 Robot Talk Repetition
(henceforth RTR), manipulated speech which had the T3S effect removed was used (more detail
will be presented in Section 7.3.2.4). Simply put, NSR is with T3S and RTR is without T3S. In
what follows, research questions, hypotheses and predictions for both studies will be presented.

208

7.1 Research questions, hypotheses and predictions for both studies
7.1.1 Research questions
In Study 4 NSR and Study 5 RTR, specific research questions regarding T3S application at
the sentence level take into consideration the complexity of sentences, the matches and
mismatches between syntax and prosody and properties of the subject.
Complexity: Does complexity—the number of adjacent T3*, the length of sentences (total
number of syllables)— play a role in T3S application?
The mapping of syntax and prosody: Even though foot-building in T3S application depends
heavily on syntax, we know that T3S domains are not isomorphic to syntactic domains. What
role does the alignment of syntax and prosody play in T3S application? How does mapping
between syntax and prosody affect T3S application in children and adults?
The distinction between subject pronoun (pro) and subject NP: Do pronouns behave
differently from NPs in T3S application, and more specifically, in the subject position? Previous
studies suggest that functional words, including prepositions, object pronouns, classifiers, are
prosodically weak and can cliticize to a preceding syllable (Chen 2000:400-403; Lin 2007:216;
Shih 1986; Shih 1997; Zhang 1997:307-308).
7.1.2 Hypotheses and predictions for both studies
Two experiments were designed in a way that the following two questions can be answered.
35

1.

Are children able to repeat

when they hear a sentence where T3S has been applied?

2.

Are children able to actively apply T3S when they hear a sentence where T3S has not

35

There is always the possibility that older children especially repeat without knowing the rule
as a frozen chunk. However, most studies (e.g. Brown and Bellugi, 1964, McDaniel et al., 1998)
show that it is very hard to repeat without using grammar.
209

been applied?
Study 4 NSR tests the repetition of sentences where T3S is correctly applied. Study 5 RTR
tests application of T3S in the repetition of sentences where the T3S effect was removed. In
other words, Study 4 NSR tests “passive” repetition and Study 5 RTR tests “active” application
of T3S. Both studies used elicited production data. As we can expect, the task in Study 4 will be
easier than that in Study 5 because the “work” of T3S application is done in the former, but not
the latter. Participants hear natural speech (with T3S) in Study 4 NSR, but unnatural speech
(without T3S) in Study 5 RTR where they need to work on the T3S application themselves.
We now turn to the specific hypotheses. First, applying T3S to two, five, or six consecutive
T3*, for instance, naturally requires a different workload. The same goes for the length of
sentences. The more consecutive T3* and/or the longer the sentence, the more complex the task
of T3S application. In H1, we define complexity in terms of number of adjacent T3* and total
number of syllables in the sentence. Our Null Hypothesis (H0) is that the number of T3* and
number of syllables will not affect T3S application.
(1)

Complexity Hypothesis (H1):

a.

Number of adjacent T3*: The more adjacent T3*, the more complex the task.

b.

Length of sentences: The more the syllables, the more complex the task.

Predictions:
a.

Everything else held constant, a sentence with no adjacent T3* (therefore, no T3S is
required) is easier than a sentence with some adjacent T3*, which in turn is easier than a
sentence with all the syllables in T3.

b.

A sentence with fewer syllables is easier than a sentence with more syllables.

210

Secondly, based on the Word-and-Phrase level Model, T3S domains are built with partial
reference to syntax (cyclic application at the Word level and non-cyclic application at the Phrase
level), and the T3S domains and the syntactic domains are not isomorphic (Chen 2000: Ch 9;
Shih 1986; Shih 1997). If there is only one prosodic domain for the whole sentence, there is no
issue with the syntax-prosody misalignment because the left and right edges of the syntactic
domain map perfectly to the left and right edges of the prosodic domain. However, if there are
more than two prosodic domains, they may or may not align with syntactic boundaries. Since
T3S partially depends on syntax, let us focus on the major syntactic boundary, the subjectpredicate boundary. Except for very short sentences, a typical sentence is broken into more than
one prosodic domain. We hypothesize that prosodic boundaries are prone to match the major
syntactic boundary, namely, the subject-predicate boundary. Our Null Hypothesis (H0) is that
there is no clear relationship between syntactic domains and prosodic domains (i.e. T3S
domains). We test not only whether or not a relationship between syntax and prosody exists in
T3S application, but also whether or not the alignment of syntax and prosody respects syntactic
boundaries.
(2)

Syntax-prosody Alignment Hypothesis (H2): Prosodic boundaries tend to match the major
syntactic boundaries.

Prediction: A parsing which results in a good match between syntax and prosody will occur more
frequently than a mismatch between syntax and prosody.
7.2 Study 4: NSR = Natural Speech (with Sandhi) Repetition
In this experiment, we test children in their repetition of structurally different sentences
where T3S have been applied. The goal is to see not only if children are able to repeat sentences
where T3S has been correctly applied, but also to find out whether or not they use the pattern
211

they hear. This is because there are multiple surface patterns, and they may hear a pattern, but
choose to repeat with their own pattern.
7.2.1 Method
7.2.1.1 Subjects
Thirty-two subjects participated in this study. Twenty-one children, age 4;0 – 6;8, were
recruited in Taichung, Taiwan. They were divided into two age groups, four-year-olds and sixyear-olds. In the six-year-old group, only two children (5;9 and 5;10 respectively) were under 6;0.
Eleven adults

36

participated in the study. All of them were native speakers of Mandarin Chinese

from Taiwan, studying at Michigan State University. Table 7.1 shows the distribution of the
subjects.
Table 7.1 Study 4: Distribution of the subjects
Age groups
N
Age range
4-year-olds
6-year-olds
adults

11
10
10

4;0 – 4;10
5;9 – 6;8

Mean
4;4
6;3

Standard
deviation
3.68 (mo.)
3.68 (mo.)

7.2.1.2 Procedure
All children were tested in a quiet classroom in the kindergarten or in the home of the child.
The adult subjects were tested at their homes or in a quiet room. The elicited production task
lasted approximately 10 minutes for adults, and 15 – 20 minutes for children. Each child was
presented a Robot and a beanie bear Xiaoli. The experimenter told the child that they were about
to play a game. She said, “Look, this is a Robot, and this is Bear Xiaoli. Robot says something to
36

An adult subject, after completing the experiment, mentioned to the experimenter that her
friends often commented that she “talked weird.” The data of this adult subject was excluded in
order to ensure, the best way we could, that adult data reflect the norm of adult speech. Her
scores in 4w4σ and 4w6σ control and test items were 100% correct. In the Pro-5w6σ and NP5w6σ control items, the answers were also 100% correct. Her scores for the Pro-5w6σ and NP5w6σ test items were 50% correct and 0% correct respectively.
212

Bear Xiaoli, but Bear Xiaoli cannot hear her. Could you help her by listening to what the Robot
is saying? After you hear it, you tell Xiaoli, okay? Do you want to play the game?” (See
Appendix E for Mandarin experimental prompts and materials.) The subjects heard the audio
recordings through headphones (Philips SHP2000) and saw accompanying pictures from a laptop
computer, and then, repeated what they heard.
Each PowerPoint slide consisted of the images of the Robot, Bear Xiaoli and/or other animals
doing various activities. The purpose of showing the images along with the audio recordings is to
assist children in their understanding of the recordings. All the images were obtained from
Google images (http://www.google.com/imghp?hl=en&tab=wi) and by photographing real
objects (Robot and Bear Xiaoli). All subjects’ responses were recorded on a Marantz PMD660
with an Audio-technica miniature clip-on microphone (AT831B Cardioid Condenser Lavalier
microphone). (A second digital recorder, a Sony ICD-P530F, was used in case of technical
problems.)
7.2.1.3 Design
An elicited repetition task (Crain & Thornton 2000; McDaniel et al. 1998) is used in this
experiment. The four structures tested in the study are in (1). Throughout the chapter, sentences
will be identified by the following labels for the four structures: 4w4σ (four words, four
syllables), 4w6σ (four words, 6 syllables), PRO-5w6σ (subject pronoun, 5 words, 6 syllables)
and NP-5w6σ (subject NP, 5 words, 6 syllables). Based on the Word-and-Phrase level Model,
the predicted patterns for each structure are listed in (3) (see Appendix G for derivations for test
items). In general, at the Word level (including simple nouns, compound nouns and NPs), T3S
applies cyclically, and at the Phrase level, T3S applies non-cyclically.

213

(3)

The four structures in the experimental sentences and T3S patterns expected

a.

4w4σ:
[Wo
I
3
3
3
(2
(2

b.

c.

i.

ii.

[xiang
want
3
3
3
3)
2

4w6σ:
[[haima] [xiang
seahorse want
33
3
(23)
3
(23)
(2
(22
2

[mai
buy
3
3
(2
(2
2

[bi]]]]
pen ‘I want to buy pens.’
3
UT
3
Word: no T3S
3) Phrase: disyllabic foot for the smallest domain, T3S
3) Phrase: Disyllabic foot for the remaining syllables, T3S; ST1
3) Larger domain in fast speech; ST2

[zhao
look for
3
3
3)
3)

Pro-5w6σ:
[Ni
[xiang [yang
you
want raise
3
3
3
3
3
3
3
3
3
(2
3)
3

[shuimu]]]]
jellyfish
33
(23)
(23)
(23)

[xiao
small
3
3
(3
(3

‘Seahorse wants to look for Jellyfish.’
UT
Word: two disyllabic feet, T3S
Phrase: disyllabic foot, T3S; ST1
Larger domain in fast speech; ST2

[laohu]]]]]
tiger
‘You want to have/raise (a) small tiger.’
33
UT
(23)
Word: T3S
23)
Word: Incorporation, no T3S
23)
Phrase: Disyllabic foot from left to right, T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3) (3
23)
Phrase: Incorporation, T3S; ST1
(2
2
2) (3
23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2 3
23)
Phrase: Incorporation, T3S; ST3

214

d.

i.

ii.

NP-5w6σ:
[Ma [xiang
horse want
3
3
3
3
3
3
(2
3)

[zhao
look for
3
3
3
3

[xiao
small
3
3
(3
(3

[haigou]]]]]
fur-seal ‘Horse wants to look for the small fur-seal.’
33
UT
(23)
Word: T3S
23)
Word: Incorporation, no T3S
23)
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3 23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3 23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2

3 23)

Phrase: Incorporation, T3S; ST3

4w4σ in (3a) and 4w6σ in (3b) are both composed of four words, but they differ in the total
number of syllables. We test whether or not the length of sentences has an effect by comparing
these two sentence groups. In addition, Syntax-Prosody Alignment hypothesis H2 will be tested
as well.
Pro-5w6σ and NP-5w6σ are used to test all the hypotheses except for H1b which concerns
the effect of sentence length (PRO-5w6σ and NP-5w6σ are of the same length (6 syllables long),
so this hypothesis cannot be tested). PRO-5w6σ in (3c) and NP-5w6σ in (3d) differ only in
whether or not it is a subject pronoun (henceforth subject pro) or a subject NP. We test whether
or not there is a difference in T3S application when the subject is an NP and when it is a pronoun.
As mentioned earlier, previous studies suggest that an object pronoun cliticizes leftwards to
the preceding syllable (Chen 2000; Lin 2007; Shih 1986; Shih 1997; Zhang 1997). What was left
unexplored in previous studies was whether or not subject pros behave similarly in cliticization.
There is a problem, however. Being in the initial position of the sentence, a subject pro cannot
cliticize leftwards to the preceding syllable since nothing precedes it. It will be interesting to
215

know whether the subject pro stays as a degenerate foot in its own prosodic domain, or that it
joins the following syllable(s) in forming a prosodic domain even though there is a major
syntactic boundary between subject and predicate. Positioning pronouns in the subject position in
Study 4 and Study 5 allows us to test not only a pronoun’s well-known property of being
prosodically weak, but also its directionality of the cliticization which has not received much
attention.
What we also are interested in is the directionality of incorporation of an unparsed syllable.
An example in (3c) is repeated in (4).
(4)

i.

ii.

PRO-5w6σ:
[Ni
[xiang
you
want
3
3
3
3
3
3
(2
3)

[yang
raise
3
3
3
3

[xiao
small
3
3
(3
(3

[laohu]]]]]
tiger
33
(23)
23)
23)

‘You want to have/raise (a) small tiger.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3 23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3 23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2
3
23)
Phrase: Incorporation, T3S; ST3
In (4), at the Word level, the innermost constituent laohu ‘tiger’ is parsed, and T3S applies.

Xiao ‘small’ is then incorporated, and T3S does not apply. At the Phrase level, a disyllabic foot
is formed from left to right, and T3S applies. Now we have one remaining syllable that we need
to incorporate into a neighboring domain. We are faced with two choices: Incorporating
rightwards or incorporating leftwards. There is no specification of the directionality in the Wordand-Phrase Model, so let us take a closer look at the two options we have at this point.

216

a.

Incorporating leftwards: The unparsed syllable is incorporated into the disyllabic domain
obtained at the Phrase level (i.e. the first two syllables). There is no reference to syntax.

b.

Incorporating rightwards: The unparsed syllable is incorporated into the ternary domain
obtained at the Word level. There is reference to syntax.
If Option a Incorporating leftwards is taken, the remaining unparsed syllable is incorporated

into the preceding domain, and T3S applies within it. We have ST1 (T2T2T3)(T3T2T3). If
Optional T3S across domain applies, we have ST2 (T2T2T2)(T3T2T3).
If Option b Incorporating rightwards is taken, the remaining unparsed syllable is
incorporated into the following domain, and T3S applies within it. We have ST3
(T2T3)(T2T3T2T3).
It should be noted that the subject-predicate pattern is also a possible T3S pattern. Let us look
at a sentence shown earlier in (3d), repeated here in (5).
(5)

i.

ii.

NP-5w6σ:
[Ma [xiang
horse want
3
3
3
3
3
3
(2
3)

[zhao
look for
3
3
3
3

[xiao [haigou]]]]]
small fur-seal
3
33
3
(23)
(3
23)
(3
23)

‘Horse want to look for the small fur-seal.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3
23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3
23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2
3
Alternative parsing:
(3) % (2
3)
↑
intonational break

(3

23)

Phrase: Incorporation, T3S; ST3

23)

ST4

217

In (5), at the Word level, the innermost constituents are parsed. T3S applies in the object
noun, but T3S is not applicable in the subject noun. Next, still at the Word level, the adjective
xiao ‘small’ is incorporated into its following domain, and T3S applies. At the Phrase level, a
disyllabic foot is formed from left to right for the first two syllables, and T3S applies, followed
by the incorporation of the unparsed syllable deng ‘wait for,’ and T3S applies again.
The additional pattern (T3)(T2T3)(T3T2T3) occurs when there is an intonational break (the
convention is to indicate it with “%”) and/or there is an emphasis/focus on the subject (Chen
2000: 379-380, 411-413). It is not clear whether or not there is an intonational break and/or if
there is an emphasis or focus on the subject in the data collected for Study 4 and Study 5. It is
also not clear if this additional pattern is specific to Taiwan Mandarin speakers. Future
investigation will be needed to answer the questions in this regard.
For a T3S application to occur, there has to be a T3S environment to trigger it. Whenever
there are adjacent T3*, the environment for triggering T3S application is potentially created.
Therefore, such environment plays a crucial role in our experimental design. The levels of
difficulty can vary depending on how many adjacent T3* there are in a sentence. Zero T3S
applications when there are no adjacent T3* is expected to be the easiest since it is without any
T3S workload. These are the control items which require no T3S applications. It should be noted
that control items sometimes contain a T3; however, it is not preceded or followed by another T3,
so T3S is never triggered.
For the test items, there are two conditions. The first condition is the T3S environment of
three adjacent T3*. The second condition is the T3S environment of maximal adjacent T3*
allowed for that sentence. The maximal number of T3* that is allowed for (3a) is four because

218

the sentence is four syllables long. Sentences (3b) – (3d) each has six syllables, so the maximal
number of T3* allowed for them is six.
For easy reference to the conditions in the test items (three T3* and maximal T3*) and as
well as the condition of “no adjacent T3*” the control items, we use the simplified terms in (6).
These labels will be used in the rest of the chapter.
(6)

Three conditions: No adjacent T3*, three T3*, and all T3*

a.

No adjacent T3*: No adjacent T3*, and therefore T3S is not applicable (the control items).

b.

Three T3*: Three adjacent T3* embedded in the sentences.

c.

All T3*: Maximal adjacent T3* allowed for that sentence (four adjacent T3* for (3a) and
six adjacent T3* for (3b) – (3d)).
Not only is the number of adjacent T3* important, the location of the T3-sequence needs to

be specified as well. For the control items, no specification is required as there are no adjacent
T3*. For the “maximal T3*” test items, we know that each syllable in that sentence bears a T3
underlyingly. In this case, all the syllables are in T3, so specification of where the T3-sequence is
located is unnecessary. In short, there is no confusion of the location of the T3-sequence in the
control items and in the test items where each syllable is in T3.
The specification of the location of T3* is necessary in cases where three T3* are embedded
in the sentence. We specify the location of T3* with bold type in (7).

219

(7)
a.

Location of adjacent T3* for the three-T3 condition (shown in bold type)
4w4σ:
[σpronoun[σwant[σverb[σNP]]]]
[Ni
[xiang [mai [hua]]]]
you
want buy flower ‘You want to buy flowers.’
3
3
3
1
UT
3
3
3
1
Word: no T3S
3
3
(3
1)
Phrase: disyllabic foot for the smallest domain, no T3S
(2
3)
(3
1)
Phrase: Disyllabic foot for the remaining syllables,
T3S; ST1
(2
2
3
1)
Larger domain in fast speech, ST2

b.

4w6σ:
[σσNP [σwant [σverb
[σσNP]]]]
[[banma] [xiang [zhao [xiongmao]]]]
zebra
want look for
panda bear
13
3
3
21
(13)
3
3
(21)
(13)
(2
3)
(21)
(12
2
3)
(21)

c.

PRO-5w6σ:
[σadj[σσNP]]]]]
[σpronoun [σwant[σverb
[Wo
[xiang [zhao
da [hema]]]]]
I want look for big hippo
3
3
3
4
23
3
3
3
4
(23)
3
3
3
(4 23)
(2
3)
3
(4 23)

‘Zebra wants to look for Panda bear.’
UT
Word: two disyllabic feet, T3S
Phrase: disyllabic foot, T3S; ST1
Larger domain in fast speech, T3S; ST2

‘I want to look for (a) big hippo.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
i. Leftward incorporation:
(2
2
3)
(4 23)
Phrase: Incorporation, T3S; ST1
ii. Rightward incorporation:
(2
3)
(3
(2
2)
(3

4
4

23)
23)

Phrase: Incorporation, T3S; ST2
Optional: T3S across domains; ST3

220

d.

NP-5w6σ:
[σNP [σwant
[Gou
[xiang
Dog
want
3
3
3
3
3
3
(2
3)

[σverb
[zhao
look for
3
3
3
3

[σadj
[da
big
4
4
(4
(4

[σσNP]]]]]
[xingxing]]]]]
gorilla
11
(11)
11)
11)

‘Dog wants to look for Gorilla.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
i. Leftward incorporation:
(2
2
3)
(4
11)
Phrase: Incorporation, T3S; ST1
ii. Rightward incorporation:
(2
3)
(3
(2
2)
(3

4
4

11)
11)

Phrase: Incorporation, T3S; ST2
Optional: T3S across domains; ST3

To summarize the experimental design, the control items are those that need no T3S
applications (no adjacent T3*). The test items have two conditions—three T3* and all T3*.
Table 7.2 lists the number of tokens of controls and tests for each sentence type.
Table 7.2 Study 4: Tokens for test and control items
Control items/test items
Conditions

Control
No
adjacent
T3*

Three
T3*

4-syllable sentences:
4w4σ: [σpronoun[σwant[σverb[σNP]]]]

2

6-syllable sentences:
4w6σ: [σσNP[σwant[σverb[[σσNP]]]]
PRO-5w6σ: [σpronoun[σwant[σverb[σAdj[σσNP]]]]]
NP-5w6σ: [σNP[σwant[σverb[σAdj[σσNP]]]]]

2
2
2

Test
All T3*
4 T3*

6 T3*

2

2

n/a

2
2
2

n/a
n/a
n/a

2
2
2

From Table 7.2, we can see that for each sentence structure, there are six tokens, with two
each for No adjacent T3*, Three-T3* and All T3*. There are 24 tokens in all (4 sentence types ×
6 tokens each sentence type).

221

In this experiment, we test whether or not children are able to repeat correctly the sentences
that have various numbers of T3S applications. Due to the T3S variability, more than one T3S
pattern is possible. Upon hearing a sentence in natural speech, are they able to repeat it? If so,
will they repeat the T3S pattern that they hear or will they repeat with a different surface pattern?
For the audio recording of these experimental sentences, we selected the pattern where more
T3* undergo T3S. By using the pattern that has more derived T2’s, rather than fewer derived
T2’s, we are testing to what extend children can retain the pattern that may or may not have been
acquired. We are more interested in how much the children can do (that is, their potential), rather
than how little children can do. For instance, in (8) below, we see various possibilities in the
surface form.
(8)

i.

ii.

[Ma
horse
3
3
3
(2

[xiang
want
3
3
3
3)

[zhao [xiao
look for small
3
3
3
3
3
(3
3
(3

[haigou]]]]]
fur-seal
‘Horse want to look for the small fur-seal.’
33
UT
(23)
Word: T3S
23)
Word: Incorporation, no T3S
23)
Phrase: Disyllabic foot from left to
right, T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3
23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3
23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2
3
Alternative parsing:
(3) % (2
3)
↑
intonational break

(3

23)

Phrase: Incorporation, T3S; ST3

23)

ST4

In (8), ST1 is the pattern derived first. Although both ST1 and ST2 are possible patterns, the
level of difficulty in producing them varies. Our reasoning is that, the more derived T2’s, the

222

more work (the higher demand). It follows that ST2 which has four derived T2s is more difficult
than ST1 and ST3 which has three derived T2s. ST4 which has two derived T2s is assumed to be
the least difficult. A female native speaker recorded the selected surface pattern for each sentence
with a professional digital recorder Marantz PMD660. These recorded sentences were put on the
PowerPoint slides where children will hear the sentences with accompanying pictures.
7.2.1.4 Materials
In the 24 experimental sentences, 16 are test items, and 8 are control items. A sample
sentence each for the control items (no adjacent T3*), three-T3 test items, and all-T3 test items
are in Figure 7.1 (a) - (c) respectively. The underlying tones and the surface pattern selected for
the audio recording are included in these sample sentences. (See the Appendix E for the full set
of 18 test items and 6 control items.)

223

Figure 7.1 Study 4: Sample materials
a. Sample control item— no adjacent T3*
Audio recording:
[Zhu
[xiang [bian
[lan [jingyu]]]]]
pig
want become blue whale
‘Pig wants to become (a) blue whale.’
pig
1
1

want
3
3

become blue
4
2
4
2

whale
12 UT
12 ST used

b. Sample test item— three T3*
Audio recording:
[Gou [xiang [zhao
[da [xingxing]]]]]
dog
want look for big gorilla
‘Dog wants to look for (the) big gorilla.’
dog
3
2

want look for big
3
3
4
2
3
4

gorilla
11
UT
11
ST used

c. Sample test item— all T3*
Audio recording:
[Ma
[xiang [zhao
[xiao [haigou]]]]]
horse want look for small fur-seal
‘Horse wants to look for (the) small fur-seal.’
horse
3
2

want
3
2

look for small fur-seal
3
3
33 UT
2
3
23 ST used

7.2.1.5 Coding
One native speaker transcribed the data and coded the answers. Numbers 1, 2, 3 and 4 were
used for the four lexical tones, T1, T2, T3, and T4 respectively. Data were coded in a way to
preserve the most available information in subjects’ responses. The coding categories are in (9).

224

(9)

Coding categories for data analysis:

a.

Included in the analysis:

i.

Correct application of T3S without missing any syllables (missing syllables are indicated
by underscores in the coding).

ii.

Incorrect application of T3* without missing any syllables

iii.

Answers with one or two missing syllables

b.

Excluded in the analysis:

i.

No answer: Saying “I don’t know” or being silent without giving an answer.

ii.

Non-target answers: Saying something else, such as adding additional words to the
sentences, replacing the name of an animal, which result in non-target answers.

iii.

Pauses: Pauses between two T3*.

iv.

Other: Missing three or more syllables.
When there are missing syllables in the subjects’ answers, it is often one or two syllables in

the medial position that are left out. Only when the number of missing syllables does not go
beyond two syllables are the answers included in the analysis. Answers with pauses between T3*
are excluded from the analysis because a pause destroys the T3S environments created. Sample
answers and how they fit in the coding categories are in Table 7.3.

225

Table 7.3 Study 4: Sample answers and their coding categories for data analysis
Sample sentence:
[Wo [xiang [mai bi]]]
I
want buy pen
‘I want to buy pens.’
3
3
3
3
UT
(2
3)
(2
3)
ST1
(2
2
2
3)
ST2
The pattern the subjects heard.
Sample answers
(Only tones are listed.)
2
3
2
3
2
2
2
3
3
3
2
3
2
3
3
3
2
_
2
3
_
_
2
3
_
_
_
3
3 pause 3 2
3
Silence or saying “I don’t know.”
Saying something else:
[ta
[mai [zhege]]]
he
buy this
‘He buys this.”
1
3
45

Include in the analysis?

Correct or
incorrect

Yes
Yes
Yes
Yes
Yes
Yes
No
No

n/a
n/a

No
No

n/a
n/a

After the responses were coded as shown in Table 7.3, a second step of coding was needed
for all the correct answers. We know that when a participant’s answer is correct, there are
different ways for it to be correct since there are multiple surface patterns. For the purpose of
analyzing correct patterns produced by the participants, all the correct responses were further
categorized.
It should be noted that we categorize the correct responses in terms of the type of feet
shown in the surface patterns. For instance, for a six-T3 sequence, the pattern
(T2T3)(T2T3T2T3) (see rightward incorporation in (3c) and (3d)) is identified as an alternating
pattern (alternating between T2 and T3), and (T2T2T3)(T3T2T3) a ternary pattern. Since a pure
non-cyclic strategy that produces (T2T3)(T2T3)(T2T3) is the same as the predicted pattern
(T2T3)(T2T3T2T3) through rightward incorporation, and we do not know which parsing the
226

speakers use, such sequence alternating between T2 and T3 will be labeled as the “alternating
pattern.” It should be emphasized that the “alternating’ and ‘ternary’ patterns here do not mean
that the string of syllables are parsed non-cyclically from left to right in two-syllable or threesyllable domains. The definition of the categories used is in (10), followed by sample correct
responses and their corresponding coding categories in Table 7.4.
(10) Categories for correct responses and their definitions:
a.

The alternating pattern: The syllables alternate between T2 and T3, or are parsed in binary
feet.

b.

The ternary pattern: The pattern is composed of ternary feet.

c.

The opt rule pattern (the optional rule pattern): The pattern is derived through one of the
optional rules in the Word-and-Phrase level Model. That is, either through optional T3S
across prosodic domains, or through the larger domain parsing in fast speech.

d.

The subject-predicate pattern: The parsing of the subject is separated from the predicate.

For (10a), in the four-syllable sentences, the alternating pattern is T2T3T2T3, and in the sixsyllable sentences, it is T2T3T2T3T2T3. For (10b), the ternary pattern, it is relevant in the 6syllable sentences, but not in the 4-syllable sentence. In our 6-syllable sentences with a
monosyllabic NP or a monosyllabic pronoun (see the structure in (3c) and (3d)), the ternary
pattern always separates the verb and the object. The opt rule pattern in (10c) is derived when
T3S applies across two domains or through larger domain parsing in fast speech. The subjectpredicate pattern in (6d) is derived where there is an intonational break and/or there is an
emphasis or focus on the subject, according to Chen (2000:379-380, 411-413). Sample correct
answers and how they fit in each category in (10) are summarized in Table 7.4.

227

Table 7.4 Study 4: Sample correct responses and their coding categories
[Ni
[xiang [yang
[xiao [laoshu]]]]]
you
want
raise/have small mouse
‘You want to raise/have (a) small fur-seal.’

a.
b.
c.
d.
e.

you
3
2
(2
(2
(2
(3)
(3)

want
3
2
3)
2
2
(2
(2

raise/have
3
2
(2
3)
2)
3)
2)

small mouse
3
33 UT
3
23 ST used
3
23) ST1
(3
23) ST2
(3
23) ST3
(3
23) ST4
(3
23) ST5

Categories for correct responses
The alternating pattern
The ternary pattern
The opt rule pattern
The subject-predicate pattern
The subject-predicate pattern

7.2.2 Results and discussion for control items in NSR
Table 7.5 shows the number of items included and excluded in control items. Table 7.6
shows the number of the excluded control items by coding categories (see §7.2.1.5).
Table 7.5 Study 4: Number of items included (I) and excluded (E) in control items
Number of items
4-year-olds
6-year-olds
adults

I
20
20
20

4w4σ
E
total
2
22
0
20
0
20

I
19
20
20

4w6σ
E
3
0
0

total
22
20
20

PRO-5w6σ
I
E
total
17
5
22
19
1
20
19
1
20

NP-5w6σ
I
E total
16 6
22
20 0
20
20 0
20

Table 7.6 Study 4: Control items— data excluded from the analysis
4w4σ
4w6σ
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing Excluanswer target
sylladed
answer target
sylladed
bles
total
bles
total
4
0
2
0
0
2
0
3
0
0
3
6
0
0
0
0
0
0
0
0
0
0
A
0
0
0
0
0
0
0
0
0
0
No
answer
4
6
A

0
0
0

PRO-5w6σ
Non- Pauses Missing Exclutarget
sylladed
bles
total
5
0
0
5
1
0
0
1
1
0
0
1

228

No
answer
0
0
0

NP-5w6σ
Non- Pauses Missing Exclutarget
sylladed
bles
total
6
0
0
6
0
0
0
0
0
0
0
0

Table 7.7 shows the correct rates for control items.
Table 7.7 Study 4: Correct rates (%) in control items
4-year-olds
6-year-olds
adults

4w4σ
100 (20/20)
100 (20/20)
100 (20/20)

4w6σ
100 (19/19)
100 (20/20)
100 (20/20)

PRO-5w6σ
100 (17/17)
100 (19/19)
100 (19/19)

NP-5w6σ
81.25 (13/16)
100 (20/20)
100 (20/20)

Figure 7.2 shows the correct rates in the control items by age.
Figure 7.2 Study 4: Correct rates in the control items by age

Correct rates in control items
100
90
80
70
60
50
40
30
20
10
0
4-year-olds
6-year-olds
adults

4w4σ
100
100
100

4w6σ
100
100
100

PRO-5w6σ
100
100
100

NP-5w6σ
81.25
100
100

Recall that for the control items, there are no adjacent T3*, so no T3S is required. That is,
surface tones are the same as underlying tones. Unlike correct answers in the test items where
there are multiple T3S surface patterns, in the control items, it is simple. The mapping of the
underlying tones and the surface tones is one to one, rather than on to many.
The correct rates for four sentence types (4w4σ, 4w6σ, PRO-5w6σ, and NP-5w6σ) by age
group (4-year-olds, 6-year-olds, and adults) were calculated. Our Null hypothesis is that there is

229

no difference between any two age groups (i.e. 4-year-olds = 6-year-olds, 4-year-olds = adults,
and 6-year-olds = adults).
Overall, the correct rates for all the age groups in the controls of all sentence types (4w4σ,
4w6σ, PRO-5w6σ, and NP-5w6σ) are very high. Even the youngest group, 4-year-olds, did very
well except for a lower correct rate at 81.25% in NP-5w6σ items. Six-year-olds are adult-like in
having a 100% correct rate in all four sentence types. These results suggest that the length of the
sentences, the structures of the sentences were not beyond their capability.
The results show that all age groups did perfectly (100% correct) in three sentence types
(4w4σ, 4w6σ, and PRO-5w6σ). Adults and 6-year-olds did perfectly in NP-5w6σ, but 4-yearolds had a correct rate of 81.25%. Two-proportion z-test (α = .05, two-tailed) was used to
determine whether or not the difference was significant. Four-year-olds are not significantly
different from 6-year-olds (p = .156), and they are not significantly different from adults, either
(p = .156).
The task of repetition of these sentences in natural speech was an easy task for children. We
will see in the next section how children did in the test items where there is T3S effect.
7.2.3 Results and discussion for test items in NSR
Table 7.8 and Table 7.9 summarize by coding categories (see §7.2.1.5) the number of
excluded data from “4w4σ and 4w6σ test items” and “PRO-5w6σ and NP-5w6σ test items”
respectively.

230

Table 7.8 Study 4: Test items (4w4σ and 4w6σ)— data excluded from the analysis
4w4σ: three T3*
4w4σ: all T3*
No
Non- Pauses Missing Exclu
No
Non- Pauses Missing
answer target
sylla-ded answer target
syllables
total
bles
4
0
4
0
0
4
0
6
0
0
6
0
0
0
0
0
0
0
0
0
A
0
0
0
0
0
0
0
0
0
No
answer
4
6
A

0
0
0

4w6σ: three T3*
No
Non- Pauses Missing Exclu
sylla-ded answer
target
bles
total
4
1
0
5
0
0
1
0
1
0
0
0
0
0
0

4w6σ: all T3*
Non- Pauses Missing
syllatarget
bles
5
0
0
0
0
0
1
0
0

Exclu
-ded
total
6
0
0
Exclu
-ded
total
5
0
1

Table 7.9 Study 4: Test items (PRO-5w6σ and NP-5w6σ )— data excluded from the analysis
PRO-5w6σ: three T3*
PRO-5w6σ: all T3*
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing Exclusylladed
answer target
sylladed
answer target
bles
total
bles
total
4
0
4
0
0
4
0
5
0
0
5
6
0
1
0
0
1
0
0
0
0
0
A
0
0
0
0
0
0
0
0
0
0
NP-5w6σ: three T3*
NP-5w6σ: all T3*
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing
answer target
sylladed
answer target
syllables
total
bles
4
0
2
0
0
2
0
5
0
0
6
0
0
0
0
0
0
0
0
0
A
0
1
0
0
1
0
1
0
0

Excluded
total
5
0
1

In the next subsections, the statistical results and discussion for two sentence pairs, 4w4σ and
4w6σ and PRO-5w6σ and NP-5w6σ, will be presented separately.
7.2.3.1 Results for 4w4σ and 4w6σ items
Table 7.10 shows number of items by pattern in 4w4σ and 4w6σ test items and Figure 7.3
shows the correct rates and the distribution of T3S patterns in 4w4σ and 4w6σ sentences by age.

231

Table 7.10 Study 4: Number of items by pattern in 4w4σ and 4w6σ test items

4
6
adults

Subjectpredicate
pattern
0
1
0

4w4σ: three T3*
Alternating
Opt
pattern
Rule
pattern
0
18
0
19
0
20

total
18
20
20

4w4σ: all T3*
Subject- Alternating
Opt
predicate
pattern
Rule
pattern
pattern
0
12
3
0
12
8
0
9
11

total
16
20
20

4w6σ: three T3*
4w6σ: all T3*
Subject- Alternating
Opt
Subject- Alternating
Opt
predicate
pattern
Rule
predicate
pattern
Rule
pattern
pattern total
pattern
pattern total
4
0
8
6
17
0
6
4
17
6
0
10
8
19
1
12
4
20
adults
0
15
4
20
0
18
1
19
(4= 4-year-olds, 6= 6-year-olds; the numbers of individual patterns + number of errors = total)

232

Figure 7.3 Study 4: Correct rates in 4w4σ and 4w6σ sentences
a.
b.

4w4σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred
0
Alternating
0
Opt rule
100

4w4σ: all T3*
100
90
80
70
60
50
40
30
20
10
0

6-yearolds
5
0
95

4yearolds
Subj-pred
0
Alternating
75
Opt rule
18.75

adults
0
0
100

c.

6yearolds
0
60
40

adults
0
45
55

d.

4w6σ: all T3*

4w6σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Alternating 47.06
Opt rule
35.29

100
90
80
70
60
50
40
30
20
10
0

6-yearolds
52.63
42.11

4yearolds
Subj-pred
0
Alternating 35.29
Opt rule
23.53

adults
75
20

6yearolds
5
60
20

adults
0
94.74
5.26

For the opt rule pattern, the number of syllables matters. The frequency of the opt rule pattern
is higher in four-syllable sentences than in six-syllable sentences.
Interestingly, the alternating pattern was not attested in three-T3 4w4σ items (Figure 7.3 (a))
though it was attested in all-T3 4w4σ and three-T3 4w6σ and all-T3 4w6σ items (Figure 7.3 (b) –
(d)). The alternating pattern decreases with age in all-T3 4w4σ items (Figure 7.3 (b)), but
233

increases with age in all-T3 4w6σ items (Figure 7.3 (d)). The subject-predicate pattern was rarely
used in all three age groups. The ternary pattern was never used.
A logistic regression analysis was performed. The results show that the independent variables
(age, number of adjacent T3*, and number of syllables) as a set are statistically significant (chi
square = 120.288, p < .001 with df = 15). This indicates that the independent variables as a set
reliably distinguished among the response patterns.
The opt rule pattern
The Wald criterion shows that the number of T3* and the number of syllables are found to be
statistically significant (p < .001 for both) for the opt rule pattern relative to errors. Everything
else held constant, three-T3 items are more likely than all-T3 items (Odds Ratio (OR) = 14.180,
p < .001), and 4-syllable items more likely than 6-syllable items (OR = 99.982, p < .001) to have
the opt rule pattern. Four-year-olds are significantly different from adults (OR = .064, p = .015),
but 6-year-olds are not (OR = .311, p =.325). The OR values indicate that adults are roughly 16
times more likely than 4-year-olds and 3 times more likely than 6-year-olds to use the opt rule
pattern.
The alternating pattern
For the alternating pattern relative to errors, the number of syllables is statistically significant
(OR = 8.873, p = .044), and the OR value indicates that a four-syllable sentence is more likely to
have the alternating pattern than a six-syllable sentence. For the alternating pattern, 4-year-olds
are significantly different from adults (OR = .052, p = .006), and 6-year-olds are not significantly
different from adults (OR = .210, p = .174).

234

The subject-predicate pattern
For the subject-predicate pattern relative to errors, the number of T3* is not statistically
significant (OR = .338, p = 4.672), and neither is the number of syllables (OR = 27.412, p = .072).
7.2.3.2 Discussion for 4w4σ and 4w6σ items
First of all, regarding the number of T3*, in the 4w4σ sentences, children did equally well in
the three-T3 and all-T3 items. In the 4w6σ items, however, while children did fairly well in the
three-T3 items (4-year-olds: 82.35% and 6-year-olds: 97.74%), their correct rates dropped in the
all-T3 items (4-year-olds: 58.82% and 6-year-olds: 85%). Complexity Hypothesis (H1a) Number
of adjacent T3*— the more adjacent T3*, the more complex the task— is supported.
Regarding the length of sentences, both 4-year-olds and 6-year-olds did better in the 4syllable sentences than in the 6-syllable sentences. Complexity Hypothesis (H1b) Length of
sentences— the more the syllables, the more complex the task, is supported.
Regarding syntax-prosody alignment, it was predicted that parsing which results in a good
match between syntax and prosody will occur more frequently than a mismatch between syntax
and prosody. In the 4w4σ sentences, the use of the alternating pattern decreases with age. In the
4w6σ sentences, its use increases with age. It should be noted that in the latter, the alternating
pattern gives a good syntax-prosody mapping. The results support the Syntax-prosody Alignment
Hypothesis (H2)— Prosodic boundaries tend to match the major syntactic boundaries.
Participants did not always use the pattern they heard. The opt rule pattern was what the
participants heard in these test sentences. The percentage of the opt rule pattern was very likely
to be boosted to some extent by what they heard since participants could simply hear a pattern
and repeat it back. However, we also see that participants did not always use the pattern that they

235

heard. In fact, their own preference for a particular pattern was so strong in some test items that
the pattern they heard was produced by only a small percentage of people.
Since the subject-predicate pattern was rarely used (with only 5% in three-T3 4w4σ items in
6-year-olds and 5% in all-T3 4w6σ items in 6-year-olds), we will focus on the alternating pattern,
which was produced by some participants in all age groups in replacing the opt rule pattern that
they heard. Let us first look at the alternating pattern for three-T3 4w4σ and three-T3 4w6σ
sentences shown in (11). A T3 is indicated by “3” and a non-T3 is indicated by an X. In the
following derivations, all the patterns derived are possible. The boxed pattern is the pattern used
in the stimuli.
(11) The alternating pattern in 4w4σ and 4w6σ sentences— three T3*
a. 4w4σ: [σpronoun[σwant[σverb[σNP]]]]
3
3
3
X
UT
3
3
3
X
Word: not applicable
2
3
(3
X)
Phrase: disyllabic foot for the smallest domain,
no T3S
(2
3)
(3
X)
Phrase: disyllabic foot, T3S; ST1
(2
2
3
X)
Larger domain in fast speech; ST2
b. 4w6σ: [σσNP[σwant[σverb[[σσNP]]]]
X3
3
3
XX
UT
(X3) 3
3
(XX) Word: not applicable
(X3) (2
3)
(XX) Phrase: disyllabic foot, T3S; ST1
(X2 2
3)
(XX) Larger domain in fast speech; ST2
In (11a), no foot has been formed at the Word level, so at the Phrase level, a disyllabic foot is
formed for the smallest domain, and T3S does not apply. At the Phrase level, the subject pro
forms a binary foot with the following syllable. This prosodic domain crosses the subjectpredicate boundary. Notice that the prosodic constituent (first two syllables) is a non-constituent

236

syntactically. In addition, the alternating pattern

37

(T2T3)(T3Tx) has two adjacent T3*, although

it is a grammatical because the two T3* belong to different feet. Despite the first foot in the
alternating pattern being acceptable, when there is another option, speakers seemed to prefer a
pattern without any adjacent T3* over a pattern with adjacent T3* belonging to two domains.
ST2 is derived through larger domain parsing in fast speech. This pattern does not have any
adjacent T3* and participants favored it.
Now consider (11b), the subject NP and the object NP are parsed at the Word level, and the
auxiliary verb and the verb are parsed non-cyclically at the Phrase level. In this case, no prosodic
domain crosses over the subject-predicate boundary. There is no mismatch between syntax and
prosody at the major syntactic boundary. This may account for the fact that across three age
groups, the alternating pattern was used more than the opt rule pattern that they heard. Adults’
preference for the alternating pattern is evident (the alternating pattern: 75%, the opt rule pattern:
20%). The two child groups, 4- and 6-year-olds used the opt rule pattern more (4-year-olds:
35.29% and 6-year-olds: 42.11%) than adults. It is possible that children were more willingly
repeat something as heard, and adults subconsciously worked out the pattern that was “faithful”
to the pattern that they would normally use.
We now turn to the all-T3 items. In all-T3 4w4σ items, while the use of the alternating
pattern decreases with age (4-year-olds: 75%, 6-year-olds: 60%, and adults: 45%), the use of the
opt rule pattern increases with age (4-year-olds: 18.75%, 6-year-olds: 40%, and adults: 55%).
The alternating pattern has two evenly divided disyllabic feet, and is particularly favored by the
4-year-olds. On the other hand, adults used the opt rule pattern (55%) slightly more than the

37

The alternating pattern includes the pattern that alternates between T2 and T3 as well as the
pattern that consists of the bracketing of binary feet.
237

alternating pattern (45%). Six-year-olds’ use of the opt rule pattern (40%) and the alternating
pattern (60%) clearly show the trend of becoming less like 4-year-olds and more like adults.
In all-T3 4w6σ items, however, the use of the alternating pattern increases with age (4-yearolds: 35.29%, 6-year-olds: 60%, and adults: 94.74%). Only 5.26% of the adults repeated the
sentence with the opt rule pattern they heard, and the rest (94.74%) of the adults all used the
alternating pattern. Six-year-olds also showed a preference for the alternating parsing (60%),
compared to the two other patterns (the subject-predicate pattern: 5%, and the opt rule pattern:
20%). The bias toward the alternating pattern in 6-year-olds and adults in this case is very strong.
Six-year-olds and adults used fewer opt rule patterns in all-T3 4w6σ items than in all-T3 4w4σ
items (there is no clear difference in 3-year-olds). As the sentence grew longer, it was harder to
use the opt rule pattern.
7.2.3.3 Results for PRO-5w6σ and NP-5w6σ items
Table 7.11 shows number of items by pattern in PRO-5w6σ and NP-5w6σ test items.
Table 7.11 Study 4: Number of items by pattern in PRO-5w6σ and NP-5w6σ test items

4
6
adults

PRO-5w6σ: three T3*
Subj- Alter- Ternary Opt topred
nating pattern Rule tal
pattern pattern
patte
rn
0
17
0
0
18
0
19
0
0
19
0
19
0
0
20

PRO-5w6σ: all T3*
Subj- Alter- Ternary
Opt
pred
nating pattern
Rule
pattern pattern
pattern
0
0
0

3
6
5

4
5
5

10
6
7

total

17
20
20

NP-5w6σ: three T3*
NP-5w6σ: all T3*
Subj- AlterTerOpt toSubj- Alter- Ternary
Opt
topred
nating
nary
Rule tal
pred
nating pattern
Rule
tal
pattern pattern pattern patte
pattern pattern
pattern
rn
4
0
0
20
0
20
0
0
6
10
17
6
2
0
17
0
20
2
0
6
10
20
adults
3
0
16
0
19
5
0
9
5
19
(4= 4-year-olds, 6= 6-year-olds; the numbers of individual patterns + number of errors = total.)
238

Figure 7.4 Study 4: Correct rates in PRO-5w6σ and NP-5w6σ sentences
a.
b.

PRO-5w6σ: three T3*

PRO-5w6σ: all T3*

100
90
80
70
60
50
40
30
20
10
0

100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred
0
Ternary
94.44

6-yearolds
0
100

4-year- 6-yearolds
olds
Ternary
23.53
25
Alternating 17.65
30
Opt rule
58.82
30

adults
0
95

c.

adults
25
25
35

d.

NP-5w6σ: all T3*

NP-5w6σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred
0
Ternary
100

100
90
80
70
60
50
40
30
20
10
0

6-yearolds
10
85

4-yearolds
Subj-pred
0
Ternary
35.29
Opt rule
58.82

adults
15.79
84.21

6-yearolds
10
30
50

adults
26.32
47.37
26.32

Figure 7.4 shows the correct rates and the distribution of T3S patterns for PRO-5w6σ and
NP-5w6σ sentences by age. The subject-predicate pattern was used in NP-5w6σ sentences
(Figure 7.4 (c) and (d)), but not in PRO-5w6σ sentences (Figure 7.4 (a) and (b)). Also, 6-yearolds and adults used this pattern, but 4-year-olds did not. The subject-predicate pattern is used by
adults in all-T3 NP-5w6σ 26.32% of the time, compared to 0% in all-T3 PRO-5w6σ items. This
239

shows that that although the subject noun can stand alone in its domain, subject pro does not.
Notice also that 6-year-olds have 10% in all-T3 NP-5w6σ items. Although the percentage is
fairly low, their use of the subject-predicate pattern in NP-5w6σ items, but not in PRO-5w6σ
items, indicates that they are aware of the distinction between subject NPs and subject pros.
The alternating pattern surfaced only in all-T3 PRO-5w6σ sentences (Figure 7.4 (b)), and all
three age groups used this pattern. The ternary pattern occurred more in the three-T3 items
(Figure 7.4 (a) and (c)) than in all-T3 items (Figure 7.4 (b) and (d)). For the all-T3 items,
regardless of PRO-5w6σ or NP-5w6σ sentences, the younger the children, the more likely they
were to use the pattern they heard, the opt rule pattern.
A logistic regression analysis was conducted. The results show that the independent variables
(age, number of adjacent T3*, and subject NP vs. subject pronoun) as a set are statistically
significant (chi square = 180.806, p < .001 with df = 20).
The opt rule pattern
For the opt rule pattern relative to errors, the number of T3* is not statistically significant
(OR = 2.199E-9, p = .997), and neither is the noun-pronoun distinction (OR = .390, p = .211).
Four-year-olds and 6-year-olds are not significantly different from adults (4-year-olds: OR =
3.527, p = .187; 6-year-olds: OR = .942, p = 1.059).
The alternating pattern
For the alternating pattern relative to errors, the number of T3* is not statistically significant
(OR = 2.824E-9, p = .998), and neither is the noun-pronoun distinction (OR = 5.990, p = .150).
Four-year-olds and 6-year-olds are not significantly different from adults (4-year-olds: OR =
1.422, p = .760; 6-year-olds: OR = 1.326, p = .758).

240

The ternary pattern
For the ternary pattern relative to errors, the number of T3* is statistically significant (OR =
9.174, p = .002). The OR value indicates that, everything held constant, a three-T3 sentence is
roughly 9 times more likely to have the ternary pattern than a six-T3 sentence. Four-year-olds
and 6-year-olds are not significantly different from adults (4-year-olds: OR = 1.416, p = .708; 6year-olds: OR = .613, p = .510). The noun-pronoun distinction is not significant (OR = .282, p
= .084).
The subject-predicate pattern
For the subject-predicate pattern relative to errors, the number of T3* is not statistically
significant (OR = 2.997, p = .255), and neither is the noun-pronoun distinction (OR = 3.208E-10,
p = .998).
7.2.3.4 Discussion for PRO-5w6σ and NP-5w6σ items
For PRO-5w6σ and NP-5w6σ items, the number of T3* did not have much effect on
children’s correct rates (approximately 5% - 15% difference). Although overall, the correct rates
are slightly higher in three-T3 items than in all-T3 items, the difference is minimal and 4-yearolds actually did better in all-T3 PRO-5w6σ items than in three-T3 PRO-5w6σ. These results do
not provide strong evidence to support H1a— Number of adjacent T3*: The more adjacent T3*,
the more complex the task.
The use of the subject-predicate pattern in NP-5w6σ items in 6-year-olds and adults lends
support to the Syntax-prosody Alignment Hypothesis (H2)— Prosodic boundaries tend to match
the major syntactic boundaries. The subject-predicate pattern gives a pattern that have a good
mapping between syntax and prosody. However, it should be noted that the ternary pattern where
the prosodic domain crosses the subject-predicate boundary was used more frequently than the
241

subject-predicate pattern. Although a monosyllabic subject NP can stand alone, 6-year-olds and
adults prefer the ternary pattern to the subject-predicate pattern. While 6-year-olds used the
subject-predicate pattern, 4-year-olds never did. This may indicate that 6-year-olds are more
aware of the syntactic properties of the NP and/or that they are more sensitive to the subjectpredicate boundary than 4-year-olds.
Speakers did not always repeat the surface pattern they heard. In fact, a lot of time, children
and adults used another surface pattern. In all sentences except for three-T3 PRO-5w6σ items, at
least two surface patterns surfaced in speakers’ repetition. The variability in all-T3 PRO-5w6σ
and all-T3 NP-5w6σ are especially interesting. A sharp contrast exists in the use of the
alternating pattern for PRO-5w6σ (Figure 7.4 (b)), but the subject-predicate pattern for NP-5w6σ
(Figure 7.4 (d)).
The subject-predicate pattern
The subject-predicate pattern was used in NP-5w6σ items, but not in PRO-5w6σ items. Since
a subject NP can stand alone in its own domain, the subject-predicate pattern surfaced as a result.
In addition, 6-year-olds and adults used the subject-predicate pattern, but not 4-year-olds (three
T3*: 4-year-olds: 0%, 6-year-olds: 10%, and adults: 15.79%; all T3*: 4-year-olds: 0%, 6-yearolds: 10%, and adults: 26.32%). This may indicate that at age 4, the subject-predicate pattern for
sentences with subject NPs is still being acquired, or yet to be acquired. Six-year-olds used this
pattern only 10%. Comparing to the absence of this pattern in 4-year-olds, 6-year-olds seemed to
starting to acquire this pattern. A difference of 26.32% in adults’ use of patterns between the NP5w6σ items and the PRO-5w6σ items reveals that adults differentiate the prosodic difference
between monosyllabic nouns and pronouns, and therefore, produced the subject-predicate pattern
in sentences with a subject noun, but not a subject pronoun.

242

The alternating pattern
In PRO-5w6σ all-T3 items, the alternating pattern was attested in all three age groups. Fouryear-olds had this pattern in place, though they did not use it as much as 6-year-olds and adults
(4-year-olds: 17.65%, 6-year-olds: 30%, and adults: 25%). The two child groups were aware of
the noun-pronoun distinction. They also were fairly close to how frequently adults used the
alternating pattern, with 4-year-olds using it slightly less than adults, and 6-year-olds using it
slightly more than adults.
The ternary pattern
The ternary pattern occurred more in the three-T3 items than in all-T3 items, possibly
because there are other strategies (the alternating pattern and the opt rule pattern) available in the
latter. Across age groups, the ternary pattern occurs more in the NP-5w6σ all-T3 items (4-yearolds: 35.29%, 6-year-olds: 30%, and adults: 47.37%) than in the PRO-5w6σ all-T3 items (4year-olds: 23.53%, 6-year-olds: 25%, and adults: 25%). This is probably because that in the latter,
children and adults also used the alternating pattern, but not in the former.
The opt rule pattern
Echoing the finding of younger children perhaps were more willing to repeat the pattern they
heard, again, 4-year-olds were the ones who produced the opt rule pattern the most frequently.
This also shows that children did not have trouble repeating the sentences when the T3S is
applied correctly.
For the all-T3 items (both subject pro and subject NP)
Now if we shift our focus to 6-year-olds in their production in the all-T3 items, they were
moving towards the adult-like distribution of different T3S patterns. In all-T3 PRO-5w6σ items,
the distribution of three patterns in 6-year-olds is strikingly similar to that of adults (Figure 7.4
243

(b)). For all-T3 NP-5w6σ items, the developmental patterns is quite clear, showing that the opt
rule pattern decreases with age, but the subject-predicate pattern increases with age (Figure 7.4
(d)). Recall that the subject-predicate pattern was only attested in subject NP items, but not
subject pro items. Combine this fact with the fact that only 6-year-olds and adults had the
subject-predicate pattern, and that 6-year-olds’ use of this pattern was less than adults (and none
of the 4-year-olds used it), it may indicate that 6-year-olds were starting to restrict syntactically
the use of T3S across the major syntactic boundary in subject NP items (6-year-olds: 10% vs.
adults: 26.32%), but not in subject pro items.
To conclude, in Study 4 NSR, children generally did fairly well. Upon hearing a particular
pattern for a sentence, children and adults did not always use the pattern they heard. T3S
variability was evident in the results. This leads us to the next study, Study 5, where participants
had to apply T3S on their own— the underlying tones of the sentences are available, but not the
surface tones. We now turn to our final study, the most complicated and yet, the most attractive
and informative, in the series.

7.3 Study 5: RTR = Robot Talk (without sandhi) Repetition
7.3.1 Background
In Study 4 NSR where there were no tonal manipulations, we found that children, both 4year-olds and 6-year-olds, were able to repeat the sentences without too much difficulty. For
both children and adults, they did not always repeat the sentences with the same T3S pattern they
heard. Keeping this in mind, we ask the following question: what do children do when there is no
T3S at all in the input and they have to do the T3S applications themselves? What about adults?
Will the distribution of the T3S patterns in Study 5 RTR be similar to that in Study 4 NSR?

244

We are departing from mere repetition in Study 4, and now will take a closer look into
children’s T3S application in Study 5 from a source that only has underlying tones. All the
sentences in Study 5 were identical to those in Study 4. However, the T3S effect was removed in
Study 5. Although the speech without T3S effect can be understood,

38

it might sound unnatural

or even rather strange, like the unnaturalness found in how robots talk. That was why this
experiment was named “Robot Talk Repetition.” The tonal manipulation in RTR was to change
all the sandhi tones (derived T2s from T3* because of T3S rule) to their underlying tone, T3.
One way to think of it is that, we gave “processed” (T3S applied) sentences in Study 4, but
“unprocessed” (T3S never applied, as if T3S rule did not exist in the language) sentences in
Study 5. Will the children still be able to repeat the sentences? If they can still repeat the
sentences, will they use the T3S patterns attested in Study 4? What surface patterns will they use?
7.3.2 Method
An elicited repetition task (McDaniel et al. 1998; Crain and Thornton 2000) was used in
Study 5. All the sentences used in Study 4 were the same, except that in Study 5, there were tonal
manipulations (Detailed examples will be given in Section 7.3.2.4).
7.3.2.1 Subjects
Fifty-seven subjects participated in this study. Forty-three children, age 4;1 – 6;11, were
recruited in Taichung, Taiwan. They were divided into two age groups, four-year-olds and sixyear-olds. Fourteen adult subjects participated in the study. All of them were native speakers of

38

In my informal inquiry of friends’ experiences of acquiring T3S, a response I received from a
person who is currently an elementary school teacher shared that when teaching her pupils about
T3S, she sometimes removes T3S from her speech intentionally and had her students correct her
speech. That is, to have them apply T3S for her.
245

Mandarin Chinese from Taiwan. The participants in this study were different from those in Study
4. Table 7.12 shows the distribution of the participants.
Table 7.12 Study 5: Distribution of the subjects
Age group
N
Age range
4-year-olds
20
4;1 – 4;9
6-year-olds
23
6;0 – 6;11
Adults
14

Mean
4;6
6;6

Standard deviation
2.40 (mo.)
3.04 (mo.)

7.3.2.2 Procedure
All children were tested in a quiet classroom in the kindergarten or in the home of the child.
All the adult participants were tested in a quiet room or the home of the participant. The elicited
production task lasted approximately 10 minutes for adults, and 15 – 20 minutes for children.
Each child was presented a Robot and a beanie bear Xiaoli. The experimenter told the child that
they were about to play a game called “Robot Talk.” She said, “Look at this Robot. She talks
funny, and the bear doesn’t understand a word she says. The bear Xiaoli understands Child Talk
only, not the Robot Talk. Can you help her? Listen to the Robot Talk, and then tell Xiaoli what
she says, okay? Do you want to play the game?” The subjects heard recordings and saw
accompanying pictures from a laptop computer, and then, repeated what they heard.
The recording device and the headphone used in this experiment are the same as those in
Study 4. The images and pictures used in this experiment are identical to those in Study 4.
7.3.2.3 Design
The design of Study 5 is the same as that of Study 4 (See Section 7.2.1.3), except that the
T3S effect was removed in Study 5. The removal of the T3S effect is the only difference between
the two studies. All the underlying T3* surface as T3* without undergoing the T3S rule in Study
5. The manipulation of the tones will be presented in the following section.

246

7.3.2.4 Materials
There were 24 sentences, 16 test items and 8 control items. Sample sentences are in Figure
7.5. (See the Appendix F for the full set of 18 test items and 6 control items.)
Figure 7.5 Study 5: Sample materials
a. Sample control item—no adjacent T3*
[Ni
you

[xiang [mai
want buy

3
3

3
3

3
3

hua.]]]
flower ‘You want to buy
flowers.’
1
UT
1
Robot Talk

[xiang
want
3
3

[mai
buy
3
3

bi.]]]
pens ‘I want to buy pens.’
3
UT
3
Robot Talk

b. Sample test item—three adjacent T3*
[Wo
I
3
3

c. Sample test item—six adjacent T3*
[Ta
he

[xiang [kan
want read

1

3
3

39

1

4
1

39

shu.]]]
book ‘He wants to read
books.’
1
UT
1
Robot Talk

The tones in italics indicate the moderate manipulation on the tone (slight changes in shape
and/or height), rather than changing the original tone categorically as in the test items where a
derived T2 is completely changed to the underlying T3.
247

Since all the sandhi tones (derived T2 from T3 because of T3S rule) were changed to the
underlying T3, the Robot Talk was essentially a speech without the T3S rule. A female native
speaker’s natural speech of all 24 sentences was recorded on a professional digital recorder
Marantz PMD660. The software PRAAT (Boersma & Weenink 2009) was subsequently used in
the manipulation of the tones. The manipulation was to undo all the T2’s that are derived from
T3* because of the T3S rule. These derived T2’s are changed to their underlying tones, T3*.
Examples of two test items and a control item are in Figure 7.6, Figure 7.7 and Figure 7.8.
Figure 7.6 Study 5: Sample Praat Spectrogram for a test item— three T3*

198.9 Hz
Total duration 2.03 seconds
[Ni
[xiang
you
want
‘You want to buy flowers.’

[mai
buy

hua]]]
flower

you
T3
T2
T3

buy
T3
T3
T3

flower
T1 UT
T1 ST
T1 Robot Talk

want
T3
T2
T3

In Figure 7.6, the last syllable in T1 is intact. The third syllable also remained intact and
served as the baseline T3 for the preceding two T3*. The bold dots were drawn to various
positions in order to change the shape and pitch of the tones, and in our case, from T2 (a high
rising tone) to T3 (a low dipping tone). The first two syllables were manually manipulated by

248

drawing the bold dots. The original thin dots situated higher were the surface tones, derived T2’s,
in the original recordings. The pitch height and shape were altered to conform to the baseline T3,
the third T3 in the three-T3 sequence. After the tonal manipulation, the original recording no
longer had the T3S effect (no more derived T2s). The manipulated speech became our Robot
Talk, a talk with all underlying tones (Robot Talk = UT), a talk without T3S rule.
Figure 7.7 Study 5: Sample Praat Spectrogram for a test item— all T3*

169.9 Hz
Total duration 2.14 seconds
[Wo
[xiang
I
want
‘I want to buy pens.’

[mai
buy

bi]]]
pen

I
T3
T2
T3

buy
T3
T2
T3

pen
T3 UT
T3 ST
T3 Robot Talk

want
T3
T3
T3

In Figure 7.7, the last syllable remains intact, and served as the baseline T3 to its preceding
three T3*. In the original recording, the first and the third syllables surfaced as a T2, a rising tone,
because of the T3S rule. These two syllables were changed to T3, the low dipping tone. The
second syllable actually did not change to T2 in the original recording—(T2T3)(T2T3), which is
one of the surface patterns of the sentence. As the last syllable was used as the baseline T3, the

249

height of the second T3 was also lowered very slightly to match the height of the fourth syllable,
the baseline T3.
Figure 7.8 Study 5: Sample Praat Spectrogram for a control item

214.2 Hz
Total duration 1.97 seconds

[Ta
[xiang
he
want
‘He wants to read books.’

[kan
read

shu]]]
book

he
T1
T1
40
T1

read
T4
T4
T1

book
T1 UT
T1 ST
T1 Robot Talk

want
T3
T3
T3

In order to maintain the consistency of the “weirdness” of Robot Talk, some tones in the
control items were manipulated slightly. There was one concern, however. For the control items,
since there were no adjacent T3*, there was no T3S rule applied in the first place. Unlike the test
items, the tonal manipulation for the control items was not changing the derived T2 to its
underlying tone, T3. Although tonal manipulations in the test items did not cause a change in the
meaning (i.e. a sentence with or without T3S application carry the same meaning), a categorical
40

The tones in italics indicate minor change (in shapes and/or pitch height) in the tones that
cannot be categorized in any of the four lexical tones.
250

change in tones in the control items was not the best option because tones are contrastive in
Mandarin, so a change in the tone often leads to a change in the meaning. For the control items,
we minimized the possibility of alteration of the meaning of the words by slightly changing the
shape and/or height of the tones, although a change of one tone to another tone is not completely
prohibited in the manipulation if it would not lead to confusion and would not change the
meaning of the sentence. For instance, the third syllable in Figure 7.8 was changed from T4, a
falling tone, to T1, a high level tone, and the intended meaning can easily be retrieved because
kan ‘to read, to watch’ can be read as T1 in kan jia ‘to watch the house, to house-sit.’ If this
syllable had been changed from T4 to T3 instead, the meaning of the sentence would have totally
changed. In tone 3, kan means to chop (e.g. a tree), so kan in T3, followed by shu ‘book’ would
mean ‘to chop the book.’ This kind of tonal manipulation that leads to change in meaning was
avoided.
As seen in Figure 7.8, besides the third syllable being changed from T4 to T1, the first,
second and the last syllables were altered slightly in height or in shape. This is for the weirdness
effect of the Robot Talk, and yet at the same time, not sacrificing the preservation of the intended
meanings of the sentences. The accompanying picture cues also helped the subjects to identify
the intended meaning of the Robot Talk.
7.3.2.5 Coding
All the coding procedure was the same as that in Study 4. (See Section 7.2.1.5.)
7.3.3 Results and discussion for control items in RTR
Table 7.13 shows the number of items included and excluded in control items. Table 7.14
shows the number of excluded control items by coding categories.

251

Table 7.13 Study 5: Number of items included (I) and excluded (E) in control items
Number of items
4-year-olds
6-year-olds
adults

I
38
46
27

4w4σ
E
total
2
40
0
46
1
28

I
23
43
26

4w6σ
E
17
3
2

total
40
46
28

PRO-5w6σ
I
E
total
29 11
40
43
3
46
28
0
28

I
26
41
28

NP-5w6σ
E total
14
40
5
46
0
28

Table 7.14 Study 5: Control items— data excluded from the analysis
4w4σ
4w6σ
No
No
Non- Pauses Missing ExcluNon- Pauses Missing
sylladed
answer target
syllaanswer target
bles
total
bles
4
0
2
0
0
2
0
17
0
0
6
0
0
0
0
0
0
3
0
0
A
0
1
0
0
1
0
2
0
0

Excluded
total
17
3
2

PRO-5w6σ
NP-5w6σ
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing
answer target
sylladed
answer target
syllables
total
bles
4
0
11
0
0
11
0
14
0
0
6
0
3
0
0
3
0
5
0
0
A
0
0
0
0
0
0
0
0
0

Excluded
total
14
5
0

Table 7.15 shows the correct rates for control items.
Table 7.15 Study 5: Correct rates (%) in control items
4-year-olds
6-year-olds
adults

4w4σ
86.84 (33/38)
82.61 (38/46)
100 (27/27)

4w6σ
65.22 (15/23)
90.70 (39/43)
100 (26/26)

PRO-5w6σ
82.76 (24/29)
90.70 (39/43)
100 (28/28)

NP-5w6σ
80.77 (21/26)
87.80 (36/41)
100 (28/28)

Figure 7.9 shows the correct rates for the control items by age in Study 5 RTR. Adult
consistently had a 100% correct rate for all four sentence types as they did in Study 4 NSR. Their
performance was not affect by the manipulation of the tones in the control items. The correct
rates of the control items for both child groups dropped, however.

252

Figure 7.9 Study 5: Correct rates in control items by age

Correct rates in control items
100
90
80
70
60
50
40
30
20
10
0
4w4σ
4-year-olds
6-year-olds
adults

4w6σ

86.84
82.61
100

65.22
90.70
100

Spro5w6σ
82.76
90.70
100

Snp5w6σ
80.77
87.80
100

The two-proportion z-test (α = .05, two-tailed) was used to test whether there was a
significant difference between any two age groups. The Null hypothesis is that there is no
significant difference between any of the two age groups in 4w4σ, 4w6σ, PRO-5w6σ, and NP5w6σ sentences.
4w4σ
No two age groups are significantly different: 4-year-olds and 6-year-olds (p = .818), 4-yearolds and adults (p = .136), 6-year-olds = adults are not statistically different (p = .056).
4w6σ
Four-year-olds are significantly differently from 6-year-olds (p = .026) and adults (p = .004).
Six-year-olds and adults are not statistically different (p = .284).
PRO-5w6σ
No two age groups are found to differ significantly: 4-year-olds and 6-year-olds (p = .524),
4-year-olds and adults (p = .080), 6-year-olds and adults (p = .284).
253

NP-5w6σ
Four-year-olds are significantly differently from 6-year-olds (p = .664) and adults (p = .049).
Six-year-olds and adults are not statistically different (p = .148).
7.3.4 Results and discussion for test items in RTR
Table 7.16 and Table 7.17 summarize by coding categories the number of excluded data
from “4w4σ and 4w6σ test items” and “PRO-5w6σ and NP-5w6σ test items” respectively.
Table 7.16 Study 5: Test items (4w4σ and 4w6σ)— data excluded from the analysis
4w4σ: three T3*
4w4σ: all T3*
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing
answer target
sylladed
answer target
syllables
total
bles
4
0
7
0
0
7
0
6
0
0
6
0
7
0
0
7
0
3
0
0
A
0
1
0
0
1
0
1
0
0

Excluded
total
6
3
1

4w6σ: all T3*
4w6σ: three T3*
No
No
Non- Pauses Missing ExcluNon- Pauses Missing
sylladed
answer target
syllaanswer target
bles
total
bles
4
1
20
0
0
21
1
15
0
0
6
0
13
0
0
13
0
5
0
0
A
0
1
0
0
1
0
2
0
0

Excluded
total
16
5
2

254

Table 7.17 Study 5: Test items (PRO-5w6σ and NP-5w6σ )— data excluded from the analysis
PRO-5w6σ: three T3*
PRO-5w6σ: all T3*
No
Non- Pauses Missing ExcluNo
Non- Pauses Missing Excluanswer target
sylladed
answer target
sylladed
bles
total
bles
total
4
1
19
0
0
20
1
24
1
0
26
6
1
3
0
0
4
1
7
2
0
10
A
0
1
0
0
1
0
0
0
0
0
No
answer
4
6
A

1
0
0

NP-5w6σ: three T3*
Non- Pauses Missing Exclusylladed
target
bles
total
10
0
1
12
5
1
0
6
0
0
0
0

NP-5w6σ: all T3*
No
Non- Pauses Missing Excluanswer target
sylladed
bles
total
3
11
0
2
16
3
4
0
0
7
0
0
0
0
0

The next subsections present the statistical results and discussion for two sentence pairs,
4w4σ and 4w6σ and PRO-5w6σ and NP-5w6σ.
7.3.4.1 Results for 4w4σ and 4w6σ items
Table 7.18 shows number of items by pattern in 4w4σ and 4w6σ test items.
Table 7.18 Study 5: Number of items by pattern in 4w4σ and 4w6σ test items

4
6
adults

Subjectpredicate
pattern
0
2
0

4w4σ: three T3*
AlterOpt
nating
Rule
pattern
pattern
3
1
0
7
0
27

total
33
39
27

Subjectpredicate
pattern
0
2
0

4w4σ: all T3*
AlterOpt
nating
Rule
pattern
pattern
4
0
11
0
24
3

total
34
43
27

4w6σ: three T3*
4w6σ: all T3*
SubjectAlterOpt
SubjectAlterOpt
predicate
nating
Rule
predicate
nating
Rule
pattern
pattern
pattern
total
pattern
pattern
pattern total
4
0
1
1
19
0
4
0
24
6
0
12
3
33
0
15
2
41
adults
0
22
4
27
0
26
0
26
(4= 4-year-olds, 6= 6-year-olds; the number of individual patterns + number of errors = total.)

255

Figure 7.10 Study 5: Correct rates in 4w4σ and 4w6σ sentences
a.
b.

4w4σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred
0
Alternating 9.09
Opt rule
3.03

6-yearolds
5.13
0
17.95

4w4σ: all T3*
100
90
80
70
60
50
40
30
20
10
0

4-year- 6-yearolds
olds
Subj-pred
0
4.65
Alternating 11.76
25.58
Opt rule
0
0

adults
0
0
100

c.

adults
0
88.89
11.11

d.

4w6σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Alternating 5.26
Opt rule
5.26

6-yearolds
36.36
9.09

4w6σ: all T3*
100
90
80
70
60
50
40
30
20
10
0

4-year- 6-yearolds
olds
Alternating 16.67
36.59
Opt rule
0
4.88

adults
81.48
14.81

adults
100
0

Figure 7.10 shows the correct rates and distribution of the T3S patterns in 4w4σ and 4w6σ by
age. For the 4-syllable items, the opt rule pattern is the preferred pattern in the three-T3 items,
whereas the alternating pattern is the preferred pattern in the all-T3 items. For the 4w6σ items,
both three T3* and all T3*, the alternating pattern is the dominant pattern in all age groups. Even
though 4-year-olds and 6-year-olds are far from adult-like, a developmental pattern shows.
256

A logistic regression analysis was performed. The results show that the independent variables
(age, number of adjacent T3*, and number of syllables) as a set are statistically significant (chi
square = 332.374, p < .001 with df = 15). This indicates that the independent variables as a set
reliably distinguished among the response patterns.
The opt rule pattern
The Wald criterion shows the number of T3* (OR = 15.267, p < .001) and the number of
syllables (OR = 4.308, p = .006) are statistically significant for the opt rule pattern relative to
errors. The OR values show that, everything else held constant, three-T3 items are roughly 15
times more likely than all-T3 items, and 4w4σ items are 4 times more likely than 4w6σ items to
have the opt rule pattern. Both 4-year-olds and 6-year-olds are significantly different from adults
(4-year-olds: OR = .000, p < .001, 6-year-olds: OR = .003, p < .001).
The alternating pattern
The number of T3* (OR = .429, p = .014) and the number of syllables (OR = .296, p < .001)
are statistically significant for the alternating pattern relative to errors. The OR values show that,
everything else held constant, all-T3 items are roughly two times more likely than three-T3 items
to have the alternating pattern, and 4w6σ items are three times more likely than 4w4σ items to
have the alternating pattern. Both 4-year-olds and 6-year-olds are significantly different from
adults (4-year-olds: OR = .002, p < .001, 6-year-olds: OR = .005, p < .001).
The subject-predicate pattern
For the subject-predicate pattern relative to errors, no independent variables, including the
number of T3* (OR = .1.217, p = .850), number of syllables (OR = 2.173E7, p = .995), and age
(OR = .015, p = .999), are found to be statistically significant.

257

7.3.4.2 Discussion for 4w4σ and 4w6σ
Both 4-year-olds and 6-year-olds did not show very much difference in the 4w4σ items and
4w6σ items. The number of T3* and the length of sentences did not have an effect. The results
do not support the complexity Hypothesis (H1a) Number of adjacent T3* and (H1b) Length of
sentences.
Both 4-year-olds’ and 6-year-olds’ correct rates were very low, although the patterns the 6year-olds favor reflect those of the adults. In the 4w6σ sentences, the alternating pattern which
gives a good syntax-prosody mapping, is used by most adults. Most 6-year-olds used the
alternating pattern as well. The results support the hypothesis Syntax-prosody Alignment
Hypothesis (H2)— Prosodic boundaries tend to match the major syntactic boundaries.
A lot of errors in children were due to the misperception of the auxiliary verb xiang ‘want’ as
a T4, instead of a T3. Xiang in T4 can be “is like,” and when put in our experimental sentences,
they make sense, too. For instance, the original intended meaning of “You want to buy flowers”
is changed to “It’s like you are buying flowers.” The children’s correct rates might have been
understated because of this error due to the misperception (4-year-olds: 63.63% (21/33) and 6year-olds: 43.59% (17/39).) However, no such errors were found in adults.
For three-T3 4w4σ items, all adults used the opt rule pattern, whereas both child groups had
another pattern in addition to the opt rule pattern (4-year-olds: the alternating pattern and 6-yearolds: the subject-predicate pattern.)
For all-T3 4w4σ items in RTR, the dominant pattern across all groups is the alternating
pattern. In adults’ responses the alternating pattern is used 88.89% of the time while the large
domain pattern was used only 11.11% of the time. Compare this with all-T3 4w4σ items in NSR
where adults used 45% of the alternating pattern and 55% of the opt rule pattern, it shows that

258

adults favor the alternating pattern when they were not provided any pattern for the repetition.
The lower percentage of the use of the alternating pattern in NSR was due to the fact that about
half of the adults repeated with the pattern they heard. Since in Study 5, participants were
provided with the underlying tones, not one of the surface patterns, the distribution of surface
patterns in this study supposedly reflects more truthfully the T3S patterns the participants would
use in reality. If this is true, in all-T3 4w4σ items, adults tend to use the alternating pattern far
more than the opt rule pattern. Given that a 4-syllable domain is not very large and should be
relatively easy for adults, adults’ preference for two disyllabic feet for the sentence is actually
very interesting. When we also looked at all-T3 4w6σ items, the alternating pattern was used by
adults 100% of the time. In addition, it is also the dominant pattern in 4-year-olds and 6-yearolds. The alternating parsing strategy appears to be a robust parsing strategy. Furthermore, the
use of alternating parsing in the 4w6σ items gives a good mapping between syntax and prosody
as in (8), which supports our syntax-prosody alignment hypothesis.
(12) [[haima]
seahorse
33
(23)
(23)
(22

[xiang
want
3
3
(2
2

[zhao
look for
3
3
3)
3)

[shuimu]]]]
jellyfish
33
(23)
(23)
(23)

‘Seahorse wants to look for Jellyfish.’
UT
Word: two disyllabic feet, T3S
Phrase: disyllabic foot, T3S; ST1
larger domain in fast speech; ST2

7.3.4.3 Results for PRO-5w6σ and NP-5w6σ items
Table 7.19 shows number of items by pattern in PRO-5w6σ and NP-5w6σ test items, and
Figure 7.11 shows correct rates in PRO-5w6σ and NP-5w6σ sentences.

259

Table 7.19 Study 5: Number of items by pattern in PRO-5w6σ and NP-5w6σ test items
PRO-5w6σ: three T3*
Subj- Alter- Ternary
Opt
topred
nating pattern
rule
tal
pattern pattern
pattern
4
6
adults

1
1
1

0
0
0

0
20
25

0
0
0

20
42
27

PRO-5w6σ: all T3*
Subj- Alter- Ternary
Opt
pred
nating pattern
rule
pattern pattern
pattern
0
3
2

1
8
6

0
7
11

0
0
5

total
14
36
28

NP-5w6σ: three T3*
NP-5w6σ: all T3*
Subj- Alter- Ternary
Opt
to- Subj- Alter- Ternary
Opt
topred
nating pattern
rule
tal
pred
nating pattern
rule
tal
pattern pattern
pattern
pattern pattern
pattern
4
1
0
1
0
28
2
0
1
0
24
6
8
0
21
0
40
9
2
11
3
39
adults
10
0
18
0
28
7
0
16
5
28
(4= 4-year-olds, 6= 6-year-olds; the numbers of individual patterns + number of errors = total.)

260

Figure 7.11 Study 5: Correct rates in PRO-5w6σ and NP-5w6σ sentences
a.
b.

PRO-5w6σ: three T3*

PRO-5w6σ: all T3*

100
90
80
70
60
50
40
30
20
10
0

100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred
5
Ternary
0

6-yearolds
2.38
47.62

4-yearolds
Subj-pred
0
Ternary
0
Alternating 7.14
Opt rule
0

adults
3.70
92.59

c.

6-yearolds
8.33
19.44
22.22
0

adults
7.14
39.29
21.43
17.86

d.

NP-5w6σ: three T3*
100
90
80
70
60
50
40
30
20
10
0

4-yearolds
Subj-pred 3.57
Ternary
3.57

NP-5w6σ: all T3*
100
90
80
70
60
50
40
30
20
10
0

6-yearolds
20
52.50

4-yearolds
Subj-pred
8.33
Ternary
4.17
alternating 0.00
Opt rule
0.00

adults
35.71
64.29

6-yearolds
23.08
28.21
5.13
7.69

adults
25.00
57.14
0.00
17.86

Figure 7.11 (a) – (d) show the correct rates and distribution of T3S patterns in the PRO-5w6σ
and the NP-5w6σ items by age. We can see that the subject-predicate pattern is higher in
frequency in the NP-5w6σ items than in the PRO-5w6σ items. In the PRO-5w6σ items and the
NP-5w6σ items, while 6-year-olds had the patterns attested in adults (except for the opt rule
261

pattern in all-T3 PRO-5w6σ items), 4-year-olds did not. They also had extremely low correct
rates.
Both 6-year-olds and adults did better in the NP-5w6σ items than in the PRO-5w6σ items.
The difference is minimal for 4-year-olds, however. Adults did perfectly in the NP-5w6σ items
(three-T3 and all-T3), but had a slightly lower correct rate of 96.29% in three-T3 PRO-5w6σ
items and a 85.72% correct rate in all-T3 PRO-5w6σ items. For 6-year-olds, the correct rates for
the NP-5w6σ items were about 70% and 60% for three-T3 and all-T3 conditions respectively,
and the correct rate was about 50% for the PRO-5w6σ items.
A logistic regression analysis was performed. The results show that the independent variables
(age, number of adjacent T3*, and subject NP vs. subject pronoun) as a set are statistically
significant (chi square = 291.799, p < .001 with df = 20). This indicates that the independent
variables as a set reliably distinguished among the response patterns.
The opt rule pattern
The Wald criterion shows the number of T3* (OR = 1.746E-9, p = .998) and the nounpronoun distinction (OR = .272, p = .058) are not found to be statistically significant for the opt
rule pattern relative to errors. Six-year-olds are significantly different from adults, but 4-yearolds are not (4-year-olds: OR = 9.749E-11, p =.998, 6-year-olds: OR = .021, p < .001).
The alternating pattern
For the alternating pattern relative to errors, the noun-pronoun distinction (OR = 5.021, P
= .045) is statistically significant, but not the number of T3* (OR = 1.312E-9, P = .998). Both 4year-olds and 6-year-olds are significantly different from adults (4-year-olds: OR = .024, P
= .002, 6-year-olds: OR = .184, P = .021).

262

The ternary pattern
For the ternary pattern relative to errors, both the number of T3* and the noun-pronoun
distinction are found to be statistically significant (OR = 2.361, p = .010 and OR = .419, p = .010
respectively). Both 4-year-olds and 6-year-olds are significantly different from adults (4-yearolds: OR = .001, p < .001, 6-year-olds: OR = .056, p < .001).
The subject-predicate pattern
For the subject-predicate pattern relative to errors, the noun-pronoun distinction (OR = .108,
p < .001) is statistically significant, but the number of T3* is not (OR = 1.247, p = .586). Both 4year-olds and 6-year-olds are significantly different from adults (4-year-olds: OR = .006, p
< .001, 6-year-olds: OR = .063, p < .001).
7.3.4.4 Discussion for PRO-5w6σ and NP-5w6σ
Level of difficulty
In PRO-5w6σ and NP-5w6σ sentences, 4-year-olds’ correct rates are under 15%. Three-T3*
or all-T3* in these items did not make a difference to them. For 6-year-olds, all-T3 items were
more difficult than three-T3 items in NP-5w6σ sentences; nevertheless, the effect of the number
of T3* did not show in the PRO-5w6σ sentences. H1a— Number of adjacent T3*: The more
adjacent T3*, the more complex the task— cannot be confirmed or rejected.
Regarding syntax-prosody alignment, the use of the subject-predicate pattern in the NP-5w6σ
items are between 25% - 35% in 6-year-olds and adults, but only under 10% in the PRO-5w6σ
items. The results lend support to the Syntax-prosody Alignment Hypothesis (H2)— Prosodic
boundaries tend to match the major syntactic boundaries. Similar to the results in Study NSR, the
ternary pattern was used more frequently than the subject-predicate pattern.

263

The observation that the subject pro appeared to be more difficult than subject NP was a
somewhat surprising result. Even though we expected the sentences with subject pro and subject
NP to have different parsing patterns because of their prosodic differences, we did not expect one
to be more difficult than the other.
The patterns attested in PRO-5w6σ and NP-5w6σ items in Study 5 RTR echo those attested
in Study NSR. A sharp contrast between the use of the alternating pattern for PRO-5w6σ (Figure
7.11 (b)) and the use of the subject-predicate pattern for NP-5w6σ (Figure 7.11 (d)) is not
unfamiliar— we have seen that in Study 4. Although the correct rates dropped dramatically in
Study 5, the T3S variability remained the same in participants’ responses.
The subject-predicate pattern
T3S cannot be done without syntax, and T3S cannot be done without prosody, either. The
only minor difference between PRO-5w6σ and NP-5w6σ is the distinction between pro and NP
in the subject position, which was the source of the shift in the distribution of the parsing patterns.
The subject-predicate pattern occurred much more frequently in the NP-5w6σ items than in the
PRO-5w6σ items. Again, since a subject NP, being a full noun, can stand alone better in the
subject-predicate pattern.
Unlike in Study 4 where the subject-predicate pattern was not attested in the PRO-5w6σ
(both three-T3 and all-T3) items, we found a small portion of responses with this pattern in
Study 5 RTR. In these cases where children and adults used the subject-predicate pattern in
PRO-5w6σ sentences, it indicates that for these speakers, the subject pronoun could stand as a
degenerate foot despite its weak prosodic nature. For them, perhaps maintaining the subjectpredicate boundary was more important, even when the subject was a monosyllabic pronoun.

264

We see that 6-year-olds’ patterns reflect those of adults, although they were still far from
adult-like accuracy in T3S application. All age groups did better in the NP-5w6σ items than in
the PRO-5w6σ items.
The alternating pattern
The alternating pattern was attested in all-T3 PRO-5w6σ items in three age groups, the same
as the finding in Study 4. Nevertheless, the alternating pattern showed up in 6-year-olds’
production in the NP-5w6σ items, though it was only at a very low frequency, 5.13%.
The ternary pattern
The ternary pattern was attested in the PRO-5w6σ and the NP-5w6σ (both three-T3 and allT3) items. Six-year-olds and adults have a fairly high proportion of this pattern. Four-year-olds
were very limited in the use of patterns seen in 6-year-olds and adults. A 6-syllable sentence with
three T3* or six T3* was probably too heavy the workload for 4-year-olds. Even though the
length of six syllables is manageable at their age (since they did well in the control items),
adding the processing load of T3S clearly add to the workload.
The opt rule pattern
The opt rule pattern was used by the adults only in all-T3 PRO-5w6σ items, (4- and 6-yearolds: 0%, adults: 17.86%) and by both 6-year-olds and adults in all-T3 NP-5w6σ items (4-yearolds: 0%, 6-year-olds: 7.69%, and adults: 17.86%). Overall, this pattern is lower in frequency
comparing to other patterns. If we compare these results to the results we obtained in Study 4
NSR, the picture was very different, with all three age groups using the opt rule pattern, and with
the 4-year-olds using it the most (all-T3 PRO-5w6σ items: 4-year-olds: 58.82%, 6-year-olds:
30%, and adults: 35%; all-T3 NP-5w6σ items: 4-year-olds: 58.82%, 6-year-olds: 50%, and adults:
26.32%). The comparison reveals the following:
265

It was not that 4-year-olds were unable to produce the opt rule pattern. We saw in Study 4
NSR that they were the most “willing” group to repeat the sentence with this opt rule pattern
they heard. The fact that such pattern was not attested in 4-year-olds in Study 5 may indicate that
the high percentage of the larger domain parsing was mere imitated repetition of what they heard.
For 6-year-olds, the much higher percentage of the larger domain parsing in NSR than in
RTR may be, too, imitated repetition. However, the very small percentage of 7.69% that showed
in all-T3 NP-5w6σ items in RTR provides some evidence that at least some 6-year-olds had the
opt rule pattern in place. As they heard underlying tones and could actively applied T3S, the
presence of the pattern in these children’s production was a piece of evidence that the pattern is
in place. Also, like adults, this pattern was the least frequent pattern in 6-year-olds.
In all-T3 PRO-5w6σ and all-T3 NP-5w6σ items, adults showed consistency in what patterns
they favored or disfavored. In RTR, the opt rule pattern was the least used (all-T3 PRO-5w6σ:
17.86%, all-T3 NP-5w6σ: 17.86%), and in NSR where they heard this pattern, adults repeat with
it 35% and 26.32% of the time in all-T3 PRO-5w6σ and all-T3 NP-5w6σ items respectively. Not
surprisingly, the use of the pattern is higher in frequency when they heard this exact pattern in
NSR. Taken the results in NSR and in RTR, we see that, in these 6-syllable all-T3 sentences,
regardless of the subject being an NP or a pro, adults chose other patterns over the opt rule
pattern.
7.4 General Discussion
Two experiments: NSR and RTR
It may be tempting to think that since the participants heard something totally strange in RTR,
there is a chance that their responses in RTR are not normal. We found that the patterns that were
attested in NSR were also attested in RTR. Even though the designs differ in terms of whether or

266

not T3S is present, the T3S variability remains the same in subjects’ responses, children and
adults. T3S patterns in NSR and RTR were very fairly consistent, although it was not surprising
that there was less variability in NSR than in RTR, given that participants heard one surface
pattern, and they could simply repeat with the same pattern, without working out a different
pattern themselves. Another way to look at this is that, the subjects were less constrained in
giving whatever T3S pattern they had in RTR, whereas in NSR they might have been “more
constrained” to the pattern they heard, instead of using their own preferred pattern.
Development in the acquisition of T3S
Between the two experiments, NSR and RTR, we had expected that NSR would be easier. As
expected, subjects did better in NSR than in RTR since the former did not require T3S
applications, but the latter did. The fact that both child groups performed well in the control
items in both studies shows that the length of the sentences did not present a challenge to them,
nor did the structures of the sentences. Both child groups also did well in the test items in NSR,
which shows that they had no problem repeating a sentence where T3S had been applied
correctly. The dramatic drop in children’s correct rates in RTR was because of the T3S
application that was required. Four-year-olds had a lot of difficulties in the RTR items and were
far from adult-like. Six-year-olds differ from 4-year-olds not only in the much higher correct
rates in T3S applications, but also in their increasing awareness and use of multiple T3S surface
patterns and how the distribution of T3S patterns reflect those of adults. The percentages of the
use of the correct T3S patterns in 6-year-olds strongly indicate that they are moving toward to be
more like what adults do in T3S application. This supports that T3S develops over a period of
time, rather than being acquired instantaneously.

267

The complexity issue: Number of adjacent T3* and the length of sentence
In NSR, the length of sentences (4w4σ and 4w6σ) had an effect on 4-year-olds’ and 6-yearolds’ correct rates. They did better in the 4w4σ sentence than in the 4w6σ sentences. However, in
RTR, such contrast did not exist. It could be that the T3S workload was so heavy in RTR that
whether children were to apply T3S in a 4w4σ sentence or in a 4w6σ sentence did not make very
much difference.
Regarding the number of adjacent T3* in a sentence, overall both child groups did better in
the three-T3 items than in the all-T3 items in NSR. The contrast did not exist in RTR, just as
what we saw in the case with the number of syllables. Fewer adjacent T3* and fewer syllables in
the sentence did not result in higher correct rates in children’s responses in RTR, though they did
in NSR.
Adults correct rates remain consistently high (100% or close to 100% in most items in both
NSR and RTR), indicating that the differences in the number of adjacent T3* and the length of
sentence did not have an effect on their responses in both studies.
The alternating pattern and the subject-predicate pattern
The alternating pattern in the NP-5w6σ sentences and the subject-predicate pattern in the
PRO-5w6σ sentences may be grammatical to some people, but marginal/ungrammatical to other
people. Whether or not the intonational break and/or emphasis/focus plays a role in the subject
predicate pattern requires further investigation. In addition, the directionality of leftward or
rightward incorporation of an unparsed syllable at the phrase level will need to be examined.
Whether or not these patterns are specific to Mandarin speakers in Taiwan is unknown. Data
from Mandarin speakers of other regions, when collected, will enable us to examine whether or

268

not some T3S patterns are used across the regions or if they are typically used by Taiwan
Mandarin speakers.
The distinction between subject pronoun and subject NP
While a subject NP can stand alone in its own domain, a subject pronoun does not tend to. As
a clitic, it is prone to cliticize to a host to form a foot. Because a subject NP can stand alone, we
see higher frequency of the subject-predicate pattern in the subject-NP sentences than in the
subject-pro sentences. This shows that the prosodic properties of the two are very different;
otherwise, we would not have seen the shifts in the percentages of the T3S surface patterns.
Adults’ relatively lower correct rate at approximately 85% in all-T3 PRO-5w6σ items in both
NSR and RTR is quite telling. Note that adults did perfectly (100% correct) in all-T3 NP-5w6σ
items in both NSR and RTR, and these items differ only in the subject being an NP, rather than a
pro. The pronoun is clearly the source contributing to the lower correct rate in adults in all-T3
PRO-5w6σ items. There is only one error pattern in adults *(T2T3T3) (T3T2T3), although in
children in addition to this error pattern, a common error is *(T3T4T3)(T3T2T3) as well as
missing one or two of the syllables.
It is intriguing that 6-year-olds’ correct rates in all-T3 PRO-5w6σ items were lower than
those in all-T3 NP-5w6σ items in both NSR and RTR. Not only do the surface patterns 6-yearolds produced mirror those adults produced, in the proportion of each pattern and in the
comparison between all-T3 NP-5w6σ and all-T3 PRO-5w6σ items as mentioned above, we see
6-year-olds’ behavior of T3S application approximating that of adults.
7.5 Conclusions
In this chapter, two experimental studies of T3S at the sentence level, NSR and RTR, were
presented. As expected, participants did better in NSR (natural speech without tonal
269

manipulations) than in RTR (with tonal manipulations) because of the lesser workload in the
former. We conclude this chapter with a summary of the findings of the two studies.
In Study 4 NSR, the results show an effect of number of adjacent T3* in children’s correct
rates. Complexity Hypothesis (H1a)— Number of adjacent T3*— is supported. Regarding the
length of sentences, both 4-year-olds and 6-year-olds did better in 4w4σ sentences than in the
4w6σ sentences. Complexity Hypothesis (H1b)— Length of sentences—is supported.
For the PRO-5w6σ and NP-5w6σ items, the number of T3* did not have much effect, and the
results do not provide strong evidence to support (H1a)— Number of adjacent T3*.
The use of the alternating pattern which gives a good mapping between syntax and prosody
in 4w6σ sentences increases with age. The subject-predicate pattern was used by 6-year-olds and
adults in the NP-5w6σ items, but not in the PRO-5w6σ items. Taken together, the use of these
patterns produces a good mapping between syntax and prosody, supporting the Syntax-prosody
Alignment Hypothesis (H2).
In Study 5 RTR, the Complexity Hypothesis cannot be confirmed or rejected because 4-yearolds’ correct rates did not show a difference in the number of T3* and number of syllables and
although 6-year-olds did better in three-T3 items than in all-T3 items in NP-5w6σ sentences, the
opposite was found in the PRO-5w6σ sentences. Similar to what was found in Study 4 NSR,
both adults and 6-year-olds use the subject-predicate pattern in the NP-5w6σ items (25% - 35%),
but only under 10% in the PRO-5w6σ items. These results lend support to the Syntax-prosody
Alignment Hypothesis (H2).
In the beginning of the chapter, research questions posed include the following:
1. Does complexity—the number of adjacent T3*, the length of sentences (total number of
syllables)— play a role in T3S application?
270

In some cases, complexity appears to have an effect (especially in NSR), but in other cases,
the effect does not exist as in RTR. This could be due to the difficulty of the task in RTR.
2. How does mapping between syntax and prosody affect T3S application in children and adults?
There is no clear evidence that shows that 4-year-olds were aware of the mapping between
syntax and prosody. However, 6-year-olds show awareness of the mapping of syntax and
prosody. Like adults, they used the alternating pattern in 4w6σ sentences and the subjectpredicate pattern in the NP-5w6σ items which give a good mapping between syntax and prosody.
Although 6-year-olds do not yet have adult-like accuracy, they mirror what adults do in
producing a variety of T3S patterns, and often with similar frequency.
Development of T3S acquisition
Regarding development of T3S acquisition, the findings obtained are summarized as follows:
Overall, the correct rates in 6-year-olds are higher than 4-year-olds. Though their
performance was not always adult-like, the developmental trend was evident.
Six-year-olds have more T3S variability than 4-year-olds. Unlike 4-year-olds, they had all
the surface patterns that are attested in the adults. This suggests that for 6-year-olds, all the T3S
patterns were in place, though the frequency of the patterns might differ from those in adults.
A very crucial piece of information that should not be overlooked is that the increase or
decrease in the usage of particular patterns (comparing to 4-year-olds) tells us that 6-year-olds
are on the right track— moving toward the adult-like preference in the T3S surface pattern. In
many cases, the proportions of the T3S surface patterns in 6-year-olds and in adults were
strikingly similar. We have also seen that what appears to be more difficult for adults (in the case
of all-T3 PRO-5w6σ items) was also more difficult for the 6-year-olds.

271

Distinction between subject NP and subject pronoun
In the beginning of the chapter, one of the research questions posed was, do pronouns behave
differently from NPs in T3S application? The results of the two studies show that the prosodic
difference between a pro and an NP affects speakers’ preferences in how they set up the domains
for T3S application, and consequently, the surface patterns used for the sentences containing a
subject NP or a subject pro exhibit some very interesting distinction. While a subject NP can
stand alone and give a subject-predicate pattern, it is far less frequent that a subject pronoun also
does the same. As a clitic, a pronoun is weak prosodically, and is prone to form a prosodic
domain with another syllable(s), and therefore, is less likely to give a subject-predicate pattern.
Lastly, application of T3S is highly complicated. Syntactic and prosodic factors, the number
of adjacent T3*, the length of sentence, the interface of syntax and prosody—alignment between
syntax and prosody—all play a role in how prosodic domains are parsed. There is still a lot to be
discovered. The experiments presented in this chapter did not support the findings of earlier
studies that T3S is mastered early and is almost error-free (Jeng 1979; Jeng 1985; Li &
Thompson 1977; Zhu & Dodd 2000). At age 4, children do not have all the T3S patterns that
adults have. At age six, children have all the T3S surface patterns attested in the adults, and the
frequency of the surface patterns in 6-year-olds is becoming more like that of adults. However,
they do not yet have adult-like accuracy. Future studies can investigate even older children and
find out at what age children become adult-like in their application of T3S in a more complicated
task such as that in RTR.

272

CHAPTER 8
CONCLUSION
8.0 Introduction
The purpose of the dissertation was to examine how children acquire the syntax-dependent
phonological rule Tone 3 Sandhi. The whole process of setting up prosodic domains within
which T3S applies is complicated because cyclic and non-cyclic parsing strategies are used at
different levels, and the integration of the two strategies is necessary at the sentence level. In
addition, there are optional rules, which create T3S variation. Specifically, we ask if T3S is
mastered early in children’s acquisition. One of the challenges was that there was no previous
work targeting T3S we could learn from and compare our findings to. There was also not much
experimental data on T3S in adults, which is relevant and crucial to acquisition studies of T3S as
adult speech is the language input children receive. Different parsing strategies as well as
optional T3S (i.e. T3S is optional across domains and the fast speech rule) result in T3S variants
whose frequency is still largely unknown. Given what we know from the existing T3S theories
and what needs to be learned in children’s acquisition of T3S, a series of five studies were
conducted, targeting to answer specific research questions.
The overall primary goal of the dissertation was to investigate children’s acquisition of T3S
in various contexts. More specifically, we examined children’s ability of using a non-cyclic
strategy in flat structures and a cyclic strategy in NPs as well as the integration of cyclic and noncyclic strategies in sentences.
In the beginning of the dissertation, questions posed include the following. The results of the
studies answer some, but not all of the questions.

273

1. Do children know both cyclic and non-cyclic strategies?
The results of the studies show that they do. They could apply T3S non-cyclically in flat
structures and cyclically in NPs.
2. Can children integrate the subject and the VP into a domain where T3S applies?
Yes, they do. In Study 4 and Study 5, a monosyllabic subject form a domain with the
following syllables, creating the alternating pattern or the ternary pattern.
3. In contexts where there are internal structures, are children aware of syntax and refer to
syntax in their application of T3S? What do they do in their application of T3S in the more
complex structures?
Although they are aware of syntax and are able to apply T3S bottom up, when the structure
becomes more complex (such as in the mixed-branching NPs), they sometimes default to the
prosody-based strategy and apply T3S from left to right.
4. When the cyclic parse and the non-cyclic parse mismatch, what do children do?
In the mixed-branching NPs [σ[[σσ]σ]] where the cyclic parse and the non-cyclic parse
mismatch, they sometimes used the non-cyclic strategy without referring to syntax.
Therefore, they sometimes produced the non-cyclic parsing (σσ)(σσ), instead of the cyclic
parsing σ(σσ)σ

σ(σσσ)

(σσσσ). The former gives the surface pattern of (T2T3)(T2T3)

whereas the latter gives the surface pattern of (T3T2T2T3). In short, in the more complex
structures, such as the mixed-branching structure, children sometimes ignored syntax and
used a prosodic parsing strategy, although a syntactic parsing strategy is required in this case.
5. How do children go from zero T3S to adult-like T3S, and whether or not T3S is acquired
early as indicated in the literature? If not, what does the developmental pattern look like?

274

The results in the studies do not support that all the “ingredients” in T3S are mastered early.
Children may have the knowledge of the rule T3T3

T2T3 at an early age; however,

applying T3S in an adult-like fashion takes years. They have to learn to use the cyclic and
non-cyclic strategies, to integrate them, to know how to incorporate an unparsed syllable, to
learn the optional rules, and to know the distinction of NPs and pronouns, etc. Younger
children have difficulties, and even the oldest age group, 6-year-olds, did not have adult-like
mastery of T3S. A clear pattern does show that they were aware of the aspect such as the
distinction between an NP and a pronoun in building prosodic domains. A very interesting
finding is that even the frequency of the T3S patterns 6-year-olds produce resembles the
frequency of T3S patterns produced by adults.
6. Do younger children and older children behave the same or different?
In some ways, they behave alike, such as they both under-apply and over-apply T3S in some
cases. In other ways, older children appear to make a distinction between NP and pronoun
that younger children are unable to do. In Study 2, flat structures, a common error pattern
that was found in 5-year-olds was not found in 3-year-olds. The error pattern results from
alternating between T2 and T3 (*T2T3T2T3T2). This may indicate that 5-year-olds is
acquiring the rhythm (grouping two syllables into a disyllabic foot).
7. Does children’s variability in T3S reflect adults’?
Older children’s variability reflects adults’ but younger children’s variability is quite limited.
I summarize the hypotheses in Section 8.1. Section 8.2 sums up the findings of each study.
Section 8.3 suggests what can be done in future research.

275

8.1 Hypotheses
Regarding the cyclic and non-cyclic strategies, we asked whether or not children can use the
two strategies separately, and whether or not they can integrate the two in sentences. In addition,
we asked what children do when a prosody-based non-cyclic parse does not match with a syntaxbased cyclic parse. Can children integrate the subject and the VP into a domain where T3S
applies? When a cyclic parse and a non-cyclic parse mismatch, what do children do? What do
children do in in their application of T3S in the more complex structures?
Regarding development in T3S acquisition, we asked how children go from zero T3S to
adult-like T3S, and whether or not T3S is acquired early as indicated in the literature. If not,
what does the developmental pattern look like? Do younger children and older children behave
the same or differently? Does children’s variability in T3S reflect adults’ variability? We put
forward the following hypotheses.
(1)

Syntax-Prosody Alignment Hypothesis (Gerken 1996)

T3S cases where a left-to-right parse and the phrase structure dependent parse produce the same
results will cause less trouble than cases in which left to right foot building produces a different
result than foot building based on the syntax. T3S cases where prosody and syntax mismatch
should be more difficult than T3S cases where prosodic domains and syntactic domains align
well.
(2)

Structural complexity Hypothesis

T3S at the clausal level requires the integration of DPs and compounds into a larger prosodic unit.
We hypothesize that children will take longer to perform at adult-level T3S at the sentence level
than at the phrasal level. Particularly it may be difficult for children to integrate the subject and
the VP into a domain where T3S can apply.

276

(3)

Variational Hypothesis (Miller 2007; Pearl 2007; Yang 2002)

If there is more variation in particular types of structures in the input, these structures will
provide evidence for more than one possible analysis, generating a certain amount of noise in the
input. If the input is noisier we expect that children will require more data to converge into the
adult language because certain outputs may not unambiguously support one or the other
hypothesis.
In what follows, I evaluate how the overall findings obtained across these five studies
confirm, reject, or are unable to confirm or reject the hypotheses.
We hypothesized that T3S cases whose prosodic parsing aligns with syntactic parsing is
easier than T3S cases whose prosodic parsing and syntactic parsing do not align (the Syntax Prosody Alignment Hypotheses). This is confirmed with the evidence from four-syllable Rbranching NPs (T2T3T2T3) and M-branching NPs (T3T2T2T3). The former whose syntax and
prosody align has overall higher correct rates than the latter whose syntax and prosody do not
align. Furthermore, in Study 4 and Study 5, a monosyllabic NP is more easily parsed as a
degenerate foot than a monosyllabic pronoun, appearing to be motivated by maintaining the
subject-predicate boundary. Both adults and children (mostly 6-year-olds) used the subjectpredicate pattern, which surfaced as a result of a better alignment between syntax and prosody.
In addition, in the 4w6σ sentences, the alternating pattern was used most frequently across age
groups, and this pattern matches the syntactic and prosodic parsing. These results lend support to
the Syntax-Prosody Alignment Hypothesis.
Regarding structure complexity, in Study 4 NSR and Study 5 RTR, number of adjacent T3*
and number of syllables in the sentences had an effect in Study 4 NSR, but the effect did not
exist in Study 5 RTR which required children to actively apply T3S. Children did not perform
277

very well in T3S in the sentences in Study 5 RTR. It is not clear if it was due to the structural
complexity, variability of the input, or if the task was too hard for children. Children, however,
were able to integrate the subject and the VP into a domain where T3S can apply. In Study 3
NPs, children did well in the three-syllable compound nouns. Although in general, children also
did well 4-syllable right-branching NPs, they had a lot of difficulties in the mixed-branching
NPs. Taken together, the Structural complexity Hypothesis cannot be confirmed or rejected at
this point.
There is variability in children and adult production. The longer the string, the more
possibilities of variation. Clearly, there is variability and this may cause delays in acquisition.
Unfortunately we cannot determine if children’s behavior in Study 5 RTR was due to the
variability of the input or the complexity of the task and experimental sentences. It is clear that
children (a) shift strategies, (b) do not have all the adult patterns, and (c) have non-adult patterns.
These observations lead us to conclude that acquisition of T3S is a slow process.
Although T3S variability was found in children and adults and variability may delay
children’s acquisition of T3S, whether or not variability indeed cause the delay will require
further investigation. The task in Study 5 RTR might have been too difficult for children. Study
3 where children were asked to build NPs might also have been beyond some children’s
capability, especially 3- and 4- year-olds. Variational Hypothesis cannot be confirmed at this
time.
T3S variability creates ambiguity and although children have the knowledge of cyclic and
non-cyclic strategies, and also know to integrate the two strategies at the sentence level, their
accuracy is clearly not adult-like. Six-year-olds showed that they are on the right track and
approximate to adult-like mastery of T3S in: (i) T3S variability and (ii) frequency of the T3S

278

surface patterns. In short, T3S develops gradually, rather than instantaneously. For children to
master T3S at the adult-like level, it takes years.
In the next section, major findings of these studies are summarized.
8.2 Summary and discussion of findings
The studies presented in this dissertation collected natural speech data, elicited production
data, and repetition data from children and adults in Taiwan. The findings do not support that
mastery of T3S is easy and perfect very early. The results of these studies indicate that although
children know to change a T3 to a T2 (the youngest age group: 3-year-olds), children across age
groups did not have adult-like accuracy. In what follows, I will summarize and briefly discuss
the major findings of each study.
Study 1: Natural Speech
Variability of T3S was attested (resulting from application or non-application of T3S across
prosodic domains). Examples of children’s T3S application were provided.
We found that in children and adults, T3S applies more frequently within constituents than
across constituents. While there is only a 8.77% difference in adults, there is a 29.36% difference
in children. This may indicate that while adults can apply T3S in a fairly similar manner within
constituents and across constituents, children differentiate the two more (applying T3S within
constituents a lot more freely than across constituents).
Although T3S variability was attested in this study, the cases of two adjacent T3* is far
greater than three, four, and five adjacent T3*. The number of cases where there were multiple
T3* was very small; therefore, a more systematic analysis of T3S application in various syntactic
contexts in children and adults could not be carried out.

279

Study 2: Flat structures
We tested production of two-, three-, and five-digits. While children did perfectly in the
control items, 3-year-olds’correct rates in the test items are between 20% and 30%, and 5-yearolds, approximately between 60% and 70% in the test items, indicating that children still do not
have adult-like mastery in flat structures (adults: 100% correct in controls and 97.50% in the test
items). Under-application and over-application of T3S were attested in both child groups.
In two-, three-, and five-digit sequences, we saw binary parsing strategy and incorporation of
the unparsed syllables (when applicable). An interesting finding was that in the sequence of five
T3-digits, two patterns were predicted: (i) binary parsing followed by incorporation:
(T2T3)(T2T2T3), and (ii) larger domain parsing: (T2T2T2T2T3), but an additional unpredicted
pattern (T2T2T3)(T2T3) was attested in both child group as well as adults. Although low in
percentage (below 10%), it is mysterious that children actually used this pattern, but they never
used the predicted pattern (T2T3)(T2T2T3) which was attested only in adults (22.50%). In
addition, the major pattern (T2T2T2T2T3), assumed to occur in fast speech by many studies,
was produced in a normal speech setting.
Regarding children’s errors, most common errors for 3-year-olds is over-application (*T2T2,
*T2T2T2, *T2T2T2T2T2), indicating that although they have the knowledge of changing a T3
to a T2 when followed by another T3, they have difficulty maintaining the underlying tone for
the rightmost digit. Five-year-olds had a relatively smaller proportion of such error, and
meanwhile another common error type emerged— (*T2T3T2 and *T2T3T2T3T2). This is a
rather attractive finding which indicates a binary process. The tendency of the alternation of
T2T3 in iterative binary feet is found in 5-year-olds, but not in 3-year-olds.

280

With respect to directionality, (T2T3)(T2T2T3) gives supporting evidence for a left-to-right
parsing strategy. On the other hand, (T2T2T3)(T2T3) gives counter evidence. An explanation is
that in flat structures, groupings of two or three, or even four digits, is robust (e.g. phone number
reading), and since a three-syllable foot is only slightly larger than a disyllabic foot, these two
types of parsing may both be accessible. In other words, children group two or three digits
together at one time, rather than going through the process of binary parsing followed by
incorporation of the unparsed syllable. Suggestions of how future work can further test parsing
strategies are given in Section 8.4.
Study 3: NPs
The major goal in this study was to test the cyclic parsing strategy in three-syllable
compound nouns ([σ[σσ]] and [[σσ]σ]) and four-syllable NPs (R-branching [σ[σ[σσ]]] and Mbranching [σ[[σσ]σ]]). The findings confirm that children and adults refer to morphosyntax in
building prosodic domains and T3S applies cyclically. While children did very well in threesyllable compounds their correct rates dropped in four-syllable NPs. R-branching NPs appeared
easier to them than M-branching NPs, and crucially, all the mis-application errors in the Mbranching NPs were *T2T3T2T3 (cyclic parsing predicts T3T2T2T3). Such error results from a
left-to-right binary parsing. In other words, there is no reference to syntax in this pattern. The
parsing strategy might have been shifted to the default prosodic parsing without reference to
syntax in structurally complex cases. In fact, this error type was not only attested in both child
groups, but also in adults.

281

Study 4: Natural Speech Repetition (NSR) and Study 5: Robot Talk Repetition (RTR)
In these two studies identical sentences were used, with the only difference that in NSR, one
of the surface patterns served as the model, and in the RTR, the underlying tones served as the
model. T3S variability was attested in both studies. Moreover, the patterns that appeared in NSR
were also attested in RTR. More patterns were attested in RTR, which was not surprising as
participants were in a sense “freer” to apply T3S rather than being affected by the model surface
form. As the participants of the two studies were different, and the tasks were different, the fact
that certain patterns were consistently used provides strong evidence that the surface patterns
attested in the studies were legitimate, rather than accidental (as it might be tempting to think
that may be the case in RTR due to the “unnaturalness or weirdness of the speech”).
Overall, 4-year-olds did not perform well in the RTR where they were required to apply T3S
in the RTR, and even 6-year-olds were not adult-like. The developmental patterns were clear.
Unlike 4-year-olds, 6-year-olds have all the T3S patterns adults have, although their frequency of
the T3S patterns sometimes differed from that of adults. By comparing the frequency of T3S
patterns across age groups, a clear trend shows that 6-year-olds are approximating their
preference of T3S patterns to that of adults. In fact, we found that the frequency of multiple T3S
patterns of 6-year-olds and adults was strikingly similar.
In NSR, the length of sentence (4w4σ vs. 4w6σ) had an effect on 4- and 6-year-olds, but in
RTR, such effect no longer existed. Regarding complexity that involves the number of adjacent
T3*, again, we see an effect (both child groups did better in the three-T3 items than in the six-T3
items) in NSR, but not in RTR. An interpretation is that it is due to the task— the task in the
RTR is much harder than in NSR.

282

Although the ternary pattern appears in both the subject-NP and the subject-pro sentences,
we found a distinction between them— a monosyllabic subject NP tends to be parsed in its own
domain, that is, being a degenerate foot, and maintain the subject-predicate boundary whereas a
subject pro does not tend to. This provides a strong piece of evidence that a monosyllabic NP and
a monosyllabic pronoun are prosodically different, hence, are dealt with differently in the footbuilding process. The patterns and the frequency of the patterns that 6-year-olds produced mirror
those of adults, indicating that they were aware of the distinction in the subject NP and subject
pro, which 4-year-olds did not seem to be aware of.
Finally, the ternary pattern used in the 6-syllable experimental sentences always separate the
verb and the object as shown in ST1 and ST2 in (4).
(4)

[Ma
horse
3
3
3
(2

[xiang [deng
want wait for
3
3
3
3
3
3
3)
3

[xiao
small
3
3
(3
(3

(2
2
3)
(3
(2
2
2)
(3
Additional patterns attested:
(3)
(2
3)
(3
(3)
(2
2)
(3

[mayi]]]]]
ant
33
(23)
23)
23)
23)
23)
23)
23)

‘Horse wants to wait for Small Ant.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right,
T3S
Phrase: incorporation, T3S; ST1
Phrase: optional T3S across domains; ST2
subject-predicate pattern; ST3
subject-predicate pattern, optional T3S across
domains; ST4

In (4), T3S first applies to the NP cyclically at the Word level, which did not appear to be
difficult for children. Then, at the Phrase level, T3S applies non-cyclically, followed by
incorporation of the unparsed syllable. Such normal derivations give ST1, and if T3S applies
across domains, we have ST2. In both cases, the verb and the object are kept separated. The verb
and the object are in a syntactic constituent, but in ST1 and ST2, the verb and the object are in
two different prosodic domains.
283

For (4), ST3 and ST4 were attested in children and adults (approximately 25% in adults and
6-year-olds, and below 10% in 4-year-olds) in RTR. It was very telling that while the percentage
of the subject-predicate pattern is fairly high in adults in the subject-NP sentences, the same
pattern was used at a low frequency in the subject-pro sentences in adults and 6-year-olds (and it
was never used by children or adults in NSR.) The contrast in the prosodic property of
monosyllabic NPs and monosyllabic pronouns is apparent. While a monosyllabic NP can stand
alone, a monosyllabic pronoun does not tend to stand alone easily because as a clitic, it has to
cliticize to a host. Although the distinction of subject NPs and subject pronouns has not been
directly addressed in the T3S literature, in the results from this study, 6-year-olds were aware of
the prosodic differences between NPs and pronouns. The distinction of NPs and pronouns in the
application of T3S shown in Study 4 and Study 5 demonstrates a previously unnoticed area that
children attend to in their acquisition of T3S.
The results from these experimental studies (Study 2 – Study 5) strongly indicate that
although children have some ability to use T3S early on and know to change a T3 to a T2 when
followed by another T3, the intricacies of the T3S application develop with time and even the
oldest age group, 6-year-olds, are not adult-like.
8.3 Future research
In Study 4 Natural Speech, in spite of its reduced scope and few instances of spontaneous
contexts for T3S application, sentences with a subject pronoun produced by children and adults
that show a subject-predicate pattern deserves a more careful study.
With respect to the prediction made for flat structures (binary parsing, followed by
incorporation of an unparsed syllable, if applicable; directionality is left to right), a longer
sequence of digits can be tested in the future if it is appropriate to the age range of the
284

participants. To test directionality, longer odd-number sequences could be used (such as 7, 9, or
11 digits). To test whether or not grouping of three or even four syllables are indeed the
alternatives (i.e. binary parsing is not the only strategy), testing multiples of three or four which
are also multiples of two will be crucial. For instance, a sequence of six is two feet if groups of
three is used (XXX)(XXX) and three feet if grouping of two is used (XX)(XX)(XX). Caution
should be made as to what digits are used. In Study 2 Flat Structure, we used identical digits in
the same sequence. The “formation” of digits will most likely affect how the digits are parsed,
such as 595959 is possibly more likely to be parsed as (59)(59)(59) than (595)(959), and for the
same reason, 555999 can be easily parsed as (555)(999) instead of (55)(59)(99).
Another area of interest is how flat structures interact with non-flat structures. By
manipulation where flat structures are embedded in sentences, we can further test that.
With respect to compound nouns, we tested four-syllable NPs in the R-branching and Mbranching structures. Future work can investigate, for instance, five-syllable NPs with various
structures such as [σ[σ[σ[σσ]]]], [[σσ][[σσ]σ]] and [[[σσ]σ][σσ]] (predicting T3T2T3T2T3,
T2T3T2T2T3, and T2T2T3T2T3 respectively). Nevertheless, structures with such complexity
are very likely to be beyond children’s (under age six) capacity as we saw that the four-syllable
M-branching NPs were already very difficult for children (age 3-6) in this study.
In the two studies NSR and RTR, we tested four- and six-syllable sentences, and in a limited
number of structures with more experimental items per condition. Future research can extend the
length of sentence, and also test other syntactic structures. We tested the sentences with different
number of adjacent T3*— zero, three, and all (four or six). Future studies can manipulate the
number of adjacent T3*. If it is age appropriate, embedding flat structures in sentences will be
interesting as we know very little about how T3S in flat structures interact with the neighboring

285

syllables in sentences. The structures and the degree of complexity is less constrained if the
experiments are intended for adult participants, but with child participants, factors such as length
of sentence and structural complexity need to be carefully considered.
All the studies presented in this dissertation were carried out in Taiwan; future studies can
replicate and investigate to see whether or not the results are similar in other regions where
Mandarin is spoken. If not, cross-regional results can be compared and the factors that may
account for regional difference can be further investigated.
Finally, although both T3S models (see Chapter 2) basically have the same coverage for T3S
patterns, there are patterns that were used by participants of these five studies that require further
investigation. It is not clear whether the Word-and-Phrase level Model or the Stress-foot Model
can better account for the empirical data. In accounting for T3S variation, both models have two
optional rules. In the Word-and-Phrase level Model, T3S is optional between domains, and in
fast speech, a larger domain is formed within which T3S applies. In the Stress-foot Model, T3S
is optional between cyclic branches, and T3S is optional before a T2 that is derived from a T3. A
crucial question to ask is, will the optional rules in both models over-generate T3S patterns? T3S
theories can be worked toward how to formalize T3S variability as well as variability due to
regional or social difference if there is any. An initial attempt of an Optimality Theory (Prince &
Smolensky 1993/2004) analysis of T3S variability by adopting Coetzee’s (2006) model has been
made (Wang & Lin 2011), which hopefully will arouse the interest in analyzing T3S variation in
the OT framework. It is my hope that the dissertation of children’s acquisition of T3S will
inspire many more researchers to conduct studies on children’s acquisition of T3S.

286

Appendix A
Study 1 Possible frozen chucks (The lexical items which have a sequence of T3T3.)
Table 4.11 List of possible frozen chucks and number of tokens produced by each participant
Lexical
Adult CH LI Adult IU Adult ES Adult GK Adult BR ER
items
CZ
4;5 4;6
LU
4;6
CL
5;5
EE
5;9
TT
6;6 6;6
biaoyan
0
0
0
0
0
0
0
0
0
4
0
0
‘perform’
41

bu-ru
‘breastfeeding’

0

0

0

0

0

0

0

0

0

6

0

2

keyi
‘can’ (aux)

20

2

4

9

1

9

3

6

10

11

3

2

laoban
‘boss; store
owner’

0

0

0

1

0

0

0

2

0

0

0

0

laohu
‘tiger’

2

0

1

7

4

6

1

0

0

2

2

2

laoshu
‘mouse’

0

0

0

0

0

0

1

0

0

0

0

0

liaojie
‘realize’

0

0

0

0

0

0

0

1

0

0

0

0

nali
‘where’

13

1

4

12

9

6

4

18

0

5

0

2

Qiaohu
‘name of a
tiger’

0

0

0

0

0

0

1

0

0

0

0

0

suoyi
‘so’

4

0

1

0

0

1

3

1

1

1

0

0

suoyou
‘all’

0

0

0

0

0

4

0

0

0

1

0

0

41

Buru (T3T3) ‘breast-feeding’ and dongwu (T4T4) ‘animals’ (lit. breast-feeding animals)
together mean ‘mammals.’
287

Table 4.11 (cont’d)
Lexical
Adult
items
CZ
xizao
0
‘bathe; take
a shower’

CH
4;5
0

LI
4;6
0

Adult
LU
0

IU
4;6
0

Adult
CL
1

ES
5;5
2

Adult
EE
0

GK
5;9
0

Adult
TT
0

BR
6;6
0

ER
6;6
0

xiaogou
‘doggie’

1

0

0

0

0

1

0

0

0

0

0

0

yongyuan
‘forever’

0

0

0

0

0

0

1

0

0

0

0

0

zhiyou
‘only’

2

0

0

0

0

0

0

0

0

0

0

0

288

Appendix B
Study 2 Experimental materials
Narration: 我們要玩一個遊戲。
“這是什麼(指著電腦螢幕上的數字)?”（小孩回答後，繼
續說）
“你可以把手手伸出來，跟我一樣嗎(示範把一隻手伸出來,五指伸直)?”（輕輕地
慢慢地幫小孩把三根手指彎下去,剩下兩根手指伸直）然後說，
“我點你的手指的時候，你
就說這個數字(指著電腦螢幕上的數字)，好不好？”
The experimenter: We are going to play a game. “What’s this? (pointing to the digit on the
screen)” After the child gives the answer, ask the child, “Could you hold out one hand just like
me? (The experimenter shows the child by holding out a hand, with five fingers up straight)”
(Then the experimenter slowly and gently bends down three of the child’s fingers, leaving two
up. “You say the digit (pointing to the digit on the screen) when I tap your fingers, okay?”
“你可以把手手伸出來，跟我一樣嗎(示範把一隻手伸出來)?”（幫小孩把兩根手指彎下
去）然後說，
“我點你的手指的時候，你就說這個數字，好不好？”
“Could you hold out one hand just like me? (The experimenter shows the child by holding out a
hand, with five fingers up straight)” (Then the experimenter slowly and gently bends down two
of the child’s fingers, leaving three up. “You say the digit (pointing to the digit on the screen)
when I tap your fingers, okay?”
“你可以把手手伸出來，跟我一樣嗎(示範把一隻手伸出來)?”然後說，
“我點你的手指的
時候，你就說這個數字，準備好了嗎？”
“Could you hold out one hand just like me? (The experimenter shows the child by holding out a
hand, with five fingers up straight). You say the digit (pointing to the digit on the screen) when I
tap your fingers. Are you ready?”
(Repeated) Figure 5.1 Flat structures: A child’s hand, (a) – (c) for two, three, and five digits
respectively
a. for two digits
b. for three digits
c. for five digits

289

Figure 5.8 Study 2: List of materials
a. practice: a non-T3 digit “4” (Tone 4)
i) two times
si
si
four four

4

‘four-four’

ii) three times
si
si
four four

si
four

iii) five times
si
si
four four

si
four

‘four-four-four’

si
four

si
four

b. control item: a non-T3 digit “2” (Tone 4)
i) two times
er
er
two
two

2

ii) three times
er
er
two
two

er
two

c. test item: a T3 digit “5”
i) two times
wu
five

5

‘two-two’

er
two

iii) five times
er
er
two
two

‘four-four-four-four-four’

‘two-two-two’

er
two

er
two

wu
five

‘two-two-two-two-two’

‘five-five’

ii) three times
wu
wu
five five

wu
five

iii) five times
wu
wu
five five

wu
five

‘five-five-five’

wu
five

290

wu
five

‘five-five-five-five-five’

Figure 5.8 (cont’d)
d. control item: a non-T3 digit “3” (Tone 1)
i) two times
san
san
three three

3

ii) three times
san
san
three three

san
three

iii) five times
san
san
three three

san
three

b. test item: a T3-digit “9”
i) two times
jiu
nine

9

‘three-three’

‘three-three-three’

san
three

san
three

jiu
nine

‘three-three-three-three-three’

‘nine-nine’

ii) three times
jiu
jiu
nine nine

jiu
nine

iii) five times
jiu
jiu
nine nine

jiu
nine

‘nine-nine-nine’

jiu
nine

291

jiu
nine

‘nine-nine-nine-nine-nine’

Appendix C
Study 3 List of test and control items
A. Three-syllable compounds
Table 6.14 Study 3: List of tests and controls in three-syllable compounds
Test (T) or Materials
Underlying Surface Syntactic
Control (C)
tones
tones
structure
1
C
113
Same
[[σσ]σ]
公雞 筆
gongji bi
cock pen
2
T
333
223
[[σσ]σ]
老鼠 筆
laoshu bi
mouse pen
3
C
313
Same
[[σσ]σ]
餅乾 鳥
binggan niao
cookie bird
4
T
333
223
[[σσ]σ]
水果 鳥

Translation
Cock-pen

Mouse-pen

Cookie-bird

Fruit-bird

shuiguo niao
fruit
bird
5

C

6

T

7

C

8

T

水
shui
water
水
shui
water
紙
zhi
paper
紙
zhi
paper

大象
daxiang
elephant
老虎
laohu
tiger
犀牛
xiniu
rhino
海馬
haima
seahorse

344

Same

[σ[σσ]]

Water-elephant

333

323

[σ[σσ]]

Water-tiger

312

Same

[σ[σσ]]

Paper-rhino

333

323

[σ[σσ]]

Paper-seahorse

292

B. Four-syllable NPs
Table 6.15 Study 3: List of tests and controls in four-syllable NPs
Test (T)/ Materials
Under- Surface Syntactic
Control
lying
tones
structure
(C)
tones
1

C

2

C

3

T

4

T

5

C

6

C

7

T

8

T

Translation

3222

Same

[σ[σ[σσ]]]

Small red sheep

3222

Same

[σ[σ[σσ]]]

Small blue sheep

3333

2323

[σ[σ[σσ]]]

Small purple
seahorse

3333

2323

[σ[σ[σσ]]]

Small purple
umbrella

3224
小 長鼻
象
xiao chang bi
xiang
small long-trunked elephant

Same

[σ[[σσ]σ]]

Small longtrunked elephant

4431

Same

[σ[[σσ]σ]]

Green big-eyed
frog

3333

3223

[σ[[σσ]σ]]

Small shortlegged horse

3333

3223

[σ[[σσ]σ]]

Short small-eyed
bird

小 红
绵羊
xiao hong mianyang
small red
sheep
小 藍
xiao lan
small blue

綿羊
mianyang
sheep

小 紫
海馬
xiao zi
haima
small purple seahorse
小 紫
雨傘
xiao zi
yusan
small purple umbrella

綠 大眼
lü da yan
green big-eyed

蛙
wa
frog

小 短腿
馬
xiao duan tui
ma
small short-legged horse
矮 小眼
鳥
ai xiao yan
niao
short small-eyed bird

293

Appendix D
Study 3 Experimental materials
A. Three-syllable Compounds:
In Figure 6.7, target answers for the test items are in bold type, and target answers for the control items are underlined.
Figure 6.7 Study 3: Experimental materials for three-syllable compounds
Pictures (shown to subjects on a laptop
Scripts (for the experimenter)
computer)
1.
你看！這是狐狸 （指著狐狸）。這支
筆長得像狐狸 (huli T2T2 ‘fox’)，我們
叫它‘狐狸筆’(hulibi T2T2T3 ‘fox-pen’,
UT=ST)。
Look at this pen. This is a fox (pointing
to fox). The pen looks like a fox. We
call it a ‘fox-pen.’

2.

這是什麼？(指著筆上方的動物)
What is this? (pointing to the animal at
the top of the pen)
這支筆長得像公雞，那我們叫它什
麼？
This pen looks like a cock, so what do
we call it?

294

Target answers
(Modeling)

gongji
cock
3 1

‘cock’
UT=ST

gongji bi
cock pen
3 1 3

‘cock-pen’
UT=ST

Figure 6.7 (cont’d)
3.

這是什麼(指著筆上方的動物)?
What is this (pointing to the animal at
the top of the pen)?

laoshu
mouse
3 3
2 3

laoshu bi
對，這支筆長得像老鼠，那我們叫
mouse pen
它…？
The pen looks like a mouse, so we call it 3 3 3
2 2 3
a…
4.

5.

你看這隻鳥 (niao T3 ‘bird’)。他看到
蛋糕好高興。我們叫他‘蛋糕鳥’
(dangao niao T4T1T3 ‘cake-bird’,
UT=ST)。
Look at this bird. He’s so happy to see
the cake. Let’s call it a ‘cake-bird.’

‘mouse’
UT
ST
‘mouse-pen’
UT
ST

(Modeling)

這是什麼？(指著餅乾)
What are these? (pointing to the
cookies)

binggan
cookie
3 1

‘cookies’
UT=ST

他好喜歡餅乾，那我們叫他什麼?
He loves cookies, so what do we call it?

binggan niao
cookie bird
3 1 3

‘cookie-bird’
UT=ST

295

Figure 6.7 (cont’d)
6.

他好喜歡水果，那我們叫他什麼?
He loves fruit, so we call it a…

7.

8.

這是什麼？
What’s this?

這是什麼？(小孩回答：猴子 houzi
T2T0 ‘monkey’)
What’s this? (Child answers: (A)
monkey.)
對，這隻猴子很特別，他喜歡住在水
裡。我們叫他‘水猴子’(shui houzi
T3T2T0 water-monkey, UT=ST)。
Yes. This monkey is very special. He
loves to live in the water. Let’s call it a
‘water-monkey.’

296

shuiguo
fruit
3 3
2 3

‘fruit’
UT
ST

shuiguo niao
fruit
bird
3 3 3
2 2 3

‘fruit-bird’
UT
ST

houzi
monkey
‘(a) monkey’
2 0

這是什麼？(指著水果）
What is this? (pointing to the fruit)

UT=ST

(Modeling)

Figure 6.7 (cont’d)
9.

11.

daxiang
elephant
44

‘elephant’
UT=ST

這是什麼？
What’s this?

daxiang
elephant
4 4

‘elephant’
UT=ST

這隻大象也很喜歡住在水裡。我們叫
他什麼？
This elephant also loves to live in the
water. What do we call it?

10.

這是什麼？
What’s this?

shui daxiang
water elephant
3 4 4

‘water-elephant’
UT=ST

laohu
tiger
3 3
2 3

‘(a) tiger’
UT
ST

這是什麼？
What’s this?

297

Figure 6.7 (cont’d)
12.

laohu
tiger
3 3
2 3

這是什麼？
What this?

這隻老虎也很喜歡住在水裡。所以我
們叫他…？
This tiger loves to live in the water, too,
so we call it…

13.

14.

‘(a) tiger’
UT
ST

shui laohu
water tiger
3 3 3
3 2 3

‘water-tiger’
UT
ST

(Modeling)
你看這隻大象(daxiang T4T4
‘elephant’)，他是紙做的。我們叫他
‘紙大象’(zhi daxiang T3T4T4 paperelephant, UT=ST)。
Look at this elephant. It’s made of
paper. Let’s call it a ‘paper-elephant.’

xiniu
rhino
1 2

這是什麼？
What’s this?

他也是紙做的。我們叫他什麼？
It’s also made of paper. What do we call
it?

298

‘rhino’
UT=ST

zhi xiniu
paper rhino
3 1 2

‘paper-rhino’
UT=ST

Figure 6.7 (cont’d)
15.

haima
seahorse
3 3
2 3

這是什麼？
What’s this?

他也是紙做的。我們叫他…？
It’s also made of paper, too, so it’s a …

‘(a) seahorse’
UT
ST

zhi haima
paper seahorse
3 3 3
3 2 3

‘paper-seahorse’
UT
ST

B. Four-syllable NPs
In this experiment, as there are layers in the syntactic structure, one layer is elicited at a time. The final target answer is elicited
through multiple pictures.
In Figure 6.8, target answers for the test items are in bold type, and target answers for the control items are underlined.
Figure 6.8 Study 3: Experimental materials for four-syllable NPs
Pictures (shown to subjects on a laptop Scripts (for the experimenter)
computer)
1.
這是什麼？
What’s this?

299

Target answers
haima
seahorse
3 3
2 3

‘(a) seahorse’
UT
ST

Figure 6.8 (cont’d)
2.

3.

這是什麼？
What’s this?
(海馬 haima ‘(a) seahorse’
T3T3 T2T3)
他是綠色的。
The color is green.
我們叫他‘綠海馬’
(lühaimaT4T3T3 T4T2T3 ‘green
seahorse’)。
Let’s call it a green seahorse.

(Modeling)

haima
seahorse
3 3
2 3

這是什麼？
What’s this?

是什麼顏色？
What color is it?

我們叫他什麼？
What do we call it?

300

‘(a) seahorse’
UT
ST

zise
purple
3 4

‘purple’
UT=ST

zi
haima
purple seahorse ‘(a) purple seahorse’
3 3 3
UT
3 2 3
ST

Figure 6.8 (cont’d)
4.

5.

你看這兩隻海馬，一隻大的，一隻小
的。
Look at these two seahorses. One is big,
the other is small.
我們叫這隻大隻的（指著大隻的）
‘大綠海馬’ (da lü haima T4 T4
T3T3 T4 T4 T2T3 ‘(a) big green
seahorse’)；這隻小隻的‘小綠海馬’
(xiao lü haima T3 T4 T3T3 T3 T4
T2T3 ‘(a) small green seahorse’)。
We call this one (pointing to the big
one) a ‘big green seahorse, and this one
(pointing to the small one) a ‘small
green seahorse.’
現在你看這兩隻。大隻的是大紫海馬
(da zi haima T4 T3 T3T3 T4 T3
T2T3。小的呢（指著小隻的）?
Now look at these two. The big one is a
“big purple seahorse.” What about this
one (pointing to the small one)?

6.

這是什麼？
What are these?
他們的顏色好奇怪，對不對？
They have strange colors, don’t they?

301

(Modeling)

xiao zi
haima
small purple seahorse
‘(a) purple seahorse’
small purple seahorse
3
3
33
2
3
23
mianyang
sheep
‘sheep’
2 2
UT=ST

UT
ST

Figure 6.8 (cont’d)
7.

8.

9.

這隻綿羊是紅的，我們叫她…
The sheep is red, we call it a …

hong mianyang
red sheep ‘(a) red sheep’
2
22
UT=ST

這隻是藍的，我們叫他…
This one is blue, so we call it a …

lan
blue
2

你看這兩隻綿羊，一隻大的，一隻小
的。
Look at these two sheep. One is big, the
other is small.
我們叫這隻大隻的（指著大隻的）
‘大紅綿羊’ (da hong mianyang T4
T2 T2T2 ‘(a) big red sheep’)；這隻小
隻的呢（指著小隻的）?
We call this one (pointing to the big
one) a “big red sheep.”
What about this one? (pointing to the
small one)
302

mianyang
sheep ‘(a) blue sheep’
22
UT=ST

(Modeling)

xiao hong mianyang
small red sheep ‘(a) small red sheep’
3
2
22
UT=ST

Figure 6.8 (cont’d)
10.

你看這兩隻綿羊。我們叫這隻（指著
大隻的）(da lan mianyang T4 T2 T2T2
xiao lan mianyang
‘(a) big blue sheep’); 這隻呢（指著小
small blue sheep ‘(a) small blue sheep’
隻的）？
3
2
22
UT=ST
Look at these two sheep. We call this
one (pointing to the big one) a “big blue
sheep.” What about this one? (pointing
to the small one)

11.

12.

yusan
umbrella
3 3
2 3

這是什麼？
What’s this?

(Modeling)

這是什麼？
What’s this?
這是藍色的，我們叫它‘藍雨傘’。
The color is blue. Let’s call it a blue
umbrella

303

‘umbrella’
UT
ST

Figure 6.8 (cont’d)
13.

yusan
umbrella
3 3
2 3

這是什麼？
What’s this?

是什麼顏色？
What color is it?

我們叫他什麼？
What do we call it?
14.

你看這兩支雨傘，一支大的，一支小
的。我們叫這支大的（指著大的）
‘大藍雨傘’ (da lan yusan T4 T2
T3T3 T4 T2 T2T3 ‘(a) big blue
umbrella’)；這支小的‘小藍雨傘’
(xiao lan yusan T3 T2 T3T3 T3 T2
T2T3 ‘(a) small blue umbrella’)。
Look at these two umbrellas. One is big,
the other is small. We call this one
(pointing to the big one) a ‘big blue
umbrella, and this one (pointing to the
small one) a ‘small blue umbrella.’

304

‘(an) umbrella’
UT
ST

zise
purple
3 4

‘purple’
UT=ST

zi
yusan
purple umbrella ‘(a) purple umbrella’
3 3 3
UT
3 2 3
ST
(Modeling)

Figure 6.8 (cont’d)
15.

現在你看這兩支。大的是大紫雨傘
(da zi yusan T4 T3 T3T3 T4 T3
T2T3。小的呢(指著小支的)?
Now look at these two. The big one is a
“big purple umbrella.” What about this
one (pointing to the small one)?

16.

17.

18.

xiao zi
yusan
small purple seahorse
‘(a) purple seahorse’
small purple seahorse
3
3
33
2
3
23
ma
horse
3
3

這是什麼？
What’s this?

你看這隻青蛙。他的腿好長哦（指著
青蛙長長的後腿）。我們叫他‘長腿
蛙’ (chang tui wa T2 T3 T1 UT=ST
‘(a) long-legged frog’)。
Look at this frog. His legs (pointing to
the long hind leg) are so long. Let’s call
it a “long-legged frog.”
305

‘horse’
UT
ST

qingwa
frog
11
11

這是什麼？
What’s this?

‘frog’
UT
ST

(Modeling)

UT
ST

Figure 6.8 (cont’d)
19.

ma
horse
3
3

這是什麼？
What’s this?

他的腿也這麼長（指著馬的長腿），
我們要叫他什麼？
His legs are very long (pointing to the
horse’s long legs), too. What do we call
it?

20.

‘horse’
UT
ST

chang tui
ma
long leg
horse
‘(a) long-legged horse’
long
2
2

leg
3
2

ma
horse
3
3

這是什麼？
What’s this?

他的腿這麼短（指著馬的長腿），我
們要叫他什麼？
His legs are so short (pointing to the
horse’s short legs), too. What do we call
it?

306

horse
3
3

UT
ST

‘horse’
UT
ST

duan tui
ma
short leg
horse
‘(a) long-legged horse’
short
3
2

leg
3
2

horse
3
3

UT
ST

Figure 6.8 (cont’d)
21.

22.

你看這兩隻青蛙，一隻大的，一隻小
的。我們叫這隻大的（指著大的）
‘大長腿蛙’ ( da chang tui wa T4 T2
T3 T1 UT=ST ‘(a) big long-legged
frog’)；這隻小的‘小長腿蛙’ (xiao
chang tui wa T3 T2 T3 T1 UT=ST, ‘(a)
small long-legged frog’)。
Look at these two frogs. One is big and
the other is small. We call this one
(pointing to the big one) a “big longlegged frog” and this one (pointing to
the small one) a “small long-legged
frog.”
現在你看這兩隻。大的是大長腿馬
(da chang tui ma T4 T2 T3T3 T4 T2
T2 T3 ‘(a) big long-legged horse’)。小
的呢(指著小隻的)?
Now look at these two. The big one is a
“big long-legged horse.” What about
this one (pointing to the small one)?

307

(Modeling)

xiao chang tui
ma
small long leg
horse
‘(a) small long-legged horse’
small long
3
2
3
2

leg
3
2

horse
3
3

UT
ST

Figure 6.8 (cont’d)
23.

現在你看這兩隻。大的是大短腿馬
(da duan tui ma T4 T3 T3T3 T4 T2
T2 T3 ‘(a) small short-legged horse’)。
小的呢(指著小隻的)?
Look at these two. This big one is called
a “big short-legged horse.” What about
this one? (pointing to the small one)

24.

xiao duan tui
ma
small short leg
horse
‘(a) small short-legged horse’
leg
3
2

daxiang
elephant
44
44

這是什麼？
What’s this?

small short
3
3
3
2

‘elephant’
UT
ST

(Modeling)
你看這隻大象。他的鼻子好長哦（指
著大象長長的鼻子）。我們叫他‘長
鼻象’。 (chang bi xiang T2 T2 T4
UT=ST ‘(a) long-trunked elephant’)。
Look at this elephant. His trunk
(pointing to the long trunk) is so long.
Let’s call it a “long-trunked elephant.”

308

horse
3
3

UT
ST

Figure 6.8 (cont’d)
25.

daxiang
elephant
44
44

這是什麼？
What’s this?

他的鼻子這麼短（指著大象很短的鼻
子）。我們要叫他什麼？
His trunk is so short (point at the
elephant’s short trunk). What do we call
it?
26.

現在你看這兩隻。大的是大長鼻象
(da chang bi xiang T4 T2 T2 T4 ‘(a)
big long-trunked elephant’)。小的呢
(指著小隻的)?
Look at these two elephants. The big
one is a “big long-trunked elephant.”
What about this one (pointing to the
small one)?

27.

現在你看這兩隻。大的是大短鼻象
(da duan bi xiang T4 T3 T2 T4 ‘(a) big
short-trunked elephant’)。小的呢(指著
小隻的)?

Look at these two elephants. The big
one is “big short-trunked elephant.”
What about the small one?

309

‘elephant’
UT
ST

duan bi
xiang
short trunk elephant
‘(a) short-trunked elephant’
short
3

trunk elephant
2
4

UT=ST

xiao chang bi
xiang
small long trunk elephant
‘(a) small long-trunked elephant’
small long
3
2

trunk elephant
2
4
UT=ST

xiao duan bi
xiang
small short trunk elephant
‘(a) small long-trunked elephant’
small short
3
3
2
3

trunk elephant
2
4
UT
2
4
UT

Figure 6.8 (cont’d)
28.

這是什麼？（指著眼睛）
What are these? (pointing at the eyes)

yanjing
eyes
31

‘eyes’
UT=ST

(Modeling)
你看這隻青蛙。他的眼睛好大哦（指
著青蛙大大的眼睛）。我們叫他‘大
眼蛙’。 (da yan wa T4 T3 T1 UT=ST
‘(a) big-eyed frog’)。

29.

Look at this frog. His eyes are so big.
Let’s call him a “big-eyed frog.”
這是什麼？
What’s this?

他的眼睛好大哦（指著小鳥大大的眼
睛）。我們叫他什麼？

30.

His eyes are also very big (pointing at
the bird’s big eyes), so what do we call
it?
你看這隻小鳥。他的眼睛麼小（指著
鳥小小的眼睛），我們要叫他什麼？
Now look at this bird. His eyes are so
small (pointing at the bird’s small eyes).
What do we call it?

310

(xiao)niao
bird
(3)3
(2)3

‘(a) bird’
UT
ST

da
big
4
4

yan
eye
3
2

niao
bird
3
3

xiao
small
3
2

yan
eye
3
2

niao
bird ‘(a) small-eyed bird’
3 UT
3 ST

‘(a) big-eyed bird’
UT
ST

Figure 6.8 (cont’d)
31.

da
big
4
4

yan
eye
3
2

niao
bird
3
3

這一隻(指著小眼睛的那隻)是…
And this is (pointing to the small-eyed
one)…

32.

這兩隻鳥你再看一次。這一隻(指著
大眼睛的那隻) 是…
Let’s look the two birds one more time.
This is (pointing to the big-eyed one)…

xiao
small
3
2

yan
eye
3
2

niao
bird ‘(a) small-eyed bird’
3 UT
3
ST

你看這兩隻青蛙。一隻白的，一隻綠
的。他們的眼睛好大（指著青蛙大大
的眼睛），這隻白的，我們叫他‘白
大眼蛙’ (bai da yan wa T2 T4 T3 T1
UT=ST ‘(a) white big-eyed frog’)

‘(a) big-eyed bird’
UT
ST

(Modeling)

Look at these two frogs. A white one
and a green one. Their eyes are very big.
This one is white (pointing at the white
one). We call it a “white big-eyed frog.”
這隻綠的我們要叫他什麼（指著綠的
那隻）？
What about this green one (point at the
green one)?

lü
da
yan
wa
green big
eye
frog
‘(a) green big-eyed frog’
green big
4
4

311

eye
3

frog
1

UT=ST

Figure 6.8 (cont’d)
33.

你看這兩隻小鳥。他們的眼睛都好大
（指著小鳥大大的眼睛），一隻高
的，一隻矮的。這隻高的，我們叫他
‘高大眼鳥’ (gao da yan niao T1 T4
T3 T3 UT T1 T4 T2 T3 ST ‘(a) tall
big-eyed bird’)。矮的呢（指著矮的那
隻）？
Look at these two birds. Their eyes are
also very big eyes (pointing at the birds’
big eyes). One is tall, and the other is
short. Let’s call the tall one a “tall bigeyed bird.” What about this short one?

34.

你看這兩隻小鳥。他們的眼睛都好小
（指著小鳥大大的眼睛），一隻高
的，一隻矮的。這隻高的，我們叫他
‘高小眼鳥’ (gao xiao yan niao T1
T3 T3 T3 UT T1 T2 T2 T3 ST ‘(a)
tall small-eyed bird’)。矮的呢（指著
矮的那隻）？
Look at these two birds. They both have
very small eyes (pointing at the birds’
small eyes). One is tall, and the other is
short. Let’s call the tall one a “tall
small-eyed bird.” What about this short
one?

312

(Modeling)

ai
da
yan
niao
short big
eye
bird
‘(a) short big-eyed bird’
green big
3
3
3
4

eye
3
2

frog
3
3

UT
ST

(Modeling)

ai
xiao yan niao
short small eye
bird
‘(a) short small-eyed bird’
short
3
3

small eye
3
3
2
2

bird
3
3

UT
ST

Appendix E
Study 4 (NSR) Experimental Materials
Narration: 我們要玩一個遊戲。你看,這是機器人,這隻熊熊叫做小莉。機器人跟熊熊小莉
說話，熊熊小莉聽不到。你可以幫她聽機器人說什麼嗎？聽到以後告訴小莉，好不好？你
想玩這個遊戲嗎？
We are going to play a game. Look, this is a Robot, and this is Bear Xiaoli. Robot says
something to Bear Xiaoli, but Bear Xiaoli cannot hear her. Could you help her by listening to
what the Robot is saying? After you hear it, you tell Xiaoli, okay?
Figure 7.12 Study 4: List of materials
A1
Ni
you
3
2

xiang
want
3
2

mai
buy
3
3

hua
flower ‘You want to buy flowers.’
1
UT
1
ST used

Wo
I
3
2

xiang
want
3
2

mai
buy
3
2

bi
pens
3
3

Ta
he
1
1

xiang
want
3
3

kan shu
read book
4
1
4
1

A2
‘I want to buy pens.’
UT
ST used

A3

313

‘He wants to read books.’
UT
ST used

Figure 7.12 (cont’d)
A4
Ni
you
3
2

xiang
want
3
2

xi
wash
3
3

che
car ‘You want to wash (the) car.’
1 UT
1 ST used

Wo
I
3
2

xiang
want
3
2

xi
wash
3
2

ma
horse ‘I want to wash (the) horse.’
3
UT
3
ST

Ta
he
1
1

xiang
want
3
3

chang
sing
4
4

ge
song ‘He wants to sing.’
1
UT
1
ST used

A5

A6

B1
Banma xiang zhao
xiongmao
Zebra want look for panda bear
‘Zebra wants to look for Panda Bear.’
Zebra
13
12

want
3
2

look for panda bear
3
21
UT
3
21
ST used

B2
Haima xiang zhao
shuimu
seahorse want look for jelly fish
‘Seahorse wants to look for Jelly Fish.’
seahorse want
33
3
22
2
314

look for jelly fish
3
33
UT
3
23
ST used

Figure 7.12 (cont’d)
B3
Guoniu xiang bian
qingwa
snail
want become frog
‘Snail wants to become Frog.’
snail
12
12

want
3
3

become frog
4
11
4
11

UT
ST used

B4
Hema xiang deng
wugui
hippo want wait for turtle
‘Hippo wants to wait for the turtle.’
snail
23
22

want
3
2

become frog
3
11
3
11

UT
ST used

B5
Laoshu xiang deng
mayi
mouse want wait for ant
‘Mouse wants to wait for Ant.’
mouse want
33
3
22
2

wait for ant
3
33
3
23

UT
ST used

B6
Haibao xiang bian
jingyu
seal
want become whale
‘Seal wants to become Whale.’
seal
34
34

want
3
3

become whale
4
12
4
12

UT
ST used

C1
Wo xiang zhao
da
hema
I
want look for big hippo
‘I want to look for Big Hippo.’
I
3
2

want
3
2

315

look for big
3
4
3
4

hippo
23
UT
23
ST used

Figure 7.12 (cont’d)
C2
Ni xiang yang
xiao laohu
you want raise
small tiger
‘You want to have (raise) a small Tiger.’
you want
3
3
2
2

raise
3
2

small tiger
3
33
3
23

UT
ST used

C3
Ta xiang chuan
duan qunzi
she want wear
short skirt
‘She wants to wear a short skirt.’
she want
1
3
1
3

wear
1
1

short skirt
3
25
3
25

UT
ST used

C4
Wo xiang zhao
pang xiongmao
I
want look for fat
panda bear
‘I want to look for Fat Panda Bear.’
I
3
2

want
3
2

look for fat
3
4
3
4

panda bear
21
UT
21
ST used

C5
Ni xiang yang
xiao laoshu
you want raise
small mouse
‘You want to have (raise) a small mouse.’
you want
3
3
2
2

raise
3
2

small mouse
3
33
UT
3
23
ST used

C6
Ta xiang ting
hao yinyue
she want listen to good music
‘She wants to listen to good music.’
she want
1
3
1
3

316

listen to good music
1
3
14
UT
1
3
14
ST used

Figure 7.12 (cont’d)
D1
Gou xiang zhao
da
xingxing
dog want look for big gorilla
‘Dog wants to look for (the) big gorilla.’
dog
3
2

want look for big
3
3
4
2
3
4

gorilla
11
UT
11
ST used

D2
Ma xiang zhao
xiao haigou
horse want look for small fur-seal
‘Horse wants to look for (the) small fur-seal.’
horse want
3
3
2
2

look for small fur-seal
3
3
33
UT
2
3
23
ST used

D3
Niu xiang bian
lü
qingwa
bull want become green frog
‘Bull wants to become (a) green frog.’
bull want
2
3
2
3

become green frog
4
4
11
4
4
11

UT
ST used

D4
Ma xiang deng
da
wugui
horse want wait for big turtle
‘Horse wants to wait for Big Turtle.’
horse want wait for big
3
3
3
4
2
2
3
4

turtle
11
11

UT
ST used

D5
Gou xiang deng
xiao mayi
dog want wait for small ant
‘Dog wants to wait for (the) small ant.’
dog want
3
3
2
2

317

wait for small ant
3
3
33
2
3
23

UT
ST used

Figure 7.12 (cont’d)
D6
Zhu xiang bian
lan jingyu
pig want become blue whale
‘Pig wants to become (a) small fur-seal.’
pig
1
1

want
3
3

318

become blue whale
4
2
12
UT
4
2
12
ST used

Appendix F
Study 5 (RTR) Experimental materials
Narration: 我們要玩一個遊戲，這個遊戲叫“機器人說話”。你看這個機器人。她說話怪
怪的，這隻熊熊都聽不懂。熊熊小莉只聽得懂小朋友說的話,她不懂恐龍說的話。你可以
幫她聽機器人說什麼嗎？聽到以後告訴小莉，好不好？你想玩這個遊戲嗎？
We are going to play a game called “Robot Talk (RT).” “Look at this Robot. She talks funny, and
the bear doesn’t understand a word she says. The bear Xiaoli understands Child Talk only, not the
Robot Talk. Can you help her? Listen to the Robot Talk, and then tell Xiaoli what she says, okay?
Do you want to play the game?”
Figure 7.13 Study 5: List of materials
A1
Ni
you
3

xiang mai
want buy
3
3

hua
flower ‘You want to buy flowers.’
42
1
UT=RT

Wo
I
3

xiang mai
want buy
3
3

bi
pens
3

Ta
he
1

xiang kan shu
want read book
3
4
1

A2
‘I want to buy pens.’
UT=RT

A3

42

RT= Robot Talk (the manipulated speech)
319

‘He wants to read books.’
UT=RT

Figure 7.13 (cont’d)
A4
Ni
you
3

xiang xi
che
want wash car ‘You want to wash (the) car.’
3
3
1 UT=RT

Wo
I
3

xiang xi
ma
want wash horse ‘I want to wash (the) horse.’
3
3
3
UT=RT

Ta
he
1

xiang chang ge
want sing song ‘He wants to sing.’
3
4
1
UT=RT

A5

A6

B1
Banma xiang zhao
xiongmao
Zebra want look for panda bear
‘Zebra wants to look for Panda Bear.’
Zebra
13

want
3

look for panda bear
3
21
UT=RT

B2
Haima xiang zhao
shuimu
seahorse want look for jelly fish
‘Seahorse wants to look for Jelly Fish.’
seahorse want
33
3

320

look for jelly fish
3
33
UT=RT

Figure 7.13 (cont’d)
B3
Guoniu xiang bian
qingwa
snail
want become frog
‘Snail wants to become Frog.’
snail
12

want
3

become frog
4
11

UT=RT

B4
Hema xiang deng
wugui
hippo want wait for turtle
‘Hippo wants to wait for the turtle.’
snail
23

want
3

become frog
3
11

UT=RT

B5
Laoshu xiang deng
mayi
mouse want wait for ant
‘Mouse wants to wait for Ant.’
mouse want
33
3

wait for ant
3
33

UT=RT

B6
Haibao xiang bian
jingyu
seal
want become whale
‘Seal wants to become Whale.’
seal
34

want
3

become whale
4
12

UT=RT

C1
Wo xiang zhao
da
hema
I
want look for big hippo
‘I want to look for Big Hippo.’
I
3

want
3

321

look for big
3
4

hippo
23
UT=RT

Figure 7.13 (cont’d)
C2
Ni xiang yang
xiao laohu
you want raise
small tiger
‘You want to have (raise) a small Tiger.’
you want
3
3

raise
3

small tiger
3
33

UT=RT

C3
Ta xiang chuan
duan qunzi
she want wear
short skirt
‘She wants to wear a short skirt.’
she want
1
3

wear
1

short skirt
3
25

UT=RT

C4
Wo xiang zhao
pang xiongmao
I
want look for fat
panda bear
‘I want to look for Fat Panda Bear.’
I
3

want
3

look for fat
3
4

panda bear
21
UT=RT

C5
Ni xiang yang
xiao laoshu
you want raise
small mouse
‘You want to have (raise) a small mouse.’
you want
3
3

raise
3

small mouse
3
33
UT=RT

C6
Ta xiang ting
hao yinyue
she want listen to good music
‘She wants to listen to good music.’
she want
1
3

322

listen to good music
1
3
14
UT=RT

Figure 7.13 (cont’d)
D1
Gou xiang zhao
da
xingxing
dog want look for big gorilla
‘Dog wants to look for (the) big gorilla.’
dog
3

want look for big
3
3
4

gorilla
11
UT=RT

D2
Ma xiang zhao
xiao haigou
horse want look for small fur-seal
‘Horse wants to look for (the) small fur-seal.’
horse want
3
3

look for small fur-seal
3
3
33
UT=RT

D3
Niu xiang bian
lü
qingwa
bull want become green frog
‘Bull wants to become (a) green frog.’
bull want
2
3

become green frog
4
4
11

UT=RT

D4
Ma xiang deng
da
wugui
horse want wait for big turtle
‘Horse wants to wait for Big Turtle.’
horse want wait for big
3
3
3
4

turtle
11

UT=RT

D5
Gou xiang deng
xiao mayi
dog want wait for small ant
‘Dog wants to wait for (the) small ant.’
dog want
3
3

323

wait for small ant
3
3
33

UT=RT

Figure 7.13 (cont’d)
D6
Zhu xiang bian
lan jingyu
pig want become blue whale
‘Pig wants to become (a) small fur-seal.’
pig
1

want
3

324

become blue whale
4
2
12
UT=RT

Appendix G
Predicted surface patterns for test items in Study 4 NSR and Study 5 RTR
1.
a.

Four syllables, four words
Three T3*
[Ni
[xiang [mai [hua]]]]
you
want buy flower ‘You want to buy flowers.’
3
3
3
1
UT
3
3
3
1
Word: no T3S
3
3
(3
1)
Phrase: disyllabic foot for the smallest domain, no T3S
(2
3)
(3
1)
Phrase: Disyllabic foot for the remaining syllables,
T3S; ST1
(2
2
3
1)
Larger domain in fast speech; ST2

b.

All T3*
[Wo
I
3
3
3
(2

[xiang
want
3
3
(2
3)

[mai
buy
3
3
3)
(2

(2

2

2

2.
a.

b.

[bi]]]]
pen
‘I want to buy pens.’
3
UT
3
Word: no T3S
Phrase: disyllabic foot for the smallest domain, T3S
3)
Phrase: Disyllabic foot for the remaining syllables,
T3S; ST1
3)
Larger domain in fast speech; ST2

Six syllables, four words
Three T3*
[[banma] [xiang [zhao
zebra
want
look for
13
3
3
(13)
3
3
(13)
(2
3)
(12
2
3)

[xiongmao]]]]
panda bear
21
(21)
(21)
(21)

‘Zebra wants to look for Panda bear.’
UT
Word: two disyllabic feet, T3S
Phrase: disyllabic foot, T3S; ST1
Larger domain in fast speech, T3S; ST2

All T3*
[[haima] [xiang [zhao
seahorse want
look for
33
3
3
(23)
3
3
(23)
(2
3)
(22
2
3)

[shuimu]]]]
jellyfish
33
(23)
(23)
(23)

‘Seahorse wants to look for Jellyfish.’
UT
Word: two disyllabic feet, T3S
Phrase: disyllabic foot, T3S; ST1
Larger domain in fast speech, T3S; ST2

325

3.
a.

i.

ii.

b.

i.

ii.

Six syllables, subject pronoun
Three T3*
[Wo
[xiang [zhao
[da
I
want look for big
3
3
3
4
3
3
3
4
3
3
3
(4
(2
3)
3
(4

[hema]]]]]
hippo
23
(23)
23)
23)

‘I want to look for (a) big hippo.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right, T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(4 23)
Phrase: Incorporation, T3S; ST1
Rightward incorporation:
(2
3)
(3
(2
2)
(3
All T3*
[Ni
you
3
3
3
(2

[xiang
want
3
3
3
3)

[yang
raise
3
3
3
3

4
4

[xiao
small
3
3
(3
(3

23)
23)

Phrase: Incorporation, T3S; ST2
Optional: T3S across domains; ST3

[laohu]]]]]
tiger
33
(23)
23)
23)

‘You want to have/raise (a) small tiger.’
UT
Word: T3S
Word: Incorporation, no T3S
Phrase: Disyllabic foot from left to right, T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3 23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3 23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2
3
23)
Phrase: Incorporation, T3S; ST3

326

4.
a.

i.

ii.

b.

i.

ii.

Six syllables, subject noun
Three T3*
[Gou
[xiang [zhao
[da
Dog
want look for big
3
3
3
4
3
3
3
4
3
3
3
(4
(2
3)
3
(4

[xingxing]]]]]
gorilla
‘Dog wants to look for Gorilla.’
11
UT
(11)
Word: T3S
11)
Word: Incorporation, no T3S
11)
Phrase: Disyllabic foot from left to right, T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(4
11)
Phrase: Incorporation, T3S; ST1
Rightward incorporation:
(2
3)
(3
4
(2
2)
(3
4

All T3*
[Ma
horse
3
3
3
(2

11)
11)

Phrase: Incorporation, T3S; ST2
Optional: T3S across domains; ST3

[xiang [zhao
[xiao [haigou]]]]]
want
look for small fur-seal ‘Horse want to look for the small fur-seal.’
3
3
3
33
UT
3
3
3
(23)
Word: T3S
3
3
(3
23)
Word: Incorporation, no T3S
3)
3
(3
23)
Phrase: Disyllabic foot from left to right,
T3S

Directionality for the Incorporation in the next step:
Leftward incorporation:
(2
2
3)
(3
23)
Phrase: Incorporation, T3S; ST1
(2
2
2)
(3
23)
Optional: T3S across domains; ST2
Rightward incorporation:
(2
3)
(2

3

23)

327

Phrase: Incorporation, T3S; ST3

Appendix H
Statistics notes
Multinomial logistic regression and Odds Ratio (OR)
Multinomial logistic regression is a powerful model that can handle outcomes that are ordinal
(ordered categories) or nominal (unordered categories). In this model, given a set of independent
variables, we can generate the predictions of the probabilities of different outcomes.
The odds is a ratio of the probability that an event will occur versus the probability that the
event will not occur. Odds ratio is the ratio of probability of choosing one outcome category over
the probability of choosing the reference category (UCLA: Academic Technology Services,
Statistical Consulting Group). Field (2005/2009:739) illustrates Odds Ratio (OR) with an
example as follows.
“Odds ratio is the ratio of the odds of an event occurring in one group compared to another.
So for example, if the odds of dying after writing a glossary are 4, and the odds of dying after
not writing a glossary are .25, then the odds ratio is 4/.25 = 16. This means that if you write a
glossary you are 16 times more likely to die than if you don’t. An odds ratio of 1 would
indicate that the odds of a particular outcome are equal in both groups.”
Assume that there are two correct T3S surface patterns, Pattern A and Pattern B. When
speakers produce a T3S phrase/sentence, the response can be Pattern A, Pattern B, or incorrect.
The outcome measure in this study is the speakers’ responses – Pattern A, Pattern B, and
incorrect answer. From the outcomes, we will see what relationships exist with other important
independent variables. I will use this study as an example to illustrate the interpretation of the
analysis. In Table 9.1, ‘age’ and ‘structure’ are two independent variables to the outcome. These
two variables are both categorical, rather than gradient. For the variable ‘age,’ there are three age
groups— 4-year-olds, 6-year-olds, and adults. For the variable ‘structure,’ there are two
structures, Structure X and Structure Y. I select incorrect answer as the reference category.

328

Table 9.1 An example of the statistics output and the interpretation of the results
Parameter Estimates
95% Confidence
Interval for
Exp(B)
T3S surface patterns

a

Pattern A Intercept

Std.
Error

B

Wald

Lower Upper
Sig. Exp(B) Bound Bound

df

1.130

.480

5.534

1 .019

age_group=4

-1.504

.531

8.030

1 .005

.222

.079

.629

age_group=6

.323

.600

.290

1 .590

1.382

.426

4.480

b

.

.

.

.

.

.

.943

.381

6.113

1 .013

2.567

1.216

5.421

b

.
.485
.639
.669
.
.470
.

.
22.820
5.205
.315
.
36.564
.

.
.000

.

.

.

.023
.575
.
.000
.

.233
.687
.
.058
.

.067
.185
.
.023
.

.814
2.550
.
.147
.

age_group=adults
structure= X
structure= Y
Pattern B Intercept
age_group=4
age_group=6
age_group= adults
structure= X
structure= Y

0

0
2.319
-1.458
-.375
0b
-2.841
0b

0
0
1
1
1
0
1
0

a. The reference category is: Incorrect answer.
b. This parameter is set to zero because it is redundant.
A quick general information in Table 9.1 shows that the difference between 4-year-olds and
adults has been found to be statistically different for Pattern A (or Pattern B) to Incorrect answer
given that age and structure are in the model (Pattern A: p = .005, Pattern B: p = .023), but there
are no difference between 6-year-olds and adults (Pattern A: p = .590, Pattern B: p = .575).
43
Variable “structure” is statistically significant in Pattern A (p = .013) and Pattern B (p < .001 ),
it means the difference between structure X and structure Y has been found to be statistically
different for Pattern A (Pattern B) to incorrect answer. Next, I will show specifically what the
values in Exp (B) tell us.
An important concept in this model is comparison, and in the comparison, there is always a
reference category. For instance, Pattern A and Pattern B are both compared to the reference
category “Incorrect answers.” Both child groups are being compared to adults. Between the two
structures, Structure Y is selected to be the reference category. Reference categories are not fixed,
so if we want to use 4-year-olds as the reference group, it is fine, but it is more meaningful to

43

Although in Table 9.1, we see .000, it is typically reported as < .001 because this is an
extremely small number such as .00000000173 which the table does not show.
329

have the two child groups compared to the adults (reference group) for the purpose of the studies
in this dissertation. Similarly, we can use Structure X as a reference category if we choose to.
Two columns (Table 9.1) we should read are in bold, “sig.” and “Exp (B).” “Sig” tells us
whether or not it is statistically significant. Typically, a value less than .05 is considered
statistically significant. Exp (B) in the table is the so-called Odds Ratio (OR) value. The Exp (B)
column can tell us that a particular thing of interest is more likely or less likely to be in the
referent group.
Pattern A:
The use of Pattern A relative to errors in Structure X and Structure Y is significantly different
(Odds Ratio (OR) = 2.567, p = .013). The OR value indicates that Structure X is about 2.5 times
(exact number is 2.567) more likely to have Pattern A than Structure Y.
For the surface Pattern A relative to incorrect answers, 4-year-olds (OR = .222, p = .005) are
significantly different from adults, but 6-year-olds are not (OR = 1.382, p = .590).
Given that adults are the reference group, an OR value smaller than 1 in another age group
indicates that Pattern A is less likely to happen in that age group, and an OR value greater than 1
indicates that Pattern A is more likely to happen in that age group. For instance, the OR value for
4-year-olds is .222, which means 4-year-olds are less likely than adults to prefer to use Pattern A
over incorrect answers. Four-year-olds are “.222 times more likely” to use this pattern, which is
not as easy to process. This actually means that adults are more likely to use the pattern by 4.5
times (1/.222= 4.50). For 6-year-olds, the OR value is greater than 1, so that means they are more
likely (roughly 1.4 times, OR = 1.382) than adults to prefer to use this pattern over incorrect
answers.
Pattern B:
The use of Pattern B relative to errors in Structure X and Structure Y is also significantly
different (OR = .058, p < .001). Notice that the OR value is smaller than 1. This means that
Pattern B is less likely to occur in Structure X than in Structure Y. If we take Structure Y
(reference category) as “1,” Structure X is .058 (the OR value), and we can say Pattern B is more
likely to be used by approximately 17 times (1/.058= 17.24) in Structure Y than in Structure X.
For the surface Pattern B relative to incorrect answers, 4-year-olds (OR = .233, p = .023) are
significantly different from adults, but 6-year-olds are not (OR = .687, p = .575). By looking at
the OR values of the two child groups, the OR values are smaller than 1, indicating that children
are less likely than adults to prefer to use Pattern B over incorrect answers. Adults are about 4.3
times (1/.233= 4.29) more likely than 4-year-olds, and about 1.5 times (1/.687= 1.46) more likely
than 6-year-olds to use this pattern.
Overall, in addition to the information on statistical significance, Multinomial logistic
regression generates OR which allows us for additional interpretation. By how many times of
that one event occurs more than the other is encoded in the OR value and can be calculated if we
are interested in learning the information.

330

REFERENCES

331

REFERENCES

Bao, Zhiming (1999). The structure of tone. Oxford: Oxford University Press.
Boersma, Paul & David Weenink (2009). Praat: doing phonetics by computer (Version 5.1. 05)
[Computer program]. Retrieved May 1, 2009.
Brown, Roger (1973). A first language: The early stages. Cambridge, MA: Harvard University
Press.
Brown, R. & U. Bellugi. 1964. Three processes in the child's acquisition of syntax. Harvard
Educational Review 34, 133-51.
Chang, Hsing-Wu (1991). Acquisition of Mandarin Chinese: A review of recent research in
Taiwan. Proceedings of National Science Council, ROC. Part C: Humanities and Social
Sciences 1, 110-126.
Chao, Yuan-Ren (1968). A grammar of spoken Chinese. Berkeley, CA: University of California
Press.
Chao, Yuen-Ren (1951). The Cantian Idiolect. University of California Publications in Semitic
Philology 2, 27-44.
Chen-Wilson, Josephine (2003). Book Review: Phonological development in specific contexts:
Studies of Chinese-speaking children. Child Language Teaching and Therapy 19, 107108.
Chen, Li-Mei & Raymond D. Kent (2009). Development of prosodic patterns in Mandarinlearning infants. Journal of Child Language 36, 73-84.
Chen, Matthew Y. (2000). Tone sandhi: Patterns across Chinese dialects. Cambridge:
Cambridge University Press.
Cheng, Chin-Chuan (1973). A synchronic phonology of Mandarin Chinese. The Hague: Mouton.
Cheng, Lisa Lai Shen (1987). On the prosodic hierarchy and tone sandhi in Mandarin. Toronto
Working Papers in Linguistics 7, 24-52.
Cinque, Guglielmo (1993). A null theory of phrase and compound stress. Linguistic inquiry 24,
239-297.
Clumeck, Harold (1977). Topics in the acquisition of Mandarin phonology: A case study. Papers
and Reports on Child Language Development 14, 37-73.
Clumeck, Harold (1980). The acquisition of tone. Child phonology 1, 257–275.
Coetzee, Andries W (2006). Variation as accessing ‘non-optimal’candidates. Phonology 23, 337385.
332

Cowan, Nelson, Candice C. Morey & Zhijian Chen (2007). The legend of the magical number
seven. In S. D. Sala (ed.) Tall tales about the brain: Things we think we know about the
mind, but ain’t so, ed. S. Della Sala. Oxford: Oxford University Press, 45–59.
Crain, Stephen & Rosalind Thornton (2000). Investigations in Universal Grammar: A guide to
experiments on the acquisition of syntax and semantics. Cambridge, MA: The MIT Press.
Dell, Francois (2004). On recent claims about stress and tone in Beijing Mandarin. Cahiers de
linguistique-Asie orientale 33, 33-63.
Demuth, K., M. Machobane & F. Moloi (2010). Learning How to License Null Noun-Class
Prefixes in Sesotho. Language 85, 864-883.
Demuth, Katherine (1989). Problems in the Acquisition of Grammatical Tone. Papers & Reports
on Child Language Development 28, 81-88.
Demuth, Katherine (1993). Issues in the acquisition of the Sesotho tonal system. Journal of Child
Language 20, 275-301.
Demuth, Katherine (1995). The acquisition of tonal systems. In J. Archibald (ed.) The
Acquisition of Non-linear Phonology. Hillsdale, NJ: Lawrence Erlbaum Associates, 111–
134.
Demuth, Katherine (2001). A prosodic approach to filler syllables. Journal of Child Language 28,
246-249.
Demuth, Katherine (2003). The acquisition of Bantu languages. In D. Nurse & G. Phillipson (eds)
The Bantu Languages. Surry, UK: Curzon Press, 209-222.
Demuth, Katherine (2007). Sesotho Speech Acquisition. In S. McLeod (ed.) The international
guide to speech acquisition. Clifton Park, NY: Thomson Delmar Learning, 526-538.
Duanmu, San (2000/2007). The Phonology of Standard Chinese. Oxford: Oxford University
Press.
Duanmu, San (2004). Left-headed feet and phrasal stress in Chinese. Cahiers de linguistique Asie orientale, 65-103.
Erbaugh, Mary S. (1992). The acquisition of Mandarin. The crosslinguistic study of language
acquisition 3, 373-455.
Ericcson, K Anders, William G. Chase & Steve Faloon (1980). Acquisition of a memory skill.
Science 208, 1181-1182.
Field, Andy P. (2005/2009). Discovering statistics using SPSS. London: SAGE publications Ltd.
Gerken, LouAnn (1996). Prosodic structure in young children's language production. Language
72, 683-712.
Goad, Heather & Meaghen Buckley (2006). Prosodic structure in child French: Evidence for the
Foot. Catalan journal of linguistics 5, 109-142.
333

Guasti, Maria T. (2004). Language acquisition: The growth of grammar. Cambridge, MA: The
MIT Press.
Hirsh-Pasek, Kathy & Roberta M. Golinkoff (1996). The origins of grammar. Cambridge, MA:
The MIT Press.
Hong, Li-Jane (1980). Acquisition of Mandarin phonology: segment versus tone. Lubbock, TX:
Texas Tech University MA thesis.
Jeng, Heng-Hsiung (1979). The acquisition of Chinese phonology in relation to Jakobson’s laws
of irreversible solidarity. Paper presented at the 9th International Congress of Phonetic
Sciences. Copenhagen, Denmark.
Jeng, Heng-Hsiung (1985). A developmentalist view of child phonology. Studies in Language
and Literature 1, 1–29.
Jusczyk, Peter W. & Paul A. Luce (2002). Speech perception and spoken word recognition: Past
and present. Ear and Hearing 23, 1-40.
Kuo, Yu-ching, Yi Xu & Moira Yip (2007). The phonetics and phonology of apparent cases of
iterative tonal change in Standard Chinese. Tones and tunes 2, 211-237.
Li, Charles & Sandra Thompson (1977). The acquisition of tone in Mandarin-speaking children.
Journal of Child Language 4, 185-199.
Li, Paul Jen-Kuei (1978). Child language acquisition of Mandarin phonology. In Li-Ying Che
Robert L. Cheng, and Tang Ting Chi (eds) Proceedings of Symposium on Chinese
Linguistics. Taipei: Student Book Co., 295-316.
Lin, Hui-shan (2005). Prosodic correspondence in tone sandhi. USTWPL 1, 229-265.
Lin, Yen-Hwei. (2007). The sounds of Chinese. Cambridge: Cambridge University Press.
Lleó, Conxita & Katherine Demuth (1999). Prosodic constraints on the emergence of
grammatical morphemes: Crosslinguistic evidence from Germanic and Romance
languages. Paper presented at the 23nd Annual Boston University Conference on
Language Development. Boston.
MacWhinney, Brian (2000). The CHILDES project. Mahwah, New Jersey: Lawrence Erlbaum
Associates, Inc.
McDaniel, Dana, Cecile McKee & Helen Cairns (1998). Methods for assessing children's syntax.
Cambridge, MA: The MIT Press.
Miller, Karen (2007). Variable input and the acquisition of plurality in two varieties of Spanish.
East Lansing, MI: Michigan State University PhD dissertation.
Miller, Karen & Cristina Schmitt (2009). Variable vs. consistent input: comprehension of plural
morphology and verbal agreement in children. In José M. Brucart, Anna Gavarró and
Jaume Solà (eds) Merging Features. 123-138.
334

Morgan, James L. (1986). From simple input to complex grammar. Cambridge, MA: The MIT
Press.
Morgan, James L. & Jenny R. Saffran (1995). Emerging integration of sequential and
suprasegmental information in preverbal speech segmentation. Child Development 66,
911-936.
Pearl, Lisa S. (2007). Necessary bias in natural language learning. College Park, MD:
University of Maryland PhD dissertation.
Pierrehumbert, Janet B. (2003). Phonetic diversity, statistical learning, and acquisition of
phonology. Language and Speech 46, 115.
Prince, Alan & Paul Smolensky (1993). Optimality theory: Constraint interaction in generative
grammar. ms. University of Colorado at Boulder and Rutgers University.
Prince, Alan & Paul Smolensky (2004). Optimality Theory: Constraint interaction in generative
grammar. In J. J. McCarthy (ed.) Optimality Theory in Phonology. Malden, MA:
Blackwell Publishing Ltd, 1-71.
Roeper, Tom (2007). The prism of grammar: How child language illuminates humanism.
Cambridge, MA: The MIT Press
Shih, Chilin (1986). The prosodic domain of tone sandhi in Chinese. La Jolla, CA: University of
California San Diego PhD dissertation.
Shih, Chilin (1997). Mandarin third tone sandhi and prosodic structure. In N. Smith & J. Wang
(eds) Studies in Chinese phonology. Berlin: Mouton de Gruyter, 81-124.
So, Lydia K.H. & Barbara J. Dodd (1995). The acquisition of phonology by Cantonese-speaking
children. Journal of Child Language 22, 473-495.
UCLA: Academic Technology Services, Statistical Consulting Group. Introduction to SAS.
from http://www.ats.ucla.edu/stat/spss/output/mlogit.htm (accessed August 18, 2011).
Wang, Chiung-Yao (2008). Acquisition of Tone 3 Sandhi in Mandarin-speaking Children. ms.
Michigan State University.
Wang, Chiung-Yao & Yen-Hwei Lin (2011). Variation in Mandarin Tone 3 Sandhi: The Case of
Prepositions and Pronouns. Paper presented at the NACCL-23 (The 23rd North
American Conference on Chinese Linguistics). Eugene, Oregon.
Werker, J.F. & S. Curtin (2005). PRIMIR: A developmental framework of infant speech
processing. Language Learning and Development 1, 197-234.
Wong, Puisan, Richard G. Schwartz & James J. Jenkins (2005). Perception and production of
lexical tones by 3-year-old, Mandarin-speaking children. Journal of Speech, Language,
and Hearing Research 48, 1065-1079.
Xu, Debao (1992). Mandarin Tone Sandhi and the Interface Study between Phonology and
Syntax. Urbana, IL: University of Illinois at Urbana-Champaign PhD dissertation.
335

Xu, Yi (1997). Contextual tonal variations in Mandarin. Journal of Phonetics 25, 61-84.
Yang, Charles D. (2002). Knowledge and learning in natural language. Oxford: Oxford
University Press.
Yeh, Chia-Hsin (2010). Comparison of Phonetic Naturalness between Rising-Falling and
Falling-Rising Tonal Patterns in Taiwan Mandarin. Paper presented at the 5th Speech
Prosody. Chicago.
Zhang, Jie & Yuwen Lai (2010). Testing the role of phonetic knowledge in Mandarin tone
sandhi. Phonology 27, 153-201.
Zhang, Ning (1997). The avoidance of the third tone sandhi in Mandarin Chinese. Journal of East
Asian Linguistics 6, 293-338.
Zhang, Zheng-sheng (1988). Tone and tone sandhi in Chinese. Columbus, OH: Ohio State
University PhD dissertation.
Zhu, Hua (2002). Phonological development in specific contexts: Studies of Chinese-speaking
children. Clevedon, UK: Multilingual Matters Ltd.
Zhu, Hua & Barbara Dodd (2000). The phonological acquisition of Putonghua (modern standard
Chinese). Journal of Child Language 27, 3-42.

336