:3! .4, WI , 33': I I1 IIIII’ :l‘: Nqu 33 JH'3 3333 IL; III': ‘3 nb 3 I“ :i'j" 3‘ :11 ‘II. 33 I I '1'». ' .3 3:! In I II'IIr' “mg" WI“ ' v 3 7". _ . ' :I' I3 . 3'13 - II} Egg-1 ‘.'. . u wu.’ .1. - ' In ICHJ r» .21 v 'I u 1 ~ 3.....1 ‘ _.P 3 .. ., .A»r;- . 1" .- '3‘""1r-::."- u ’.-7, u— v 3...._ ‘1’.“ " c. r - 1.3 ~ I. I‘ IfLEL'. .' . ,,.‘. av. . - e- . 2‘ I-I: II “at; I' . ‘- 3 , "Ig' JIIJInfiI'Itafi'zI' - > g. _ - I I ‘ c 14:45:“ bu | . 1‘ I ‘ r 3. :‘31. ‘t I?" ' ‘~‘ A x ‘ I' " 'm}; .I'I'II"E‘I1II'.2= ”I? ”2- ~' ' ‘ “.2? 1 'II ' , ,' ~’;.t‘1,‘_“-"l 3,, 2.3%, "I'IIIII:II:2 ,. a, .11? I!‘3I,»Ij‘-g. 2;, 3 .. ;,. ,zn‘: ,1 ,1 'ILG'IA'WE‘ 99:33»? . ”I'M gI‘tI': fiu’ II!" I‘Ifiq‘ifi}ti“ 331m: 3.}. 3M1 :11}: .ié‘d" Elut'h’a' “ms-3:43pm.; 13h 9:3”...3313, II; fifié‘fi‘ 1 1!: II ' .. I“ ' (5‘ 3.3.3.333,34,° TIE. .I'JI‘M‘; ”4:33 '13 Igld' I‘ . IIMII ““333 I431,” 3331323 I‘ .‘MJ '3: I .3,H N .3, . .3 II' , 3 337 3:3 :‘333;:i1h‘ 313W IB‘LIH YI- jfiII‘ 1% I II”! fiig‘ib‘fi m Lm'gffl “I333 . ' 'igIz- ‘I‘ II III III ‘5” I“ M3,. " 'ILSI'A WWW... IRMA"? :B—-4 Pu ”$4. :31: éi:‘ 5:. 2.; -:.r.. .. 1:11:51 Zé;3§ -5:.:: 332%: I “513“,} {’Ml‘!’ - I: III‘EI 3313.303 'w 3 1’3 III”: l.:.I'j_‘,,\3+n,I-.’Ir W I‘IVIN 33 I ’n .33 £3,339 gag” 31.1% . ' '11:... 3a;{Hz-I"I1'I‘If§,.§.f.f.“,'5 “'“M‘ {II MINI”... 3.3%“. I“"""~““ IaI'II'Im "3"" ‘§%" 35'": - - s .r' .-.,“I.‘II' 'I I" “III-TIA I“ "‘ n,,.,-.'II'I9.,I«,I:|.;9,»; 5.1"?“ "I “1' IV“ ’I‘W‘IWII‘I" I . = ‘3": ‘ r eg.,,;s‘,§,g.§ag,€.%§‘ié,w.’*‘.,~;';..I',s*m. W, "‘ " 3.1. .II‘IE‘W» " ’ ' .': 'It‘”}I‘ I‘M fiIIIIIIIl'II'I'I'I' ~ III‘ "II'T’. 'IIII; :33; 4:13.93. ’Wk 1a! hiI I'M gz'I'IIfl“ 3133?:‘21'; {I Ifi‘ig-LIL 1': ‘g‘IrlkIIIILIfiw '5'21451‘2 v 331L'3z|'.llh3II3u5~3II335EI¥3 I‘ZI‘I'I'I'IW' 3' I)“ gt xi ‘. I: . I . .- ;, --— ‘ n...“— .. €53.55- -. .:.. 2—3: E; ‘2 r s .511 .— .. '3 "R p “7 .4qu u .3 .. u, , -'-: :LEnZ- - . ,0”. .2137: a“: i": 5:3; Lg”: 15:1) -»_I .. . .- ~. .x:- "M . w, ,. . “'3 ,3! 333$. !!3,!'\;I ”#5333 '{I . é'r' .‘I'II 31“,: 3 3 . . = ..,.,,.3.,...3:2 III II.III-III....IIII‘I“III IIIIIIIIII~.',.....3,,..,.,; 3., I...,If.r‘m .'.',i u H‘ III‘EII'III I:.1‘g"‘g\ ‘EI'Itfigiw ,3 3 ,' WI}; Iiégéé'l'l 133%; 3:1' 133' ; E 3':I3‘3':3 {3:I3.3':,.'1 IIII'I'X'LIIVuIIlI'.:l§'£yI'I'Ifii'tl'fik' ' I Egg; [5 3‘3?£'1¥;11fi33jr1'3[l§.¢3'3§33I "' "'II'I'I'I'I'IMI'I 111m: 'III'I'IIF 'I3""II33"' ' 3.3.3 $33.3 £33 11": "h'II'iI W333, t;l"-1‘I'l"".21‘, \m it “I [W I I ... u '«t {'3 m' "Id": I31: 31' "j" éI' IV?! ~11“ XIII" III” "Inga"? III', {"79 MI '15 19"” 33,33, ””5“: "I ,.u,. 1% I {III "'"II III. III" 'II'WMI I" W, I" .1». III: III .. -"r ‘;:: 4_ w. ‘iq‘t -' .P. 5A;- Wfl‘“: I II? '1‘" -.... fit"; 5" .3.‘ ...—.~ first; .. ‘ flute-S '- _., 1-3:.va W. MW 3333333333 3333 333 3 3 _ 3 :1 .. . , 3.3; , . II II I ,l! - .. , ‘I' ."‘~‘-' 3.32"” «MIMI III ' III..'.'I‘:I..I"II '.'.'.i"a'I III ' . .I‘ ': "I1"“I.II III». I="I1I'I‘I‘}‘IH"' I‘m-I I III " - . ‘- ' I‘M "' I II ' I'm? «I?» "‘rE-I‘ : II} III.” I'3‘I'IIJI \p 33 l I '..J w ' ' 51“.} ,. In”: ~ I . w m I I? .. I I'I‘IIIII. 3?.“ II.'I'.'."":I'I' 31:61:11: "' .~I.:':III3,I"III3I{'I'{'I,':,I,III' "I'lII';. :1: 'I 'II‘ '0'» :II: I.III'K'I""I'II'H'1""I 1. IIII' I II. III' 3&1. :",3'II'.' I33:‘I’:'.,I3I33\I'li3y33‘:3‘ i. I II 1'? Iti " .IAEI'II: :IEI‘ 'I"L""I".'."III'I'IL'I'I"'I'I"-1I I», 'U Mm. 1:»; I II 'I'I'I. .IIIIII|I"I " WII', .-.:II"II'. IIIII' ”HI ‘I' . . . In .1. . .m. - rHESiS 3 IUIHHIIHHIIHII”Hill!"(llfillUllllllllllIIIHIIIIJHI 23 01592 8496 .- LIBRARY Michigan State University This is to certify that the dissertation entitled Localizing Derivationai Economy presented by Daehee Lee has been accepted towards fulfillment of the requirements for Ph.D. degreein LinQUiStiCS Alec/M A Major professor Date _AD_LU_1L19.9_Z__ MS U i: an Affirmative Action/Equal Opportunity Institution 0-12771 ' ‘- w~ ’ PLACE It RETURN BOXto rernovethie checkouthorn your record. TO AVOID FINES return on or bdore dete due. DATE DUE DATE DUE DATE DUE ll iLJl 4—3 __] : fil—j MSU to An Affirmative ActiorVEquel Opportunity lnetituion LOCALIZING DERIVATIONAL ECONOMY IN MINIMALISM BY Daehee Lee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Linguistics and Germanic, Slavic, Asian and African Languages 1997 ABSTRACT LOCALIZING DERIVATIONAL ECONOMY IN MINIMALISM BY Daehee Lee This thesis attempts to uniformly and strictly localize derivational economy, and to investigate the significance and consequence of local economy, providing some natural and unified analyses of the cyclicity in overt derivations, Procrastinate effects, Wh-asymmetries, and Wh-adjunct symmetries in the minimalist program for linguistic theory (Chomsky 1993, 1994, 1995). First of all, we distinguish between global and local economy, and examine the motivation of global economy on derivations, and its conceptual and empirical problems. Then we pursue local economy in which derivational economy conditions (the Last Resort Condition, the Minimal Link Condition, and the Earliness Principle) should select the most economical operations at each step of derivations. By localizing derivational economy, we get the following desirable results: 0 Economy becomes strictly derivational. 0 Computational complexity is significantly reduced by generating only a set of optimal derivations. 0 The optimality of a derivation is consistent in the course of derivation and at the interface levels. 0 Derivational economy becomes homogeneous in terms of unviolability and locality. We also propose that the Procrastinate Principle, which is a stipulative and global condition, should be eliminated and replaced with Earliness. Earliness has the following advantages: 0 All the derivational economy conditions become localized uniformly. 0 The Last Resort Condition becomes strengthened so that it can block ’no operation'. 0 The cyclicity of overt computation and Procrastinate effects are derived from one principle, Earliness. We also hypothesize that multiple features of a target can attract F, and that multiple feature attraction can presumably be parametrized in terms of the number and type of features. Incorporating this with the Minimal Link Condition, multiple feature attraction offers a unified analysis of Wh-asymmetries such as argument-adjunct asymmetries, argument extraction asymmetries, argument- quasi-argument asymmetries, superiority effects, and Wh- adjunct symmetries shown in argument-adjunct asymmetries, pseudo-opacity, and inner island conditions. Copyright by Daehee Lee 1997 Dedicated to my father, Sangwook, my mother, Sochool, my wife, Gemhee, and my three sons, Jaeyoung, Jaegook, and Jaesung. ACKNOWLEDGMENTS I am very pleased to express my gratitude to people who have supported me in fulfilling my graduate study and dissertation for five and a half years. Without their support I could not complete my graduate study at all. First and foremost, I thank all the members of my dissertation committee, Alan Munn (chair), Barbara Abbott, Cristina Schmitt, Alan Beretta, and Mutsuko Endo Hudson, for their strong support and guidance. Especially, I express my deep gratitude to Alan Munn, Barbara Abbott, and Cristina Schmitt. It was very fortunate that I have been a student of Alan Munn. His consistent guidance, insightful comments, challenging questions and criticisms have been invaluable for my research. In spite of his busy schedule, he has gladly opened his office for me regularly, and very patiently and carefully listened to my little and premature idea for hours, and provided me with very helpful comments on its predictability and the directions of the research. His guidance for reading books and articles, and for organizing the idea has been incredibly helpful. His constant encouragement has also been an invaluable support. vi I am grateful to Cristina Schmitt for her excellent comments and questions on my research. By discussing with her, I could make my vague ideas clear. She has never forgotten to encourage me with her good words and confidence in me. I am greatly indebted to Barbara Abbott for her continuing guidance and strong encouragement. In my early graduate study she provided me with an excellent academic training, and broadened my view of theoretical linguistics. Her comments and questions have been very helpful to improve my research. In classes and seminars she always read for me what she was writing on the blackboard, keeping in mind that I cannot access the blackboard with my blindness. One thing that I could never forget was that she herself read some books and articles on the tape for me. I am grateful to Seok Choong Song and Kazuhiko Fukushima. They gladly read good books and articles and discussed them together with me. In addition, I thank all my other professors: Grover Hudson, David Lockwood, Julia Falk, and Carolyn Harford for their support in my early graduate studies. I am in debt to Jaehyun Han for his guidance in my undergraduate study. By virtue of him I could first open my eyes to theoretical linguistics. I am also indebted to Office of Programs for Handicapper Students (OPHS) and Tower Guard for their strong vii support. Without their reading service I could not have finished my graduate study at all. I give my special gratitude to Michael Hudson, a vision specialist at OPHS, for his friendship, strong confidence in me, constant encouragement, and invaluable information on accessibility on campus. I also thank my friends at the depart of linguistics for many delightful conversations, good discussions, their friendship and help. I give my special thanks to Laurie Church, Seung-chae Cheong, Ki Yeol Lee, 0k Sook Park and Dennie Hoopingarner for their great assistance. I am grateful to all the congregations at Korean Lansing United Methodist Church for their fellowship and assistance. My special gratitude goes to Pastor Hyonam Hwang. His good words, prayers and confidence in me have been very encouraging to me. I am indebted to Deane Blazie and Bryan Blazie at Blazie Engineering for their trust, encouragements, and financial support. They offered me a job in software engineering, while I was a graduate student, and generously allowed me to work remotely from the company. When I was busy with academic research, they kindly assigned some works to someone else. When I was on leave from my position to finish my research, they continued to provide financial support for me. I am also very truly grateful to my family. My mother, viii Sochool, brother, Wonhee, and sisters, Jungok and Sookja, strongly supported me. Especially, their prayers and pride in me have been good encouragements to me. I never forget Gemhee Lee, my wife’s, unselfish love and support for me. She read and scanned a tremendous amount of books and articles for my study, edited my papers, and drew diagrams and figures for me. I interrupted her work and even her sleep at midnight so many times so that I could ask her to look up references for me. Her patience and unselfish love have been and will be the greatest source of support. Finally, I thank all my three sons, Jaeyoung, Jaegook and Jaesung, who gave up their mother most of time when she helped me. They have been my great pleasure. ix TABLE OF CONTENTS 0. Outline of the Thesis . . . . . . . . . . . . . . . . 1 1. Introduction to a Minimalist Program for Linguistic Theory . . . . . . . . . . . . . . . . . . 14 1.1 The Minimalist Model . . . . . . . . . 15 1. 2 The Lexicon and the Computational System . . . . . . 21 1.2.1 The Lexicon and the Computations: Select . . . . 21 1. 2. 2 Merge . . . . . . . . . . . . . . . 24 1. 2. 3 The Status of X'-theory . . . . . . . . . . . . . 29 1. 2. 4 Move . . . . . . . . . . . . . . . . . . . 32 1.2.5 Delete/Erase . . . . . . 36 1. 3 Bare Output Conditions and the Computations . . . . 39 1.3.1 Features and Their Interpretability . . . . . . . 40 1.3.2 Spell-out, and PF and LF Branching . . . . . . . 41 1.3.3 Feature Checking and Move-F . . . . . . . . . . . 43 1.4 The Economy Principles . . . . . . . . . . . . . . . 57 1 4.1 Last Resort . . . . . . . . . . . . . . . 57 1.4.2 Minimal Link Condition . . . . . . . . . . . . . 6O 1 4.3 Procrastinate Principle . . . . . . . . . . . . . 61 2. Localizing Derivational Economy . . . . . . . . . . . 63 2.1 Introduction . . . . . . 63 2. 2 Global Economy: the Motivations and Problems . . . . 65 2.2.1 A Distinction between Global and Local Economy . 65 2. 2. 2 The Last Resort Condition and Global Economy . . 67 2. 2. 3 The Minimal Link Condition and Global Economy . . 69 2. 2. 4 The Procrastinate Principle and Global Economy . 72 2.2.5 Some Problems with Global Economy . . . . . . . . 73 2. 3 Localizing Derivational Economy . . . . . . . . 81 2.3.1 Localizing the Last Resort Condition . . . . . . 82 2. 3. 2 Localizing the Minimal Link Condition . . . . . . 85 2.3.3 Earliness as a Local Economy Condition . . . . . 87 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . 92 3. Deriving Strict Cycle and Procrastinate Effects . . . 95 3.1 Introduction . . . . . . . . . . . . . . . . . . 95 3. 2 Deriving Strict Cycle . . . . . . . . . 98 3.2.1 Previous Analyses and their Problems . . . . . . 98 3.2.1.1 Extension Condition . . . . . . . . . . . . 98 3.2.1.2 Target- a . . . . . . . . . . 105 3.2.1.3 Crossing the Number of Nodes . . . . . . . . 111 3.2.1.4 Feature Strength and Cyclicity . . . . . . . 113 X 3.2.1.5 Chain Interleaving . 3. 2. 2 Earliness and the Strict Cycle . . 3. 3 Procrastinate: the Background and Problems L 3.1 Motivations . 3. 3. 2 Some Problems . 3. 4 Deriving Procrastinate Effects 3.4.1 PF Deletion Analysis L L 2 Earliness and Procrastinate Effects 4. A Unified Analysis of Wh-Asymmetries and Wh-Adjunct Symmetries . . . . . . . . . . . . . . . . . 4.1 Introduction 4. 2 Some Types of Wh— asymmetries . 4.2.1 Wh- asymmetries and Pre- minimalist Analyses L 2. 2 Some Minimalist Analyses 4. 3 A Unified Analysis of Wh- asymmetries 4.3.1 Feature Specifications of Wh- words and [+Wh] Comps 4. 3. 2 Multiple Feature Attraction 4. L 3 Analysis 4.3.3.1 Some Basic Assumptions . 4.3.3.2 Argument- Adjunct Asymmetries 4.3.3.3 Argument Extraction Asymmetries 4.3.3. 4 Argument- Quasi- Argument Asymmetries 4.3.3.5 Superiority Effects and Some Residues 4.4 Further Consequences: Some Adjunct Symmetries 5. Conclusion and Further Research REFERENCES xi 114 116 117 117 124 127 127 134 139 139 140 140 148 155 155 160 164 164 166 170 171 175 181 184 187 0. Outline of the Thesis The spirit of economy has long had effect on the theory of generative grammar. In the early theory of grammar it was reflected by simplifying the rule system of language, i.e. reducing complex and superfluous rules into simple universal principles. For example, language-particular and construction-specific phrase structure rules were reduced into a simple X'-theory; construction-specific transformational rules were simplified into one generalized rule, Move-a; a variety of descriptive islands on extraction were generalized into the Empty Category Principle; and so on. The minimalist program for linguistic theory (Chomsky 1993, 1994, 1995) pursues the spirit of economy in two different ways. One is that the language system is so perfect that universal principles cannot be overlapped in their effects; if so, some of them may be wrong. In addition, the components of the language system must be virtually conceptually necessary. The existence of D- structure and S-structure was, so to speak, motivated by some theory-internal necessity rather than by virtual conceptual necessity. Thus they have been eliminated from the minimalist program. 2 The other way economy is used in the minimalist program is that the language system is so perfect that a structural description must be derived in an optimal way. If more than one structural description can be derived by the computations, only an Optimal derivation will be selected by the language system, blocking nonoptimal derivations. It indicates that economy considerations should have effect on grammaticality, and hence that some universal principles should reflect economy considerations. To reflect economy considerations for language analysis in universal grammar, Chomsky (1991, 1993, 1994, 1995) proposes some economy conditions on derivations such as the Greed Principle, the Minimal Link Condition, and the Procrastinate Principle. Since then many studies have been undertaken to discover the properties of derivational economy. Most of the studies are, however, closely related to global economy in which economy conditions apply at the interface levels or representations, which is empirically and conceptually undesirable in language (which we return to later). The goal of this thesis is to strictly and uniformly localize derivational economy in the minimalist program (Chomsky 1993, 1994, 1995) so that derivational economy should apply at each point of a derivation. Furthermore, the thesis investigates the significance and consequences of local economy, giving some natural and unified explanations 3 of cyclic derivations in overt syntax, Procrastinate effects, some Wh-asymmetries, and some Wh-adjunct symmetries under local economy considerations. The thesis is organized as follows: Chapter 1 gives a concise introduction to the minimalist program (Chomsky 1993, 1994, 1995) for the nonspecialist. It describes some basic concepts and machineries of the minimalist model. The first section (section 1.1) makes a brief sketch of the general model of minimalism from a bird's eye point of view, and the subsequent sections then describe the basic modules of the model in more detail. Section 1.2 introduces bare phrase structures and computational operations (i.e. Select, Merge, Move, and Delete/Erase), and explains how the computational system accesses lexical items and constructs phrase structures by means of the computational operations without reference to X’-formats. Section 1.3 explains the relationship between derivations and bare output conditions in two respects: (i) how the computational system maps phrase structures to PF and LF, and (ii) which interface properties motivate the computations. More specifically, we discuss types of features, feature interpretability at the interface levels, Spell-out and Move-F/Attract-F. Section 1.4 describes economy principles on derivations, i.e. the Last Resort Condition, the Minimal Link Condition, and the Procrastinate Principle, and explains how they apply for optimal derivations. Chapter 2 sets up a theoretical basis for local economy on derivations. First of all, section 2.2 distinguishes global economy from local economy, as defined in (1) and (2), respectively: (1) Global Economy Derivational economy should apply at the interface levels so that it selects a derivation (among convergent derivations) that takes the most economical operations. (2) Local Economy Derivational economy should apply at each point of a derivation so that it selects the most economical operation to affect the target at that point. Then this section discusses the motivation of global economy, and its empirical and conceptual problems in comparison with local economy, as listed in the table below: Global Local i It is a kind of It is a strictly representational derivational condition. condition. ii It allows the It allows the computational system to computational system to generate an explosive or generate only a set of exponential number of optimal derivations at derivations at the the interface levels. interface levels. iii Some derivations which Some derivations which were optimal during the time of derivation may become nonoptimal at the interface levels. are optimal during the time of derivation are always optimal at the interface levels. iv It makes economy conditions heterogeneous in terms of unviolability and locality. It makes economy conditions homogeneous in terms of unviolability and locality. Localizing derivational economy means that measuring the cost of the computational operations should be done at each point of a derivation. Section 2.3 explores how to measure the cost of the computational operations in a local way. Section 2.3 claims that derivational economy should adopt (4) and (6) for the measurement of the cost of an operation rather than (3) and (5) which motivate global economy. (3) The more operations a derivation takes, the more costly it is. (4) The more superfluous operations a derivation takes, the more costly it is. (5) Merge is costfree, and Move is costly. 6 (6) Merge and Move are both equal in terms of cost. As a consequence of (4), "no operation" cannot any longer block necessary (or last resort) operations. Rather, a last resort operation blocks "no operation", incorporating with the Earliness Principle (which we return to later). In addition, proposal (4) naturally leads us to proposal (6). Since Merge and Move are equal in cost, they are both costly if they perform in a superfluous way; otherwise they are both considered costfree. This proposal implies that Merge and Move cannot compete with each other under economy considerations, since their functions are fundamentally different. An operation, Merge, applies to reduce the number of partial phrase structures into one larger phrase structure for a derivation to converge at PF and LF; otherwise it would crash, since partial phrase structures which are not related in terms of dominance and c-command cannot be interpreted, i.e. not linearized by Linear Correspondence Axiom (Kayne 1994) at PF, nor semantically interpreted by composition at LF. On the other hand, an operation, Move, functions to satisfy morphological properties at the interface levels, providing a local checking relation in a phrase structure; otherwise a derivation would crash, since some morphological formal features of a derivation cannot be interpreted at the interface levels. 7 In this section we propose a timing principle, the Earliness Principle, as defined in (7), and claim that Earliness should replace the Procrastinate Principle which is a global condition in nature. We consider how Earliness applies locally along with Attract-F. (7) Satisfy bare output condition as early as possible. More specifically, we formulate Attract-F to reflect Earliness in it: (8) K attracts F early only if a sublabel of K is an uninterpretable feature at the interface level that Attract-F affects. For a derivation to be optimal at the interface levels, it should satisfy all three types of derivational economy conditions at each point of a derivation: the Earliness principle as in (8), the Last Resort Condition and the Minimal Link Condition, as Chomsky (1995) defines in the following: (9) The Last Resort Condition K attracts F if F can enter into a checking relation with a sublabel of K. (10) The Minimal Link Condition 8 (i) K attracts P if F is the closest feature to K. (ii) X is closer to K than y if K c-commands x and x c-commands y. At each point of a derivation the computational system now selects the most economical operation in a strictly local way, generating only a set of optimal derivations regardless of whether they converge at an interface level. The subsequent chapters (chapter 3 and 4) attempt to offer some unified analyses of the cyclicity in overt derivations, Procrastinate effects, Wh-asymmetries, and Wh- adjunct symmetries under local derivational economy. In chapter 3 we apply Earliness to some phenomena of the cyclic derivations in overt syntax, and Procrastinate effects, demonstrating that the cyclicity of the computations and Procrastinate are reducible to one timing principle, Earliness. Section 3.2.1 makes a brief sketch on previous efforts to derive the cyclicity of overt computations from some economy principles. Their analyses are done under the global economy considerations and also the assumption that Procrastinate should exist. In section 3.2.2 we derive the cyclic computations in overt syntax from Earliness. Section 3.4 demonstrates how Procrastinate effects are derived from Earliness in a local way. If we can eliminate the Procrastinate Principle, which is global in nature, then 9 we can uniformly localize all derivational economy conditions. In order to eliminate Procrastinate, section 3.3 discusses the motivations of Procrastinate and its problems. First of all, Procrastinate has two stipulations in comparison with other universal principles: One is that Procrastinate is violable for convergence, while no other universal principles including other economy principles such as the Last Resort Condition and the Minimal Link Condition can be violated for any reason. Its violability is not consistent with the general assumption that all universal principles should be observed for convergence, and that if a derivation violates any principle it should yield some deviance. The other stipulation is that only Procrastinate is global in nature, while other economy principles can be localized. Its global characteristic is undesirable (as will be described in section 2.2). In addition, Procrastinate has the following problems: (i) As a timing principle it cannot explain the timing of the computations in overt syntax; (ii) Its conceptual motivation is based upon some characteristic of the sensory- motor system rather than on linguistic properties. In section 3.4 Earliness derives Procrastinate effects such as English verb movement and object shift and French object shift without reference to Procrastinate at all. 10 Chapter 4 attempts to give a unified analysis of some asymmetries of Wh-movement and some symmetries of Wh-adjunct movement under the Minimal Link Condition and multiple feature attraction in a local way. We make the following proposals relating to multiple feature attraction: (11) K attracts P where the number of features F and types of F are parametrized. Parametrizing the features F of Attract-F completely fits minimalism in which only lexical items and their morphological properties must be idiosyncratic language to language, and all universal principles must be invariant. In this sense parameters (or options) for a language must be specified in terms of formal features. As a consequence we reduce the Wh-asymmetries and Wh- adjunct symmetries to the Minimal Link Condition and multiple feature attraction without reference to the non- formal features of categories such as referential/nonreferential O-role, etc. In section 4.2 we introduce some asymmetries of Wh- movement such as argument-adjunct asymmetries, argument- quasi-argument asymmetries, argument extraction asymmetries, and superiority effects, and discuss their previous (pre- minimalist and minimalist) analyses and problems. The previous analyses could not treat the Wh-asymmetries and Wh- ll adjunct symmetries in a unified way, and needed to refer to some semantic information such as thematic roles, referentiality/nonreferentiality, etc. which are undesirable to refer to during the derivation in the minimalist model. Section 4.3 makes a unified analysis of those asymmetries under the Minimal Link Condition and Attract-F which are independently motivated in language. In Section 4.3.1, first of all, we elaborate the feature specification of Wh-words and a Comp. In this section we classify Wh- words into three types of categories: Wh-DP operators, Wh- adverbial operators, and Wh-pronominal variables, and their differences are specified in terms of formal features: (12) a. Wh-DP operators: {D, OpQ} b. WH-adverbial operators: (Adv, OpQ} c. Wh-pronominals: {DMD}. Regarding the formal features of a [+Wh] Comp, we propose the following: (13) A Comp attracts F where F is either an Operator Op feature or a pair of features . Section 4.3.2 considers how Attract-F and the Minimal Link Condition interact with each other in minimalism. Specifically, we demonstrate that the Minimal Link Condition 12 determines optimal derivations relative to features to be attracted. Section 4.3.3 demonstrates how Attract—F and the Minimal Link Condition provide a unified analysis of the Wh- asymmetries. Under our analysis the Wh-asymmetries are due to the asymmetries of the availability of the D and OpQ feature of a Wh-word under multiple feature attraction and the Minimal Link Condition. That is, a [+Wh] Comp attracts F where F is Op or . .A feature Op can attract any category with an Op feature, (i.e. Wh-adjuncts, Quasi- arguments, Wh-NPs), while 1can attract only a category with both a D feature and an Opb feature (i.e. Wh- NPs). Under the Minimal Link Condition, however, an Op feature cannot attract another Op feature across an intervening Op or , while cannot attract another across an intervening but can attract it across Op. The former is a typical case where Wh-adjuncts cannot move across an intervening operators, and the latter a case where Wh-NPs cannot move across an intervening Wh-NP. Section 4.4 extends this analysis to Wh—adjunct symmetries, as shown in argument-adjunct asymmetries, pseudo-opacity, and inner islands where Wh-adjuncts cannot move across any other intervening operator. To conclude, local economy is empirically and conceptually superior to global economy. Conceptually, we 13 can reduce computational complexity, and keep homogeneous derivational economy conditions under local economy. We can also derive the cyclicity of overt derivations and the Procrastinate Principle from one timing principle, the Earliness Principle. Under the Minimal Link Condition and multiple feature attraction, on the other hand, we can uniformly treat some Wh-asymmetries and Wh—adjunct symmetries. 14 1. Introduction to a Minimalist Program for Linguistic Theory This chapter will briefly review the framework of the minimalist program for linguistic theory (Chomsky 1993, 1994, 1995). Although the main chapters of this dissertation discuss in more detail some concepts of the minimalist program relating to each chapter’s topic when necessary, the introductory review here will provide some theoretical background for understanding basic concepts and machineries of minimalism. In section 1.1 the minimalist model of grammar will be described in the general sense of conceptual necessity. The subsequent sections will describe the components of grammar in more detail. Section 1.2 introduces bare phrase structures and computational operations (i.e. Select, Merge, Move, and Delete/Erase), and explains how the computational system selects lexical items and constructs (or derives) phrase structures by means of the computational operations without reference to X’-formats. Section 1.3 describes the linguistic representational levels, bare output conditions, and the relationships between bare output conditions and derivations in two respects: (i) how a phrase structure is mapped to the interface levels, PF and LF, and (ii) which 15 interface properties motivate the computations. More specifically, it discusses types of features, feature interpretability at the interface levels, Spell-out, and move-F/Attract-F. Section 1.4 introduces some economy conditions on derivations such as the Last Resort Condition, the Minimal Link Condition, and the Procrastinate Principle. 1.1 The Minimalist Model Chomsky (1993, 1994, 1995) proposes the minimalist program as the principles-and-parameters model in which particular languages are assumed to be determined by a finite set of universal principles and parameters. Universal principles are invariant and common to all human language faculties, and parameters (or options) are "restricted to functional elements and general properties of the lexicon" (Chomsky 1994 p.4), and determined by very limited linguistic experience only. The minimalist program has been designated to accommodate only conceptually necessary or minimally required concepts for a theory of grammar. What elements are conceptually necessary and minimally required for linguistic theory, then? First of all, one of the minimal theoretical requirements is the large repository which stores the lexical items with idiosyncratic prOperties including 16 phonological, morphological, (sub)categorial, and semantic specifications. For example, in English the word "tree" means a tree, not a car; it is pronounced as [tri:], not [ka:]; the verb "buy" obligatorily requires an object, and the verb "arrive" does not; etc. Such arbitrariness of lexical items cannot be computed at all, and must be somehow specified in a storage—~what we may call a lexicon. Any theory of grammar must thus have a lexicon. The second requirement for linguistic theory is a computational system. Since the lexicon itself is a storage device with some morphological processes, the theory requires the computational system to construct larger units such as phrases and clauses. The lexicon and the computational system belong to the generative or computational procedure of language faculty. The computational system selects an array of lexical items from the lexicon and generates structural derivations. The derivations which the computational system generates affect the sound and meaning. For example, the sentence "John kissed Mary." does not mean "Mary kissed John."; the sound pattern, e.g. intonation of "green house" is different from that of "greenhouse". For this reason Chomsky (1993, 1994, 1995) proposes that the output of the computational system should be interpreted at two interface levels, PF and LF, for sound and meaning, respectively. In addition, only the information relating to sound is interpreted at PF, and (1 In DI 17 only the information relating to meaning is interpreted at LF, but not vice versa. In this sense the computational system should take on two responsibilities: one is that it should keep the history of a derivation for the interface levels; the other that it should generate only the elements which are interpretable at the interface levels. It is a computational effort to satisfy the output conditions. The above discussion implies that the theory of grammar requires at least two representational levels, PF and LF. PF and LF are assumed to be further fed to two external systems, an articulatory-perceptual external system, and a conceptual-intentional external system, respectively. In addition, Chomsky (1993, 1994, 1995) argues that only PF and LF are minimally required for linguistic theory, and that D- structure and S-structure, which were assumed in traditional generative grammar, can be eliminated if we can reduce the conditions on D-structure and S-structure to the ones on PF, LF, and derivations. So the minimalist model no longer takes D-structure and S-structure for granted. Further, the computational system should not arbitrarily derive phrase structures to simply satisfy the output conditions. Rather, it follows some conditions on derivations in the computational process. For example, sentence (1) is grammatical, and sentence (2) is not, although they are both derived from the same lexical choices, and presumably satisfy Full Interpretability (FI) 18 at the interfaces such as Case theory, O-theory, the Uniform Chain Condition, the Extended Projection Principle, etc. The ungrammaticality of (2) is presumed to be due to violating some condition on derivations. Thus derivations must satisfy some conditions on derivations and some output conditions at the same time, in order to be grammatical. (1) It seems that John.i is believed ti. (2) *Johni seems that it is believed ti. Universal grammar (UG) will take the following computational procedure to map a phrase structure to PF and LF: First of all, an array of lexical items are chosen from the lexicon. Then the computational system selects the lexical items from the array freely at any point of a derivation before PF and LF branches, and constructs phrase structures, satisfying some derivational conditions. At any point of the course of the derivation, the computational system switches them into PF, which is what we may call the Spell-Out operation. Then the computation maps the structures into a component of Morphology, and further into PF. The computation which maps the phrase structures to the PF representation after the Spell-Out may be called the PF computation. On the other hand, independent of the PF computation, the computational system continues to further modify the 19 phrase structures and map them into LF. This may be called the LF computation. Chomsky (1993, 1994) claims that the PF and LF computations cannot further access the array of lexical item or the lexicon}. The computation before Spell- Out may be called an overt syntactic computation, and the LF computation may be called the covert computation, since the syntactic structures modified by the LF computation are not reflected in pronounciation. Note that the overt computation and LF computation (or covert computation) are a single uniform computational system, and hence there is no difference between the overt and covert computations at all except for whether the results of the computation are perceptual or not. In sum, the minimalist model consists of a lexicon, a computational system, two levels of linguistic representations (PF and LF), and some principles on derivations and representations, which can be diagrammed in (3). 1Chomsky (1995) claims that phonetically null lexical items may be accessed and merged to the root of a phrase structure even after Spell-out. 20 (3) Lexicon Numeration overt derivation ~Computational Operations- Select Merge Move Spell-out Morphology l covert derivation -— derivational economy-———7 The Last Resort Condition The Minimal Link Condition The ProcrastinatePrinciple PF LF - bare output -——- P bare output ———— conditions conditions Linear Case Theory Correspendence 0 Theory Axicom Binding Theory etc. etc. Thus Chomsky (1994) proposes that a language should be specified in terms of "... the nature of the computational procedure; ... the properties of bare output conditions and 21 the functional component of the lexicon; and ... principles and concepts." (p.5) In the subsequent sections let us take a look at some of these properties in a little more detail. 1.2 The Lexicon and the Computational System In this section we will consider the mechanisms and properties of four computational operations: Select, Merge, Move, and Delete/Erase, which are all assumed to be conceptually necessary for the language faculty, and discuss bare phrase structures (Chomsky 1994, 1995), deriving X'- theory from other principles and therefore eliminating it from the grammar. 1.2.1 The Lexicon and the Computations: Select The computational system generates a linguistic expression where P refers to an expression for PF and L refers to an expression for LP. P and L are assumed to be constructed from the same lexical choices, since, for example, the sound of a sentence John kissed Mary does not mean that a dog chased a cat. The lexical choices are assumed to be done at two levels. At one level lexical items are selected into an array from the lexicon. This is done all at once before the computations proceed to 22 construct a phrase structure. At the other level, in the course of deriving a phrase structure, the computational system selects lexical items from the array rather than from the lexicon. In this section let us consider the computational operations to retrieve lexical items from the lexicon to an array, and to retrieve them from the array to a derivation, respectively. These operations are conceptually necessary to interface the lexicon and the computational system. Retrieving lexical items from the lexicon forms a set of a pair in an array where LI is a lexical item and i is the number of times that LI has been retrieved from the lexicon. The array is called a numeration of lexical items. For example, for a sentence John saw Marv the numeration n is as follows: (4) n = {, , , , } (where C and T are functional categories for Comp and Tense, respectively.) It is important to note that the numeration n must be finished all at once before it is mapped to PF and LF. This condition can be defined in (5). (5) Inclusiveness Condition: (=Chomsky(1995) p.228) Any structure formed by the computation is 23 constructed of elements already present in the lexical items selected for n. After the numeration is done, the computational system starts to build a phrase structure, selecting the lexical items from n, and introducing them to a derivation. This operation can be called Select. The computational system selects an lexical item, LI, (i.e. accesses in the numeration, and reduces i by 1), and performs permissible computations for derivations. For example, suppose that a numeration n is completed as in (4), and that the computational system derives a partial phrase structure LWJohn [wsaw Mary]]. First, an operation Select accesses and , reduces each index by 1, and performs a computation to construct [wsaw Mary]. After this process, the numeration n looks like (6): (6) n = {, , , , and reduces i by 1, and a further computation constructs LWJohn [wsaw Mary]]. Then the numeration n looks like (7). (7) n = {, , , , }. 24 If i is a zero in , then Select can no longer access that LI. Furthermore, unless all 1’s in N are exhausted and so become 0, a derivation cannot be done nor generated. Note that the Select operation is assumed (by Chomsky (1995)) to be costless in the sense of the economy conditions which we will discuss later. 1.2.2 Merge When the computational system retrieves the lexical items from a numeration n by Select, it concatenates or merges them into a larger unit. This operation is called Merge. This operation is conceptually necessary to build a unit larger than a word. An operation Merge can be defined as in (8). (8) Merge: (= Chomsky (1995) p. 243) a. take only two syntactic objects, x and y; b. form a larger syntactic object, 2 = {w, {x,y}}; c. eliminate x and y. (8.a) defines Merge to be a binary operationz. It follows from this that a non-branching projection X -> X' is no longer a valid operation, although it was permissible 2Kayne (1994), Collins (1995) and Watanabe (1995) argue that the binary property of Merge can be reduced instead of being defined in it. 25 under X'—theory. (8.b) indicates that it creates a new larger category (by projecting one of the two merging categories.) (8.c) means that it deletes two merging categories from partial phrase structures after creating a new category. So Merge is understood as an operation to reduce the number of phrase markers (or syntactic objects)3. Actually, Merge iterates until a single syntactic object is left‘. (We will see an example of this in detail later. The syntactic objects can be defined as in (9). (9) (CF. (5), Chomsky 1995 p.243) a. lexical items b. 2 = {w,{x, y}}, where x, y, and z are objects and w is the label of 2. First of all, let us look at the form, 2 = {w, {x,y}}. Z is a set which is constituted of x and y, and understood as a phrase marker. x and y are called terms. W is the label of 2, representing the type of z. W is determined by projecting either x or y exclusively or asymmetrically. For any structure K the terms can be defined as in (10). 3Bobaljik (1995) takes a different position for Merge. .According to him, Merge does not eliminate the merging categories, but simply creates a new category. Thus all partial phrase structures are accessible by Merge for further computations, although they are once merged and contained in a larger category. See Bobaljik (1995) for detail. ‘This fact can be reduced from Kayne’s (1994) Linear Correspondence Axiom (LCA). See Kayne (1994) for details. 26 (10) (=(10), Chomsky 1995 p.247) a. K is a term of K. b. If L is a term of K, then the members of the members of L are terms of K. Let us take some examples of Merge. Let x=[vsaw] , y=[NMary] , and x be projected. Then 2 = {w, [x,y]} can be computed as in (11), and informally diagrammed as in (12)5. (11) a. Merge [vsaw] and [Mary]. -> (8.a) b. Create V’ = {V, {[vsaw], [NMary]}}. -> (8.b) c. Eliminate [vsaw] and [,Mary]. -> (8.c) (12) V'(=V-type) / \ [vsaw] [NMary] As defined in (9) V’ = {V, {[vsaw], (fiMary]}}, [Vsaw], and [Mary] are all syntactic objects, and V’ has a label of V (i.e. a V-type syntactic object). Also, as defined in (10), V’, [vsaw] , and [Mary] are the terms of V’ . In (11) V’ is a simple object, since the terms [vsaw] and [,Mary] , are terminal strings or lexical items. However, if x:45John], y=V’ in (11), and y is projected, then the form 5Following Longobardi (1994) proper nouns should be treated as a DP like th LwMary1]. But we will here assume proper nouns to be NP's just for simplicity, because it is irrelevant for our discussion. 27 z={w,{x,y}} can be a complex form as in (13). (13)6 a. Merge [wJohn] and V’ in (11). -> (8.a) b. Create vp={v, {[NJohn], v'}}. -> (8.b) c. Eliminate [NJohn] and V'. -> (8.c) (14) VP(=V—type) / \ [NJohn] V’ / \ [vsaw] [NMary] In (13) V’ is a term of VP, and so all the terms of V’ are also the terms of VP, as defined in (10). Turning to (8), we have to make (8.c) clear. For this we have to assume that there should be a set of partial structures for a derivation which is accessible by the computational system. This set is different from a numeration. Select accesses the lexical items in the numeration, and those lexical items are entered into a set of partial phrase structures for a derivation. The ‘Although Chomsky (1995) assumes a VP shell for transitives, as in (i), just for simplicity we will ignore it for a moment. (1) VP / \ NP V’ I /\ John V VP |/\ 28 computational system manipulates the objects in a set of partial structures. In the case of (11), for example, the computational system accesses <[vsaw] , 1> and <[NMary] , 1> in the numeration, reduces their indices by 1, and puts [vsaw] and [Mary] into a set of partial structures for a derivation. Then the set S of partial phrase structures is: (15) S = {[vsaw] , [NMary] } . If only the suboperations (8.a) and (8.b) of Merge apply to the set S, then the set S changes into S’ as in the following: (16) S’ = {[vsaw], [5Mary], V7}. If (8.c) applies to S’, then: (17) s"={v'}, since it eliminates [vsaw] and [Mary] . As a consequence of (8.c), Merge should apply only at the root. If all the indices in the numeration become zeros, and the set of partial phrase structures contains only one object, then the derivation can be a potentially legitimate object for the interface levels. In other words, all lexical items in the numeration must be contained in one 29 phrase structure in order to be interpreted at PF and LF. Chomsky (1995) claims that, like Select, Merge should also be costless in terms of economy on derivations. 1.2.3 The Status of X’-theory Chomsky (1970, 1986), Jackendoff (1977), and others developed the X’-schema, recognizing the endocentricity of syntactic categories (N, V, A, P, I, and C), the inherent properties between a head and its maximal phrase, and the structural parallelism across syntactic categories. As a consequence we could eliminate the redundancy of lexical properties and phrase structure rules, and language-specific construction rules along with the concept of the parameter of headedness of universal grammar, and develop some properties of local domain and relations in syntax. In the minimalist framework, however, Chomsky (1994, 1995) reconsiders X’-theory on the assumption that even the X’- format is derivable from other properties and so is eliminable from the grammar. Chomsky (1994) argues that categorial projections should be understood as "relational properties of categories, not inherent to them" (Chomsky (1994) p.9). So whether a category is a maximal, minimal or intermediate projection should be determined in the structure where it occurs. Given a phrase marker, maximal and minimal 30 projections are defined in (18). (18) (=Chomsky (1994) p.10) a. A category that is not any further projected is a maximal projection XP. b. A category that is not a projection at all is a minimal projection X°. c. Any other projection is an intermediate projection X’ (which is invisible for the computations and the interface levels. If a lexical item.[wmohn] is selected from the lexicon, for example, in traditional generative grammar it should always be projected as in (19) by a nonbranching operation in order to satisfy X'—theory. (19) a. [gJohn] b. [wngohnll c . [NP [w [NJohn] ] ] However, this is no longer true in the minimalist program. The computations such as Select and Merge do not perform a nonbranching projection at all. As we will see later, Move and Delete/Erase do not render a nonbranching projection, either. It is not defined in the minimalist program. 31 Now let us take (11) and (13) into consideration again. In (11) the ternilgMary] is understood as a maximal and minimal projection at the same time, since it is not further projected and not a projection at all, as defined in (18). The teranQsaw] is a minimal projection, since it is not a projection at all; but it is not a maximal projection, since it is projected to V'. The projected category V’ is not a minimal projection, since it is a projected category, and it is a maximal projection, since it is projected but not further projected. The status of V’ is a maximal projection in the minimalist program. Without confusion, (11) can be expressed in (20). (20) VP = {V, {[vsaw], [NMary]}}. In the case of (13), the ternllkJohn] is minimal and maximal like [,Mary] in (11). However, the term V’ in (13) is not a maximal projection at this time, since it is further projected to VP. It is not minimal, either, since it is a projected category. So V’ is understood as an intermediate category which is not visible to the computational system for further access. As we have seen above, the status of categories is differently interpreted at different stages of the computation, depending upon the categorical relation with other terms in a structure. 32 1.2.4 Move An operation Move is also assumed to be conceptually necessary to rearrange the order of phrases. The Move operation can be defined as in (21). (21) (=Chomsky (1995) p.250) Suppose the category 2 with terms x and y. Then: a. take x; b. target y; c. raise x; d. form a category 1 = {w,{x,y}}; e. replace y in Z with l; f. form a chain, (xv ta). Note that the operations in (21.a-f) are the internal suboperations of Move. So the Move operation itself should be a single operation, and so the suboperations cannot be interrupted, and the intermediate derivations that the suboperations may generate are not accessible by other computations. Note that for Move, the projection (e.g. w for l in (21.d)) is predictable (i.e. a target must be always projected,) while it may be fixed in language 1 for Merge. (See Chomsky (1995) chapter 4 for details.) Although it is not yet clear which conditions the Merge operation should satisfy, the Move operation is required to 33 satisfy some principles of UG. First of all, unlike other operations, it is subject to the economy conditions such as the Last Resort Condition, the Minimal Link Condition, and the Procrastinate Principle, (which will be discussed in section 1.4). Second it should satisfy some conditions on chain formation as in (22). (22) a. c-command: a head of a chain must c-command its trace. b. uniformity condition: (=(17) a chain must be uniform with regard to phrase structure status, where the phrase structure status of an element is its relational property of being maximal, minimal or neither. The conditions on chain formation implies two important things for Move: (1) Move must raise (cannot lower) a syntactic object; (ii) Move must project the targeting syntactic object (i.e. it can never project the raising syntactic object). On the other hand, Move leaves a trace. The trace is understood as an identical copy of the head of the chain. The copy theory of Move accounts for reconstruction effects at LF. (See Chomsky (1993) for the consequences of the copy theory for Move.) Now let us take object raising for the example of Move. 34 In the structure of (23) , the object [NMary] raises to target AgrPg at LF, taking the internal operations of Move in (21), forming the structure of (25), where irrelevant elements and operations are ignored7. (23) a. TP / \ [NJohn]i T’ / \ T Agrpo / \ Agro VP / \ ti v' / \ [vsaw] [,Mary] b. TP = {T, {John, T’}} T’ = (T. (T: AgrPoH AgrPo = {Agro {Agror VP}} VP = {T,, v'} V’ = {V, {[vsaw], [Mari/1}} (24) a. Take ("Mary/J; b. Target AgrPo; c. Raise [NMary]; 7Chomsky (1995) no longer takes for granted AgrS and Agro as independent functional categories. See Chomsky (1995), section 4.10, for detail. 35 d. Form AgrPlo={Agro, {[NMary], AgrPo}}; e. Replace AgrPoin TP = {T, {T, AgrPo}} with AgrPlo; f. Form a chain [MflLOMaryj [v.[vsaw] [NMary]j]] (25) a. TP / \ Johni T’ / \ T AgrPlO / \ Maryj AgrPo / \ Agro VP / \ t:i v' / \ [vsaw] tj b. TP {T, {John, T’}} T' (T. (T. Agrplo}} AgrPlo = {Agrw {Mari/j, AgrPo}} AgrPo = {Agra {Agror VP}} VP = {T,, v'} V’ {V, { [vsaw] , [NMary] }} As we have mentioned before, Move is costly and hence subject to economy principles. The question arises of why 36 UG takes Move if it is costly. We will discuss Chomsky’s answer to this question in section 1.3. In section 1.4 we will discuss how to minimize the cost of Move, once it is required to take place. 1.2.5 Delete/Erase Although some lexical items exist during derivation, they seem to be invisible at the interface levels. Consider the following sentences: (26) a. It seems that John likes Mary. b. John likes Mary. c. Who does John like? If we represent (26.a) for semantic interpretation, it would be geem(like(gghn,Mary)). The expletive it in (26.a) does not affect semantic interpretation, although it exists in syntax. The Agr feature of the verb likes (i.e. 3rd person and singular) also seems to be invisible to semantic interpretation if like(Jghn,Mary) or like§(gohn,M§ry) does not matter for the semantic representation of (26.b). If it is correct that a trace is the copy of an moved element, the trace of who in (26.c) also seems to be deleted at PF, although it is visible during derivation and at LF. In this sense Delete/Erase is conceptually necessary in language. : old r.\ r\! .V‘ 37 The Delete and Erase operations are to make invisible the elements that are uninterpretable at the interface levels in order to satisfy the output conditions. Delete leaves the structure unaffected but marks some elements as invisible at the interfaces. Although the deleted elements are invisible to the interface levels, they are still accessible to the computational system, and further computations can manipulate them. On the other hand, the operation Erase marks the elements as completely invisible to the interface levels and the computational system at all, and the computation cannot further access them. Delete/Erase have some empirical and theoretical consequences along with the copy theory of a trace. Consider the following sentence: (27) (=(41) Chomsky 1995 p.206) John wondered [alwhich picture of himself] LwBill took t]] Following Chomsky (1993, 1995), sentence (27) is ambiguous in two respects: one is that the reflexive himself can take either fighn or Bill as its antecedent; and the other is that the phrase take picture can be interpreted either idiomatically ("photograph") or literally ("take it away"). If himself takes Bill as its antecedent, the idiomatic and literal interpretations are permitted; if 38 himsolf takes Jooo as its antecedent, however, only the literal interpretation is permitted, but the idiomatic interpretation is disallowed. To explain the correlation between reflexive binding and idiom interpretation, Chomsky (1993, 1995) argues that (27) has two LF representations as in (28): (28) a. John wondered [flIwhich x, x a picture of himself] LwBill took x]] b. John wondered L”[which x] LwBill took [x picture of himself]]] In the LF representation (28.a), John, not Bill, can be the antecedent of the reflexive himself by condition (a) of the binding theory, and in the representation (28.b), 8111, not gooo, can be the antecedent of himself by the same principle. In addition, Chomsky assumes that an idiom should be present as a unit at LP to undergo idiom interpretation. In the configuration of (28.b) the phrase gake o picture can be either literally or idiomatically interpreted, since it is present as a unit at LF, but in (28.a) we have only the literal interpretation of pogo. To derive the LF representations in (28), Chomsky claims that the Move of which picture of himself leaves a copy of itself as a trace, as shown in (29). After Spell- 39 out the computational system deletes part of either the higher or the lower copy of the chain, generating (28) at LF. (29) John wondered [alwhich picture of himself LnBill took [which picture of himself]]]] In addition, the Delete/Erase operation plays an important role with feature checking in the minimalist framework. We will discuss this in the next section. 1.3 Bare Output Conditions and the Computations Given an array of lexical items, the computational system starts to construct a derivation. At some point of the derivation Spell-out splits this derivation into a pair of linguistic expressions . P consists of the PF objects, and L consists of the LF objects. The objects of a derivation should be legitimate objects at the relevant interface level. That is, the PF objects should be interpretable at PF, and the LF objects should be interpretable at LF. If P contains only legitimate objects which are interpretable at PF, P is said to converge at PF; otherwise, it crashes at PF. If L contains only legitimate objects which are interpretable at LF L is said to converge at LF; otherwise, it crashes at LF. 40 A derivation should thus converge at both the interface levels, PF and LF; otherwise it crashes. In this section let us consider what "interpretable" means, which elements are interpretable at which interface level, and how the interpretability at the interface levels affects the computations. 1.3.1 Features and Their Interpretability A lexical item is supposed to be F, a set of features. Selecting a LI indicates that F of that LI is selected. For the features F of LI, Chomsky (1995) distinguishes three types of features: phonological features, semantic features, and formal features. The phonological features are interpretable only at PF, and formal and semantic features are interpretable only at LF, and not vice versa. The set of features interpretable at PF are represented as PF(LI), and those interpretable at LF are represented as LF(LI). Of LF(LI), the formal features PF(LI), not semantic features, are accessible to the computational system, too, and play a crucial role in the minimalist program. PF(LI) contains categorial features such as N, A, V, P, D, Etc., Case features such as Nominative, Accusative, Etc., tense features such as Present and Past, agreement features such as number, gender and person, and presumably other features for binders, controllers, and operators. 41 Chomsky (1995) argues that some of PF(LI) are interpretable at LF, and some others are not, although all FF(LI) are accessible to the computational system. He descriptively classifies PF(LI) as [+ Interpretable] and [- Interpretable], as in (30). (30) a. [+ Interpretable]: (i) all categorial features: N, A, V, P, D, etc. (ii) agreement features of nominals (D and N): number, gender, and person. b. [- Interpretable]: (i) sublabel of the target“: strong features, affixal (ii) all non-nominal agreement features (iii) all Case features. In the subsequent sections we will see how interpretability has effects on the computations. 1.3.2 Spell-out, and PF and LF Branching In order to converge at the interface levels or satisfy 8The features associated with the label is called sublabels. Formally speaking: (i) (=(30) Chomsky (1995) p.268) A sublabel of a category K is a feature of H(KLMMN where H(K)Mm,is the zero-level projection of the head H(K) of K. 42 output conditions for PI, a derivation should contain only the features which are interpretable at the interface levels. If it contains some uninterpretable features, they should be eliminated by some computations. Otherwise it would crash. For example, phonological features must be eliminated at some point of mapping a numeration to LF, since they are not interpretable at LF; likewise, formal and semantic features must be eliminated at some point of mapping a numeration to PF, since they are not interpretable at PF; otherwise the derivations would crash. For this Chomsky (1995) assumes that there is an Operation, Spell-Out. At some point of a derivation, Spell— out applies to the structure S already formed, and strips phonological features away from S, leaving the others behind, which the computational system continues to map to LF. Further, he assumes that Spell-Out maps S to the Morphology component, which maps it to PF, eliminating non- phonological features, i.e. formal and semantic features, with one exception: strong features cannot be eliminated by PF computation at all. Hence a strong feature must be eliminated before Spell-out. Regarding the LF mapping, after Spell-out, the derivation now contains only formal features PF(LI) and semantic features, PF(LI) having been eliminated. The computational system continues to map this derivation to LP. Yet the derivation may contain some PF(LI) which are 43 uninterpretable at LF. If they are not eliminated by the LF computation, the derivation will crash at LF. The elimination of the uninterpretable features, PF(LI), is closely related to feature checking. In next section we will discuss feature checking and Move-F. 1.3.3 Feature Checking and Move-F Languages require some formal features of one category to agree with those of other categories. For example, the agreement features of the subject should match with those of a verb of the predicate; the Case feature of the object should match with that of the verb. This feature matching mechanism may be called feature checking. For feature checking to be successful, it should satisfy two conditions: First, feature checking should happen in some local relation between a checker and a checkee. For example, in (17) the Agr feature of the embedded subject John cannot be checked by that of the matrix verb believes. (31) *They believes that John kissed Mary. Chomsky (1995) assumes that a feature checker should be a head or an adjunction to a head, and a checkee should be in a spec of a checker, an adjunction to the maximal 44 projection of a checker, or an adjunction to a head of a checker. This can be exemplified in (32). (32) XP / \ YP XP / \ WP X’ In (32) a head H is adjoined to a head X; ZP is the complement of two segments ; WP is the specifier of a head X; and YP is an adjunction to the maximal projection XP. In this configuration, the head X is a checker; YP and WP are checkees of X; H can be a checkee of X but can also be a checker of YP and WP. But a category ZP cannot be in checking relations with X and H at all. The second condition for feature checking is that a formal features should commonly exist in PF(LI) of a checker and its checkee. In addition, once they are checked, all the features common to a checker and a checkee should be checked. Suppose that X has a Case feature in (32). In order for X to be in a checking relation with WP, WP should 45 also have a Case feature. If WP and X have more common features, they should be also checked when their Case features are checked. Chomsky (1995) assumes that feature checking is a Delete/Erase operation. That is, if a feature is checked and is uninterpretable, it is deleted, and further erased if possible for convergence at the relevant interface levels. Let us take (33) into consideration for the checking theory. (33) John kissed Mary. First, let us suppose that (34.a) has been derived by the computations. The head T has a strong D feature which is uninterpretable at PF and at LF, and so it must be presumably eliminated at this point. But T does not have a checkee yet. Move raises oohp, targeting T, and derives (34.b). At this point goho is in the checking domain of T, and hence feature checking is possible if oooo and T have common features. In this case FF(T) has a strong D feature and a Case feature, and FF(John) also has a D feature and a Case feature. Now FF(John) and FF(T) can be in checking relation. Furthermore, the uninterpretable features (i.e. the D and Case feature of FF(T) and the Case feature of FF(John)) are deleted, once they are checked. However, the D feature of FF(John) remains undeleted because it is an 46 interpretable feature. The result can be represented in (34.c). Now this derivation can converge at PF, and hence spells out. But this derivation cannot converge at LF yet, since it contains some uninterpretable features, i.e. the Case, Agr and Past feature of FF(kissed) and the Case feature of FF(Mary). To eliminate those features, first, at LF Move raises kioooo, targeting T, and derives (34.d). Now FF(kissed) can be a checker of FF(John), and at the same time a checkee of FF(T). The Past feature of FF(kissed) is in the checking relation with that of FF(T), and is deleted. The Agr feature of FF(kissed) is in the checking relation with that of FF(John), and is deleted. The derivation can be represented as in (34.e). After that, FF(Mary) moves to T at LF, deriving (34.f). After the Case feature of FF(kissed) is checked with FF(Mary), the final derivation, (34.9), converges at LF. (34) (a symbol ** indicates a strong feature, and * a non-strong uninterpretable feature.) 47 a TP / \ T VP / \ [NJohn] V’ / \ [vkissed] [NMary] FF(T) = {**D, *Nom, Past} FF(John) = {D, *Nom, Agr} FF(kissed) = {V, *Past, *Acc, *Agr} FF(Mary) = {D, *Acc, Agr} b. TPl / \ [NJohni] TP / \ T VP / \ ti V’ / \ “kissedl hMary] FF(T) = {**D, *Nom, Past} FF(John) = {D, *Nom, Agr} FF(kissed) = {V, Past, *Acc, *Agr} FF(Mary) = {D, *Acc, Agr} 48 C. TPl / \ [NJohni] TP / \ T VP / \ t, v' / \ [vkissed] [NMary] FF(T) = {Past} FF(John) = {D, Agr} FF(kissed) = {V, *Past, *Acc, *Agr} FF(Mary) = {D, *Acc, Agr} 49 d . TPl / \ [NJohni] TP / \ T VP / \ / \ hkissedfl T ti V’ / \ tj lpMary] FF(T) = {Past} FF(John) = {D, Agr} FF(kissed) = {V, *Past, *Acc, *Agr} FF(Mary) = {D, *Acc, Agr} 50 e. TP1 / \ [NJohni] TP / \ T VP / \ / \ [vkissedj] T ti V’ / \ tj [NMary] FF(T) = {Past} FF(John) = {D, Agr} FF(kissed) = {V, *Acc} FF(Mary) = {D, *Acc, Agr} 51 f. TP1 / \ [NJohni] TP / \ T VP / \ / \ [NMaryk] T ti V’ / \ / \ [vKissedj] T tj tk FF(T) = {Past} FF(John) = {D, Agr} FF (kissed) = {V, *Acc} FF(Mary) = {D, *Acc, Agr} 52 g. TP1 / \ [NJohni] TP / \ T VP / \ / \ [NMaryk] T ti V’ / \ / \ [vKissedj] T tj tk FF(T) = {Past} FF(John) = {D, Agr} FF(kissed) = {V} FF(Mary) = {D, Agr} Since Move is assumed to be driven only by feature checking, Chomsky (1995) proposes that the minimal operation of Move, then, should move only the feature F to be checked. Move should raise FF(LI) to its target if possible rather than a LI itself. This operation may be called Move-F, which replaces Move-a which raises the whole LI itself. Move-F can be defined as follows: (35) (=(28) Chomsky 1995) p.265) Move—F carries along FF(F), where F is a feature of a lexical item LI, and FF(F) indicates all formal features of LI. 53 Chomsky (1995) argues that if Move-F raises the formal features of LI overtly, PF convergence requires F to carry along with the whole LI. If Move-F raises F covertly, only FF(F) is raised to a target, leaving its LI behind. Whether FF(F) carries the whole LI is determined presumably by morphological properties, output conditions, and economy principles. Further, covert feature raising adjoins FF(F) to the head of the target, although overt FF(F) raising should target an XP or X of the checker, depending upon the status of the category to be checked. For example, to derive (36.a) for LF, the computational system takes the derivations as shown in (36.b). (36) a. John kissed Mary. b. (1) VP / \ V NP | I kissed Mary 54 (ii) VP kissed ti NP Mary (iii) VP NP V’ John Vi VP kissed ti NP Mary 55 (iv) TP / \ L1j T’ / \ T VP / \ kissed ti NP Mary (v) Spell-out 56 (vi) TP / \ LI(John)j T’ /\ T VP /\ /\ FF(Mary) T NP V’ /\|/\ FF(kissed) T tj Vi VP l / \ kissed ti NP Mary In the above derivations oooo is overtly raised to the spec of TP, since an English tense, T, has a strong D feature which is uninterpretable at PF and LF. This overt Move-F carries everything for Jooo, i.e. LI(John). After that, the derivation is spelled out and the covert computations continue to map it to LF. At LF, Move-F raises V, i.e. FF(kissed), to T by adjunction, in order for the tense feature of V to be checked by T, and for the Agr feature of V to check that of John. Then Move-F raises the object, i.e. FF(Mary), to T by adjunction for Case and Agr feature of the object to be checked by those of V. As we have mentioned in section 1.2, Move is costly, 57 and thus is subject to economy conditions. In next section let us consider the relationship between Move and economy conditions. 1.4 The Economy Principles In the minimalist program a derivation must satisfy bare output conditions for convergence. But satisfying output conditions is necessary but not sufficient for it to be evaluated as syntactically well-formed. It must also be optimal. For a derivation to be optimal, according to Chomsky (1993, 1994, 1995), it must satisfy some economy conditions: the Last Resort Condition, the Minimal Link Condition, and the Procrastinate Principle. In this section let us consider these three economy conditions. 1.4.1 Last Resort It has long been recognized that an operation Move is a last resort operation: it takes place only when it is forced by some necessity, i.e. to satisfy some conditions. In sentence (37), for example, Jooo moves to the spec of TP; it must take place as a last resort to satisfy the Extended Projection Principle, the Case theory, and presumably other conditions; (37) would be otherwise ungrammatical. 58 (37) [TpJohni Lmt, Lpsaw Mary]]] Under the economy considerations, it is natural that a costly computational operation must be driven by some necessity, i.e. to satisfy bare output conditions for FI; otherwise a derivation would fail to converge. In this sense a last resort condition has been understood as an economy condition: the less number of costly operations a derivation takes, the more economical it is. (Chomsky (1993, 1994, 1995), Chomsky and Lasnik (1993)) In the minimalist program, satisfying bare output conditions by movement means eliminating uninterpretable morphological features in checking. So Chomsky (1995) defines the last resort condition for movement in terms of Move—F, as in (38). (38) (=(51) Chomsky 1995 P.280) Move-F raises F to target K only if F enters into a checking relation with a sublabel of K. Now consider (37) again under the definition of (38). Targeting T, Move-F raises the D feature of FF(John) which carries along FF(John). It observes (38), since the D feature of FF(John) that Move-F raises is in the checking S9 relation with the strong D feature of FF(T)9. Now suppose that Move-F raises FF(John) to target C as in (39). However, no feature of FF(John) is in the checking relation with FF(C). So this raising is superfluous and so violates (38). (39) * [CPJohni [wti [Wti [v.saw Mary] ] ]] . Last Resort as defined in (38) can also permit (40) accidentally. Move-F raises the D feature of FF(John) in the spec of the embedded TP to target the matrix T whose FF(T) contains a strong D feature. It observes (38), since the D feature of FF(John) is in the checking relation with the strong D feature of FF(T). But (40) is ungrammatical, not because it violates an economy condition but because it still contains uninterpretable features. That is, although the D feature of FF(John) is in the checking relation with the strong D feature of FF(T) in the matrix sentence, the Case feature of FF(T) cannot be in the checking relation with the Case feature of FF(John), since the Case feature of FF(John) is not available in the spec of the matrix TP: it has been checked and deleted by the Case feature of FF(T) in the embedded sentence. 9When checked, the Case feature, a "free rider" of FF(John) is also checked by the Case feature of FF(T). 60 (40) * {T,,Johni seems [TPt’i [thi [v.saw mary] ] ]] . Thus Last Resort in (38) and feature interpretability can successfully block superfluous movement. 1.4.2 Minimal Link Condition While Last Resort determines whether movement is necessary or not, the Minimal Link Condition (MLC) determines which one should move if more than one category can satisfy Last Resort at the same time. For example, (41.a) and (41.b) both satisfy FI and Last Resort. However, (41.b) seems to violate some other condition on derivations, while (41.a) satisfies it. (41) a. Who did you tell t that John met who? b. *who did you tell who that John met t? Chomsky (1995) attributes the ungrammaticality of (41.b) to a violation of the MLC. The MLC is defined as follows: (42) (cf.(110) Chomsky 1995 P.311) Move—F raises F of x to target K only if there is no y, y closer to K than x, such that y raises to K. 61 (43) (Chomsky 1995 P.358) y is closer to the target K than x if y c-commands Xe To target the matrix CP in (41), two WH-phrases, the complement of tell and that of meg, are the competing candidates for Move, since both can satisfy the last resort. But the former is closer to the CP than the latter. Thus (41.b) violates the MLC. Under economy considerations, the MLC is also an economy condition in terms of the shortest movement: the shortest movement makes the shortest chain links. 1.4.3 Procrastinate Principle It is well-known that French main verbs are overtly raised to T, and that English ones do not. (Emonds (1978), Pollock (1989), Chomsky (1991)) (44) Jean embrasse souvent Marie. John kisses often Mary "John often kisses Mary." (45) John often kisses Mary. Following Chomsky (1993), even English main verbs must be raised to T; otherwise the tense feature and Agr feature 62 of FF(V) would cause (45) to crash at LF, since they are uninterpretable at LF. But it cannot move overtly, as shown in (46), although it observes the Last Resort Condition and the Minimal Link Condition. (46) *John kisses often Mary. Chomsky claims that (46) violates an economy condition- -the Procrastinate Principle, which is defined in (47). (47) Minimize overt Move-F. Under economy considerations, this principle assumes that overt operations cost more than covert Operations. 63 2. Localizing Derivational Economy 2.1 Introduction In the framework Of the minimalist program, all syntactic Operations must uniformly satisfy economy conditions. Under the economy considerations syntactic derivations must be Optimal. In order to be Optimal, a derivation must Observe three types of derivational economy conditions (Chomsky 1993, 1994, 1995): (1) a. Minimize computational Operations. b. Minimize chain links. c. Minimize overt Operations. Condition (1.a) is the property Of greed/last resort of movement; (1.b) adopts the characteristics of Chomsky’s (1973) superiority effects and Rizzi’s (1990) relativized minimality effects--what may be called the Minimal Link Condition or Shortest Move, and (1.c) is the timing principle Of movement--the Procrastinate Principle. The first two conditions are related to the matters of whether to move or not, and which element to move, and the third condition is related to when to move. 64 In this chapter we will fully consider where the derivational economy conditions in (1) should apply for an optimal derivation. Most of the recent studies of economy principles assume that derivational economy should apply at the interface levels. This may be called global economy. In this chapter we instead propose local economy under which derivational economy should apply locally at each point of a derivation. First Of all, section 2.2 discusses the motivation Of global economy and its problems in comparison with local economy. Section 2.3 eliminates some assumptions such as (2) and (4) which motivate global economy, and instead makes the following proposals as in (3) and (5) for local economy. (2) The more Operations a derivation takes, the more costly it is. (3) The more superfluous Operation a derivation takes, the more costly it is. (4) Merge is costfree, and Move is costly. (5) Merge and Move are both equal in cost. Furthermore, we propose the Earliness Principle as a local economy condition on derivations, as stated in (6), and attempts to replace with it the Procrastinate Principle Which is a global economy condition in nature. 65 (6) The Earliness Principle Satisfy bare output conditions as early as possible. 2.2 Global Economy: the Motivations and Problems 2.2.1 A Distinction between Global and Local Economy Chomsky (1995 pp.220-221) proposes that economy conditions must hold only of convergent derivations. In other words, the computational system generates three relevant sets of derivations at an interface level: D, Dc, and DA. The set D is the set Of all the possible derivations that the computational system can generate, regardless of whether or not they converge at that interface. The set Dc.is the set of convergent derivations among the set of derivations in D which satisfy the interface conditions for Full Interpretation. 30 Db is a subset of D. The set DA is the set of admissible derivations among the set of convergent derivations EL which satisfy the economy conditions. Thus DA is a subset of DC. It indicates that the economy conditions apply at the interface levels to select optimal derivations. This may be called a global economy condition. We distinguish global economy from local economy as in 66 the following:10 (7) Global Economy Derivational economy should apply at the interface levels or representations so that it selects a derivation (among convergent derivations) that takes the most economical Operations. (8) Local Economy Derivational economy should apply at each point of a derivation so that it selects the most economical operation to affect the target at that point. While local economy evaluates the optimality of derivations locally during the course of derivation, global economy applies at the interface levels, and selects a set of optimal derivations, examining the derivational history Of convergent derivations. In general, the following assumptions for measuring the optimal Operations motivate global derivational economy. (9) a. The fewer Operations a derivation takes, the more economical it is. b. Merge is more economical than Move. c. Covert Operations are more economical than overt 10See also for definitions of local economy: Collins (1995) and Ura (1995). 67 operations.11 In the subsequent sections we will consider the assumptions in (9) in detail. 2.2.2 The Last Resort Condition and Global Economy The Operation Move has long been assumed to be a last resort Operation in language. It should be driven only by some (morphological) necessity. For example, in (10) Joho moves to the spec of TP; it must take place to check the strong D feature of T, and the Case features of T and Jooo; otherwise (10) would crash. (10) [.erOhni Lmt, Lpsaw Mary]]] On the other hand, in (11) it is unnecessary for John to move to CP, since (11) can converge without that movement. In this sense raising John in (11) violates the last resort for movement. (11) *[c,,Johni {T,,t’i [Wti [v.saw Mary]]]] 11The minimalist program also assumes (i) for measuring an optimal derivation, but this assumption can apply locally without any further modification. (i) The shorter movement a derivation takes, the more economical it is. 68 Chomsky (1993, 1994, 1995) derives this last resort condition from the assumption that Move is a costly Operation, and that the computational system must minimize the Move Operation as much as possible. Under economy considerations, this can be stated as in (9.a), repeated in (12): (12) The fewer Operations a derivation takes, the more economical it is. If we compare (10) with (11) in terms of the movement of John under the assumption, (12), the former takes only one movement Operation, while the latter takes two movement Operations. Hence (10) blocks (11). As Chomsky (1995) mentions: a derivation in which an operation applies is less economical than one that differs only in that the operation does not apply. The most economical derivation, then, applies no Operations at all to a collection Of lexical choices and thus is sure to crash. If nonconvergent derivations can block others, this derivation will block all others... (pp.220-221) however, economy should apply at the interface levels under the assumption, (12); otherwise nonconvergent derivations would be always Optimal. Consider (13) for this. (13) a. *[Cp[Tpseems [Tpto be likely [Tpto [vaohn winlllll. 69 b. [CPLrPJOhni seems [Tpt”i to be likely [Tpt’i to [vpti win] 1 ] l] . If we compare (13.a) with (13.b) in terms of NP movement under the assumption Of (12), the derivation of (13.a) is Optimal, since (13.a) takes no movement at all but (13.b) takes three applications Of Move. If (12) applies during the course Of derivation or applies to all the possible derivations at the interface levels, a nonconvergent derivation would thus block other derivations, and UG would never generate a convergent derivation at all. If (12) applies only to convergent derivations at the interface levels, however, (13.b) cannot be compared with (13.a), and hence becomes optimal, since (13.a) is not a convergent derivation: roughly speaking, the Case features of ooho and Tense, and the strong D feature Of Tense are uninterpretable at the interfaces for PI. Hence under the assumption Of (12), derivational economy should apply only to convergent derivations at the interface levels; otherwise nonconvergent derivations would become always Optimal and block convergent ones during derivation. 2.2.3 The Minimal Link Condition and Global Economy The assumption (9.b), repeated in (14), makes the 7O Minimal Link Condition (MLC) apply globally at the interface levels, although the MLC itself is applicable as a local economy condition. This is motivated by the assumption that Merge is costfree and Move is costly. (14) Merge is more economical than Move. Consider the following superraising case: (15) *John seems that it is likely t to win. Suppose that the computational system has constructed (16) for (15). At this point the computational system has two choices: it can take a Merge operation, concatenating i; to the TP as in (17.a), or it can take a Move Operation, raising oooo to the TP as in (17.b). (16) Lnis likely John to win] (17) a. Lnit [Wis likely John to win]] b. LnJohn [Tis likely t tO win]] If Merge is costfree and Move is costly, and derivational economy applies locally at the point given in (16), then derivational economy will pick (17.a) rather than (17.b) for an Optimal derivation, since (17.a) takes no costly Moves but (17.b) takes one costly Move. If we 71 compare Merge with Move locally in terms Of cost, we can never get the derivation (18), since it is derived from (17.b) which is blocked by (17.a). (18) It seems that John.i is likely ti to win. If derivational economy applies only to convergent derivations at the interface levels, on the other hand, the assumption (14) will then not allow (17.a) to block (17.b) at the point of (16). Then, the computational system will generate further derivations from both (17.a) and (17.b), as shown in (19). Now (19.c) will be Optimal at the interface levels among the derivations in (19). That is, (19.a) crashes because the Case feature of T cannot enter into a checking relation with the Case feature Of FF(it) which has already been deleted by checking by the FF(T) in the embedded sentence; on the other hand, (19.b) and (19.c) are equal in terms Of the number of Operations, since both take one Move Operation and one Merge Operation, but (19.c) takes shorter movement than (19.b) under the MLC. (19) a. *Iti seems that ti is likely John to win. b. *John.1 seems that it is likely ti to win. c. It seems that John.i is likely ti to win. Although the MLC itself can be formulated in a local 72 way, we cannot avoid the global application of the MLC under (14). 2.2.4 The Procrastinate Principle and Global Economy The assumption of (9.c) is a global and also stipulative concept, which we return to in section 3.3 in chapter 3. We repeat (9.c) in (20). (20) Covert Operations are more economical than overt Operations. Let us consider (21) and (22) under the assumption of (20). If (21.a) competes with (22.a) during the derivation, the former is more economical than the latter, and wrongly blocks it, since the former takes no overt movement but the latter takes one overt movement. (21) a. Lmt [woften LWJOhn left]]] b. Spell-out c. [TPJOhni [T.T [vpoften [Wti left] ] ]] (22) a. [TpJohni [T.T [vpoften [thi left]]]] b. Spell-out If we apply (20) at the interface levels, the derivation (22) will become Optimal, since (21) crashes and 73 cannot be compared with (22). In next section we will discuss the problems with global economy in detail, and motivate the adequacy Of local economy for universal grammar. 2.2.5 Some Problems with Global Economy First of all, although derivational economy is a condition on derivations, it cannot block nonoptimal derivations during the derivation, as shown in (13). It must wait until the computational system generates a set of all possible derivations, and the interface conditions select a set of convergent derivations from a set of all the possible derivations. After that, the economy conditions will select some optimal derivation from the set of convergent derivations. In this sense it is hard to look at global economy conditions as derivational conditions; rather, they function like conditions on representations. Second, the global economy in which economy holds only of convergent derivations cannot apply to all economy conditions in a consistent way. For example, it is problematic with the Minimal Link Condition (MLC). Consider (23). (23) a. *[TpJohni seems that [Tpit is likely [TPti to win]]]. 74 b. *[whati did John wonder [prhoj [TPtj bought t,]]]. Both (23.a) and (23.b) are cases that violate the MLC. TO explain the ungrammaticality of (23), Chomsky (1993, 1994, 1995) proposes the shortest Move or Minimal Link Condition as an economy condition as in (1.b) which is repeated in (24) for convenience. Following the MLC in (24), a shorter movement is more economical than a longer movement, and hence blocks it. (24) A derivation must minimize chain links. The ungrammaticality of (23.a) can be explained with the global MLC as follows: The computational system generates a set Of all possible derivations at PF and LF. From this set we would presumably get a set Of convergent derivations as in (25). (25) DC = { (i) It seems that John.i is likely ti to win. (ii) John.i seems that it is likely ti to win.} Then the global MLC would select (i) from DC in (25), since the movement of John in (i) is shorter than the 7S movement Of goho in (ii)”. Now take the ungrammaticality Of (23.b). The computational system generates a set Of all possible derivations for it, and selects a set E% of convergent derivations from this set. But (23.b) is the only convergent derivation at this time. So it must be the Optimal derivation because there is no shorter movement than the movement of poo; among the convergent derivations. Although there is a derivation which takes a shorter movement than (23.b), it cannot block (23.b) if it crashes: (26) [CpWhOj did John wonder [Cpt’j [Tptj saw what]]] The case Of (23.b) makes a strong implication that some economy condition like the MLC must be a local condition on derivations and should not be violated even for convergence”. If this is correct, then the global characteristic can apply to some economy conditions like Procrastinate, and 12It would be more desirable to compare (25.ii) with (i) as below, since they are derived from the same partial derivation (ii). (i) *It seems that t1 is likely John to win. (ii) Lnseems that it is likely John to win] But (i) cannot be compared with (25.ii) under global economy, because it cannot converge at the interface levels. If we compare the movement of ii in (i) with the movement of John in (25.ii), the former would be Optimal, because the movement of i; is shorter than the movement of John. 13Chomsky (1995) also takes this research line in which the MLC must be a local condition and unviolable. 76 cannot apply to other economy conditions like the MLC. Hence the economy conditions become heterogeneous in the grammar. This heterogeneousness seems to be arbitrary. It would be conceptually simpler to have only local economy conditions. Third, for global economy the computational system generates a set of all the possible derivations explosively (or exponentially) and redundantly, regardless Of whether the derivations are Optimal or not. This set should also include the derivational history Of each derivation so that economy conditions can examine the history to select the optimal derivation. If economy is a real condition on derivations, it would be better that economy constrains the computational system tO generate only a set of Optimal derivations, regardless Of whether they are convergent or nonconvergent at the interface levels. Let us take an example for this. (27) Who left? Suppose the computational system derives (27). First, it generates a set D of all possible derivations for the interface levels, as in (28). (We here ignore the possibilities of V-to—C-raising.) (28) D={ (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) 77 LthO left] -> LanwhO left]] -> [gphehmwho left]]] (at PF and at LF), [vaho left] -> [prhoi [Wti left]] -> [cp[TPWhOi [vpti left]]] (at PF and at LF) , [prho left] —> [prhoi [Wti left]] -> [cpwhoi [Tpt’i [vpti left]]] (at PF and at LF) , [th0 left] -> [T1, [vaho left]] -> [prhoi [Tpt’1 [Wti left]]] (at PF and at LF) , [vaho left] -> [T1, [vaho left]] -> [prhoi [.rP [Wti left]]] (at PF and at LF), [tho left] -> [TPleft1L [vaho til] -> [C1,[Tpleft1 [vaho t,]]] (at PF and at LF) , [vaho left] -> [TPlefti [vaho ti]] -> [mwhoj [Tplefti [vptj till] -> [cplmwhoj [Tplefti [Wtj tillll (at PF and at LF) , [vaho left] -> {T,,lefti [vahO ti]] -> [mehoj [Tplefti [vptj till] -> [cpwhoj [Tplt’j [Tplefti [Wtj tillll (at PF and at LF) , [prho left] —> [Tplefti [vaho ti]] —> [cpwhoj [Tplt’j {T,,left:i [vptj t,]]]] (at PF and at LF), [tho left] -> [TPleft1 [vaho ti]] -> [cpwhoj [Tplefti [vptj til]] (at PF and at LF) , 78 (xi) [vaho left] -> [Tp[vaho left]] -> [CP[TP[prhO left]]] (at PF) -> [prho left] -> [.rplefti [wwho ti]] -> [C,,[T,,leftjL [prho t,]]] (at LF), (xii) [prho left] -> [TpWhOi [Wti left]] -> [cpiTPwhoi (wt, left]]] (at PF) ~> [vaho left] -> (”whoi leftj [Wti t]]] -> [epinwhoi leftj [Wti t,]]] (at LP), (xiii) [vaho left] -> [prhoi [Wti left]] -> [prhoi [Wt'i [Wti left]]] (at PF) -> [vaho left] -> [prhoi leftj [Wti tj]] -> [prhoi [TPt’i leftj [vpti tjlll (at LF), (xiv) [vaho left] -> [T9 [vaho left]] -> [prhoi [TPt’1 [Wti left]]] (at PF) -> [vaho left] -> {T,,leftj [vaho tj]] —> [prhoi [Tpt’i leftj [Wti t.,]]] (at LF), (xv) [wwho left] -> [T1, [vaho left]] -> [prhoi Ln [mid left]]] (at PF) -> waho left] -> [TPleftj [vaho t]]] -> [prhoi {T,,leftj [Wti tjlll (at LF). Then the interface conditions select a set DC of convergent derivations from (28): PF selects convergent derivations as in (29), and LF does as in (30). 79 (29) DC {(iii), (iv), (viii), (ix), (xiii), (xiv)} (30) DC {(viii), (ix), (xiii), (xiv)} Now we get a set Eh of derivations which converge both at PF and at LF as in (31). (31) DC:= {(viii), (ix), (xiii), (xiv)} Then the global economy conditions will select a set DA Of Optimal derivations from (31). Procrastinate selects DC as in (32) from (31). (32) DA = {(xiii), (xiv)}“ As we have seen above, like a condition on representations, global economy must allow the computational system to generate an exponential number Of derivations to get a set Of Optimal derivations. Fourth, as we pointed out in the previous section, some derivations which may be optimal during the time of derivation would become nonoptimal ones at the interface levels if they cannot converge, although this does not occur with local economy. If we can apply some derivational economy conditions . “Earliness will further select (xiii) for an optimal derivation in comparison with (xiv). See section 2.3.3 and chapter 3. 80 in a local way during the time of derivation we can get only optimal derivations to reach the interface levels. For example, suppose (33.a) is derived, and more economical than (33.b). (33) a. [TPT [th0 left]] b. [Tplefti+T [vaho ti]] Then (33.a) will block all derivations in which (33.b) is involved (i.e. (vi)-(x) in (28).) Next, (34.a) is supposed to be further constructed and Optimal. Then it will block all derivations in which (34.b) is involved (i.e. (1), (iv), (v), (xi), (xiv), (xv) in (28)). (34) a. [mehoi T [Wti left]] b. [CFC [TPT [th0 left]]] Next (35.a) is constructed and Optimal at this point. Then it will block all derivations containing (35.b) (i.e. (ii), (xii) in (28)). (35) a. [cpwhoi [Tplti T [Wti left]]] b. [whmwho1L T [Wti left]]] Next, suppose that (36.a) is constructed and optimal. 81 It will block (36.b). (36) a. [prhoi [Tmti leftj+T [Wti tj]]] b. [prhoi [.mt1 T [Wti left]]] Now the derivation (36.a) is Optimal and gets to the interface levels for F1. Bare output conditions can interpret it; hence it converges. Thus it is more relevant and desirable if economy is a local condition on derivations, and determines the Optimality of derivations during the time Of derivation and generates only a set of optimal derivations at the interfaces. In next section let us consider local derivational economy in detail. 2.3 Localizing Derivational Economy As we have mentioned in section 2.2, local economy can be stated as follows: (37) Local Economy Derivational economy should apply at each point of a derivation so that it selects the most economical Operation to affect the target at that point. 82 That is, local economy evaluates the Optimality of derivations at each point Of derivation, and selects the most economical Operation at a given point. Thus local economy generates only a set Of Optimal derivations at the interface levels rather than three sets Of derivations, i.e. a set of all possible derivations, a set of convergent derivations, and a set Of admissible derivations, which global economy requires at the interface levels. To pursue local derivational economy, in this section we reconsider the concepts of measuring the cost Of Operations, as in (9), and remedy the concepts that motivate global economy with alternative local measurements for the Optimality Of derivations. (9) is repeated in (38) for convenience. (38) a. The fewer Operations a derivation takes, the more economical it is. b. Merge is more economical than Move. c. Covert operations are more economical than overt operations. 2.3.1 Localizing the Last Resort Condition As shown in section 2.2.2, the assumption (9.a), repeated in (39), requires the economy condition to apply at the interface levels; otherwise "no Operation" would block 83 even last resort Operations. (39) The fewer Operations a derivation takes, the more economical it is. But we should not understand the concept of last resort in such a way that Operations are always costly, and therefore that "no operation" is always the most economical. To measure the most economical Operation, we should consider the necessity Of the Operation. In other words, "no operation" should not block a last resort operation under economy considerations. If our assumption is correct, we eliminate (39) and replace it with (40)”. (40) The more superfluous Operations a derivation takes, the more costly it is. The distinction between "no Operation" and "no superfluous operation" for measuring the cost of an Operation has a desirable consequence for local economy. Consider (41): (41) a. *LnSeems LnJohn to leave]]. b. [TPJOhni seems [wti to leave]] . 15Chomsky attempts to derive (40) from (39), applying economy globally at the interface levels. 84 If we apply (39) to (41.a) and (41.b) during the course of the derivation, then (41.a) will block (41.b): (41.a) takes nO movement at all, while (41.b) takes one movement. We will get a undesirable result. If we apply (39) to (41) at the interface levels, then it will correctly select (41.b), since (41.a) cannot converge at the interfaces. SO (39) forces economy to apply only tO a set Of convergent derivations at the interface levels. If we take (40) for derivational economy, then the prediction will be different. If we compare (41.a) with (41.b) in terms Of (40) during the time of derivation, then (41.a) and (41.b) are both Optimal, since both Of them take no superfluous movement at all. That is, the movement in (41.b) is not superfluous but necessary movement. SO (41.a) and (41.b) cannot block each other in terms of the number of computational Operations. More strongly speaking, we argue in section 2.3.3 that Earliness will select (41.b) for an Optimal derivation in comparison with (41.a), although (40) considers (41.a) and (41.b) to be equally Optimal. (See section 2.3.3 for more detail.) For local economy, we adopt Chomsky’s (1995) formulation of the Last Resort Condition on movement as in (42). 85 (42) (Cf. (51) Chomsky 1995 P.280) K attracts F only if F enters into a checking relation with a sublabel of K. This formulation Of last resort can only tell us that movement is legitimate, i.e. whether movement can take place or not. But this will be strengthened by the Earliness Principle in section 2.3.3 so that a last resort Operation should be triggered as early as possible. 2.3.2 Localizing the Minimal Link Condition The assumption of (40) naturally leads us to another way to eliminate one more concept of global economy as in (9.b). We repeat (40) and (9.b) in (43) and (44), respectively: (43) The fewer Operations a derivation takes, the more economical it is. (44) Merge is more economical than Move. Chomsky (1995) assumes that Merge is a cost-free Operation, and Move is a costly Operation“. From our assumption Of (43), however, we can draw the conclusion that 16I cannot see any specific motivation for such distinction between Merge and Move in terms of cost. It seems to be simply a stipulation. 86 Move is not necessarily a costly Operation. That is, if Move is taken by necessity, it can be thought of as a costfree Operation. The same will be true with Merge. If Merge is taken by necessity, it will take no cost at all; otherwise it is also a costly Operation. Thus all syntactic operations, including Merge, Move, and perhaps Delete, are assumed to be last resort Operations so that they "must be driven by some condition on representations" to satisfy FI at the interface levels; otherwise they would crash. (Chomsky 1995 p.28) Then, all computations are considered to be equally costly“. SO, which Operation, Merge or Move, the computational system will take must completely depend upon some other evidence rather than the economy considerations“. If our assumption is correct, we can replace the assumption (44) with (45): (45) Merge and Move are equal in cost. Returning to (15)-(17), repeated in (46)—(48), at the point Of (47) Merge and Move are both equal in cost, since neither Move nor Merge are superfluous at all. So (48.a) 17From Watanabe’s (1995) Avoid Redefinition, we may have the same conclusion. That is, Merge and cyclic Move do not undergo redefinition, and so are equally economical. 18Following Collins (1995), Merge is driven by the fact that both Of the phrases have the property that they must be integrated into the clause. 87 and (48.b) are both available to the computational system for further computation. (46) *John seems that it is likely t to win. (47) Lmis likely John to win] (48) a. Lwit [Tis likely John to win]] b. LwJohn [Tis likely t to win]] Now Chomsky’s (1995) Minimal Link Condition, as formulated in (49), can apply locally during the course Of a derivation. (49) (cf.(110) Chomsky 1995 p.311) K attracts F of X only if there is no y, y closer to K than x, such that y raises to K. (50) (Chomsky 1995 P.358) y is closer to the target K than x if y c-commands X. 2.3.3 Earliness as a Local Economy Condition Among the economy principles, only Procrastinate can hardly be maintained for local economy, defined in (9.c) which is repeated in (51). Procrastinate seems to be global in nature. 88 (51) Covert Operations are more economical than overt operations. TO localize all derivational economy conditions uniformly, we attempt to eliminate Procrastinate from derivational economy, and instead we propose an alternative timing principle, the Earliness Principle, which is independently motivated by cyclic computation (which we will discuss in chapter 3.) Putting aside the motivations of Procrastinate and its problems, and a way Of reducing Procrastinate to Earliness in chapter 3, let us elaborate the Earliness Principle as a local economy condition to strengthen the Last Resort Condition in this section. Following Chomsky (1993, 1994, 1995), the computational system constructs a linguistic expression in an Optimal way where p is a PF Object and l is an LF Object. The concept of Optimality can be considered from various points of view”. The intuitive idea here is that economy is related to how early or fast a derivation can satisfy bare output conditions. This consideration of economy may be called the Earliness Principle. If computation is a process of satisfying bare output conditions at the interface levels, we propose the Earliness Principle as a timing principle and economy condition as in the following: 19See Chomsky (1993, 1994, 1995), Collins (1994, 1995), Fukui (1993), Kitahara (1994), Oka (1993, 1995), Ura (1995), Watanabe (1995) for different views Of economy conditions. 89 (52) The Earliness Principle A derivation must satisfy bare output conditions as early as possible. From our Earliness Principle, (52), we can also derive Pesetsky’s (1989) idea of an earliness principle that movement must take place as early in the derivation as possible. An Operation Move is motivated by eliminating uninterpretable features to satisfy bare output conditions. In other words, for a derivation to satisfy bare output conditions early, all uninterpretable morphological features in a derivation must be checked as early as possible. SO (52) subsumes Pesetsky’s (1989) Earliness which can be repeated in (53) in our terms”. (53) Uninterpretable morphological features must be checked as early as possible. 20We may derive some condition on Merge from our Earliness Principle. Chomsky (1995) claims that Merge should apply only to a root of a phrase structure. For example, the computational system constructs partial phrase structure (i) not as in (ii) but as in (iii): (i) [TPT [vaohn [v.met Mary]]] (ii) (”met Mary] [pr [vpmet Mary]] LWT LWJOhn [wmet Mary]]] -> *Merge [vpmet Mary] hmJOhn [wmet Mary]] . [TPT [VPJOhn [v.met Mary]]] If Merge is g eedy in some sense (Collins 1995), (iii) satisfies the Greed.Principle earlier than (ii). I will leave this question for further research. (iii) OUWOO‘W 90 Formally speaking, we formulate Earliness in (54), relating Attract-F: (54) K attracts F early only if a sublabel Of K is an uninterpretable feature at the interface level that Attract-F affects. Let us consider how Earliness can help a last resort Operation to block "no Operation", as discussed in section 2.3.1. Consider the embedded clausal construction in (55): (55) A professor knows wahether C Lw[ma.student], should [wm, read this book]]] Suppose that the D feature Of English tense T is strong. To satisfy PF output condition, it must be eliminated by raising a spudont, targeting TP, for feature checking. The computational system may take either (56) or (57) for (55). Both Of them are convergent derivations. (56) a. [Tshould Lw[ma.student] [wread this book]]]. b. {T,,[Dpa student], [T.should [vpt, [v.read this book]]]]. c. [c.C {T,,[Dpa student], [T.should [vpt, [v.read this book]]]]]. d. [prhether [c.C {T,,[Dpa student], [T.should (Wt, 91 [wread this book]]]]]]. (57) a. [Tshould [wdma.student] [wread this book]]]. b. [0C [Tshould Lw[ma.student] [wread this book]]]]. c. mehether [0C [Tshould Lw[ma.student] [wread this book]]]J]. d. [prhether [c.C {T,,[Dpa student], [Tshould [vpt, [wread this book]]]]J]. Suppose that the computational system has constructed (56.a) (or (57.a)). At this point the computational system can take two choices: the strong D feature of FF(T) attracts e student, as in (56.b), or it can merge C and TP to GP as in (57.b). Derivational economy, Earliness, picks (56.b) for an Optimal derivation rather than (57.b), since the strong D of FF(T) is uninterpretable at PF and LF, this Attract-F affects PF and LF at the same time, and (56.b) is the earliest point for Attract-F for TP. Hence a last resort Operation becomes Optimal in comparison with no movement under Earliness. As shown above, different from Procrastinate, Earliness selects an optimal derivation in a course of derivation rather than it selects an optimal one among a set of derivations at the interface levels. That is, the derivation (56) blocks (57) at the point of derivation (56.b). 92 2.4 Summary In this chapter we have distinguished local economy from global economy as in (58)-(59), and explored three types Of local derivational economy conditions: the Last Resort Condition, the Minimal Link Condition, and the Earliness Principle, as defined in (60)-(62), respectively. (58) Global Economy (59) (60) (61) Derivational economy should apply at the interface levels or representations so that it selects a derivation (among convergent derivations) that takes the most economical Operations. Local Economy Derivational economy should apply at each point of derivation so that it selects the most economical Operation to affect the target at that point. The Last Resort Condition (=(51) Chomsky 1995 P.280) K attracts F only if F can enter into a checking relation with a sublabel of K. The Minimal Link Condition (cf.(110) Chomsky 1995 P.311) K attracts F Of x only if there is no y, y closer to K than x, such that K attracts y. 93 (62) The Earliness Principle K attracts F early only if a sublabel Of K is an uninterpretable feature at the interface that Attract-F affects. Now at each point of a derivation the computational system takes the most economical Operation which satisfies all the three types Of derivational economy conditions given in (60)-(62), constructing a linguistic expression for PF and LF. As a result, it generates only a set Of Optimal derivations at the interface levels. with global economy: local economy has following advantages in comparison E i It is a kind of It is a strictly representational derivational condition. condition. ii It allows the It allows the computational system to computational system to generate an explosive or generate only a set Of exponential number of Optimal derivations at derivations at the the interface levels. interface levels. iii Some derivations which Some derivations which were Optimal during the are optimal during the time Of derivation may time of derivation are become nonoptimal at the always optimal at the interface levels. interface levels. iv It makes economy It makes economy conditions heterogeneous conditions homogeneous in in terms of unviolability terms of unviolability and locality. and locality. 94 In addition, local economy Offers some unified analyses of various phenomena of natural language. In the subsequent sections we will investigate cyclic computation, Procrastinate effects, some Wh-asymmetries and Wh-adjunct symmetries under local economy. 95 3. Deriving Strict Cycle and Procrastinate Effects 3.1 Introduction In this chapter we derive two seemingly Opposite principles, the Strict Cycle Condition (SCC), a principle for overt computations, and the Procrastinate Principle, a principle for LP computations, from Earliness. Linguists have long Observed that the computational Operations, specially, overt operations, apply cyclically. Sentence (1) is a typical example21 which shows that overt 21Another typical example that the SCC applies to is the case of Wh-island violations. Consider (i). (i) *[CpHow, did [TP,JOhn wonder [prhatj [mBill bought tj tillll Sentence (1) is the one that violates Subjacency, since so! crosses two bounding nodes, i.e. TP1 and TP2, deriving it as in (ii). If the computational system were to derive (i) as in (iii), it would escape the Subjacency violation. (ii) a. [flnBill bought what how] b. [prhat, [mBill bought t, how]] c. [TleOhn did wonder [prhat, [mBill bought t, how]]] d. [Cphow1 did [TP,JOhn wonder [prhat, [mBill bought t, t] 1] (iii) a. [mBjill bought what how] b. [Cphow, [mBill bought what t,]] c. [TP,John did wonder [cphow, [mBill bought what t,]]] d. [Cphow, did [TP,John wonder [Cpt’, [mBill bought what tfl]]] e. [cphow, did [TP,John wonder [prhatj [mBill bought t t,]]]]] If the SCCjapplies to (iii) at S-structure, then it will Prohibit the derivation as expected. However, Wh-island phenomena are more puzzling than (1). SO we Will put them aside until chapter 4. 96 movement must be cyclic. (1) *[CpWhO, was {T,,[Dpa picture Of t,]j sold t]]] Sentence (1) is ungrammatical, since it violates Chomsky’s (1973) Subject Condition. Roughly speaking, the Subject Condition indicates that nothing can be extracted out Of a DP in [DP, TP]. (1) is assumed to be derived as in (2). (2) a. [Cpe was [Tpe [vpsold [Dpa picture of who]]]] b. [Cpe was {T,,[Dpa picture Of who]j [vpsold t,]]] c. [prho, was {T,,[Dpa picture of t,]j [vpsold t,]]] However, if the computational system constructs the derivation of (3) for (1) rather than (2), it can escape from the Subject Condition. (3) a. Lye was Lwe [wsold [ma.picture Of whO]]]] b. [prho, was [me [vpsold [Dpa picture of t,]]]] c. [prho, was {T,,[Dpa picture of t,]j [vpsold t,]]] We cannot constrain derivation (3.b) with Subjacency, since it is possible to extract a category, who, out of [ma picture Of] if it is a complement of a verb as in (4). 97 (4) [CpWhO, did [TpJohn sell [Dpa picture Of t,]]] Traditionally, the Strict Cyclic Condition (SCC), which was assumed to constrain a derivation at S-structure, forces the computational system to build (2), prohibiting (3). However, the SCC is untenable in the minimalist program in which D-structure and S-structure are reduced to PF and LF. In contrast to the SCC, which reflects the timing Of earliness in overt derivations, we also have the Procrastinate Principle which prefers covert Operations. The SCC reflects Earliness in itself, but Procrastinate is Opposite in spirit to Earliness. This chapter explains the SCC and Procrastinate with Earliness in a local way. In section 3.2.1, we review some previous efforts to reduce the SCC in the minimalist program, and in section 3.2.2 explain the SCC effects with Earliness. Before deriving Procrastinate effects, in section 3.3 we discuss the motivations of the Procrastinate Principle, and its problems such as (i) that as a timing principle it cannot explain the timing of overt derivations at all; (ii) that its conceptual motivation is based upon some characteristics of the sensory-motor system rather than on some linguistic properties; and (iii) that its violability is not consistent with the unviolability of other economy 98 conditions such as the Last Resort Condition and the MLC; and (iv) that its global characteristic is also a undesirable property (as described in section 2.2). In section 3.4 we derive Procrastinate effects from Earliness. In Section 3.4.1 we discuss Kitahara’s (1994, 1995) analysis of Procrastinate and its problems, and in section 3.4.2 derive Procrastinate effects from Earliness. 3.2 Deriving Strict Cycle 3.2.1 Previous Analyses and their Problems 3.2.1.1 Extension Condition Chomsky (1993) proposes two operations, Nonbranching Projection and Generalized Transformation, as syntactic Operations. They can be defined in (5) and (6), respectively. (5) Nonbranching Projection (NBP) (=(18), Chomsky (1993) p.21) a. X -> X’ b. X’ -> XP (6) Generalized Transformation (GT) (Chomsky (1993) p.22) a. Target a category x; 99 b. Add an empty category e to x; c. form a new category 2; d. Take a category y, and substitute y for e; e. Form the chain (y,tx) if y is contained in the targeted category x. Chomsky (1993) assumes that all categories should be projected tO'a maximal projection even if there is no specifier and complement: e.g. no branching. This requirement of projection tO a maximal category necessitates the NBP Operation, as exemplified in (7) and (8). (7) [Np [w [Ncats] ] J (8) a . [Ncats] b. [w [Ncats] ] c. [Np [w [Ncats] ] ] Contrary to NBP, a category can be projected to a maximal category or an intermediate category if it has either a specifier or a complement. Thus a branching projection requires Generalized Transformation (GT), as exemplified in (9) and (10). (9) {V,,[diogs] [v.[vchased] [Npcats]]] (10) a-1. [,Chased] a-2. [schased] e 100 a-3. [w [vchased] e] a-4 and b—l. [v. [vchased] [Npcats]] b-2. e [w [Vchased] [upcats]] b-3 . [vpe [w [vchased] [Npcats] ] ] b-4 . [VP [diogs] [v. [Vchased] [Npcats] ] ] Following Chomsky (1993), Move is a subcase Of GT as in (11) and (12). (11) [Tp [diogs] i [T T [vpt, (v. [vsaw] [Npcats] ] l ] l (12) a. (T. T [W, [diogs] [v. [vchased] [Npcats] ] ] ] b. e [T. T {V,,[diogs] [v. [Vchased] [Npcats] ] ]] c . [Tpe [T T [W [diogs] [w [vchased] [Npcats] ] ] ] ] d. [T1, [diogs] [T T [W [diogs] [w [vchased] [Npcats] ] ] l ] e. [Tplupdogsh [T T [vpledogs], (v. [vchased] [Npcats] 1]] 1 Chomsky (1993) proposes that X’-theory constrains the syntactic Operations, NBP and GT. The Operations, NBP and GT, should satisfy the X’-format of (13). 101 (13) Xp XP WP ZP X' In addition, GT should satisfy the Extension Condition (EC) that "substitution Operations always extend their target" (p.23). We may paraphrase the Extension Condition in (14). (14) Extension Condition A branching operation (i.e. GT) should form a branching node which dominates (or contains) all the phrase markers in that phrase structure which it targets. In other words, the branching node which GT creates should be the topmost phrase node in that phrase structure. Let us consider (15). Assume that the computational system has built the phrase structure (15). 102 (15) X’ Y! If at the point Of the derivation Of (15) the computational system targets X’, taking a maximal projection, WP, and constructs (16), it will satisfy the EC, since the newly branching node, XP, dominates the whole phrase structure; that is, XP dominates WP, X’, X, YP, Y’, and ZP. (16) XP WP X' YI If at (15) the computational system targets Y’, taking 103 a maximal projection, WP, and builds (17), however, it will violate the EC, since the newly branching node, YP, does not dominate the whole phrase structure; YP does not dominate X’ and X. (17) X’ / \ / \ WP Y’ / \ Returning to the derivation of (1), the EC correctly blocks the derivation Of (3) which is rewritten in (18). (18) a. [CpWhoj was [Tpe [Vpsold [Dpa picture of t,]]]]? b. [chhOj was {T,,[Dppictures of t,], [vpsold t,]]]? -> *EC Instead, the EC permits (2) which violates the Subject Condition (SC), rewritten in (19). (19) a. [T was [wsold [ma.picture of who]]] b. {T,,[Dpa picture Of who], was [VPsold t,]]? c. [prhoj was {T,,[Dpa picture of t,], sold t,]]? 104 -> *SC Although the EC can predict the ungrammaticality Of (1), it has a problem which can hardly be accommodated in minimalism. First of all, Chomsky (1993) stipulates that the EC should not apply to adjunction Operations like head movement, although they may be branching Operations. For example, in French a verb overtly raises to T as an adjunction, as in (20). This operation extends the head T, but cannot extend T’ which is the highest category which dominates all other categories. (20) a. T’ / \ Second, Chomsky stipulates that the EC must apply only 105 to overt operations—~it should not apply to LP Operations. For example, in English an object is assumed to raise to the spec of Agrb for Accusative, as in (21). This covert movement violates the EC, since it does not extend the highest category CP.22 (21) a- [CP [TPJOhni [AgrPO [Agr’o [VPti [v'saw Mary] 1 1 1 1 1 b. [cpterohm t.....Mary,- (.....tvpt. [v.saw tgmm TO pursue minimalism, the stipulations of conditions should be removed from the theory. Furthermore, the stipulations for the EC cannot be maintained in minimalism. Because DS and SS are not available, it is unclear how the stipulation which distinguishes between overt and covert movement can be stated. Thus the EC should be replaced with another principle, or its stipulative assumptions should be removed in this sense. 3.2.1.2 Target-a Kitahara (1994, 1995) argues that Target-a can remove Chomsky’s Extension Condition, incorporating it with the 22In Chomsky’s (1995) theory Of grammar covert movement is supposed to be an Operation like a head movement. Thus the stipulation for covert movement would be the same as that for overt head movement as we have discussed just above. 106 economy principle. Kitahara (1994) unifies two Operations, NBP and GT, into a generalized targeting Operation, Target- a. The Operation of Target-a is stated in (22). (22) (= 26 Kitahara 1995) Target-a: Target a category a, and a. Build a new phrase structure 8 immediately dominating a. b. Substitute a category B for a newly created empty e external to a. In addition to Target-a, he proposes the following economy principle: (23) If a derivation D1 takes less targeting operations than a derivation D2, then D1 blocks D2. Consider (24). TO derive (24), the computational system can take Target operations in two different ways, as in (25) and (26). (24) {V,,[diogs] [v.[vchased] [Npcats]]] (25) a. [vChased] b-l. [vchased] e b-2. [wlgchased] e] b-3 and c—l. [wlgchased] [meatSJ] 107 c-2. e [wlychased] [mcats]] c-3 . [We [w [vchased] [Npcats] ] ] c-4 . [VP [updogs] lv' [vchased] [Npcats] ] l (26) a. [yChased] b and c-1. [v.hchased]] c-2. e [w[gchased]] c-3 . [We {V, [vchased] ] ] c-4 and d-1. {V,,[diogs] [w [vchased] ]] d—l . [vp [diogs] [w [vchased] e] 1 d-2 . [VP [Updogs] [w [vchased] [Dpcats] ] ] If we compare (25) with (26) in terms of the number of Target operations, (25) takes three targeting Operations, and (26) takes four targeting Operations. So (25) blocks (26), following the economy principle (23). Now let us consider (1), repeated in (27), under the Target-a analysis. Again the computational system constructs (1) as either (28) or (29). (27) *[CPWhO, was {T,,[Dpa picture of t,]j sold t,]]? (28) a. Lrwas sold Lwa picture Of who]] b. Ln[rwas sold [ma.picture of who]]] c. [C. [TP[,..was sold [Dpa picture Of who] ] ]] d. [prho, [(3. [Tp[,.was sold [Dpa picture of t,] ] ] ]] e. [prho, [C. {T,,[Dpa picture Of t,]j [,nwas sold t,]]]i 108 (29) a. Lrwas sold [ma.picture of who]] b. Lw[ma.picture of who]j Lrwas sold tfl] c. [c.hehwa picture of who]j Lrwas sold.tj]]] d. [@WhO, hyLn[ma picture Of tflj [Twas sold t,]]]J If we compare (28) with (29), the derivation of (29) takes one less target operations than that of (28), and thus (29) blocks (28). One of the good things about Target-a is that it selects one or more Optimal derivations even if they are nonconvergent. For example, (29) cannot converge: poo in e picture Of who should raise to CP and check the strong Wh- feature of C, but cannot be extracted out of the tensed subject, which would violate the Subject Condition. In spite Of this, Target-a considers (29) to be Optimal in comparison with (28). Target-a has some problems, however. One is that the economy principle (23) should still apply globally. Suppose that the computational system constructs up tO (28.b) and (29.b). But it cannot block (28.b) at this point, because they have the same number Of target operations up to this point. The computational system further constructs up to (28.d) and (29.d), but both of them still have the same number of target Operations. The economy principle finally selects (29) at the point of (28.e). So target-a has the 109 same problems as the global condition, as we have discussed in chapter 2. Another problem is that target—a is no longer accommodated for Chomsky’s (1994) model of the minimalist program. Target-a can be maintained in the model in which a nonbranching projection operation is available. But Chomsky (1994) eliminates a nonbranching Operation from UG. Let us consider (1) with Target-a in the model of Chomsky’s (1994) Bare Phrase Structures. (30) a. anas sold Lwa picture Of who]] b. two anas sold hwa picture of who]]] c. [prho, [c.C [pras sold [Dpa picture Of t,]]]] d. [prho, [c.C {T,,[Dpa picture of t,]j [,.was sold t,]]]] (31) a. [Twas sold [ma picture Of who]] b. Lm[me.picture of who]j Lrwas sold tfl] c. [CFC {T,,[Dpa picture Of who]j [Twas sold t,]]] (1. [prho, [c.C {T,,[Dpa picture of t,]j [Twas sold t,]]]l As we see in (30) and (31), both of the derivations (30) and (31) have the same number Of target Operations. Thus Target-a fails to make (31) block (30) in the model of Bare Phrase Structures. To fix this problem, Kitahara (1996) prOposes three 110 syntactic operations, Merge, Move, and Replace, for phrase structure constructions, and also decomposes Chomsky’s Move operation into these three. They can be defined as follows: (32) Merge: concatenate two elements. (33) Move: target and raise. (34) Replace: replace. If we consider (1) again, two possible derivations are available: (35) a. (36) a. TP = {T(was) [sold hwa picture Of who]]} -> Merge CP = {C, TP} -> Merge CP1 = {who,, CP} -> Merge and Move TP1 = {Lwa picture of tflj, TP} -> Move and Merge CP = {C, TP,} -> Replace TP {T(was), [sold Lwa picture of who]]} -> Merge TP1 = {[Dpa picture Of who],, TP} -> Move and Merge CP = {C, TP,} -> Merge CP,== {who,, CP} -> Move and Merge (35) takes four Merge operations, two Move Operations, 111 and one Replace Operation, whose total number Of operations are seven. (36) takes four Merge Operations and two Move Operations, whose total number Of operations are six. If we compare (35) with (36), (35) takes one more Operations than (36). Thus (36) blocks (35), adopting the economy principle in (23). In this case, too, Kitahara must assume that the economy principle is a global condition, because this time a derivation with no movement always takes less Operations than a derivation with movement. This indicates that nonconvergent derivations would always block convergent derivations, as we have discussed in chapter 2. In addition, Kitahara (1996) splits one single Move Operation into independent operations. However, Chomsky explicitly defines Move with raise, merge, and replace, all Of which are the internal noninterruptible suboperations of Move, and should not be seen by conditions. (See p.32 in chapter 1. for detail) Kitahara claims that Raise, Merge and Replace are independent operations. Then Move may be interruptible, and so it is not clear they can be counted as Kitahara suggests. 3.2.1.3 Crossing the Number of Nodes Chomsky (1994) and Collins (1994) consider the number of nodes that movement crosses to be costly under economy 112 considerations as in the following. (37) The less nodes movement crosses, the more economical it is. Consider (38). (38) a. [cwas [wsold hwa picture Of who]]] b. [Cwas Lm[ma.picture Of who] sold] Chomsky (1994) argues that the extraction out Of (38.b) is more economical than that out of (38.a), since the WH— phrase goo moves across more nodes in (38.a) than (38.b). If we simply counts the maximal categories that goo crosses, it crosses DP, VP, and TP in (38.b), but crosses DP, and TP in (38.b). This view of economy has at least two problems. First, it cannot explain why (39.a) cannot block (39.b), although the Wh-phrases in (39.a) crosses less nodes than the Wh-phrase in (39.b). (39) a. Which person did John tell to buy which picture? b. Which picture did John tell which person to buy? Second, the economy principle is global again. 113 3.2.1.4 Feature Strength and Cyclicity Giving up explaining cyclicity in terms Of economy, Chomsky (1995) suggests an alternative. Following his argument, a strong feature has two properties: First, it triggers an overt Operation, and second, induces cyclicity. For the first case, the pre-Spell-Out property, a strong feature is assumed to crash at PF and therefore must be removed before Spell—Out. For the second case, cyclicity in overt derivations, a derivation is assumed tO be not able to tolerate strength: a strong feature cannot be passed by an Operation and later checked by another Operation. That is, a derivation D is canceled if D contains a strong feature. In the case of the derivations for (1), only the derivation (2) is a legitimate derivation, repeated in (41), because, as repeated in (40), (3) contains a strong feature within it. (40) a. {C,,Whoj was [Tpe [vpsold [Dpa picture Of t,]]]]? b. {C,,Whoj was {T,,[Dppictures of t,], [VPsold t,]]]? (41) a. [T was [wsold [ma.picture Of who]]] b. {T,.[Dpa picture Of who], was [vpsold t,] ] ? c. {C,,whoj was {T,,[Dpa picture of t,], sold t,]]? -> *SC Again, this statement is completely a stipulation. 114 Thus we cannot answer the question Of why only a strong feature cannot be contained within legitimate derivations. 3.2.1.5 Chain Interleaving Following Chomsky (1993) and Collins (1994), the formation Of chains is one single Operation. For example, who raises to the matrix CP, and forms a chain in (42). (42) a. flywho did [nBill think [@t’ that [John sold a picture of t]]]] b. Then the question is whether the chain formation Of (42.b) should be considered as two instances Of movement, or as one instance Of movement. Chomsky (1993) argues that chain formation should be one instance of movement. If the chain (42.b) is formed as two operations Of movement, the economy considerations would block it with another derivation with one movement--which is less costly-- as shown in (43). (43) a. flywho did LnBill think Lmthat [John sold a picture of t]]]] b. 115 Following Chomsky (1993), (42) and (43) are equally costly if we assume that the chain formation is one movement. Collins (1994) formalizes this characteristics as follows: (44) A chain must not be interleaved (45) Two chains, X and Y, are interleaved if during a derivation, part Of X is formed, then part of Y is formed, then part of X is formed, and so on. (p.47) Following Collins (1994), (44) is derivable from the assumption that chain formation is one single Operation. Then we can apply (44) to explaining (1). Consider the procedures to construct (1). (46) a. [Wwas sold hwa picture of who]] b. [prho, [Wwas sold [DPa picture Of t,]]] c. {T,,[Dpa picture Of t,]j [vaho, [vaas sold t,]]] d. [prho, [Tp[Dpa picture Of t:,]j [vpt’, [pras sold t,]]]] Collins (1994) argues that the derivation of (46) violates (44). But we still have an alternative to derive (1) as 116 follows: (47) a. Lw[nwas sold Lwa picture Of who]]] b. [prho, [pras sold [Dpa picture Of t,]]] c. hwwho, fivfiwa picture of td, was sold tfl] This observes (44), forming the chains as one instance of movement. TO filter out (47), we still need the equivalent Of the SCC we have already seen cannot easily be derived. 3.2.2 Earliness and the Strict Cycle In this section we will argue that Earliness as a local economy condition can predict that overt derivations are cyclic without any further assumptions and stipulations. Consider the derivation (48). The computational system constructs up to (48.a). At this point of the derivation the computational system can possibly take one of (49.a) and (49.b). If it takes (49.b), it would violate Earliness, because this point (49.a) is the earliest time to be able to check the DP and the strong D feature of Tense, although their features can be checked later as in (50) if it takes (49.b). Thus (49.a) is Optimal in comparison with (49.b) at this point Of the derivation, and blocks (49.b) and its subsequent derivations. 117 (48) [wwas sold [ma.picture of who]] (49) a. [Tp[Dpa picture Of who], was sold t,] b. *Lwc [wwas sold Lwa picture of who]]] (50) *[CpWhoj was {T,,[Dpa picture of t,],sold t,?]] (51) [CFC {T,,[Dpa picture of who], was sold t,]] After that, the computational system further builds a phrase structure (51) from (49.a). At this point the strong +wh feature Of C should have attracted any +Wh feature to satisfy bare output conditions. However, the computational system cannot utilize Move any more, since goo is in a Subject Condition island, and so is not accessible (or visible) to the computational system”. If it reaches the interface levels, then it crashes, because the strong feature is not interpretable at PF and at LF. 3.3 Procrastinate: the Background and Problems 3.3.1 Motivations A well known difference between English and French is the word order Of finite verbs relative to a negative morpheme and adverbs. As exemplified in (52) and (53), 23We assume that Attract-F can only attract some feature within some (minimal) domain, and that the Constraint on Extraction Domain (Huang 1982) including the Subject Condition and the Adjunct Condition should not be in this domain. 118 French finite main verbs precede the negative pee "not" and adverb souvent "Often", while English ones do not precede the negative no; and adverb Often. (52) a. Jean embrasse souvent Marie. John kisses Often Mary "John Often kisses Mary." b. Jean (n’) aime pas Marie. John likes not Mary "John does not like Mary." (53) a. John often kisses Mary. b. John does not like Mary. More correctly speaking, French inflected main verbs obligatorily precede negation and adverbs, and English ones may not precede them. The following examples will show the impossibility that English main verbs come before a negative and adverbs, and that French ones come after them. e. (54) a. Jean (ne) pas aime Marie. b.* Jean souvent embrasse Marie. g. (55) a. John likes not Mary. b.* John kisses often Mary. According to Emonds (1978), Pollock (1989), and Chomsky (1991), both in French and English the main verbs are 119 generated in the V-position Of the VP as in (56.a) and (57.a), respectively. In overt derivations, however, English finite main verbs remain in the V-position of a VP as in (57.b), while French ones are in the T-position of a TP, raising the V in a VP to the T Of TP, as (56.b).24 Since English finite main verbs have overt affixation like the 3rd singular person, we may have a question about how the affixation of finite main verbs are possible if they remain in the V-situ position, not raising to T, in the overt derivation. The Emonds-Pollock-Chomsky analyses suppose that T is overtly lowered to the verb, leaving the T-trace unbound, as in (57.b). The amalagated V-T raises to T at LF as in (57.c), though, to remedy the unbound trace which will violate some condition such as the BOP in that framework. (56) a. LwJean T [Wsouvent [Wembrasse Marie]]]. b. LwJean embrasse,JT Lmsouvent [w¢, Marie]]]. (overt raising) (57) a. LnJohn T [woften [Wkisses Mary]]]. b. [TpJohn t, [vpoften [vpkisses-T, Mary]]] . (overt lowering) c. LnJohn [kisses—T], [vpoftenlvpti Mary]]]. (covert raising) . 2“Their analysis is based on the assumption that a negative like not and adverbs like Often are posited between T and VP lnvariantly across languages. 120 If we compare the length Of the derivation of (56) with that Of (57), the former takes one step less than the latter. Hence Chomsky (1991) concludes that raising is less costly than lowering, since lowering always presumably requires raising again to remedy an unbound trace. With Chomsky (1993) and Chomsky and Lasnik’s (1993) introduction Of the checking theory, lowering processes become unnecessary. As we have discussed just above, T- lowering was required to explain the visibility Of verb affixes (at S-structure). In the checking theory it is assumed that the affixation Of a syntactic category e.g. kiss plus the Present 3rd Singular features is done before being drawn from the lexicon, and inserted to a phrase structure rather than that affixation is done syntactically. Syntactic processes simply check the features of a category with a corresponding functional category and determine whether the inflection is correct or not. Thus the base generation Of (52.a) and (53.a) can be represented as follows: (58) LwJean T-(3rd sg pres) [Wsouvent [Wembrasse-(3rd sg pres) Marie]]]. (59) [wJOhn T-(3rd sg pres) [woften [Wkisses-(3rd sg pres) Mary]]]. Then the inflectional features must check and match 121 with the features of the T, raising the verb to the T. The only difference between French and English is the timing of movement which takes place: French main verbs move overtly or before Spell-out, and English ones move covertly or after Spell-out. SO the further derivations of (58) and (59) can be represented as follows: (60) a. LnJean embrasse,JT-(3rd sg pres) stouvent (Wm, Marie]]]. b. Spell out (61) a. Spell out b. LwJohn kisses,JT-(3rd sg pres) Lmoften [wt, Mary]]]. Chomsky (1993, 1994, 1995) argues that it depends on the strength Of the formal features Of functional categories whether movement takes place overtly or covertly. That is, if a feature is strong it must be checked and eliminated before Spell-out for convergence at PF and LF, since it is an uninterpretable feature at PF and LF; if it remains after Spell-out, the derivation crashes at PF, although they are eliminated at LF. Weak features can be eliminated after Spell-out because they are invisible at PF; but they must be eliminated at LF for convergence. Furthermore, the strength of formal features is assumed to be parametrizable language to language. 122 Returning to the case of French and English verb movement, French T is assumed to have a strong V feature, and English T is not. In French a feature of T is strong and must be eliminated before PF by raising V to T for feature checking as in (60); otherwise the derivation would crash at PF. Thus the verb movement must occur overtly. In English the verb does not have to move overtly, since English T does not have a strong feature. But the verb is supposed to have an uninterpretable feature like a tense feature, and so it must be eliminated before LF, and so English main verb moves covertly as in (61). We have one more question to ask for this analysis. Suppose that kisses raises to T before Spell-out as in (55). It converges at PF and LF, too, because it does not have any feature which is uninterpretable at PF and at LF. Why shouldn’t English main verbs raise before Spell-out at all, as exemplified in (55), even if it can converge at PF and LF? Chomsky (1993, 1994, 1995) claims that this is due to the Procrastinate Principle. (62) The Procrastinate Principle Minimize overt operations. He assumes that covert movement is cheaper than overt movement, since LF Operations are "Operating mechanically 123 beyond any directly observable effects." (p.30) For the reason that covert Operations are less costly than overt operations the computational system tries to minimize overt operations and maximize covert operations. That is, the computational system prefers covert operations to overt operations unless only overt operations make a derivation converge. In English, main verbs must not move to T overtly, even if this overt verb movement can prevent the derivation from crashing at PF and at LF, since covert verb movement can also prevent it from crashing. Then a question may arise why overt movement like verb movement in French is permitted to violate the Procrastinate Principle. Chomsky claims that the economy conditions should apply only to convergent derivations, and among them select an optimal derivation. SO the Procrastinate Principle can be violated by the requirement for convergence. To sum up, covert operations are less costly than overt ones, since the former are not directly observable mechanically. In addition, the Procrastinate Principle is violable for convergence. Therefore the Procrastinate Principle applies only to convergent derivations, and among them selects an optimal derivation which takes the least number of covert operations. 124 3.3.2 Some Problems First of all, Procrastinate is too loose to explain all the cases of the timing of movement. It simply considers one point of a derivation--the point of Spell—out-- to be a critical time for movement, so that, among convergent derivations, it selects a derivation which takes the least number of overt operations. This comes from the assumption that Procrastinate prefers covert operations to overt operations. As we have shown in section 3.2, however, there is some evidence that the computational system determines whether it should take movement at each point of a derivation, rather than at the point of Spell-out. Consider (1) again, repeated in (63). (63.a) is assumed to be represented as (63.b) at LF. (63) a. *Who was a picture of sold? b. *[CpWho, was {T,,[Dpa picture of t,], sold t,?]] (63) is ungrammatical, since it violates Chomsky’s (1973) Subject Condition: nothing can be extracted out of a DP in [DP, TP], roughly speaking. (63) is assumed to be derived as in (64). (64) a. [WT [wsold [ma.picture of who]]]? b. {T,,[Dpa picture of who], [T.T [vpsold t,]]]? 125 c. [prho, was {T,,[Dpa picture of t,], [vpsold t,]]]? However, if the computational system constructs the derivation of (65) for (63) rather than (64), it can escape from the Subject Condition. (65) a. [Cpe was [Tpe [vpsold [Dpa picture of who]]]]? b. [cpwho, was [Tpe [VPsold [Dpa picture of t,]]]]? c. [prho, was {T,,[Dpa picture of t,]j [vpsold t,]]]? We cannot explain (65) with Subjacency, since it is possible to extract a category out of [ma.picture of] if it is a complement of a verb as in (66). (66) [CpWho, did [TpJohn sell [Npa picture of t,]]]? To explain the ungrammaticality of (63), thus it is far more important when goo and a picture of who move, rather than whether they move overtly or covertly. For (63), the computational system must first construct (64.b), raising the DP to the TP, and then (64.c), raising goo to the CP, rather than derive (65.a)-(65.c) in that order. That is, the computational system is forced by some principle to move _Lwe piegoge of whol to TP without any delay right after TP is constructed. Procrastinate simply allows the computational system to 126 move the Wh-phrase and DP overtly, since in English a D feature of T and a Wh feature of C are both strong, but cannot force the computational system to take the derivation of (64) rather than (65) at all. UG requires some additional principle in order to filter out (63). To motivate Procrastinate as an economy condition in UG, Chomsky argues: LF operations are a kind of "wired-in" reflex, operating mechanically beyond any directly observable effects. They are less costly than overt operations. The system tries to reach PF "as fast as possible," minimizing overt syntax. (PP.30-31) Yet his intuitive argument is obscure. First, it is not clear why the operations "mechanically beyond any directly observable effect" (p.30) are less costly than any directly observable operations. All syntactic operations are, overt or covert, observable by UG. So the argument that overt operations are observable and that covert operations are unobservable may be related to the sensory- motor system which is not supposed to affect language under Chomsky’s (1995) "... speculation that the essential character of Cm,is independent of the sensory-motor interface." (p.335) Hence the economy conditions should be considered in the sense of language properties. As Brody (1995) also points out, Procrastinate may be an unnatural economy condition for UG. Procrastinate implies that the default case of economy is to make the LF 127 form maximally different from the PF form, which is not a natural expectation. One would expect the LF form to be maximally similar to the PF form as the default case, so that it can be recovered with minimum effort. The violability of the Procrastinate Principle is also inconsistent with the unviolability of other universal principles. In generative grammar, including the minimalist program, it has been in general assumed that all universal principles should be observed and that some deviance should be yielded if any universal principle is violated. But the Procrastinate Principle is the only exception to this assumption. It can be violated if it is necessary for convergence, yielding no deviance. Rather, it must be violated for convergence. In addition, undesirably, the Procrastinate Principle has the characteristics of a global condition, although other economy conditions can be formulated in a local way for their application, as we have discussed in chapter 2. In next section we will derive Procrastinate from a different timing principle, the Earliness Principle, which is independently motivated by overt cyclic derivations. 3.4 Deriving Procrastinate Effects 3.4.1 PF Deletion Analysis 128 Kitahara (1994, 1995) proposes that Procrastinate is derivable from Target-a. Under his analysis it depends on the number of targeting operations at the interface levels whether an operation is obligatorily covert (Procrastinate) or optionally covert (optional movement). In other words, if movement obligatorily takes place covertly like object shift in English, it is due to the fact that covert movement takes fewer targeting operations than overt movement: it is the case that covert movement is more economical than overt movement. If movement optionally takes place covertly, it comes from the fact that covert movement takes the same number of targeting operations as overt movement: overt and covert movement are equally economical. Let us take (67), which is the case of English object shift which obligatorily takes place covertly. (67) John kisses Mary. According to Kitahara, the computational system will take the steps in (68). (68) a. [,,,,,,,,_,,Agro [vaohn kisses Mary]] b. [Agrp,oe [,,,,,._,,Agro [vaohn kisses Mary]]] c. [TpJohn, [Agrp_oe [,,g,.,oAgrO [vpt, kisses Mary]]]] d. Spell-out (D [,pJohn, [Agrp,oe [,,,,_,,,._,,kisses,+Agro [vpt, t, Mary]]]] 129 f- [TPJOhni [AgrP-oMarYk [Agr’-Okissj [VPti tj tk]]]] If we compare (68) with (69) which is derived for overt object shift, the derivation of (68) takes one more targeting operations than (69). In other words, overt computations take less operations than covert computations under the target-a analysis. This would wrongly predict that there must be no procrastinate effect in natural language. (69) a. [AWPOAgrO LmJohn kisses Mary]] b. [A9,p_okisses,+Agro [VPJohn t, Mary]] c. [,.,g,,,_oMaryk [Mr._okisses,+Agro [VPJohn t, t,]]] d. [TpJohn, [,,,,.,,_0Maryk [,,g,.._okisses,+Agro [vpt, t, t,]]]] e. Spell-out To solve this problem, Kitahara extends the targeting operation to a Delete operation at PF. He argues that under the copy theory of movement a trace of movement is exactly the same copy as the moved category. A trace is, apparently, phonetically null, although it is exactly the same as the moved category. Arguably, it indicates that the copy left by movement, e.g. a trace, deletes at the PF component. (CF. Affect-a in Chomsky and Lasnik (1993), Lasnik and Saito (1984, 1992)) He subsumes PF Delete under a 130 targeting operation as in the following: (70) (=76) Kitahara 1994 p.41) Target-a (targeting a category a) a. Build a new category 1 by merging a and an empty ¢ Substitute a category B for O sister to 0 Delete 0 Now overt computations induce Delete Operations at PF, where as Delete operations are not necessary for covert computations. Under the target as defined in (70), let us consider (68) and (69), which are derived as in (71) and (72), respectively. (71) a. (72) b. c. d. "h g. [,,,,,,,.C,Agro [vaohn kisses Mary]] [Mrp-oe [,,,g,.._,,Agro [VPJohn kisses Mary]]] [TPJohn, [Agrp_oe [,,,,,.._,,Agro [vpt, kisses Mary]]]] Spell-out [TPJohn, [Agrp_oe [,,S,,.,Okisses,+AgrO [vpt, t, Mary]]]] [TpJohn, [,,,§,,,,_,,Maryk [Ag,._okiss, [vpt, t, tk]]]] Delete t, at PF [,.,,,,,,_C,Agro [vaohn kisses Mary]] [,,,,,,,_okisses,+Agro [vaohn t, Mary] ]] [,g,.,,_oMaryk [,.,,,,.,okisses,+Agr0 [vaohn tj t,]]] 131 d. [,pJohn, [,,,g,.,,_oMaryk [,,,_,,.,Okisses,+AgrO [vpt, t, t,]]] e. Spell-out f. Delete t, at PF Delete t, at PF. 5' (.0 Delete t.k at PF. Now we compare (71) with (72) in terms of the number of targeting operations. The derivation (71) now takes one less targeting operations than (72). So English object shift is obligatorily covert, which is a Procrastinate effect. Let us consider (73) instead of (72) for English object shift. (73) a. [,,,g,.,,_oAgro [vaohn kisses Mary]] b. [,,,,,,.,,_0Maryk [,,,,,,._,,Agro [vaohn kisses t,]]] 0 [TPJOhni [AgrP-oMarYk [Agr'-OAgrO [VPti kisses tk]]] d. Spell-out (D [TPJOhni [AgrP-OmarYk [Agr’-Okissesj+AgrO [VPti tj tk]]] f. Delete t, at PF 9. Delete tk at PF. If we compare (73) with (71) in terms of the number of targeting operations, they are equally economicl, since they take the same number of targeting operations. Kitahara rejects the derivation of (73) for object shift, adopting 132 Holmberg’s (1986) generalization that object shift requires a verb to raise to Agrblbefore it. In other words, for the object Moog to be raised to Agrfg as in (71.f) and in (72.c), the verb kisses should be raised to Agrbjprior to object raising, as in (71.e) and in (72.b). In this sense the derivation of (73) is not legitimate, violating Holmberg’s generalization, since the object shift in (73.b) does not induce a verb raising prior to it. Thus object shift indicates that both a verb and an Object must move. It means that one object movement induces two movements: a verb movement and an object movement. If it takes place overtly, two Delete operations will be induced. Overt object shift is this more costly than covert object shift. If we take French object shift, however, we can notice that the target-a analysis will fail. In French verb raising is known to be obligatorily overt, being similar to Icelandic verb raising but contrasting to English main verb raising. But French object shift is obligatorily covert as in (74). (74) a. [TpJean, embrasse,+T [vpsouvent (wt, t, Marie]]] b. *[TpJean, embrasse,+T Mariek [vpsouvent [vpt, t, t,]]] (75) a. [,,,_,,.,,_,,Agro [Wsouvent [vaean embrasse Marie]]] b. [,,,_,,.,,_oembrasse,+Agro [vpsouvent [vaean t, Marie] ]] c. [Agrp_oe [,,,_,,.,,,embrasse,+Agro [vpsouvent [vaean t, 133 souvent]]]] d. [TpJean, [39”er [,,g,._oembrasse,+AgrO [vpsouvent [vpt, t, Marie]]]]] e. Spell-out f. [TpJean, [,,,_,,,,,,,Mariek [,,,_,,.._oembrasse,+Agro [vpsouvent [vpt, ti t,]]]]l Delete t, Delete t, (76) a. [AgmoAgro hmsouvent [wflean embrasse Marie]]] b. [,gwoembrasse,+Agro [Wsouvent [vaean t, Marie]]] c. [TpJean, [,,g,,,_oembrasse,+Agro [vpsouvent [vpt, t, Marie]]]] (1. [TPJean, [,,,,,,,,_0Mariek [,,,_,,...Oembrasse,+Agro [vpsouvent [,,t, t, t,]]]]] e. Spellout f. Delete t, Delete t, .‘J‘LQ Delete tk Suppose that the computational system constructs (74.a) and (74.b) as in (75) and (76), respectively. If we compare the derivations (75) and (76) in terms of the number of targeting operations, we can see that both of them take the same number of targeting operations. In spite of the equal number of targeting applications, French object shift must be obligatorily covert. 134 In addition, as we have discussed in chapter 2 (section 2.2.5) and chapter 3 (section 3.2.1.2), Target-a assumes global economy on derivations which is undesirable. Target-a cannot also be maintained in Chomsky’s (1994, 1995) bare phrase structure. In next section we will derive Procrastinate from Earliness from which overt cyclic derivations are derived. 3.4.2 Earliness and Procrastinate Effects In this section we will argue that Procrastinate effects should be derived from Earliness. According to our Earliness Principle formulated in (77), a derivation must satisfy bare output conditions as early as possible. (77) K attracts F early only if a sublabel of K is an uninterpretable feature at the interface level that Attract-F affects.25 Consider covert verb movement and object shift in 25We assume that there are three types of uninterpretable formal features. The first type is such a feature as a strong feature which is uninterpretable at PF and at LF. In this case, Attract-F should affect both PF and LF. In other words, the movement affects sound and meaning. The second type is such a feature as a Case feature which is uninterpretable only at LF. In this case, Attract-F should affect only LF. It is the case of covert movement which affects meaning, not sound. The third type is a feature which is uninterpretable only at PF. In this case, Attract-F should affect only PF. Scrambling which does not affect meaning may be subject to this case. 135 English as in (78): (78) LmJohn T Lwoften [W¢, kisses Mary]]] Suppose that the computational system has constructed (79). At this point the computational system can potentially take two choices: the strong D feature of FF(T) attracts FF(John) as in (80), or C is merged with TP, constructing CP as in (81): (79) [WT [Woften LWJohn kisses Mary]]] (80) [TPJohn, [,.T [vpoften [wt, kisses Mary]]]] (81) LwCLWT [Woften [WJohn kisses Mary]]]] The strong D feature of FF(T) is uninterpretable at PF and LF, and Attract-F also affects PF and LF. Hence (79) is the earliest point for Attract-F, and the computational system takes (80); otherwise it would violate Earliness as in (81). At the point of (80) the computational system can take two choices again: it merges C and TP to CP as in (82), or the V feature of FF(T) attracts the verb kisses as in (83) because it is an uninterpretable feature. (82) [CFC [,pJohn, [T.T [vpoften [vpt, kisses Mary]]]]] (83) [TpJohn, [T.kisses,+T [vpoften [vpt, t, Mary]]]] 136 But at this point the computational system takes (82) rather than (83): this Attract-F affects PF and LF, but the V feature of FF(T) is uninterpretable only at LF; hence (80) is not the earliest point to attract the verb. This is the Procrastinate effect of English verb movement. The above result leads us to covert object shift in English: in English object shift cannot take place overtly, since main verbs raise covertly, and hence no feature attracts the object overtly. We derive Holmberg’s (1986) generalization from Attract-F and Earliness. For example, after the computational system takes (82) for further computation, it takes two choices: Spell-out or raise the object Megy. At this time the computational system takes Spell-out, since there is no feature of T which can attract the object. After that, the computational system raises the verb kisses, targeting T, as in (84), since the V feature of FF(T) is uninterpretable only at LF, and Attract-F affects only LF now. (84) [CPC [,PJohn, kisses,+T [vpoften [vpt, t, Mary]]]] After that, the Case feature of FF(kisses) attracts FF(Mary), as in (85), since the Case feature of FF(kisses) is uninterpretable at LF, and Attract-F affects only LF. 137 (85) [CFC [,pJohn, FF(Mary)k+kisses,+T [vpoften [vpt, t, t,]]]] Under the Earliness analysis, we can derive the Procrastinate effects of verb movement and object shift in English without reference to the Procrastinate Principle at all. Now consider covert object shift in French, repeated in (86): (86) a. [TpJean, embrasse,+T [VPsouvent [wt, t, Marie]]] b. *[TpJean, embrasse,+T Mariek [vpsouvent [vpt, t, t,]]] Suppose that the computational system has constructed (87). At this point the strong D feature of FF(T) attracts the subject Jean, as in (88), in the same manner as overt subject raising in English as in (80). (87) [WT [wsouvent [mflean Embrasse Marie]]] (88) [mJean,'T Lmsouvent [w¢, embrasse Marie]]] In addition, French T has a strong V feature (Chomsky 1993), and attracts the verb embrasse and must attract it at this point, since (88) is the earliest point to affect PF and LF with a feature which is uninterpretable at PF and LF. 138 It thus generates (89): (89) [TpJean, embrasse,+T [vpsouvent [th, t, Marie]]] At this point the Case feature of FF(embrasse) cannot attract FF(Marie), however: (89) is not the earliest point to attract FF(Marie), since the Case feature of FF(embrasse) is uninterpretable only at LF, and this Attract-F affects both PF and LF. After Spell-out, the computational system raises the object, targeting'lgembrasse+T]. As we have seen so far, Earliness can derive Procrastinate effects locally during the course of derivation without reference to the Procrastinate Principle. 139 4. A Unified Analysis of Wh-Asymmetries and Wh-Adjunct Symmetries 4.1 Introduction This chapter attempts to derive some Wh-asymmetries and Wh-adjunct symmetries from a general economy principle on derivations, the Shortest Move or Minimal Link Condition. In section 4.2 we first review some Wh-asymmetries (e.g. argument-adjunct, argument-extraction, argument—quasi- argument, superiority effects), their descriptive generalization, and their pre-minimalist and minimalist analyses. In section 4.3 we then develop some theoretical hypotheses for our analysis. In section 4.3.1 we investigate some properties of Wh-words and classify their characteristic in terms of feature specifications. More specifically, we classify Wh-words into three types: Wh-DP operators, Wh-adverbial operators, and Wh-NP variables, and specify their features as {D, Opo}, (Adv, OpQ}, and {Unmfi}, respectively. In addition, operator-types must undergo movement for LP legitimacy, and variable type must be bound 26I will use Pro as a feature in order to indicate a property of a variable which requires an operator to bind it. 140 for LP legitimacy. In Section 4.3.2 we propose multiple feature attraction in which multiple features parametrically attract F, and discuss how it works in the minimalist model. For Wh- questions, a Comp attracts only one Wh-word with an operator feature (Op) or a pair of features . Section 4.3.3 we investigate Wh-asymmetries under the Minimal Link Condition and Attract-F which are independently necessary in the minimalist model. We extend our analysis to some Wh-adjunct symmetries (argument-adjunct, pseudo- opacity, inner island condition) in section 4.4. 4.2 Some Types of Wh-asymmetries 4.2.1 Wh-asymmetries and Pre—minimalist Analyses Linguists have long found several phenomena of Wh- asymmetries in natural language. The first type of Wh- asymmetries are superiority effects, as exemplified in (1) and (2). ( 1) a. John wonders [prho, [Tpt, bought what]] b. *John wonders Lflwhat, meho bought tfl] (2) a. [prho, did [,pyou tell t, [TPPRO to read what]]] b. *[prhat, did (”you tell who [TPPRO to read t,]]] 141 Chomsky (1973) explains the contrast in (1)-(2) in terms of a condition on transformational rules that disallows a rule to apply to an element Y if there is another element Z which is superior to Y and to which it can apply. The superiority condition is formulated in (3). (3) Superiority Condition (=(73) Chomsky (1973) p.246) No rule can involve X, Y in the structure .x...[,...z...-wvz...].... where the rule applies ambiguously to Z and Y and Z is superior27 to Y The superiority Condition prevents Wh-movement from applying to she; in (1) and (2), since movement can equally apply to she and goes at the current cycle, and ego is superior to goes. The formulation of the Superiority Condition has some empirical problems in explaining the contrasts in (4) and other Wh-asymmetries below, although its properties seem to be potentially correct. (4) a. *I wonder what who bought t b. Who wonders what who bought t 27For simplicity let superiority be a c-command relation as follows: (i) A category X is superior to a category Y if X c-commands Y. 142 Sentences (4) are typical examples which violate the Superiority Condition. Sentence (4.b) is grammatical, however, if poo in the matrix clause and poo in the embedded clause receive a pair-list reading (Lasnik and Saito 1992). The second type of Wh-asymmetries is argument-adjunct asymmetries. Huang (1982) observes that extraction of a Wh- adjunct from a Wh-island yields a worse deviance than extraction of an argument. They are exemplified in (5): (5) a. ?[C,,what, did John wonder [prhether to fix t,]] b. *[CpHow, did John wonder [prhether to fix the car t,]] The argument—adjunct Wh—asymmetries can be also observed in multiple Wh-question constructions: (6) a. [Cphow, did Fred fix what t,] b. *[prhat, did Fred fix t, how] Huang (1982) attributes these contrasts in (5)-(6) to the ECP. The ECP can be formulated as follows”: (Chomsky 1981) 28For the ECP to work out, we need other auxiliary hypotheses such as the Comp-indexing algorithm, no application of Subjacency/CED to LP movement, and so on. We ignore technical details to focus our discussion. 143 (7) A nonpronominal empty category must be properly governed. (8) a properly governs 8 iff a governs E and (i) a is a lexical head (lexical government), or (ii) a is coindexed with E (antecedent government). (9) a governs 8 iff for all x, x a maximal projection, x dominates a iff x dominates B. (Aoun & Sportiche 1981) Following Huang’s ECP accounts of the contrast in (5), in (5.a) the trace t, is a sister to the verb ii; and hence lexically governs it. It thus satisfies the ECP. In (5.b) the trace t, is an adjunct outside the governing domain of the verb (lexical government), and hence must be antecedent- governed to observe the ECP. The trace t, in (5.b) cannot be locally bound by hog, since there is another Wh-word in the embedded Comp. Hence it is not antecedent-governed nor lexically governed, and so violates the ECP. In sentence (6.a) the trace is locally bound by hog from the Comp, and satisfies the ECP. If hop in sentence (6.b) undergoes LF movement, however, the trace of hog cannot be locally bound by hog, because the Comp has already had the index of EEQL- So (6.b) violates the ECP. The ECP account of Argument-adjunct asymmetries correctly derives the fact that adjuncts must move to a local Comp position prior to other Wh-phrases. 144 Although the ECP subsumes the Superiority Condition, it lacks explaining the contrast in (2), repeated just below: (10) a. [prho, did [prou tell t, [TPPRO to read what]]] b. *[prhat, did [prou tell who [TPPRO to read t,]]] In (2) each trace is lexically governed by the verb, and hence satisfies the ECP, but (2.b) is ungrammatical. It also has problems explaining argument-quasi-argument and argument-extraction asymmetries (see below). Another problem is that we can hardly keep the ECP in the minimalist program, since it assumes (i) that some derivational principle like Subjacency and CED should apply differently to SS movement and LF movement, while principles are assumed to apply in the same way in the minimalist model, because there is no distinction between SS and LF; and (ii) that scope—bearing elements should undergo LF movement which is not motivated by morphological properties and hence disallowed in the minimalist program. The third type of Wh-asymmetries are argument-quasi- argument asymmetries as in (11). Rizzi (1990) claims that the verb geigh in (11.a) assigns which box a referential 6 role, and that the verb geigo in (11.b) assigns how much a nonreferential 0 role, although both of them are understood as arguments of the verb weigh. 145 (11) a. [@Which box did Bill wonder whether John weighed tfl b. *[flhow much did Bill wonder whether John weighed tfl Rizzi (1990) accounts for the contrast in (11) in terms of referentiality and relativized minimality. (12) (=(28) Rizzi 1990 p.86) A referential index must be licensed by a referential 0 role. (13) (=(29) Rizzi 1990 p.87) X binds Y iff (i) X c-commands Y, and (ii) X and Y have the same referential index. Rizzi (1990) claims that a referential index is legitimate only if it is associated to referential 0 role, and the A’-dependencies must be expressed through binding relations which are also associated to referential 0 roles. If no index is legitimate and no binding is available, for legitimacy the A’-dependency must resort to antecedent- government which is subject to relativized minimality. (14) =(40) Rizzi 1990 p.92) X antecedent-governs Y iff (i) X and Y are nondistinct 146 (ii) X c-commands Y (iii) no barrier intervenes (iv) Relativized Minimality is respected. (15) (=(15) Rizzi 1990 p.7) Relativized Minimality X a-governs Y only if there is no Z such that (i) Z is a typical potential a-governor for Y, (ii) Z c-commands Y and does not c-command X. (16) (=(16) Rizzi 1990 p.7) Z is a typical potential head governor for Y = Z is a head m-commanding Y. (17) (=(17) Rizzi 1990 p.7) a. Z is a typical potential antecedent governor for Y, Y in an A-chain = Z is an A specifier c- commanding Y. b. Z is a typical potential antecedent governor for Y, Y in an A’-chain = Z is an A’ specifier c-commanding Y. c. Z is a typical potential antecedent governor for Y, Y in an Xo-chain = Z is a head c- commanding Y. For sentence (11), Rizzi claims that in (11.a) the verb weigh is an agentive verb which assigns a referential 0 role to its object; in (11.b) the verb geigh is a stative verb which assigns a nonreferential 0 role to its complement. In 147 (11.a) the trace t, can be connected to which box through binding, since the index i is licensed by a referential 6 role that the verb geigo assigns to its object. So the A’- dependency is legitimate in (11.a). On the other hand, in (11.b) no index is legitimate under (12), since a nonreferential 0 role is assigned by the stative verb geigh. For the A’-dependency of how much and its trace, a chain of antecedent-government relations is the only option. But the operator how much fails to antecedent-govern its trace, since there is a closer intervening potential A’-governor, i.e. an operator in the spec of the embedded Comp, for it. Thus the A’-dependency is illegitimate in (11.b). Following Rizzi (1990), we can generalize that only elements (or arguments) assigned a referential O-role can be extracted from a Wh-island; all other elements (adjuncts and quasi-arguments) assigned a nonreferential 6 role or assigned no 0 role cannot be extracted from a Wh-island. The relativized minimality analysis covers argument- adjunct asymmetries, argument—quasi-argument asymmetries and some adjunct symmetries in a unified way. Although the basic spirit of Relativized Minimality seems to be correct, the formulation cannot explain superiority effects like (2), and the contrast in (4) and in (18), since a referential 0 role assigned argument is not subject to Relativized Minimality, and should have been extractable across a potential antecedent governor. It is also hard to maintain 148 the formulation in the minimalist model, since it refers to the distinction between referential and nonreferential 0 roles which are not assumed to be formal features in the minimalist program. Finally, there are other Wh-asymmetries which we may call argument-extraction asymmetries, exemplified in (18). (18) a. ?[chhat, did John wonder [prhether to fix t,?]] b. ?[CpWhat, did John wonder [Cphow to fix t,?]] c. *[CpWhat, did John wonder [prho fixed t,?]] As we have discussed before, arguments can usually be extracted from a Wh-island. In sentences (18) an argument pose is extracted from a Wh-island, but it yields a severe deviance only in (18.c). Descriptively, arguments cannot be extracted through a Comp filled with an argument. In (18.c) the embedded Comp is filled with an argument goo, through which she; is extracted. On the contrary, the embedded Comp is filled with nonargument in (18.a) and (18.b). In next section we will investigate Wh-asymmetries under economy considerations in the minimalist model. 4.2.2 Some Minimalist Analyses Kitahara (1994) attempts to reduce four types of Wh- asymmetries (i.e. superiority effects, argument-adjunct, 149 argument-quasi-argument, argument-extraction asymmetries) to Chomsky’s (1993, 1994, 1995) general economy principle, the Shortest Move (or the MLC). The Shortest Movement Requirement (SMR) is defined as in (19)-(20). (19) (20) (21) (22) (=(13) Kitahara 1994 p.61) Shortest Movement Requirement (SMR) Minimize the length of each feature-checking movement. (=(14) Kitahara 1994 p.61) Shortest Feature-Checking Movement Let X and Y be two nodes in a tree. Let Z be the closest c-commander of X, bearing a- feature. The movement of X to Y is the shortest a-checking movement of X iff Y and Z are in the same minimal domain. (=(15) Kitahara 1994 p.61) Closest C—commander Bearing a-Feature X is the closest c-commander of Y bearing a- feature iff (i) X bears a-feature, and (ii) X c-commands Y, and (iii) no category bearing a-feature intervenes between X and Y. (=(16) Kitahara 1994 p.61) 150 Z intervenes between X and Y iff X c-commands Z and Z c-commands Y. (23) (=(17) Kitahara 1994 p.62) C—command X c-commands Y iff (i) neither X nor Y dominates the other, and (ii) a category immediately dominating X dominates Y. (24) (=(18) Kitahara 1994 p.62) Domain the domain of CH (a,,...,ah) = the set of categories contained in Max (afl, each member of which does not contain any a,. (25) =(19) Kitahara 1994 p.62) Minimal Domain the minimal domain of CH (a,,...,an) = the smallest subset K of the domain of CH (a,,...uxJ such that for any 1 a member of the domain of CH (a,,...M%J, some 8 member of K dominates T. To illustrate how the SMR works, suppose the following configuration: (26) (=(22) Kitahara 1994 p.63) Y z, x 151 The movement of X to Y satisfies the SMR only if (i) X B- checks with Y where a not equal 8, or (ii) X a-checks with Y where (a.) Z is the closest c-commander of X, bearing a- feature, and (b.) Y and Z are in the same minimal domain. Now consider the superiority effect (1), repeated in (27), and its derivation in (28). (27) a. John wonders [prho, [Tpt, bought what]] b. *John wonders wahat, waho bought tfi] (28) a. [CPCop [prho bought what]] b. [prho, [c.C [,pt, bought what]]] c. John wonders flywho, LyC [fits bought what]]] Suppose the computations have constructed (28.a). The strong +Wh of the Comp attracts goo; if it attracts EEQE it would violate the SMR, which is the case of (27.b). After that the computations map (28.b) to (28.e). The SMR also correctly explains the superiority effect in (2) exactly in the same manner as in (1), which was the problem with the ECP account. But the SMR makes the contrast in (4) remain unsolved, which is a critical counterexample of the superiority account. We repeat it in (29). (29) a. *I wonder what who bought t b. Who wonders what who bought t 152 At the point of constructing the embedded CP in (29), the +Wh feature of the Comp must attract goo, not EDEL, observing the SMR, since goo is the closer to the COmp in the minimal domain for +Wh checking than poop. Kitahara (1994) accounts for argument-adjunct asymmetries in terms of Chain Formation Requirement (CFR) in addition to the SMR. The CFR is defined as below: (30) (=(18) Kitahara 1994 p.120) Chain—Formation Requirement (CFR) An application of Target a Tg‘yields more than one chain only if I} is violation-free. Consider (5), repeated in (31). Suppose the computations construct (32) and (33) for (31.a) and (31.b), respectively. (31) a. ?[prhat, did John wonder [cpwhether to fix t,]] b. *[CpI-Iow, did John wonder [prhether to fix the car tfl] (32) a. Lydid [nJohn wonder [@whether PRO to fix what 1]] b. [prhat, [c.did [TpJohn wonder [cpwhether PRO to fix t,] ] ]] c. [cpwhat, [c.did [,pJohn wonder [prhether PRO to [AgrP-ot' i fix t1] 1 ] 1] 153 (33) a. LwDid [wJohn wonder [@whether PRO to fix a car how]]] b. [whow, Lydid LmJohn wonder flywhether PRO to fix the car tfl]]] Consider (31.b) first. Suppose that the computations have constructed (33.a). At this point the strong +Wh feature of the matrix Comp attracts poo, which violates the SMR, since whepher in the embedded Comp is closer to the matrix Comp. He claims that the violation of the SMR yields no chain formation of hog and its trace in (31.b), and that its LF representation in (34) is not legitimate because violation of the SMR disallows hog and its trace to be in a chain under the CPR, and the operator hop undergoes vacuous binding. (34) Lwhow, Lydid [wJohn wonder flywhether PRO to fix the car tdll] Thus the sentence (31.b) violates the SMR and LF legitimacy, which yields severe deviance. Consider (31.a) now. Suppose that (32.a) is derived. At this point the strong +Wh feature of the Comp attracts EEEE. violating the SMR and thereby yielding no chain of she; and its trace. In this case, however, she; needs to check its Case at LF, and so its trace (or copy) undergoes 154 LF movement as shown in (32.c). This single violation-free application of movement can yield two chains: an operator- variable chain (whatu t?,) and an argument chain (t’,, t9. So the LF representation in (35) satisfies the LF legitimacy. (35) mehat, Lydid [WJohn wonder wahether PRO to [AWLo t’, fix t,]]]]l Thus (31.b) violates the SMR and LF legitimacy, but (31.a) violates only the SMR, which results in the contrast in (31). However, this analysis cannot explain the contrast in (18), repeated in (36): (36) a. ?[chhat, did John wonder [cpwhether to fix t,?]] b. ?[C,,What, did John wonder [Cphow to fix t,?]] c. *[CPWhat, did John wonder [cpwho fixed t,?]] Consider (36.c). Suppose the computations have constructed (37.a). At this point the embedded Comp attracts goo, observing the SMR, in mapping (37.a) to (37.b). The computations further construct (37.0). At this point the matrix Comp attracts ghee, violating the SMR. At LF, however, there is a single violation-free movement which raises the trace of what, yielding two chains, the operator- 155 variable and argument chain. The LP representation is shown in (37.e). (37) a. [CFC [prho fixed what]] b. [cpwho, [c.C [Tpt, fixed what]]] 0 [deid [TPJohn wonder [prho, [c.C [Tpt, fixed whatlllll d. [prhat, [c.did [TpJohn wonder {C,,whoi [c.C [Tpt, fixed tfllllll e. [prhat, [c.did [TpJohn wonder [cpwho, [c.C [Tpt, [Agree (2’, fixed t,]]]llll Then (36.a-b) and (36.c) violate only the SMR, but only (36.c) is severely deviant, but (36.a-b) are marginally deviant. In next section we offer an alternative unified analysis of Wh-asymmetries under multiple feature attraction and the Minimal Link Condition. 4.3 A Unified Analysis of Wh—asymmetries 4.3.1 Feature Specifications of Wh-words and [+Wh] Comps For a Wh-interrogative the minimalist program assumes that a strong feature OpQ of a Comp attracts a feature OpQ 156 of a Wh-phrase, as shown in (38). (38) a. [,’.,,C(did)op,Q [TpJohn buy whatop-o]] b. [prhat(OpQ)i [c.C(did)op_Q [TpJohn buy t,]]] c. Spell-out If Spell-out applies before (38.b), it would crash at PF; if (38.b) never happens, it would crash at PF and LP: a strong feature is not interpretable at PF and LF; and the operator gheowfl,also undergoes vacuous binding, which is illegitimate at LF. Some condition on vacuous binding is necessary for LP legitimacy for an independent reason.29 Consider (39). In (39.a) the Comp does not have any OpQ feature, and a Wh-phrase w_h_st; with OpQ stays in situ. Sentence (39.b) has a Comp with Opo,iand what merges to the Comp rather than it moves to the Comp. To explain the ungrammaticality of (39), we presumably need (40) for LP legitimacy. (39) a. *John bought what. b. *What did John fix a car. (40) LF Legitimacy An operator must nonvacuously binds a variable, and a variable must be bound by an operator. 29See also: Lasnik and Saito (1992). 157 In (39) goeo®,,violates LF Legitimacy: in (39.a) it is not in a Comp and it binds nothing; in (39.b) ghee binds nothing, although it occurs in a Comp. One important thing to note is that LF Legitimacy does not motivate movement at all; rather, morphologically-driven movement results in the LF Legitimacy. On the other hand, English does not allow a Comp to be doubly filled. In English multiple questions only one Wh- jphrase moves to the Comp with Opo,,as shown in (41). It indicates that in English a Comp parametrically has only one strong OpQ feature. (41) a. Who did Bill persuade to buy what? b. *What Who did Bill persuade to buy In (41.a), on the other hand, goo moves to the Comp and binds its trace, which satisfies LF Legitimacy; but Wh-in- situ she; does not observe LF Legitimacy, since it occurs in situ, not in a Comp, and binds nothing. To explain the grammaticality of (41.a), we may not say that like the ECP account, the Comp has one more weak OpQ feature and it attracts goes at LF. This approach cannot explain the contrast in (42). (42) a. *Who did John leave after he met t? b. Who left after he met who? 158 The adjunct clause is known to be an extraction island. The ungrammaticality of (42.a) is due to extraction of poo from the extraction island. If a weak Opb feature of the Comp can attract goo from the adjunct clause at LP in (42.b), how can we justify the LF extractability without any distinction between S-structure and LF? For Wh-in-situ Tsai (1994) argues that Chinese Wh— phrases have a variable, but do not have an operator Opo,.as shown in (43). (43) N, / \ Wh ind.x The variable ind.x undergoes some type of binding for Wh—dependencies rather than movement. Under this assumption we can explain the contrast in (42). In (42.b) the Wh-in- situ goo has only a variable without OpQ. ‘Then goo does not undergo movement for Wh-dependency. Wh-in-situ will take binding (or linking) for Wh-dependency. If this approach is correct, then English has two types of Wh-words. One is that a Wh-word has only an operator OpQ; the other that a Wh-word has only a pronominal variable. (cf. Chierchia (1991), Hornstein (1995)) Their differences can be represented in terms of feature specification: 159 (44) a. Wh-operator: {Op, O}. b. a Pronominal Wh-word: {DMO}. The operator Wh-word forms a Wh-dependency by movement, and a variable Wh-word forms a Wh-dependency by linking (in Higginbotham’s (1983) and Hornstein’s (1995) terms). Furthermore, Reinhart (1993), Tsai (1994) and others distinguish Wh-NPs from Wh-adverbials. They claim that Wh- adverbials do not have an indefinite variable, and Wh-NPs do. In other words, Wh-adverbials have only an operator form. This assumption indicates that Wh-NPs can form Wh- dependencies either by movement or by binding, since they can have an operator or a variable feature, while Wh- adverbials can form Wh-dependencies only by movement leaving a trace as a variable behind”. It is also important here to note that Wh-NPs are categorically NPs, forming a DP with a D feature, while Wh- adverbials are categorically adverbials, lacking a D feature. We represent these as the following structures: (45) a. Wh-NP: [DPD [NpWhl] b. Wh-Adverbial: [Advah] To sum up, we minimally represent Wh-words in terms of feature specification as in the following: 3oSee Tsai (1994) for more consequences. 160 (46) a. Wh-NP: {D, 0p. 0) b. Wh-in-situ: {Dan} C. Wh-adverbials: {Adv, Op, Q} Now consider some features of a Comp. It is known that a Comp has a OpQ-bearing feature for Wh-questions. In addition, the Comp has a D feature, as shown below: (47) a. That John left was pleasing. b. Whether John went was important. In English a tense has a strong D feature. It attracts a D feature overtly for PF convergence. This is the case of VP-internal subject raising to the Spec of TP. In (47) the CP clause occupies the Spec of TP, and should have a D feature to check the strong D feature of the tense. In many other cases, too, we can see that CPs have distribution very similar to DPs. 4.3.2 Multiple Feature Attraction Under the minimalist assumptions, the operation Move must be driven by the requirement that some morphological feature F must be checked. If a target attracts a feature F, Attract-F/Move-F automatically carries FF(F). If Move -F is triggered by PF, it pied-pipes its full category (for PF 161 convergence). (Chomsky (1995)) If we carefully look into Attract-F, however, we need to make sure which F attracts which F. Consider several cases here. Suppose that we have two features F1 and F2, one functional category X, and two lexical categories Y and Z in the lexicon. Then we have several possible choices of feature selections. Consider Case 1 in (48) where the symbol * means that the feature is uninterpretable: (48) Case 1: Let FF(X) = {*F1} and FF(Y) = {F1}. For a derivation to converge, FF(X)={*F1} must attract FF(Y)={F1}. Then FF(X) and FF(Y) can be in a checking relation, since both have the same feature F1. Consider Case 2 in (49). (49) Case 2: Let FF(X)={*F1,*F2} and FF(Y)={F1,F2}. In this case, too, FF(X)={*F1,*F2} must attract FF(Y) for convergence, and Attract-F is successful and the derivation converges, since FF(x)={*F1,*F2} can enter into a checking relation with FF(y)={F1,F2}. Consider Case 3 in (50): (50) Case 3: Let FF(X)={*F1, *F2} and FF(Y)={F1} and 162 FF(Z)={F2}. For convergence, *F1 and *F2 of FF(x) each attracts its corresponding feature. In this case FF(X) must attract a feature F twice: first, the *F1 of FF(X) attracts FF(Y)={F1}, and second attracts FF(Z)={F2}, where the order of Attract-F does not matter here. For Case 3 here we can think about another possibility of Attract-F. Different from the previous assumption that *F1 and *F2 of FF(X) each independently attracts FF(Y) and FF(Z), a pair <*F1,*F2> of FF(X) triggers Attract-F. If we take this option for Case 3, the derivation would crash, since neither FF(Y) nor FF(Z) has a pair and Attract-F fails. If FF(Z) were to have {F1,F2}, then it would converge, since FF(X)=<*F1,*F2> could attract it. In the minimalist program, the universal principles are assumed to be invariant and common to all human language faculties, and parameters (or options) are assumed to be "restricted to functional elements and general properties of the lexicon" (Chomsky 1994 p.4) If we parametrize features to attract in this way, we would explain some language variations of movement. We propose the following parametrization for Attract-F: (51) Attract F where the number of F and the type of F 163 can be parametrized language to language. This approach completely fits the minimalist assumption about parametrization. More specifically, English has the following parameter for a Comp: (52) F of a Comp attracts F where F is OpQ or . In the subsequent sections we will discuss the consequences, investigating Wh-asymmetries. Finally, consider Case 4 in (53) for Attract-F and MLC: (53) Case 4: Let FF(X)={*F1,*F2} and FF(Y)={F1,F2} and FF(Z)={F1,F2}. FF(X) attracts either FF(Y) or FF(Z) exclusively at this time, since both FF(Y) and FF(Z) have {F1,F2} and can be in a checking relation with {*F1,*F2} of FF(X). In this case, however, the order is relevant. If FF(Y) is closer to FF(X) than FF(Z), Attract-F attracts FF(Y) but cannot attract FF(Z) because of the Minimal Link Condition; if FF(Y) is in equidistance with FF(Z), it attracts either of them; otherwise it attracts FF(Z). Following Chomsky (1995), we can define the Minimal Link Condition as follows: 164 (54) (=(110) Chomsky 1995 p.311) Minimal Link Condition K attracts a only if there is no 3, B closer to K than a, such that K attracts E. (55) (Chomsky 1995 p.358) 8 is closer to the target K than a if B c-commands a. In next section let us consider the Minimal Link Condition and multiple feature attraction can explain Wh- asymmetries. 4.3.3 Analysis 4.3.3.1 Some Basic Assumptions In previous sections we have elaborated some distinction among Wh-words in terms of feature specifications. Wh-Nps minimally have {D, OPQ}, Wh- adverbials {Adv,OpQ}, and Wh-in-situ {Dub}. We have also seen that a Comp has {D,OPb} for Wh- questions. We also assume that a comp has only one OpQ hi English to disallow a doubly-filled Comp. Furthermore, the Comp parametrically attracts Wh-phrases with {OpQ}«or . Consider the following three configurations. 165 (56) {C,Cw, 0,-0, {T,... WhP,D,op-Q,]J a. the OpQ of FF(C) attracts FF(WHP) . b. A pair of features of FF(C) attracts FF(WhP). (57) {C,Cw, 0M, {T,... Whpmvflrmn a. the OpQ of FF(C) attracts FF(WHP) . b. A pair of features of FF(C) cannot attract FF(WhP). (58) [Cpcm OM, {T,... Wthmfi] a. the OpQ of FF(C) cannot attract FF(WHP) . b. A pair of features of FF(C) cannot attract FF(WhP). In (56) the Comp can attract the Wh-NP with Opormr , since the Wh-NP also has those features which can be in a checking relation with the features of the Comp. This configuration represents a sentence like (59.a). On the other hand, in (57) the OpQ can attract FF (WhP) , while a pair of the Comp cannot attract the Wh-Adv, since the Wh-adverbial has no D feature in its formal features {Adv, OpQ}. Hence the movement of Wh-adverbials is always triggered by the Oporof the Comp for Wh-questions. This configuration represents (59.b). Wh-in-situ constructions can be represented as in (58). In (58) FF(Comp) cannot attract FF(WHP) with either Opocnr , since FF(WHP) does not have any Op feature. 166 (59) a. what did Mary eat? b. How did Mary eat pizza? We also accept Chomsky’s (1993) assumption that there is no QR-like LF movement. Hence Wh-in-situ does not undergo QR-like LF movement to the Spec of CP to simply take scope”. Under these hypotheses we will attempt to give a unified analysis of Wh-asymmetries in next section. 4.3.3.2 Argument—Adjunct Asymmetries Let us start with argument-adjunct asymmetries, as shown in (60). (60) a. What did John wonder how to fix? b. *How did John wonder what to fix? Consider the derivation of (60.a) first, which can be described as in (61): (61) a. [,,,,C,,,,_Q [TPPRO to {V,,[vpfix whatw'opflfl howop_o]]] b. [cphowop-o-, [0Com [TpPRO '20 [vp[vpfix What(D.Op-o}] 31This does not mean that there is no covert Wh-movement. Presumably, Wh-in-situ can undergo LF movement if Wh-in-situ has an Opb feature, and a Comp also has a non-strong uninterpretable OpQ feature . 167 t,] ] i] c. [CpC(did){D, op-“ John wonder [cphowop_o_, [c.Cop,Q [TPPRO to {V,,[vpfix what{D,op-Q}] t,] l 1 l] d. [prhatw' op_Q,-, [c.C(did) {D, 0pm John wonder [Cphow,p_o_, [c.cop,Q [,Ppno to {V,,[vpfix t’,] t,]]]]]] Suppose that the computational system has constructed (61.a) for (60.a). At this point the computational system can take two choices for further derivation: one is that FF(C) attracts a Wh-phrase with Opb; the other choice is that it attracts a Wh-phrase with . Putting aside the second choice for a moment, let us focus on the first choice here. The Opocof the Comp can potentially attract FF(how) or FF(what), which maps (61.a) to (61.b) or (62), respectively: (62) *[prhatm' 0pm,, [C'COp-Q [TPPRO to {V,,[vpfix what,] howop-Q]]]] —> *the MLC If we compare (61.b) and (62) at this point under economy considerations, the former is more economical than the latter, since FF(how) is structurally closer to the attractor than FF(what). The derivation (62) violates the MLC. So the MLC picks (61.b) for an optimal derivation at this point. After that the computational system is supposed to 168 construct (61.c). At this point the matrix Comp can potentially attract a Wh-phrase in three ways: (i) the OpQ attracts FF(How) in the embedded Comp, which maps (61.c) to (63); (ii) the th attracts FF(what), which maps (61.c) to (64); and (iii) features attracts FF(what), which maps (61.c) to (61.d). (63) *[cphowop-o-, [,,.C(did)o,,_Q John wonder [Cpt’, [,,.C,,p_Q [TpPRO t0 [vp[vpfix What{p,op-o}] t,]]]]]] -> *the Last Resort Condition (64) * [prhatw' CP,», [,_..C(did)o,,_Q John wonder [Cphowop,o_, [,_..c,,,,_Q [,ppRo to {V,,[vpfix t’,] t,]]]]]] -> *the MLC If we compare (61.d), (63) and (64) under economy considerations, the derivation (61.d) is the most economical derivation: (61.d) observes all three derivational economy conditions, while (63) violates the Last Resort Condition, and (64) violates the MLC. That is, at the point of (61.c). the OpQ cannot attract FF(how) in the embedded Comp, since the chain m,_,_t,)_ has already satisfied bare output conditions; if so, it would violate the Last Resort Condition, as in (63) . If the OpQ attracts FF(what), it would violate the MLC, as in (64), since there is an intervening category with OpQ which is closer to the matrix Comp than FF(what). If the Comp attracts FF(what), Attract- F then observes the MLC, since the intervening How does not 169 have features . FF(how) has features {Adv, OpQ}. Furthermore, by this attraction FF(what) can also be legitimate at LF, escaping from vacuous binding. Hence the computational system maps (61.c) to (61.d), being an optimal derivation. Now consider (60.b). Suppose the computations have constructed (65.a). At this point, the Comp attracts a Wh- phrase with either OpQ or . SO far we have considered the first choice above for (60.a). If the Comp attracts £23; with a pair , it observes the MLC, since the intervening hog does not have this pair. Then Attract-F successfully maps (65.a) to (65.b). After that the computations further construct (65.c). At this point the Comp cannot take to attract hog, since hog does not have that pair. So it should attract hog with OpQ. IBut this violates the MLC, since the intervening goes in the spec of the embedded CP has OpQ feature. If the matrix Comp attracts FF(what) in the embedded CP, then it would violate the Last Resort Condition and LF Legitimacy, since EDQL has already formed a Wh-dependency, and so! vacuously binds itself in situ. (65) a. [CPC{D, Op-Q} [TPPRO to fix What“), op_Q} howop_o]] be [CPWhat{D' op-o}_1 [C'C{D, op_o} [TPPRO to fix t1 hOWop_ 01]] c. [CpC(did)Op_Q John wonder [prhatw' 0pm-, [c.Cm, op-Q} 170 [TPPRO to fix t, howop_o]]]] d. *[Cphowop_o_, [c.C(did)op.Q John wonder [prhatw' 0pm- 1 [C’C{D, Op-Q} [TPPRO t0 fix ti tj]]]]] Thus we successfully explain argument-adjunct asymmetries under Attract-F and the MLC. Extraction of an argument out of a Wh-island does not violate derivational economy, but extraction of an adjunct out of a Wh-island violates the MLC. 4.3.3.3 Argument Extraction Asymmetries Now consider (66) exhibiting argument extraction asymmetries. (66) a. What did John wonder how to fix? b. *What did John wonder who bought? Suppose that the computations have constructed (67.a) for (66.b). At this point, the Comp attracts EEO with OpQ. If it instead attracts EEQL. it would violate the MLC, because goo is intervening between the attractor Comp and goes, and the feature of poo is the same as the feature to attract. It maps (67.a) to (67.b). After that the computations further map (67.b) to (67.c). At this point the matrix Comp attempts to attract whee with either Opocn: 171 , but it would violate the MLC, since the intervening who in the embedded Comp has those features. b. [prhow’ op-o}-, [c.Cw, org} [Tpt, bought what“), 0pm] 1] O [CpC(did) {D, OM), John wonder [prhow’ Op-Q}-i [0th op_ Q} Int, bought what“), 010-0)] ] ]] d. * [prhatw’ OM», [c.C(did) {D, 0pm John wonder [chh°{n, Op-Q)-i [C’C{D, Op—Q} [TPti bought tj] ] ] 1] If an argument is extracted out of a Wh-island across another Wh-argument, it also violates the MLC. 4.3.3.4 Argument-Quasi-Argument Asymmetries Sentence (68) is an example of argument—quasi-argument asymmetries. (68) a. Which box did Bill wonder whether John weighed ti b. *How much did Bill wonder whether John weighed t1 Quasi-arguments behave like adjuncts in some cases, although they receive a O-role (and presumably Case, too) as arguments. First of all, they behave like Wh-adjuncts in 172 extraction out of a Wh-island, as shown in (68). Arguments can be extracted out of a Wh—island, yielding mild deviance, while adjuncts cannot be extracted; if so, it would yield severe deviance. If quasi-arguments are extracted out of a Wh-island as in (68.b), it will yield severe deviance like ordinary adjunct extraction. Second, sentence (68.a) can be passivized, raising whioh bog to the Spec of TP, as in (69.a), but (68.b) cannot, as shown in (69.b): (69) a. Which box was weighed by John? b. *How much was weighed by John? In passivization DP-movement takes place for two reasons: one is that the Case feature of the complement NP and T should be checked for convergence; the other that the strong D feature of T should be checked by a DP. Ordinary DP arguments have Case and D features, and so can be raised to the Spec of TP in passivization. If this is correct, presumably quasi-arguments may lack a D feature, (although it might have a N-feature). If it were to have a D feature, it would be raised to the Spec of TP in passivization, and check the strong D feature. The assumption that a quasi-argument how mosh has the feature specification {N, Opo},also gives us some explanation of the fact that, like Wh-adjuncts, quasi- 173 arguments cannot be extracted out of a Wh-island. For (68.a) the computational system constructs the derivation as in (70). (70) a. [CpDiddr-Cw' 0pm [,PBill wonder [prhetherop_o John weighed which box]]] b. [Cp [Which box], [c.Did+C{D, 0p-o) [TpBill wonder [,,,,whether,,p_Q John weighed t,] ] ]] Suppose that the computational system has constructed (70.a) . At this point a feature OpQ of the matrix C'cannot attract which box, because there is an intervening category with Opo; if so, it would violate the MLC. But features can attract it without violating the MLC at all, because the intervening category whegher is assumed to lack a D feature. The computational system thus generates (70.b), which converges. For (68.b), on the other hand, the computational system constructs the derivation given in (71): (71) a. [C,,Did-r-C,,p_Q [TpBill wonder [prhetherop_Q John weighed how much]]] b. * [Cp [How much], [c.Did+Cop.Q [TpBill wonder [,:,,whether,,p_Q John weighed t,] ] ]] Suppose that the computational system has constructed 174 (71.3). At this point a feature Oporof the matrix C fails to attract hog_mooo, because there is an intervening category with OpQ which is closer to the matrix C. Features in English) should attract a Wh- word. Now let us assume that the Oporof a Comp can also 183 parametrically be underspecified as Op, and that the OP attracts an operator. This does not mean that a Comp does not have a Q feature. It is just underspecified. This underspecification is also completely compatible with our previous analyses of Wh-asymmetries. It can still attract a Wh-word, since a Wh-word with two features Op and Q, and an FF(Wh-word) is attracted if an Op feature is attracted. In section 4.3.3.2 we have already considered the asymmetry in (94). Now let us consider (96). Suppose that a pair of the Comp attract a Wh-word. Then it can possibly attract the whole NP with , as shown in (96.a). This observes the MLC, since there is no intervening category with between the NP and the Comp. If the Op of the Comp attracts poppies in (96.b), it cannot be successful, violating the MLC, since there is an intervening category Neg with an Op feature between the NP and the Comp. The of the comp cannot attract only combien, since it has no D feature. We can have the same account for the contrast in (97). If the Comp attracts a wh-word, it observes the MLC. The intervening Neg operator does not have any D feature. If the Op feature of the Comp attracts a Wh-word, it will violate the MLC because of the intervening Neg operator. 184 5. Conclusion and Further Research In the minimalist program economy considerations have played a very important role in optimizing a language system: reducing the components of language only to virtually conceptually necessary modules, and deriving various principles from a very general property of economy. This thesis has attempted to localize derivational economy uniformly and strictly, and to elucidate its significance and consequences, investigating the cyclicity in overt derivations, Procrastinate effects, Wh-asymmetries, and adjunct symmetries in the minimalist program. By localizing derivational economy, we achieve the following desirable results: 0 Derivational economy becomes strictly derivational. 0 Computational complexity is significantly reduced by generating only a set of optimal derivations. 0 The optimality of a derivation is consistent in a course of derivation and at the interface levels. 0 Derivational economy becomes homogeneous in terms of locality. We also make a proposal that Procrastinate should be 185 eliminated and replaced with Earliness. With Earliness we have the following advantages: 0 All the derivational economy conditions become localized uniformly. 0 The Last Resort Condition becomes strengthened so that it can block "no operation". 0 The cyclicity of overt computation and Procrastinate effects are derived from one principle, Earliness. We also hypothesize that multiple features of a target can attract F, and that multiple feature attraction can presumably be parametrized. Under the local Minimal Link Condition multiple feature attraction offers a unified analysis of Wh-asymmetries such as argument-adjunct, argument-extraction, argument-quasi-argument, superiority effects, and adjunct symmetries such as argument-adjunct, pseudo-opacity, and inner island conditions. We also have some more areas to which our analysis can potentially apply and extend. One area is LF cyclicity. Recently it has been reported by Bures (1993), Jonas and Bobaljik (1993), Tsai (1994), and others that LF computation is cyclic. If LF cyclicity analysis is correct, it can also be derived from Earliness straightforwardly. Another area is some argument-adjunct asymmetries in 186 parasitic gap constructions (Cinque 1990). Parasitic gaps are permissible only if they are referential NPs. If we reduce referentiality/nonreferentiality to some properties of a D feature, as presented in this thesis for the analysis of Wh-asymmetries and adjunct symmetries, we may derive the argument-adjunct asymmetries in parasitic gap constructions from multiple feature attraction and the Minimal Link Condition. Another area is some Wh-asymmetries in scope ambiguity and extraction (Cinque 1990). If a Wh-NP is extracted out of a Wh-island, it takes only wide scope over quantifiers within the Wh-island, while it exhibits scope ambiguity if it is extracted from a that-clause. If attraction is assumed to generate a different LF representation from Op attraction, we may be able to explain relationship between some scope asymmetries and extractability under multiple feature attraction. We will leave all these areas for further research. 187 REFERENCES Aoun, J. and D. Sportiche (1981) "On the formal theory of government." The Linguistic Review 2, 211-236. Bobaljik (1995) "In terms of merger: Single output syntax and the strict cycle." Pepers on minimalist sypsax: MIT working pepers in Lingoistics 27, 41-64. Cambridge, Mass.: MIT Press. Brody, M. (1995) Lexieo—Logical Form: A Redically Mieimalist Theoiy. Cambridge, Mass.: MIT Press. Bures, A. (1993) "There is an Argument for an LP Cycle Here." QLS 2§, 14-35. Chierchia, G. (1991) "Functional WH and weak crossover," in D. Bates (ed.) Proceedings of WCCFL 10, 75-90. Chomsky, N. (1970) "Remarks on Nominalization," R. Jacobs and P. Rosenbaum, eds., Readings in English Transformapional Grammar, Waltham, Mass.: Ginn. Chomsky, N. (1973) "Conditions on Transformations," reprinted in Chomsky 1977, Essays on Form end lopeipieoeoiopy North—Holland, New York. Chomsky, N. (1977) Essays on Fogm eno lepegppetation, North Holland, Amsterdam. Chomsky, N. (1986) Knowledge of lenguage. New York; Praeger. Chomsky, N. (1991) "Some Notes on Economy of Derivation and Representation." R. Freidin, eds., Prinoiples and Paiamepers in Comperepive Grammar, Cambridge, Mass.: MIT Press. Chomsky, N. (1993) "A Minimalist Program for Linguistic Theory." K. Hale and S. J. Keyser, eds., The View from seileipg 20: Essays in Lingoistics in Honor of Sylvain Biompergeg, 1-52. Cambridge, Mass.: MIT Press. Chomsky, N. (1994) "Bare Phrase Structure." Mit Occasional Eepe; is Lingoistios 5. Cambridge, Mass.: MIT Press. Chomsky, N. (1995) The Minimelis; Program. Cambridge, Mass.: MIT Press. Chomsky, N. and H. Lasnik (1993) "The theory of principles and parameters." J. Jacobs, A. von Stechow, W. Sternefeld, and T. Vennemann, eds., Syptax: An 'n na ' l h db ok of on m ora re arch. Berlin: de Gruyter. Cinque, G. (1990) Types of A’-dependencies. Cambridge, Mass.: MIT Press. Collins, C. (1994) "Economy of Derivation and the Generalized Binding Condition." Lingeistic Ingoigy 25, 45-61. Collins, C. (1995) "Toward a theory of optimal derivation" Papers on minimalist syptax: MIT working 188 papers in Lingoistics 27, 65-104. Cambridge, Mass.: MIT Press. Emonds, J. (1978) "The Verbal Complex of V’-V in French." Linguistic Ingoi;y 2, 151-175. Epstein, S. D. (1991) "Derivational Constraints on A’-Chain Formation." Lingoistic Ingoiry 23, 235-259. Fukui, N. (1993) "Parameters and optionality." Lingeistic Inge1§y 24, 399-420. Higginbotham, J. (1983) "Logical form, binding and nominals" Lingoistic Ingoipy 14, 395-420. Holmberg, A. (1986) Word order end syptectic features in the Scandinavian languages and English. Doctoral dissertation, University of Stockholm, Stockholm. Hornstein, N. (1995) Logicel Form: From GB To Minimalism Cambridge, Mass.: Blackwell. Huang, J. (1982) Logical Relations in Chinese and the Theorv of Grammar. Doctoral dissertation, Cambridge, Mass. Jackendoff, R. (1977) X’ syptax: A study of phrase structure. Cambridge, Mass.: MIT Press. Jonas, D. and J. Bobaljik. (1993) "Specs for subjects: The role of TP in Icelandic." Pepers on Case & Agreement I. MIT Working Pepers in Lingoispics 18, 59-98. Kayne, R. (1994) The antisymmetgy of syptax. Cambridge, Mass.: MIT Press. Kitahara, H. (1994) T r e a- A nifi he r Movement and strooture Building. Doctoral dissertation, Harvard University, Cambridge, Mass.: MIT Press Kitahara, H. (1995) "Target a: deducing strict cyclicity from derivational economy." Lipgoispio Ingoipy 25, 47-78. Kitahara, H. (1996) "Minimal Syntactic Procedure: Deriving the Timing of Movement." Paper presented at Michigan State University, Dept of Linguistics, East Lansing, Michigan. Larson, R. (1988) "On the double object construction" Linguisgig Ingoigy 19= 335-391. Larson, R. (1990) "Double object revisited: Reply to Jackendoff." Lingoispio Ingoigy 2;, 589-632. Lasnik, H. (1992) "Case and expletives." Lingoisgic Ingpiry 23y 381-405. Lasnik, H. (1993) "Lectures on Minimalist Syntax," ggflPL Ocoesionel Pepers 1. Storrs. Lasnik, H. (1995) "Case and expletives revisited: On greed and other human failings." Linguistio Ingeiry 26. 615-635. Lasnik, H, and M. Saito. (1984) "On the nature Of proper government." in istic In ir 1 . 235-289. Lasnik, H, and M. Saito. (1992) Move a. Cambridge, Mass.: MIT Press. Lee, Daehee (1995) "The Revised Greed Principle and Phrase Structure Constructions: the Earliness Principle," I) 189 presented in Michagan State University Linguistic Colloquium, October 1995. Lee, Daehee (1996) "The Timing Principle On Syntactic Derivations," presented in Michigan Linguistic Society Annual Conference, October 1996. Longobardi, G. (1994) "Reference and proper names: A theory of N-movement in syntax and Logical Form." Lingeistic Ingoigy 25. 609-665. Obenauer, H. (1984) "On the Identification of Empty Categories." Linguistic Review 4. 153-202. Oka, T. (1993) "Shallowness" Papers on Case e agreement II: MIT working papers in linguistics 19, 255-320. Cambridge, Mass.: MIT Press. Oka, T. (1995) "Fewest steps and island sensitivity." Papers on minimalist syptax: MIT working papers in Lingeistics gly 189-208. Cambridge, Mass.: MIT Press. Pesetsky, D. (1989) "Language Particular Processes and the Earliness Principle." Ms., MIT, Cambridge. Reinhart, T. (1993) "Wh-in-situ in the framework of the minimalist program." Lecture given at the Utrecht linguistics Colloquium. Rizzi, L.(1990) Relativized Minimality. Cambridge, Mass.: MIT Press. Ross, J. R. (1983) Inner Islands. Manuscript, MIT. Tsai, W. D. (1994) On Economizing the theory of A-bar dependencies. Doctoral dissertation, Cambridge, Mass.: MIT Press. Ura, H. (1995) "Towards a theory of ’Strictly derivational’ economy condition" Papers on minimalist syptax: MIT working papers in Lingoistics27, 243-268. Cambridge, Mass.: MIT Press. Sauerland, U. (1995) "Early features" In Papers on minimalist syptax: MIT working papers in Lingoistics 27, 223-242. Cambridge, Mass.: MIT Press. Watanabe, A. (1995) "Conceptual Basis of Cyclicity" Pepers on minimalist syptax: MIT working papers in Linguistics27, 269-291. Cambridge, Mass.: MIT Press.