lWlHIHUIlHlH‘HI WIlllHHHHlIWHIHIWI '—1 3:3: _cn4>-oo ——1!I§%s THEsm: TTEIVE u .1 lililiijfllnlillflliimlil This is to certify that the thesis entitled WORD FREQUENCY STUDY AND MORPHOLOGICAL ANALYSIS OF TWO BANTU LANGUAGES presented by Chege John Githiora has been accepted towards fulfillment of the requirements for Master of Arts degree in Linguistics Department of Linguistics, Germanic, Slavic Asian and African Languages ,% #44... Major professor Date 2 April 1993 0-7 639 MS U is an Affirmative Action/Equal Opportunity Institution LIBRARY Michigan State University PLACE N RETURN BOX to remove We checkout from your record. TO AVOID FINES return on or bdore date due. DATE DUE DATE DUE DATE DUE ' MSU le An Affinnetlve Action/EM Opportunity lnetltmlon WW1 i —————-—...——_.__ WORD FREQUENCY STUDY AND MORPHOLOGICAL ANALYSIS OF TWO BANTU LANGUAGES BY Chege John Githiora A THESIS Submitted to Michigan State University In partial fulfillment for the degree of MASTER OF ARTS Department of Linguistics, Germanic, Slavic, Asian and African Languages i993 ABSTRACT WORD FREQUENCY STUDY AND MORPHOLOGICAL ANALYSIS OF TWO BANTU LANGUAGES By Chege John Githiora This thesis describes the theory and method of producing basic 3000-word frequency lists of the written vocabulary of two important African Bantu languages: Kiswahili and Gikfiyfi. Toward this practical end, I discuss their morphology, with special reference to nominal and verb constructions. The first section presents the linguistic background of the two languages and related work that has been done so far on them and others in the area of lexical studies. A second section explains in detail the procedure and method used in producing the word lists, and the third part discusses some general aspects of Bantu morphology including two approaches to general morphological theory. It also covers in detail a study of word structure of the two languages with an emphasis on morpheme order and cooccurrences. Using a structural approach Bantu morphology, a template of Bantu morpheme order is elaborated. Finallly I discuss the problems and limitations of the thesis, with some concluding remarks. ACKNOWLEDGMENTS I would like to express my sincere gratitude to my academic advisors who steered me through two years of study here at MSU. Special thanks go to Dr. Grover Hudson for his mentoring and belief in my abilities, from the very beginning. To Dr. Dennis R. Preston for clearly perceiving my ideas and ideals, for nurturing my intuitions and providing inspiration. To Dr. David Dwyer for his great patience and open heart, willingness to listen and offer helpful advice. I would also like to thank Dr. Carolyn Harford for laying open to me her knowledge and expertise of Bantu linguistics. I also wish to thank.my friends and colleagues at MSU, for their encouragement and moral support during my studies. To the faculty and staff of Linguistics and Languages Department for their friendliness and cooperation, both which made the pains of writing this thesis a bearable experience. iii TABLE OF CONTENTS Page LIST OF TABLES ............................................ vi 1. Introduction ......................................... l 1.1. Word Studies and Bantu Languages ..................... 2 2. Objectives ............................................. 5 2.1. Areas of Application of Results ...................... 6 3. Kiswahili and Gikfiyfi: The Languages and Speakers ...... 10 4. Method and Procedure .................................. 11 4.1. Zipf’s Law .......................................... 12 4.2. The Corpus .......................................... 13 4.2.1. Selection of Texts ................................ 16 4.2.2. Background to Corpus Compilation .................. 17 4.2.3. Sources of Corpus ................................. 18 4.3. Tagging ............................................. 20 5. Bantu Morphology ...................................... 22 5.1. Nominal Constructions ............................... 23 5.1.1. Modifiers ......................................... 25 5.2. Unit of Lexical Analysis (ULA) ...................... 26 5.2.1. Independent Nominals as ULAs ...................... 27 5.2.2. Modifiers as ULAs.... ............................. 28 5.3. Verb Constructions .................................. 29 5.4. The Tagging Code .................................... 34 6. Bantu Word Structure and Morphological Theory ......... 37 6.1. The Template Approach ............................... 37 6.2. The Generativist Approach ........................... 38 6.3. An Eclectic Approach ................................ 39 6.4. A General Template of Bantu Verb Structure .......... 42 6.4.1. Inventory and Description of Morphemes ............ 44 6.4.2. Order of Pre-Stem Morphemes ....................... 47 6.5. A Bilingual Morpheme Template ....................... 50 7. Expected Results ...................................... 51 8. Discussion and Conclusions ............................ 52 iv 9. References ............................................ 55 10. Appendix A .......................................... 58 Appendix B ........................................... 59 LIST OF TABLES Page Table I Inventory and Description of Morphemes ....... 44 vi 1 . Introduction The aim of this thesis is to describe the theory and method of producing basic 3000-word frequency lists of the written vocabulary of two important African Bantu languages: Kiswahili and Gikfiyfi. It also discusses their morphology, with special reference to the concept ’word'. The paper is divided into four major parts. The first one gives the linguistic background of the two languages and related work that has been done so far in the area of lexical studies. The second section explains in detail the procedure and method used in producing the word lists, and the third part discusses some general aspects of Bantu morphology including two approaches to general morphological theory. It also covers in detail a study of word structure of the two languages with an emphasis on morpheme order and cooccurences. A final part discusses the problems and limitations of the thesis, and presents some concluding remarks. The theSis describes the study of at least 40,000 ’words' of each language.’Word' at this point refers to the graphic word or 'a sequence of contiguous alphanumeric characters between two spaces or punctuation' (Kucera l967:9). But because of the rich and complex morphology of agglutinating languages such as the Bantu ones dealt with in . this study, much more needs to be said before such a definition can be applied as the basis for any type of analysis. (1) mtu anayekuandikia barua 'person who is writing you a letter' The above sentence is made up of three graphic words. While the first and last are single lexical entries, the middle graphic word is a verb construction which can be analyzed down to a number of bound morphemes (4 prefixes, an extension and a final vowel) in the following manner: (2) a-na-ye—ku-andik-i-a SUBJ—TS(present)-REL-OBJ-STEM-APPLICATIVE-FV. Note: The subject (SUBJ), relative marker (REL) and direct object (OBJ) are all in the 3rd person singular form. FV stands for final vowel. It is necessary to analyze such a construction, to arrive at its base form, isolate the morphemes, derived forms and any inflectional variants. Each category can then be dealt with accordingly. What constitutes a 'word' and how morphemes, which play important grammatical roles, are to be treated forms a crucial part of the present paper. One fundamental theoretical goal of this research is to determine what constitutes a Unit of Lexical Analysis (ULA) for these Bantu languages. After a more complete discussion, I will provide a description of ULA as a methodological concept in section 5.3. 1.1. word Studies and Bantu Languages Studies of the lexicon have tended to be sidelined in the general study of linguistics. Work in the general field of 3 lexicology has concentrated on dictionary writing and compilation. While the need for scientific analyses of lexical inventories and statistical studies of particular languages has long been recognized, actual work in that direction was restricted by the logistical requirements of managing large bodies of data that needed constant manipulation. Frequency or basic word studies were expensive and extremely time consuming to produce in the past. But with the spread in use of and access to computer technology, these studies have become highly feasible. The 19605 mark the initiating period of lexico- statistical studies proper. Before then, studies of various Indoeuropean languages were done on a smaller scale-—well planned and intended but with scope and results that reflected the inadequacies of their technological era. For instance, Allwood and Wilhelmsen’s 1947 study of Basic Swedish word frequency involved a panel of 'wisemen" who sat to decide the frequency of each word of their language. Another example is Dabb’s (1966) study of Bengali done at Texas A&M University in which manual counts of words in newspaper texts were done by a team of researchers. Even for a language like English, with its long history of scholarship, a thoroughly complete study of its lexicon was not accomplished until 1967 with the publication of the Brown Study (Kucera & Francis 1967 & 1982) The technological advances and their impact on lexical 4 studies are not so well reflected in African languages. Hardly any substantial work has been done and published. Even for Kiswahili, a relatively well studied language of Africa, there are no published frequency or basic word lists, much less for Gikfiyfi. In addition, past lexicographic works (mainly dictionaries) do not integrate important underlying characteristics of Bantu word morphology. A good example is their treatment of derived verb forms as subentries of the base form despite significant meaning changes. See for instance, Johnson (1939) and Hamisi et a1 (1981). 2. Objectives Frequency lists of the vocabulary of languages are essential bases for many types of research in both theoretical and applied linguistics. Some specific examples are given in section 2.1. While the production of such lists is greatly facilitated by technology, it is not an entirely mechanical enterprise, nor does it entail mere description of words and parts of language. A comprehensive application of linguistic theory--from morphology to semantics-—is required, as is a thorough knowledge of the structure of individual languages. This thesis attempts come up with a linguistically sound methodology of producing basic and frequency word lists of two Bantu languages and, in doing so, to elaborate on aspects of linguistic theory relevant to specific areas of the project. The work will provide a solid basis upon which the actual lists of the two languages will be produced. By the same token, it is anticipated that the credibility of this method will be such that it will inform future word studies of Bantu and other similarly structured languages. Finally, I expect that the study will produce results that may add to the body of knowledge of linguistic theory. The fact that linguistic theory is being studied at the same time as the work of producing the actual lists is significant in that useful deductions and inductions can be made from the data at hand, and our theoretical questions 6 may target specific problems encountered during the analysis. While this is a paper on theory and method, at certain stages of analysis throughout this paper I shall point out what I have achieved so far, in actual practice. 2.1. Areas of Application of Results Apart from insights into the theory of language, there are several possible areas in which the results of this work are of more immediate application. The final product will be of interest to linguists, language teachers of Kiswahili and Gikfiyfi, psychologists, computer scientists and others. The study will be of special relevance in the following areas. 2.1.1. Preparation of teaching materials. A teaching grammar or text-book for a language course requires knowledge of relative basicness of vocabulary, so that important words can be taught early and less important words later, and so that what is taught can be expected to be naturally reinforced in typically encountered reading materials. A list of basic words will serve to guide writers of text- books and other language learning materials. 2.1.2. Dictionaries for the languages. Basic learner's dictionaries including at least about 3000 words presuppose knowledge of what the most important (frequent) words are. A sensitive native-speaker of a language has good but imperfect intuitions of the basic vocabulary known by other native speakers, but these intuitions are unreliable for second language users of the language, whose exposure to the 7 language is small in relation to that of a native speaker. I am currently working on a Kiswahili-Spanish-Gikfiyfi dictionary whose quality and authority will be enhanced by the incorporation of the results of this study. 2.1.3. Preparation of all sorts of print materials addressed to non—native speakers of the languages. In multilingual parts of the world such as East Africa, many persons have varying but functional knowledge of several languages. Although Heine (1981:21) reports that ’the average Kenyan is proficient in at least 1.1 languages’, many of those living in the central and urban areas of the country speak at least 3. A native speaker of Gikfiyfi in north central Kenya, for example, frequently has cause to read and speak Kiswahili. Print materials are readily available in Kiswahili and are much used throughout East Africa; they are becoming increasingly available in languages like Gikfiyfi. More and more, these writings must be used by persons who know the languages of the materials only as a second or third language. These important writings include such things as instructions for census reports and public-use questionnaire, voting, and instructions for use of medicines among many other uses. The effectiveness of such materials (based on corpora produced by both native and non-native speakers) can be improved if the vocabulary they use can be statistically evaluated for its likelihood to be understood by non-native speakers. 8 2.1.4. Innumerable types of essential linguistics research are based on vocabulary lists. For example, psycholinguistic investigations of the connotation and prototypic uses of words provides important knowledge which may be used as a basis in translating between non-kindred languages such as Kiswahili and Spanish. In such investigations it might be of interest to compare the frequency or other statistical characteristics of words of approximately equal meaning across these languages. One hypothesis I have proposed previously 1is that Kiswahili has received enough influence from Arabic and some Indoeuropean languages such as English and Portuguese, to the extent that certain aspects of its structure today reflect a significant shift away from other structurally related (Bantu) languages, such as Gikfiyfi, toward that of Indoeuropean ones. For example, there is a widespread use of prepositions where they are not required by the language. From the perspective of componential analysis which treats word meaning as the sum of its constituent parts (e.g. 'woman' = +HUMAN; +FEMALE; +ADULT), word meanings seem able to ’decompose’ into their constituent parts which subsequently acquire specialized meanings. For example ndugu at an earlier stage in the history of the language had the 'Definition and equivalence in a Bilingual Dictionary of Non-Kindered Languages: Kiswahili-Gikfiyfi-Spanish', a paper I presented at the 23rd Annual Conference of African Linguistics (ACAL), MSU, March 1992. 9 generic meaning of 'fellow sibling, male or female’ along with other extended meanings including 'cousin, kinsman' and 'tribesman'. To denote ’sister’ or ’brother’, this word would be qualified by kike (female) or kiume (male) respectively. In present day standard Kiswahili however, dada, a new lexical item is used in place of the former ndugu_wa kike 'sister’, while ndugu is used to refer exclusively, within the realm of kinships, to ’brother’ ndugu wa kiume. Such a change affecting a basic kinship term may be attributable to analogical change as a result of language contact with languages that have such a marked gender distinction, possibly initiated by second language speakers of Kiswahili. Gikfiyfi has not undergone this particular change. There are other examples of the mutual influence (at the lexical, syntactic and other levels) among the languages in contact in East Africa, which have been the subject of a few investigations in language contact and change (e.g. Scotton 1992). The results of this frequency study will enable us to statistically compare the two languages, to see if the relative basicness of words between these two languages supports such hypotheses or not. 10 3.xiswahili and Gikfiyfi: Speakers and Orthography The languages of this project are important ones of the Bantu stock. Within the Eastern Bantu subfamily, Kiswahili is one of the languages of the Northeast Coast sub-group, while Gikfiyfi belongs to the Thagicfi one (Spear, Guthrie and others). Kiswahili is the most widely spoken African language, the national language of Tanzania and Kenya, and the lingua franca for all of East Africa and much of Central Africa in Rwanda, Burundi and Zaire. With at least 5 ndllion native speakers found within Kenya, Gikfiyfi is the second most widely spoken language of Kenya after Kiswahili. It is often the lingua franca along fuzzy linguistic borders in the central region, and in urban areas. Gikfiyfi is also important to the project as a representative Bantu language. Of the many Bantu languages found in East and Central Africa, many of the more important ones, such as Gikfiyfi, have a considerable body of print materials. All are written using, largely, the basic letters of the Roman alphabet as does English. This is important because it allows us to utilize technology that has been designed for such languages as English, without need for great modifications (e.g. scanning, storage and manipulation of data stored in electronic format). Once the project has been accomplished for Kiswahili and Gikfiyfi, it should be readily possible to accomplish it for others of the typologically close but mutually unintelligible Bantu languages. 11 4. Method and Procedure: A concrete example of the extent to which technology permits this type of work is found in the very first stage of the project. In order to subject a corpus of large proportions to analysis, it is necessary to create a database that can be manipulated in many ways on computer. Without the appropriate means, it is an uphill task-—even with a sizable team of typists--to enter into a computer such an extensive body of language on the keyboard. The otherwise insurmountable job was resolved by relying on state-of—art technology. To build a database for this project, I used optical scanners (or Kurzweil Scanners) available in MSU computer laboratories to convert print texts to electronic "ASCII'I texts. The scanners take an image of a page of text and “read it to disk“. I obtained approximately 200 words with each exposure of the scanner, scanning about 200 computer pages for a total-text length in graphic words of at least 40,000 for each language. I then edited these texts in preparation for treatment by a text-analyzing program. For Gikfiyfi, there was the additional problem of the scanner being unable to recognize- -and misreading--the graphemes <fi>, . Such editing is a time consuming aspect of the project and must be done by one with sufficient linguistic sophistication and knowledge of the languages being studied. 12 Morphemic homophones occur frequently, making it necessary to edit manually where the simple word processing commands that I use may not be effective. Some Word Perfect 5.1 functions are very useful and sufficient where generalities can be made, but it is not possible to for it recognize the difference between the sequence as an adverbial affix in.k;jerumani 'German’ or as a (continuous) tense marker in akienda ’s/he going' or as a diminutive prefix marker as in.k;jiti(’small stick’), or as a mere (grammatically) insignificant sequence as in hakinu ’judge’. A crucial methodological aspect of editing these texts is the tagging of each part of speech, which has to be done before doing the actual frequency counts. It is an important task which forms a crucial component of this thesis so I discuss it in greater detail in a section below. 4.1. zipf's Law A significant part of the project whose methodology I am describing here involves statistical operations. The final goal of the project is to determine the 3000 most frequent words of the two languages. The first step required was to detenmine how such a list of words may be obtained using statistically credible methods. The number of tokens (i.e untagged graphic words) to be studied, 40,000, was a bit more than that strictly needed to derive a vocabulary of 3000 different words according to a ratio which is known as Zipf’s law (Paul Zipf, 1945). This 13 formula provides the probability of a word of given frequency rank. In Miller 1981:107, it is stated that Although some people have larger vocabularies than others have, a few words are used so frequently that everyone knows them. The 50 most frequent words in speech make up 60% of what we say ("I" ranks first); the 50 most frequent in writing make up 45% of what we write (“the“ ranks first)...if word frequencies are tabulated and the words ranked from the least frequent, a simple formula describes the relation...known as Zipf’s Law If p,.is the probability of the rth most frequent word, then Zipf showed that the formula gives a good approximation of word probabilities in many languages. Zipf’s law is considered a universal which is expected to have rough validity for all languages, and so in a 40,000-word Bantu language text, one expects to find approximately 3,000 different words, the least frequent several hundred of which occur only once or twice in the sample. The present study is based on this principle. 4.2. The Corpus The corpus consists of machine readable texts and has been assembled on electronic format, creating a database of these two languages for possible use in word studies of many types. The corpus seeks to provide a representative sample of modern written language which is computer-accessible for all manners of analysis and for addition or modification. Its composition reflects these efforts as well as the limitations involved in undertaking the project. Text 14 samples range from newspaper articles to poetry, fiction and sociolinguistic studies, among other genres. In the long run, this database should become an important resource for scholars of these two languages, once it has been edited and stored in easily retrievable electronic format. A crucial aspect of the project is selecting the texts to assure having a valid representation of each language. Of course for Kiswahili there are texts of all sorts of print materials, as for many major world languages. For this language, I have used newspapers heavily, because they provide writing over a range of topics and styles, including the editorial page, letters to the editor, news, and sports. The sample of the language obtained may be expected to be a valid cross-section of the modern written language. It is possible that spoken, colloquial, language is somewhat different in its basic vocabulary, but for the purposes mentioned above, the written, more formal language is more appropriate. However, I have included transcriptions of Kiswahili conversations which are available from a previous discourse analysis project that I did in November of 1991. This will provide at least 3000 tokens of the spoken Kenyandialect of the language. While the inclusion of these conversations may be inconsistent with our aim of studying written language, it should better be considered a pilot study which might give useful insights for a future comparative study between spoken and written language. 15 Corpus gathering for Gikfiyfi was more problematic because of the unavailability of newspapers or magazines for inclusion in the study. I had to rely on a limited number of authors who have done most of the writing in this language, notably Ngfigi wa Thiong’o and Gakaara wa Wanjau. The representativeness of this sample is more restricted than I would prefer, but at the same time, there are not as many authors or publications in the language (though certainly much more than I have available here now at MSU) and it may be argued that the sample is valid enough in that it does represent what limited written work exists. In any case, I hope to add more diverse printed material to this corpus in the future, as they become available. Aside from independent sources and helpful tips from my academic advisors, selection of the corpus is modelled after three similar studies of larger proportions. Two are on English--the Brown Corpus (1967) and the LOB (London-Olsen- Bergen, 1989) Corpus. (See reference section for full citations.) The former is the most famous and comprehensive word study of the English language. The latter, more recent, one was modelled after the Brown study with a few innovations. It analyzed British English. The third study is on Mexican Spanish, a project that was carried out at El Colegio de Mexico by Dr.Fernando Lara et a1, whose results were published as a book in 1989. It was a comprehensive study, very similar to the Brown study in its methodology. 16 It also provided the basis for a subsequent Dictionary of Basic Mexican Spanish. All these studies fit within a paradigm of lexicology in that they are based on certain common principles and all have a closely resembling methodology of compilation and analysis. They are also applied to two of the most widely spoken world languages of the Indoeuropean stock. I have stayed within the framework of this tradition to the extent where it applies in this study of vastly different languages. Both typological and linguistic differences between the languages of the cited works and the ones of this study have called for divergent and creative approaches. While always on the lookout for possible universals that may be applicable, it has been necessary to rely on my knowledge of the structure of these languages in order to solve some methodological problems. 4.2.1. Selection of Texts With few exceptions, the corpus encompasses genres that have been used in the studies cited above. The only areas that I have had trouble representing are those of science and technology. The following genres are represented in the Corpus. Full citations of the sources of text follow. (a) Literature (novels, short stories, poetry; drama); (b) journalistic (news, opinion; editorials, politics, sports, international news): (C) political discourse (Kenyatta, Nyerere speeches); (d) religion (Bible Gospel; missionary 17 works); (e) cultural studies (essays); (f) popular literature (detective novels, short stories); (9) transcribed conversations. With the two exceptions noted above, these are the genres included in all the major studies I have examined. It is also worth noting that I have selected the texts, to the extent possible, without regard to the preeminence of their authors—-the representativeness of the writing being the determining factor. Also, I have made attempts to control the length of each sample to avoid overrepresenting a particular author or style. I have yet to determine the exact amount of words I used from each text or author. As I stated in 3.2. above, the serious limitations I encountered included a lack of scientific material available, and a limited pool of Gikfiyfi authors to choose from. 4.2.2. Background to Corpus Compilation I began by assembling all publications available in personal and MSU libraries. From these books I selected a few pages of each book or pamphlet and photocopied theml Whenever possible (e.g., with short stories or news items, I selected whole passages, but in many cases continuity ended where I skipped pages of a particular book. Text coherence (i.e. a whole continuous story or article) was not considered necessary. In the final analysis our interest is in words, and the context that may eventually be required to eliminate ambiguity is not expected to go beyond the 18 sentence—level. After making the cleanest photocopies with best print- clarity, (newspaper texts often have to be enlarged for treatment by optical scanners, with frequently some loss of effective print clarity) these were submitted to the scanning specialist for scanning and storage in computer disk. The texts were stored in separate files according to their print type. The scanner required a “training” session for every print face it encountered; given the wide variation of sources of texts (books of different publications, newspapers, magazines, special graphemes of Gikfiyfi etc), it required many training sessions which led to scanning difficulties, higher costs, and much loss of print clarity. With the scanned texts back in my hands again, I went through each file, removing undesirable marks that the scanner had misread/misrepresented, rejoining words and sentences that may have been erroneously separated and so on. However, it should be noted that in cleansing the texts in this manner, the original texts were not tampered with, modified, or rearranged. 4.2.3. Sources of Corpus The general sources of the corpus, the difficulties and limitations involved in the corpus gathering have been discussed. The corpus at hand includes texts from the following sources. Kiswahili 19 AlKUIN, P.,O.S.B, Kabla ya Kuolewa Benedictine Publications - Peramiho, Tanzania, 1982, 7ed. [Religion, social commentary, customs and culture] Pages: 5,6,7,12,13-21 & 4 pages without numbers. MBAABU, Ireri, Kabla ya Kuolewa, Longman, Kenya, 1991. [academic/sociolinguistic/language history]. Pages: 7, 95, 26, 27, 42, 43, 72,73. KEZILAHABI, Euphrase, Rosa Mistika, Kenya Lit. Bureau, 1971. [Hadithi/Novel] Pages 62, 63, 69,76,77,86,87 MWENEGOHA, H.A.K..MWenye Uhuru, Transafrica Press. [Biography— Nyerere], pages 10,11 TAIFA LEO [Newspaper, Kenya, Julai 31, 1992 — opinion column by Walter Mbotela on the standardization of Kiswahili; second letter in same daily opinion by Kimani Ruo "Tafsiri zinavyokanganya“ [The hazards of translation]. TAIFA LEO, many other texts of various sections of this paper and the one below, KENYA LEO, have been included. KENYA LEO [Newspaper, Kenya, Agosti 3, 1992 — 6 letters to the editor] AMEIR ISSA HAJI et al,.Misingi ya.Nadharia ya Fasihi Taasisi ya Kiswahili na lugha za kigeni, wizara ya elimu Zanzibar-Tanzania, 1981. [Literary theory] Pages: 4—11, 17-21, 30-35, 56-62. NASSIR, Med, Malenga wa.MVita: Diwani ya UStadh Bhalo, Oxford University Press, 1971. [collection of modern short Stories], Pages 6,7,11,12,13,20,21. MUNGIA, Justin D., Hadithi za.Mfalme Sinsin Tanzania Publishing House Ltd, Nairobi, 1971. [Tales of old] Pages: 6,7,8,9,14 ROBERT, Shaaban, Utenzi wa Vita vya Uhuru 1939 hata 1945, Oxford Univ. Press, Nairobi, 1967. [epic of WWII, Mashairi]; Pages 1-15. POETRY ASSORTED, (4, pages) Gikfiyfi GAKAARA, Wanjau, wa.Nduuta Hingo ya Paawa, Gakaara Press, Nyeri, Kenya, June 1984. [Popular Literature].Page 2-12 20 , we Nduuta Kfirega na.Mfitf waake, July 1982, Gakaara Press, Nyeri, Kenya, No. 32 [Popular Literature]. Pages 6-9. , Thooma Gifgfkfiyfi Kfega Ibuku ria Keerf "A” Gakaara Press, Nyeri, Kenya, Jan 1988, (Revised Edition) [Language and Culture] Pages; 32,33 , waeNduuta Akfhererwo nf .Mfika, No.23, Feb., Gakaara Press, Nyeri, Kenya, 1982. , MWandiki wa.Mau Mau Ithaamfrioini, Heinneman Educational Books, 1983. [Biography-Historical Novel-Dairy] Pages: x,xii,36,38,80,81,112,113, 117,142, 143,146,147,152,153. NJOROGE,Lizzie N., Nitwendeete Rfithiomi Rwiitfi, East African Publishing House, 1979. Kenya Institute of Education, [Folk stories]. Pages 4, 8, 16, 22]. TKK Gikfiyfi, Book 3A, Longman, Kenya Ltd. 1981. [Primary school reader]. Pages 1,2,3,4, 8,10. THIONG'O, Ngugi, Matigari ma Njirfifingi, Heinneman Kenya Ltd. 1986. [Novel] Pages; 30-37,52-55,78,79,86,87,90,91 , Njaamba nene na Mbaathi f Mathagu, Heinneman Educational Books, 1982. [Children’s book] Pages;4,6,8,15,16,18,20,22—25 , Ngaahika.Ndeenda, Heinneman, 1986, [Play-Drama] 5 unnumbered pages). 4.3.Tagging Methodologically the principle of “tagging“ is crucial in this project. A ’tag’ is a string of capital letters and/or symbols indicating the grammatical category to which a graphic morpheme is assigned (Kucera, 1967). The full list of tags used in this project is laid out below in Table I. The tagging process therefore refers to the assignment of a specific grammatical designator to each morpheme, based on 21 the taxonomy of the language’s grammatical or functional categories. Spelling variants and derivatives are noted. I have designed a tagging code for all morphemes of the two languages occurring in the corpus which shall be used for the entire data base. The tags will be the chief methodological tool that I will use in the counts-- especially those of grammatical morphemes--and in other kinds of analyses arising from the project. The tagging procedure is based on a morphosyntactic study of the two languages, and here the problems arising from the fuzziness of the boundary between morphology and syntax become obvious sometimes. In particular, the issue of what role a given morpheme --say a noun class marker-— plays in the language is challenging: is it part of the noun or not? How should such markers be tagged? Are they ULAs or not? Should they be treated the same as lexical stems or not? These questions require answers which I shall attempt to provide in the following sections. 22 5. Bantu Morphology. To better justify the tagging procedure and categorization of tokens (graphic words), it is necessary to provide a sketch of Bantu morphology of the two languages to the extent to which it concerns this project. Although the two languages are distinct, they share many commonalities which I shall draw upon in order to construct a bilingual template which might be elaborated into an algorithm for the automated analysis of Bantu grammatical constructions. Both Kiswahili and Gikfiyfi have a morphological structure typical of most Bantu languages. Two aspects are especially relevant here: (i) nominal constructions, with special attention to the noun class system and word formation, and (ii) verbal constructions, with particular attention to derivation and inflection. 5.1. Nominal Constructions This category refers primarily to nominal derivation, compounding and, in general, the system of concords affecting modifiers (adjectives, numerals and interrogative nominals such as spi, -ngapi). The classificatory system of nouns plays an important role in the grammar. It vaguely reflects semantic classes and it functions in similar ways (such as determining agreement in class and number) to Indoeuropean (grammatical) gender (Jensen 1990). A Spanish example may illustrate this point: 23 (3) a. cuantas personas vinieron? (Spanish) b. watu wangapi walikuja? (Kiswahili) c. ni andfi aigana mookire (Gikfiyfi) ’how many persons came?’ Morphemes in bold are those which are attached to a stem.to indicate gender (fem.) and number (plural) for Spanish in (3a); (3b) and (3c) shows class (1/2) and number (plural) for Kiswahili and Gikfiyfi respectively. Noun class markers function as indicators of class; nouns may be shifted to a different class, yielding a modified meaning. For example, they may be shifted from an original class to another to form diminutives, augmentatives or collectives. This is done by changing the class prefix. Note that as in the case of verb extensions in derivation, the resulting meaning is not always predictable as exemplified below. (4) Kiswahili a. mtoto (cl. 1/2) > katoto (c1. 13/12) ’child' > ’1ittle child’ b. mlima (c1. 5/6) > kilima (cl. 7/8) ’mountain’ ’small hill’ c. simba (cl. 9/10) > masimba (cl. 5/6) ’lion’ ’pride of lions’ (5) Gikfiyfi a. mwana (cl. 1/2) > kaana (cl. 13/12) ’child’ ’little child’ b. mfindfi (cl. 1/2) > kimfindfi (cl. 7/8) ’person’ ’gigantic person’ The above examples illustrate two fundamentals of Bantu morphosyntax. A change of prefix (in bold) brings about a change in nominal class and meaning. 24 Compounding creates new words by combining two or more others. Various types of compounds are found throughout the lexicon. MWana, 'child’ as a relational term ’denoting the practitioner of a profession related to N’ (Mchombo & Bresnan 1993:11) is quite productive as the following Kiswahili examples show: (6) wa-/mwanasiasa < mwana + siasa ’politician’ < ’member’ + ’politics’ wa—/mwanachama < mwana + chama ’partymember’ < ’member’ + ’party’ Other compounds are created by juxtaposing a verbal base and a direct object or bare NP, and of course the appropriate noun class prefix (in bold). (7) 0-/kingamwili < kinga + mwili ’antibody’ < 'guard’ + ’body’ vi-lkionambali < -ona + mbali ’binoculars’ < ’see’ + ’far’ mfanyabiashara < -fanya + biashara ’business person’ < ’do’ + ’business’ Nouns can also be derived by adding a nominal suffix to a deverbative noun, and the appropriate noun class prefix. There are several nominal suffixes which denote a variety of effects such as agentive, instrumental and state, as in (8a,b) below, for Kiswahili and Gikfiyfi respectively. Note that these correlations are not absolute: (8) a. (i) instrumentals: ’doer of action; result of action. mi-lmsemo (cl.3/4) ’saying’ < -sema ’say’ ma-lneno (cl.5/6) ’word’ < -nena ’speak’ (ii) agentives wa-lnwuaji (cl.l/2) ’killer’ < -ua ’kill’ wa-lmmandishi (cl.l/2) ’writer’ < -andika ’write' (iii) of state utulivu (cl.14) ’calmness’ < -tulia ’be calm’ 25 uharibifu (cl.3(14)’destruction’ < —haribu ’destroy’ (8) b. mwako (cl.3/4) ’building’ < —aka ’build’ ci-/kiugo (cl.7/8) ’word’ < -uga ’say,speak’ 0-/mfiragani (cl.1/2)’killer’ < -firaga ’kill’ wanangi (cl.11/12) ’destruction’ < —ananga ’destroy’ Note: In bold are singular noun class prefixes; on the left are the corresponding plural prefixes (standard notation: 0 represents zero-prefix). (8b) are analogous Gikfiyfi examples. Numerical class markers are indicated. 5.1.1. Modifiers In addition, any phrase that modifies or is predicate of a noun phrase agrees with the head noun in class, person and number. Adjectives and determiners agree with the head noun and verbs agree with subject. This can be exemplified using the same examples of (4) as head nouns in the following sentences: Kiswahili metoto weake wba tatu a-ta-kuja kesho child his/hers of three he/she (fut.) come tomorrow ’his/her third child will come tomorrow’ ka-toto ka-dogo ka-ta-kuja kesho child DIM-small FUT-come tomorrow ’a small child will come tomorrow’ Gikfiyfi mw-ana weake ni-a-go-oka rficifi child his/hers-FOC—he/she-FUT-come tomorrow his/her child will come tomorrow ka-ana ka-niini ni-ga-go-oka haaha rficifi DIM-child DIM-small FOC-he/she-FUT-come here ’a little child will come here tommorow'. The examples show how a change in noun class prefix (in bold) results in a meaning change and, how the syntax of Kiswahili reflects these changes. The same behavior can be 26 seen in the examples of Gikfiyfi. They clearly show not only the morphosyntactic processes above mentioned, but also the structural closeness of the two languages. They also provide an important basis for the argumentation which I will use to justify my methodology. 5.2. Unit of Lexical Analysis (ULA) Not only are there considerable difficulties pinning down any universally applicable notion of ’word’, it appears that even when we restrict ourselves to morphological criteria within a single language we find that the term itself covers a multitude of sins, which need to be carefully distinguished.- (Spencer, 1991:45) In light of the above sketch of the nature of Bantu nominal constructions it is pertinent to pose the question: what part of the construction should be the basis of a word study, given that I'word" is so elusive a concept in Bantu? In reply to this question, I propose to name that unit I am interested in for counting purposes, a Unit of Lexical Analysis (ULA). The ULA is an unbound minimal unit, essentially the stem without prefixes in the case of verbs. It includes therefore, verb roots and stems (i.e., verb roots plus extension(s)), and modifiers in their base form. It does not include noun class markers and most pre—verb stem.morphemes such as tense and subject markers. These shall be tagged and counted separately, not as ULAs but as morphosyntactic units. This distinction does not diminish their importance, rather it is a methodological necessity for the purposes of this word study since it is the counting 27 of ULAs that is of primary importance in the project, but at some point the prefixes may also be counted. 5.2.1. Independent NOminals as ULAs To justify the above definition, let me begin by going back to the previous examples and identify the independent nominals found there: (9) mtoto, neno, mwuaji, mwandishi, kionambali, mwana, kaana, kiugo, mfiragani The nouns in (9) are independent nominals which, although they have (variable) class prefixes, may be considered lexical entries by their own right. Without those prefixes, they cannot be readily identified nor can they stand on their own. Nor can their meaning be immediately discerned without their respective affixes. That is to say that the stems and affixes are bound to each other. In any given text--whether written or spoken--they are never divorced from their prefixes. For the purposes of frequency or other counts, these are ULAs. The prefixes alone are of no significance taken out of context, or where grammar is not the object of study. Independent nouns such as.mwana, mtoto, .msemo, wanangi, utulivu, kionambali, mwanachama and so on are therefore ULA’s. Plural and singular forms, those that are the result of shift of class of the same lexical item are also ULAs, independently of their status as ’variants’, members of a lexical paradigm, for example: (10) Gikfiyfi: mwana(sg.’child’),ciana (p1.),kaana (sg.dim),twana (pl.dim.) 28 Kiswahili: mtoto (sg.’child’),watoto (p1.),katoto (sg.dim), tutoto (pl.dim) Compounds and all other derived nominals shall also be treated as ULA’s not as members of the paradigm of their original components, for example: (11) kionambali; mwanachama; mfiragani; mfirfraikfhia; All graphic words that are independent nouns (as in 11 and 12) remain untagged. Only modifiers and prefixes bear a tag as described before. This procedure will enable me to generate the lists of verb stems and nouns in alphabetical order without the difficulties that would arise if they bore a tag (which is placed at word initial). It also reduces the number of tags on the corpus. The verb and nominal ULAs will thus be identified by default. 5.2.2. modifiers as ULAs. Adjectives and interrogatives, however, will receive different treatment. In the lexicon of the language, these are not inherently bound to any one or two particular prefixes. The shape of the prefix that they take is unstable in that it changes constantly depending on the noun that they modify, unlike the independent nominals whose variation is restricted to only two--singular and plural. Hence the ULA of this category of lexemes will be the base form such as: (12) -ake, -ngapi, -refu, -dogo , etc. These will then be tagged with AS, IS for adjectival stem 29 and interrogative stem.respectively, and the prefixes that have been detached will also carry a tag D, E or Q etc. to indicate their original syntactic function i.e. as adjectival, noun class, interrogative prefixes and so on (see Tagging Code in 4.3.1). This will permit the determination of the frequency of either the prefix or stem, As mentioned above, the counting of these pre—stem.morphemes will be for the purposes of grammatical analysis in contrast to the counting of lexical stems (ULAs) for the purposes of word study. It may be of interest for instance, to find out the distribution and frequency of certain grammatical pre- stem.markers or the statistical relations between morphemes and the stems they are attached to. 5.3. Verb Constructions As for verb constructions, similar principles of categorization will be followed in extracting the ULA. As discussed in section 5.2., this is basically an effort to isolate stems. I will return to example (1) repeated here as (14) to clarify my postulation. (13) mtu anayekuandikia barua person he/she-PRES-REL(who)-OB(you)-write-APPL-FV letter ’the person who writes a letter to you’ Following the above criteria, the first thing in this sentence would be to isolate the initial and final independent nouns mtu (’person') and barua (’letter’) as a ULAs. The in mtu is a prefix which is an integral part of the whole noun and it shall thus remain attached. The 30 middle graphic word would then be analyzed into the following morphological constituents: (14) a- na—ye- ku-andik-i-a SUB-TS-REL-OB-STEM-APPL-FV The first four morphemes are markers of important grammatical information, viz subject, (present) tense, subject relative (3rd pers. sing.) and direct object. A verb root with an extension follows. These pre-stem morphemes however are bound to the verb stem in the sense that they cannot stand on their own. It is only the fifth and final sequence —andikia (= ’to write to, for’) which bears these characteristics and, the only one in this particular construction that will be isolated for the frequency counts. This stem has an applicative extension -i-, a morpheme in its own right. In the present analysis, this is considered as an integral part of one of the members of a set of lexical forms (i.e.,-andikia) having the same root and belonging to the same major word class (verb). A full inventory of such related forms of a verb root and its different extensions is termed a paradigm” In many Bantu language dictionaries (e.g Johnson 1939, Hamisi 1989 for Kiswahili and Benson 1960 for Gikfiyfi), entries are similarly grouped, ignoring the meaning changes that result from verbal extensions. Such a procedure lacks the crucial distinction between morphological and semantic relationships and, is detrimental to the usefulness of such Bantu dictionaries by non-experts, an issue I take up later 31 in this paper. The following examples illustrate this problem. (15)a. Kiswahili: -end-a ’to go’ —end-e-a ’go for, against, toward’ —end-esh-a ’cause to go; drive; have diarrhea’ -end-ele—a ’continue; ’develop’ -end—ele-z-a ’cause to develop’ -end-e—an-a ’go toward, against, for each other’ b. Gikfiyfi: -in-a ’sing' -in—ir-a ’sing for’ -in-ithi-a ’make sing' -in-ain-a ’swing to and fro’ —in-a-in-ithi-a ’cause to shake’ -in-ir—ir-a ’boo’; ’pester’ On the left side of the paradigm is the verb root (-end- for (15a) and -in- for (15h), with affixed extensions which add a specific meaning to the base form.which, in both languages, has an ending or final vowel (FV) -a. The shape of these extensions is constant (e.g. -e-, -esh-, —ir-, - thi—, etc.), but morphophonemic processes such as assindlation, vowel harmony and coalescence, and effects of syllable structure constraints and others may intervene depending on context. Furthermore, extensions may be built upon other extensions as in the last two examples of (15a), where a causative and reciprocal extension respectively, are added onto the already inflected applicative forms. While the derivational process involved in creating a paradigm is regular, semantic changes may occur making it impossible, in many cases, to treat the meanings of the derivations within a paradigm as mechanical outputs as 32 illustrated in (15a,b). However, I will not discuss the nature and effects of these semantic shifts here. This is because semantic idiosyncracies of individual forms do not affect their grammatical categories; all the forms basically remain verbs; it is only their meanings that are affected by the subsequent change in argument structure. For instance, -andika ’write’ requires two arguments: a subject and a direct object (writer and letter), whereas —andikia ’write to, for’ requires a subject, a direct object and an indirect object (writer, letter and recipient of letter). Derivation in these two languages does not affect functional categories and so, these changes of meaning will not be relevant to the counting of individual forms of each category. My primary concern in this project is to obtain lexemes ULAs that are potential dictionary entries. The study and count of morphemes which are found in the grammar of the language, such as the prefixes of (15) is secondary but also important for other purposes. I shall be concerned with stripping verb constructions of their pre-stem prefixes but leaving extensions intact. I shall illustrate this point using the following Kiswahili passage as an example: (16)...kwa siku nyingi alitaka kumwandikia rafiki yake barua .ya mapenzi. Lakini kila alipoanza kuiandika, alishindwa; haikuandikika..Aliendelea kujaribu bila kuweza hata kwenda kwake nyumbani. Siku moja, akiendesha gari lake, alimwona njiani, akamwendea na kumwomba waendelee na urafiki wao... ’...for many days s/he wanted to write to his/her friend a love letter. But every time s/he started to write it s/he could not; it could not be written (it 33 was impossible to write it). S/he continued to try, unable even to go to her/his house. Later, while driving in his/her car, s/he saw her/him in the street, went to (approached) her/him and begged her/him to continue with their friendship...’ There are many verb and nominal constructions present in the above passage. However I shall identify only those that are based on two verb-roots which I have discussed before: - andik- ’to write’; -end- ’to go’ . Nominal forms are ignored here. The relevant constructions are highlighted both in the Kiswahili text and in the English gloss, and they are isolated in (17) below. Note that this is also what the sorting program does when asked to sort graphic words: (17) a. kuandika, kumwandikia, haikuandikika b. kwenda, aliendelea, akiendesha, akamwendea, waendelee Stripped of the pre-stem.morphemes and maintaining inflections intact, the above lists yield the following verb stems: (18) a. -andika, -andikia, -andikika b. -enda, -endelea, -endesha, -endea, -endelee This is as far as the analysis goes. I consider all fonms of (19a,b) to be ULAs which shall be subjected to frequency and possibly other types of counts. According to the discussion in 5.3. these are entries or ULAs in their own right. The complete list of different forms of the same verb in (a) or (b) represents a paradigm for each one of the two verb-roots -andik— and -end-, respectively. The diverse additions or changes of meaning brought about by the suffixation are not relevant for our present purposes, and hence do not affect 34 the methodology. Their idiosyncracies will be dealt with in the dictionary where they shall be defined, giving special attention to such meaning changes. Since it is the individual members of these paradigms that shall be counted, the main task for the present is how to arrive at these forms. 5.4. The Tagging Code Following the taxonomy I have described in the above section, the parsing exercise will be completed through tagging. Below are the tagging codes, with one tag for every pre-stem.morpheme found in Kiswahili and Gikfiyfi graphic words (in the sense established in 1.1). For each language there is a total of 27 tags (see Table I.). These Tags shall be used consistently during the project to identify every information bearing grammatical affix. What is left untagged will be lexical units (ULAs), the principal parts which will be counted. A full description of each category is provided in Table II, in section 6.3.2. Only examples are given here. Kiswahili S = subject markers prefixes, e.g. u—,a-,tu ki-,i-, etc T1 = past tense marker present tense marker, -na- T3 = completive/present perfect (-me-; —ja-) T4 = conditional present -nge-) TAB = conditional past (- ngali- T5: future tense marker (-ta-,-taka-) T6 = habitual tense (hu-) 35 T7 = consecutive (-ka-) T8 = present continuous(—ki-) O = Object marker e.g --zi7ya,-i-, etc) P = noun class markers e.g. ma-,wa- etc, with phnological variants, e.g. mw-,w- etc.) L = locatives (po-, mo-, ko-, —ni) R = subject relative markers (e.g. -cho—,-zo- -o-, etc) A = adverbial prefix (ki-, e.g. Kiafrika) I = infinitival prefix ku- N = Pre-initial negative (ha-, 51- including si (=negative of ni) = interrogative -pi, wepi, nini etc. J = noun ending (-ji, e.g. mjuaji) D = adjectival concords and possessive concords (similar to P) C = conditional (-ki-) PP = preposition e.g ch-a CP - copula (ni);’si’ neg under N) R! = reflexive (-ji-) AS = adjectival stem (e.g -ake, —kubwa, etc.) 78 a numeral stem (e.g. -tatu, -tano, etc.) IS - Interrogative stem ( -pi, ~ngapi) Gikfiyfi S = subject markers (prefixes, e.g.fi-,a-,tfi-,gI-, ma-, etc) Tl past tense marker (various remote, recent, etc.) T3 completive T4 T48 T5: T5 T7 0 H r I! t‘ I, I3 2 PP CP 78 IS 36 conditional present (—ngi) conditional past tense (-ngia—) future tense marker consecutive (-ga—/-ka-) present continuous (ki-) object marker (various, e.g -o-,-ci,:ya-,-ma- etc, including reflexive i) noun class markers (see adjectival concords D, usually of same form e.g. ma-, wa-, etc, with phonological variants, e.g. w-,.mw- etc.) locatives (—ho, —ni) relative markers adverbial prefix (k£-) e.g. Giikamba;xijeremaani) =infinitival prefix (kfi-) negative Pre-initial (ha-, ti, etc.) interrogative (nfifi, r2, kfi-,including —k£ suffix) adjectival concords and possessive concords (similar to P) conditional (-ngi-) preposition (e.g ci—a) copula (ni). Note that’ti’ (negative counterpart) will be categorized as N) numeral stem (e.g. -thatfi, -thano, etc.) Interrogative stem ( -fi, -igana) 37 6. Bantu word structure and Morphological Theory It is clear that there is a specific ordering of morphemes in the structures we have looked at so far. This is not accidental. Linear ordering of pre-stem morphemes, or, prefixes in verb morphology has long been acknowledged in Bantu language studies, and has been the subject of much theoretical discussion. In many ways, Kiswahili is rather straightforward and the order of morphemes has been well studied; this is less so for Gikfiyfi; I shall return to this point later. There are two major approaches to the study of Bantu morphological structure which I will briefly discuss. 6.1. The Template Approach The first one, known as the “Slot-and—filler“ or "Template" approach (Lyons 1968, Gleason 1961), stems from the structural school of linguistics. Word formation is a process of stringing together of morphemes “like beads on a string“ (Lyons 1968:56.). According to this approach, Bantu I'word" structures consist of slots which may be filled by one of a finite list of morphemes. The graphic words we have seen then are flat structures or templates composed of constituent morphemes that are subject to certain collocational restrictions and hierarchy. The template approach has been applied to many languages, especially agglutinative ones. Given the set of coocurrence restrictions that apply, the (linear) ordering of these morphemes can be predicted. 38 Fundamentally, the "flat structure" referred to in this approach is equivalent to the "Dokean word“ where 'word' refers to a string of morphemes whose boundaries are determined by stress placement (Myers 1987). Word boundaries in a language like Kiswahili (and many other Bantu languages) which assign stress to the penultimate syllable, correspond to the limits of the graphic word I have discussed, which in the two languages under study, can be defined following similar phonological criteria. Rubanza (1988) in an MSU Ph.D dissertation arrived at similar conclusions about the phonological word after doing an exhaustive study of Haya verb morphology. He cited phonological, syntactic, semantic and pragmatic factors that determine both the ordering and predictability of morphemes. Haya is a Bantu language and in his work, Rubanza drew many parallels to both Gikfiyfi and Kiswahili morphology. The claim of predictability is motivated by the fact that “it is hard in Bantu languages (impossible in Haya) to have all (23 for Haya) morphemes in one verbal construction". He then elaborated a Morphemic Formula for the Haya Verb (MHV), a complex template which summarized all the restrictions that apply within the Dokean word. 6.2. The Generativist Approach The second approach is termed "configurational“ (Myers 1987) Ph.D dissertation which adopts the principles of the generativist school of grammar. It is configurational 39 because it involves a binary branching structure ordered in two levels: word and stem» Briefly put, Myers utilizes a version of Chomsky’s 1970 X-bar theory of natural language constituent structure that is head driven and binary branching. His position rejects the template approach discussed in the previous section. The Dokean word upon which the template approach is based is also rejected on the basis that it fails to capture certain generalizations about Bantu word structure; in this study for example, it would be quite misleading to use the Dokean word par excellence as a unit of lexical analysis because morphological boundaries between lexemes and other constituents are not easily defined phonologically. Myers’s central thesis is that I'Bantu morphology is configurational and binary branching, that the rules governing it are context—free and strictly local...the Bantu “word“ (i.e "Dokean word”) is not in fact a morphological or syntactic constituent at all, but rather a derived phonological domain i.e. a phonological wordI (Myers 1987:12). All evidence there is in Shona for the Dokean word-- such as being the domain for stress, epenthesis and Meeussen’s rule on tone lowering--is all phonological. 6.3. An Eclectic Approach The two approaches I have outlined above are divergent, reflecting fundamental philosophical and practical differences. No doubt the configurational stance is of much 40 theoretical interest for generativist grammars of Bantu languages. It seeks more to explain the deeper structure of Bantu word formation. The main critique of the template approach is, predictably, the descriptivist nature and relatively limited explanatory powers of the latter. Both schools of thought have points that are useful, though in fact the template is more attractive to me as far as the aims of this project are concerned. The so called Dokean word clearly corresponds to the graphic word I have described, although this in itself is not of major significance. For present purposes, it is more convenient to regard structures as linearly ordered, and, without regard to the internal constituency of the verb structure (binary or not), there is no questioning the ordered character of the pre-stem.morphemes (morpheme sequencing). On the surface, they are readily identifiable and isolable, a crucial first step toward my goal. In other words, description rather than explanation is the immediate goal here. Furthermore, the idea of a morphemic formula based on a study of the cooccurence restrictions of the affixes seems to hold great potential. Ideally, a template for each of the languages might be designed-~perhaps a single one that captures enough generalizations for application in both languages (and other Bantu ones). From such a template, a mathematical algorithm might be worked out allowing us to analyze every graphic word automatically. 41 The so called Dokean word clearly corresponds to the graphic word I have described, although this in itself is not of major significance. For present purposes, it is more convenient to regard structures as linearly ordered, and, without regard to the internal constituency of the verb structure (binary or not), there is no questioning the ordered character of the pre-stem morphemes (morpheme sequencing). On the surface, they are readily identifiable and isolable, a crucial first step toward my goal. In other words, description rather than explanation is the immediate goal here. Furthermore, the idea of a morphemic formula based on a study of the cooccurence restrictions of the affixes seems to hold great potential. Ideally, a template for each of the languages might be designed--perhaps a single one that captures enough generalizations for application in both languages (and other Bantu ones). From such a template, a mathematical algorithm might be worked out allowing us to analyze every graphic word automatically. For such an algorithm to succeed completely, it would have to take very many factors into consideration: phonological (including tonology, phonetic variations, vocalic transformations etc.), morphological, syntactic, semantic, pragmatic, and so on. Upon closer examination, the feasibility of successfully doing so is limited by the complexity that such a template entails. But in spite of the 42 complexity of Rubanza’s MHV, and the number of variables that come into play especially for Gikfiyfi, there are prospects for success of such a solution if all Also, the potential use of such a template remains far too attractive. Basing myself on Bantu morphological characteristics and actual data found in the project database, I attempt in the following section, to construct such a template. 6.4..A General Template of Bantu verb Structures With only a change of tense, the constituent morphemes of the verb structure in (2) can be assigned slots in a template such as follows. (19) The above is a highly generalized template showing the order of markers occurring in a given Bantu verb construction. Two concrete examples are provided in the template, first row for Kiswahili and in the second an analogous Gikfiyfi one are provided. Slots with (0) (e.g. ka.(4) and sz.(1)) are unfilled because they cannot occur within the exemplified constructions. Markers are coded as FOC=focus, SUB: subject, TS=Tense, REL=subject relative, OBJ=direct object, RT=verb root, EXTzextension, FV=final vowel and PF=postfinal morpheme. As a second step, I will describe each category, its individual members and, the cooccurrence restrictions 43 that apply between them. Two morphemes cooccur if they can occupy their respective slot within the same construction. While there are significant differences between the order and appearance of morphemes in each language, many of the common features observed might facilitate the construction of a single, bilingual template. Two basic assumptions apply: no two markers of similar or different categories can occupy the same slot at the same time. Second, a slot may remain unfilled. In view of the goals of this study that I delimited earlier, I shall limit the analysis to a certain level. I shall not delve into morphophonemics for instance. For the sake of consistency, I shall proceed by dealing separately with each one of two broad categories: pre-stem and post-stem morphemes. Due to lack of space, it is not possible to include all the relevant information and examples, and present it graphically in linear order as in (17). I have used numbers to encode the morpheme categories and show the sequencing. 44 6.4.1. Inventory and Description of Morphemes. Table I (l) FOCUS (prefix, ka. only) n1- (2) Subject Markers (SUB) (prefixes) Kilwehili: ‘-’ -u-'.-3n-p til-p 1", 11- n'lk1'071": i-I'i-lku-lp.’ ku-pm" Gikfiyfi: m", ‘-p ‘1', k1", 1-, c1-pn-( fl'p “-1 ”-5 ta’, “-1 u-) m- (3) Tense Markers (TNS) (infixes) general present: progressive -ki- past -li- future -te- perfective -ne-,-nesha consecutive -ka- habitual hu- conditional -nge-,-ngali- Gikfiyfi: C011 (3b). (3b) Gikfiyfi Tense markers (TNS): Past: -remote a+...+ire/ega -near re+...+irelege -immediate kfi+...+1te/ege Present e+...+e/1te Future: -immediate kfi/gfi+...+e -near r1+...+elege -remote ha+...+e/ega Consecutive:-ke-,-ra-,-e-,gi- Bo+...(+age) tense Ne...e tense (habitual) progressive -k1- conditional -future -ng1- -past -ngia Habitual -aga Intentional -ege Repetitive -ega (5) Object Markers (OBJ) (infixes) Kiswahili : '31-, .m-' -I-, -'I-, -tu-', - 1" -11-'-Y.-p-k1-p-v1-p-1- ' "t- lefiyfi: -n-, ndl-, -kfi-,-i-, aa- 'tap-n-p“a-p-k1"c1-pfl-p- .1-,-h--31-,-hl-p -m- 45 Table I (cont'd) (4) Relative Markers (REL) (infixes) Rilwlhili: 'Y'I '0': "Yo' I '10'0 -cho-l 'VYO' I "'°"o 'kc": -po-l ”Cho‘ Gikflyfi: An independent relative pronoun -ria is used, rather than an infix within the verb construction as in Kiswahili. A marker of the same class and form as in (3) is prefixed. (6) VIBE ROOT (RT). e.g. sz: -andik- ka: -andik- (7) Extensions (EXT) (Capital letters represent archiphonemic vowels) Kiev. Sky. stative (-Ik-) -Uk-,-era causative (-Ish/s) -Ia/Ithia passive (-Iwb) -Iw- applicative (-I-) -Ira reversive (-U-) -Ura/Uka reciprocal (-en-) -ana positional (-ens) repetitive (-eg-) -ange potential -- -Ika (8) Final Vowel (FV) (suffixes) indicative -e subjunctive: -e imperative: -e Gikfiyfi: indicative -e, -ege. subjunctive -e, -ege imperative -e (The e’s of the subjunctive are treated as FV’s) (9) Post Final (PF) sz.: locatives -po, -mo--ko- -ho I -ri, -kuo -I 46 This table summarizes the most ubiquitous components of a Kiswahili verb construction (sz.) and Gikfiyu (ka.). While I have striven to be thorough, it must be noted that the layout does not describe all specific, internal characteristics of the verb. Many morphophonemic details have been left out because they are not of direct relevance to the problem at hand, even though they may affect the surface forms. There are several other language and morpheme specific characteristics that have been left out. Tone plays an important role in the morphosyntax of Gikfiyfi, but a truly comprehensive study is beyond the scope of this paper. What is of importance is that the outline includes enough information to enable us to proceed to delimit the main cooccurrence and other restrictions that apply, the next step in our search for an adequate template. Other peculiarities of Gikfiyfi relate to tenses in cell (2b), which are more complex, marked in some cases by two discontiguous morphemes, e.g. a...ire for the remote past tense. Other tenses lack equivalents in English hence they are not given a standard grammatical category, e.g., ’na...a tense’. Dahl’s law, which voices (prefix) initial stops when the stem.consonant is unvoiced, is a highly productive rule of Gikfiyfi which should be assumed to apply in all appropriate contexts. 47 6.4.2. Order of pre-sten.norphenes In order to maintain coherence I shall analyze two types of complex verb constructions individually: a negative verb construction and an affirmative one, then draw generalizations. As before, I shall deal with pre-stem morphemes first. (20) nitfigakimfiandikirai ’so then we shall then write to him/her’ l ni tfi ga ki mfi andik fr 1 g 1 2 3 CON 5 6 7 9 Cell 4 is unrepresented because, as mentioned earlier, Gikfiyfi does not use an internal relative marker (see (19b)); cell 9 is a discourse marker (variant: ri) affixed to the verb construction after the final vowel. Its function is similar to ’the way a comma is used in English’ (Barlow, 1951:13). In cell 5 is a connective particle, ki, of Gikfiyfi. Now let us look at the negative counterpart of (20). (21) tfitigakfmfiandikfrai ’so, we will not write to him/her' ( : tfi ti ga ki mfi andik fr 1 i 2 NEG 3 CON 5 6 7 9 l 1 Observation: FOC(1) and NEG do not occur within the same verb construction. There is a allomorphic NEG marker -ta- which is used in relative clauses. (22) firia tfi-ta-ga-ki-andik-ir—a—i 48 REL (2)-NEG-(3)-CON—(6)-(7)-(8)-(9) ’the one then, whom we shall not write to’ The same restriction applies. ~ta- is simply a substitute of —ti- of the main clause NEG. Also note that when an independent relative firia is used, the object marker of cell 5 cannot occur. We can include in example (22) another marker, the reflexive 1 which simply occupies cell 5, without affecting the template any further. Examples (21) and (22) represent the maximum number of pre-stem.markers possible in a single verb construction of Gikfiyfi. Observations: The CON occupies the same slot as (3)TNS, immediately preceding (5)OBJ unless the optional (4)REL is present in which case CON fills that slot. The reflexive marker replaces (5)OBJ, so the two cannot cooccur. Consider Kiswahili (22). (23) tu—takafiye—mu-andik-i-a 2- 3- 4- 5- 6- 7-8 ’he/she who we will write to' ha-tu-ta—mu-andik-i-a NEG- 2- 3— 5- 6- 7-8 Observations: NEG does not cooccur with 4 REL; the independent relative amba- is used in such cases. Note that -taka— is a variant future tense form (c.f.-ta-) which is required by relative constructions. 49 The examples and observations of (18)-(23) provide sufficient information for me to make a few generalizations about pre-stem.morphemes of both languages. In Gikfiyfi, a maximum of 5 pre-stem markers can occur in a single affirmative or negative constructions, 4 in Kiswahili. As Barlow (1951) does for Gikfiyfi, I consider there to be a total of 7 pre-stem morphemes in the grammar of Gikfiyfi, 6 in Kiswahili. The discrepancy between the maximum possible and those that may actually occur in a single surface construction is accounted for by the cooccurrence restrictions which I attempt to summarize in the following template. Their order in an actual surface form is as they appear in the template. 6.5. A Bilingual Morpheme Template I *(NEGW 3 sf " (7) (RF) C. I < l I Following the numbering code established in Table II, pre- stem morphemes in non-relative, affirmative verb clauses appear as in Table II, underlyingly, but the following restrictions apply. The parentheses around a marker indicates that it may or may not appear on the surface structure. In slots ((4a), (5)) where there are two possible markers, both may not occur at the same time. 1. Morphemes in bold occur only in Gikfiyfi. 2. Cell 4 never occurs in Gikfiyfi; its slot may or may not be 50 filled by CON 3. 1 and NEG do not cooccur 4. It is assumed, following Schadeberg 1990, that the Kiswahili preinitial negative marker ha- has fused with the subject concords. Forms such as hu- (