Ch ................................ ................................ ................................ .................. ................................ ................................ ................................ ..................... ................................ ................................ ................................ .... ................................ ................................ ................................ ............... ................................ ................................ ................................ ..... ................................ ................................ ..... ................................ ................................ ............... ................................ ................... ................................ ................................ ....................... ................................ ............................... ................................ . ................................ .. ................................ ............ .............................. ................................ ................................ ................................ ......... ................................ ................................ ................................ ....... ................................ ................................ ....................... ................................ ................................ ................................ ..... ................................ ................................ ................................ ............ ................................ ................................ ....................... ................................ ................................ ................................ ............... ................................ ................................ ........................ ................................ ................................ ....................... ................................ ................................ ................................ .. ................................ ................................ ................................ ............ ................................ ................................ ................................ .... ................................ ................................ ................................ .... ................................ ................................ ................... ................................ ............... ................................ ................................ ................................ ......................... ................................ ................................ ................................ ................... ................................ ................................ ................. ................................ ................................ ............... ................................ ................................ .................... ................................ ................................ .................. ................................ ................................ ................................ ................... ................................ ................................ ................................ ................ ................................ ................................ ................................ ................. ................................ ................................ ................................ ................. ................................ ................................ ................................ .............................. ................................ ................................ ................................ ..................... ................................ ................................ .......... ................................ ......... ................................ ........................... ................................ ................................ ... ................................ ................................ ................................ .............. ................................ ................................ ...... ................................ ................................ . ................................ ............... ................................ ................................ ..... ................................ .............................. ................................ ................................ ............ ................................ ................................ ........ ................................ ................................ ............. ................................ ................................ ................ ................................ ................................ . ................................ .............................. ................................ ................................ ................................ .......................... . ................................ ................................ .. ................................ ............................... ................................ ................................ ................................ ................................ .......... ................................ ................................ ................................ ................................ ....... .......... .......... ................................ .... .......... ...................... ................................ .. ................................ .. .............................. ................................ ..... ............................... ................................ ......... ................................ .............. ............................. ......................... ....................... ................................ ................................ ... ................................ ................. 1 2 2.1 T he English article system poses a unique challenge to lear ners because of cross - linguistic differences of the article system ( e.g., ; Section 2 .1.1) and the difficulty in teaching and learning the use of the intricate English article system ( e.g., Dulay, Burt, & Krashen, 1982 ; Master, 1994; Section 2 .1.2) . M orpheme order studies have quantitatively informed us of the relative difficulty of the article acquisition, in comparison with other morphemes (e.g., Section 1.1.3) . In what follows, I discuss each of these aspects. 2.1.1 2.1.2 2.1.3 2.2 2.2.1 et al. (2013) speculate that this is be 2.2.2 2.3 2.3.1 2.3.2 Th e notion of Definiteness is central to the use of English articles . A common scheme for subsequently revised by Huebner ( 1983, 1985 ) and is still widely used in corpus - based learner language research (e.g., Butler, 2002; Crosthwaite, 2016; Diez - Bedmar & Papp, 2008; Diez - Bedmar, 2015; Leroux & Kendall, 2018 ) . This scheme categorizes NPs into f our semantic contexts based on the presence and ±) and Specific Referent (SR ±) . Figure 1 .1 shows the graphical representation of this categorization scheme. 1. 2. a. b. c. d. 3. 4. a. b. c. The first two categories are both [+HK], meaning that the entity is assumed known to the hearer . The former (Category 1) is a known entity without specific referent (e.g., A cat likes mice), and the latter (Category 2) is a known entity with a specific referent (e.g., Pass me the pen ) . The other two categories are both [ - H K], meaning that the entity is assumed unknown to the hearer . The former (Category 3) is an unknown entity with a specific referent, such as first mention (e.g., I saw a strange man ), and the latter (Category 4) is an unknown entity without a specific refe rent 4. [ - SR] [ - HK] nonreferentials 3. [+SR] [ - HR] nonreferential definites 2. [+SR] [+HK] referential definites 1. [ - SR] [+HK] generics (e.g., He used to be a lawyer ) . All these examples were adopted from Butler (2002, pp. 478 - 479); for a more complete set of examples, see Butler (2002). However, even after the idiomatic expressions and conventional uses were added as the fifth category by Thomas (1989), this 2 × 2 (+ 1) categorization scheme is not adequate, as it does not capture the full variability of the notion of definiteness . That is to say, d ifferences within each of the five types of definiteness should also be taken int o a ccount; for example, in Figure 2 .1, within the single category 2 [+SR, +HK], four types of examples are given . Lumping all these four types within category 2 would be problematic, if learners have varying degrees of problems among these four types. This issue seems to be addressed in a more fine - grained coding scheme for communicative functions of definiteness (CFD ; Bhatia, Simons et al., 2014 ; subsequently modified in Bhatia, Lin et al., 2014 a ) . This coding scheme takes a hierarchical structure, as show n in Figure 2 .2. o o o o o o o o o o o o o o o o o o o o As shown in Figure 2 .2, CFD categorizes various types of definiteness based on its tree structure (examples of each category are given in Appendix A) . The highest order distinction categorizes all noun phrases (NPs) into the following three intermediate nodes: Nonanaphora, Anaphora, and Miscellaneous . Nonanaphora refers to entities that are discourse - new, and it further ramifies into Unique, Nonunique, and Generic. Unique refers to uniquely identifiable entities such as Bara c k Obama, whereas Nonunique refers to unidentifiable entities . Generic refers to the entire genre rather than an individual case . Anaphora refers to entities that are previously mentioned or evoked in t he discourse, and it further ramifies into Basic and Extended Anaphora . The former refers to the entities that have been mentioned in the discourse, whereas the latter refers to the entities that have not been directly mentioned, but evoked by indirect all usion . Miscellaneous refers to all other kinds of entities that do not fit into either Nonanaphora or Anaphora , such as a part of an idiomatic expression (e.g., in fact ). Even though , to my knowledge, this coding scheme has never been used in SLA research, it was deemed more informative and useful than the traditional semantic wheel because CFD overcomes the abovementioned problem that the semantic wheel lumps together different kinds of definiteness in to one category. For example, in semantic wheel, knowledge are knowledge, such as Nonunique_Physical_Copresence and Extended_Anaphora. The former is when t he referred entity is known to the hearer because it is physically present at the moment of the speech, and the latter is when the referred entity is known to the hearer because it has been indirectly ev oked in the previous discourse. In addition to the variables central to the use of English articles that have been discussed in this section thus far, other factors, such as syntactic modification, are also reported to affect the use of English articles (e.g., Lee, 1999). I will now turn to those factors. 2.3.3 2.3.4 2.4 1. 2. 3. 3 3.1 3.1.1 . The Two Corpora Used in the Study 1 2 3.1.2 Target tokens (i.e., articles) were extracted from the abovementioned tw o corpora (EFCAMDAT and LOCNESS) in the following ways . For learner language, relevant essay topics, L1s, and proficiency levels were selected from EFCAMDAT and downloaded as an XML file . Initially, 5 L1s (English, Japanese, Chinese, Russian, and Korean) were to be included in the analyses; however, to ensure enough number of occu r rence s in each L1 group for subsequent statistical analyses , Russian and Korean were excl uded from data ann o tat i o n . 1 https://corpus.mml.cam.ac.uk/efcamdat2/public_html/ 2 https://uclouvain.be/en/research - institutes/ilc/cecl/locness.html For essay topics, EFCAMDAT has the total of 126 essay topics across 16 proficiency levels . As essay prompts are reported to affect the accuracy of certain article forms (Crosthwaite, 2016), a care was taken to ensure the compara bility between the two corpora. Because the essays written by English native speakers in LOCNESS are argumentative essays, personal topics were excluded from EFCAMDAT . More specifically, once the list of 126 essay prompts across 16 levels and 6 proficiency groups (A1 C2) was extracted, each of them was examined carefully based on (a) word count requirement, (b) writing format (e.g., email, letter, list , etc. ), and (c) nature of the prompt . For (a), because essays in LOCNESS are 500 words or more, essays with fewer word counts were removed . However, because the longest word count requirement was 150 180 words in EFCAMDAT, an arbitrary cutoff point of at least 100 words was made . For (b), writing format that affects the discourse level variable was remove d; for example, prompts that elicited bullet points were excluded . For (c), topics that are either non - argumentative or personal were excluded . A letter to a friend, email to a teacher, formal apology, and apartment lease are among the examples of the topi cs excluded from this study. For the final list of topics, see Appendix A. As a result, levels A1 B1 and C2 were excluded because these levels did not have enough NPs for each L1 group once the exclusion criteria (a) (c) were applied . For example, A1 and A2 did not meet the criteria because every single essay prompt in these levels were too short (either 20 - 40 or 50 - For NP extraction, due to the dif ference in file format, different steps were required for EFCAMDAT and LOCNESS . For EFCAMDAT, the most straightforward way was to extract all determiners and the NPs they precede . However, this approach cannot extract NPs with zero articles . Hence, this study took a backward approach all NPs and the preceding determiners were extracted, and irrelevant ones were identified and subsequently removed . This was done through Python syntax . Concretely, nouns that are either (a) preceded by quantifiers (e.g., som e people , any reason ), (b) preceded by demonstratives (e.g., this man , these people ), ( c ) preceded by possessives (e.g., his car), or (d) functioning as a noun modifier (e.g., credit card , bank account ) were all removed . Nouns that are irrelevant to the ch oice of determiners (e.g., something, anything ) were also removed . These process es removed approximately a third of the NPs . After the extraction and removal processes, all the NPs were exported into an Excel sheet for the s ubsequent annotation . Each column in the Excel sheet corresponded to each variable that is described in the following section. For L OCNESS, due to its file format (i.e., .txt) , TagAnt (Anthony, 2016) was employed for automatic part of speech (POS) tagging . After the POS - tagged text file was generated, all NPs that are tagged as either NN, NNS, NNP, or NNPS were extracted and exported into an Excel sheet for further annotation . Because the automatic removal processes described above were not applicable to the native speak er data due to its file format, irrelevant cases of NPs were manually removed one by one . The exclusion criteria were the same as the ones used for the learner data . The descriptive statistics for the extracted tokens are presented in Table 3 .2 , with a bre akdown of how many definite (DA), indefinite (IA), and zero articles (ZA) were used in each of the first language groups. . Descriptive Statistics for the Annotated Tokens of Articles L1 # Essays Word Counts (per essay) # Tokens DA (%) ZA (%) IA (%) English 15 4992 (332.8) 833 245 (29%) 479 (58%) 109 (13%) Chinese 25 4822 (192.88) 795 221 (28%) 480 (60%) 94 (12%) Japanese 24 4603 (191.79) 833 290 (35%) 471 (57%) 72 (9%) Total 64 14417 (225.27) 2461 756 (31%) 1430 (58%) 275 (11%) 3.1.3 3 Overview of the Variables Used in Annotation Types of variables Variables Number of Levels Semantics N OUN C OUNT 2 N OUN A NIMACY 12 N OUN T YPE 2 D EFINITENESS 8 V ERB T YPE 5 Morphological F ORM 3 N UMBER 2 Syntactic M ODIFICATION 4 C ASE 4 Data L1 3 P ROFICIENCY 4 ID 32 3 I was the sole annotator of this annotation process, and I acknowledge that the untested interrater reliability is one of the limitations of this study. In Table 3 belongs . For example, variables labeled as semantics pertain to the meaning of the target NP . how many levels each of the variables has . In what follows, I present a deta iled description of each of the following 12 v ariables in the order in Table 3 .3 . The first variable N OUN C OUNT is presented in Table 3 .4. . The N OUN C OUNT Variable and its Levels Type of variables Variable Name L evels Semantics N OUN C OUNT countable uncountable The variable N OUN C OUNT was annotated in the following way . First, all NPs that are tagged as nouns can be pluralized . can be either singular countable nouns or mass noun, this distinction was manually annotated . In this manual annotation process, whenever the countability was unclear, an online English dictionary 4 was used as a reference . Because the same noun can be countable or uncountable depending on the meaning it conveys in a particular context, the closest meaning was identified in the dictionary, and the corresponding countability was annotated in the data . Examples below show that the NP life in (1) is countable because it refers to a particular course of life, whereas the one in (2) is uncountable because it means general human existence. (1) Today , having knowledge of how the computer operates is considered a necessary component of leading a succe s sful life ( ICLE - US - MICH - 00 2.1). 4 Longman Dictionary of Contemporar y English Online ( https://www.ldoceonline.com/ ) was used. (2) Cars , telephones , and nuclear energy are just three examples of inventions and discoveries that have had profound effects on modern day life ( I CLE - US - MICH - 0035 .1) . The variable N OUN A NIMACY is presented in Table 3 .5 . . The N OUN A NIMACY Variable and its Levels Type of variables Variable Name L evels Semantics N OUN A NIMACY non - human human natnl/group/socrole other abstract dynamic ling eff/state mental/emotional natural entity place/time social - conv This variable N OUN A NIMACY was adopted from Des hors (2016 ) . The original variable had 23 levels; however, it was eventually conflated i nto 12 levels . For examples of each of these levels, see Appendix A . The conflation process is presented in Table 3 .6. N OUN A NIMACY As show in Table 3 .6, the original variable with 23 levels was conflated into 12 levels ( Deshors, 2016, p. 143). Examples of each of the N OUN A NIMACY types are presented in Appendix A. For more details on the statistical and conceptual validity of this conflation, see Deshor s (2016). The variable N OUN T YPE is presented in Table 3 .7 . . The N OUN T YPE Variable and its Levels Type of variables Variable Name L evels Semantics N OUN T YPE common proper This variable N OUN T YPE was annotated automatically, based on the part - of - speech tag provided by EFCAMDAT . Because proper nouns are tagged as either NNP (singular) or NNPS (plural), and common nous as NN (singular) or NNS (plural), the first two were automatically annotated as proper nouns, and the other two as common nouns . For example, t he NP America in (3) was annotated as proper , and humanness as common. (3) In America , this growing individualistic society, one no longer sees the realitive humanness between people (ICLE - US - MICH - 0005.1). The variable D EFINITENESS is presented in Table 3 .8 . . The D EFINITENESS Variable and its Levels Type of variables Variable Name L evels Semantics D EFINITENESS Unique Hearer Old (uniq_hear_old) Unique Hearer New (uniq_hear_new) Non - Unique Hearer Old (nonuni_hear_old) Non - Unique Hearer New (nonuni_hear_new) Non - Unique Non - Specific (nonuni_nonspe) Generic (generic) Basic Anaphora (bas_anaph) Extended Anaphora (ext_anaph) Miscellaneous (misc) This variable D EFINITENESS originally had 24 levels, but it was conflated into 9 levels for the ease of annotation . Examples for each of the 9 levels will require more than just a sentence as this discourse - level variable is suprasententially defined; for a simpler list of examples for each of the D EFINITENESS levels, see Appendix A . For the annotation, Table 3 .9 summarizes the conflation process of the variable D EFINITENESS . . The Conflation Process of the Variable D EFINITENESS Original Levels Conflated Levels Unique_Physical_Copresence Unique Hearer Old (uniq_hear_old) Unique_Larger_Situation Unique_Predicative_Identity Unique_Hearer_New Unique Hearer New (uniq_hear_new) NonUnique_Physical_Copresence Non - Unique Hearer Old (nonuni_hear_old) NonUnique_Larger_Situation NonUnique_Predicative_Identity NonUnique_Hearer_New_Spec Non - Unique Hearer New (nonuni_hear_new) NonUnique_NonSpec Non - Unique Non - Specific (nonuni_nonspe) Generic_Kind_Level Generic (generic) Generic_Individual_Level Same_Head Basic Anaphora (bas_anaph) Different_Head Bridging_Nominal Extended Anaphora (ext_anaph) Bridging_Event Bridging_Restrictive_Modifier Bridging_Subtype_Instance Bridging_Other_Context Pleonastic Miscellaneous (misc) Quantified Predicative_Equative_Role Part_Of_Noncompositional_MWE Measure_Nonreferential Other_Nonreferential Originally, the variable D EFINITENESS had 25 levels, and they were conflated into nine levels as shown in Table 3 .9 , for the ease and accuracy of annotation. The conflation was mainly based on the original hierarchical structure proposed in Bhatia , Simons et al. (20 14 ), which is shown in Figure 3 .1. The nine underlined, boldfaced levels were kept after conflation. o o o o o o o o o o o o o o o o o o o o In addition to the hierarchical structure shown in Figure 3 .1, I also took into account the and Specific Referent [SR±] during the conflation process. For example, because the [HK±] and [SR±] distinctions were present within Nonanaphora , this distinction was retained in the conflation process. Consequently, from the level Nonanaphora in Figure 3 .1, the following six levels were retained: Unique_Hearer_Old, Unique_Hearer_New, Nonunique_Hearer_Old, Nonunique_Hearer_New, Nonunique_Nonspecific, and Generic . F or Anaphora , because it is important to make a distinction between the anaphoric NPs that have actually been mentioned and the ones that have only been evoked by entities mentioned before, Basic_Anaphora and Extended_Anaphora were retained. Lastly, other types that fall under miscellaneous were conflated into one level Miscellaneous because they are not explicable by anaphoricity, [±HK], or [±SR]. The variable V ERB T YPE is presen ted in Table 3 .10 . . The V ERB T YPE Variable and its Levels Type of variables Variable Name L evels Semantics V ERB T YPE stative activity achievement accomplishment n/a Th is variable V ERB T YPE is based on the taxonomy developed by Vendler (1957) . For the nouns that were either subject (nominative c ase) or object (accusative c ase) of a verb, lexical aspect of the verb was annotated . For nouns that did not receive any syntactic c ase, its V ERB T YPE was annotated as n/a. The variable M ODIFICATION is presented in Table 3 .11 . . The M ODIFICATION Variable and its Levels Type of variables Variable Name L evels Syntactic M ODIFICATION Pre - modific ation with adjective (premod_a ) Pre - m odification with noun (premod_n ) Post - modification with prepositional phrase (postmod_p ) Post - modification with relative clause (postmod_rc) Post - modification with infinitival clause (postmod_ic) Post - modification with complement clause (postmod_cc) Originally, these six levels M ODIFICATION . However, because noun phrases can have multiple modifications (e.g., a big house in the city ), it was not ideal to annotate this variable with a six - level multinomial (i.e., single - label) variable . M ODIFICATION - label variable, it was separated into six variables, each of which was then treated as a two - level binary variable . The underlined NP in example (4 ) was annot ate d as premod_a and premod_n, (5 ) as postmod_p and postmod_ic, (6) as postmod_cc, and (7 ) as posmod_rc. (4) As individuals we are constantly surrounded by racist and discriminative media language (ICLE - US - MICH - 0004.1). (5) This sudden burst of useful compounds not only improved the chances of a patient s survival in a hospital but also caused a great need for medical chemists to study and classify each new drug as it was discovered (ICLE - US - MICH - 0015.1) (6) We Chinese have a saying that men at their birth are natura lly good (EFCAMDAT - writing - id - 556256). (7) An invention of the 20th century which I feel has significantly changed people s lives is the introduction of Bank - cash machines or Automatic teller machines (ICLE - US - MICH - 0044.1). The variable F ORM is presented in Ta ble 3 .12 . . The F ORM Variable and its Levels Type of variables Variable Name L evels Morphological F ORM DA IA ZA The variable F ORM is the choice of the article made on each case . DA represents for definite zero article . For example, the NP illustration in (8) was annotated as IA, work as DA, and computer as ZA. (8) A vivid illustration o f this can be found by examining the work . Recently , an auto - pasts [ sic ] company put all of their inventory on computer (ICLE - US - MICH - 0002.1). The variable N UMBER is presented in Table 3 .13 . a . The N UMBER Variable and its Levels Type of variables Variable Name L evels Syntactic N UMBER singular plural This variable N UMBER was annotated automatically based on the POS - tagging . NN and NNS were annotated as singular, and NNP and NNPS as plural . For example, in (9), the NP saying was annotated as singular, and generations as plural. (9) Money is the root of all evil is an ancient saying -- but its truth applies to all generations (ICLE - US - IND - 0015.1). The variable C ASE is pr esented in Table 3 .14 . Table . The C ASE Variable and its Levels Type of variables Variable Name L evels Syntactic C ASE Acc usative with preposition (acc_p ) Accusative with verb (acc_v) Nominative (nom) neither C ASE (10) (11) . The L1 Variable and its Levels Type of variables Variable Name L evels Data L1 E nglish J apanese C hinese As has already been presented as des criptive statistics in Section 3 .1.2, 833 occurrences of articles from LOCNESS were annotated as English, 833 from EFCAMDAT as Japanese, and the remaining 795 from EFCAMDAT as Chinese. The variable P ROFICIENCY is presented in Table 3 .16 . Table . The P ROFICIENCY Variable and its Levels Type of variables Variable Name levels Data P ROFICIENCY B 2 C1 The variable P ROFICIENCY was only applicable to NNS data from EFCAMDAT . Based on the conversion chart between the proficiency level measures in EFCAMDAT and that of CEFR ( 3.2 3.2.1 As has been briefly mentioned earlier in this paper, MuPDAR (Gries & Deshors, 2014) is a regression - based methodological protocol - nativelikeness of the use of a certain linguistic structure . Conceptually, it predicts what an NS would do in a given linguistic context that an NNS is in , and this given linguistic context is operationalized through a set o f relevant linguistic features . Methodologically, MuPDAR consists of roughly four steps : (1) train a logistic regression model (R 1 ) based on NS data, (2) if the fit of R 1 is good, apply R 1 to NNS data to make pred ictions and obtain the probability distribution of the target linguistic form (i.e., what an NS would do at what probability in a given situation that an NNS is in), (3) based on the difference between the prediction made in (2) and the actual NNS data (i.e., what an NNS actually did ), and (4) create a regression model (R 2 ) to predict the deviation of NNS calculated in (3). Gries and Deshors (2014), in their analysis of NNS usage of modals may and can , explain that (3) can be done in two different ways . The first approach is to calculate the deviation categorically ; that is to say, whenever the predicted NS choice and actual NNS choice do not match, foreign ), whereas it is mar native ) when they match . The second appr o ach is to calculate the deviation quantitatively . In this approach , a vector Dev (as in deviation ) is created, and a numeric value is attached to each case of the target linguistic form . Whenever the actual NNS choice and the predicted NS choice match , the numeric value is set to 0 (no deviation) . Whenever the choices do not match, the numeric value is set to p 0.5 , where p stands for the predicted probability of NS choice made by R 1 ( for the co mplete explanation of the original MuPDAR approach, see Gries & Deshors, 2014). The second approach, or the quantitative one, is more commonly used because of the level of granularity it allows for (e.g., Lester, 2019). When applying MuPDAR to a multinomial classification, a crucial difference between binomial and multinomial classification has to be noted. In binomial classification, o ne deviation vector suffices because the probability of one class automatically determines the p robability of the other class. For example, when the probability of may is 40%, then the probability of can is automatically 60% . However, this does not hold true in a multinomial classification like the one in the present study, because the probability of one class does n ot determine the probability of each of the remaining classes. For example, when the probability of DA is 40%, it only tells us that the sum of the probabilities of IA and ZA is 60%, but it does not tell us what the probabilities of IA and ZA are, respecti vely. T herefore , a modification has to be made to accommodate the number of classes of the res ponse variable . Two possible alternative approaches and their pros and cons were considered. 3.2.1.1 The first approach is similar to the categorical approach to MuPDAR explained above . It consists of four (almost) identical steps: (1) train a multinomial logistic regression model (R 1 ) based on NS data (2) if the fit of R 1 is good, apply R 1 to NNS data to make predictions and obtain the probability distribution of the a rticle choice (i.e., what an NS would do at what probability in a given si tuation that an NNS is in) (3) create a vector that categorically represents whether or not the actual choice matches the NS prediction made in (2), and (4) create a binary logistic regression model (R 2 ) to predict the deviation of NNS calculated in (3). The biggest advantage of this approach is that the final step (4) can be taken in the exact same way as the original MuPDAR because of the categorical n ature of the deviation vector. On the other hand, this approach has (at least) two shortcomings : it cannot q uantify the kinds of deviation . The former is inherent to the categorical approach, whereas the latter stems fro m the nature of the multinomial classification . That is to say, a dichotomous categorization of deviation (i.e., match vs . mismatch) will not tell us what an NNS chose and what an NS would choose in the same situation, because it only tells us if the responses w ere the same or not . For example, an NNS choosing zero article when an NS chooses definite article and an NNS choosing definite article when an NS chooses indefinite article are two very different scenarios (with probably two very different reasons for the 3.2.1.2 Approach 2 addresses the first shortcoming; namely, the inability to quantify the deviation . In this Approach 2 , step s (1) and (2) follow Approach 1, and steps (3) and (4) are different: (1) train a multinomial logistic regression model (R 1 ) based on NS data (2) if the fit of R 1 is good, apply R 1 to NNS data to make predictions and obtain the probability distribution of the article choice (i.e., what an NS would do at what probabili ty in a given situation that an NNS is in) (3) NS prediction made in (2), and (4) create a multiple regression model (R 2 ) to predict the deviation score calculated in (3). In s tep (3), instead of calculating the deviation dichotomously, a numeric value will be assigned to each instance of article use . If the NS and NNS choices were in alignment, a numeric value of 0 will be assigned (i.e., no deviation) . If the NS and NNS choice s diverge, then the d eviation will be quantified as how small the probability of NS making the same choice as NNS was . For example, in a given linguistic context X, NNS chose indefinite article, whereas NS had the probability distribution (as predicted by R 1 ) of ( DA, IA, ZA) = (0.7, 0.1, 0.2 ) . In this case, the probability of an NS making the same article choice as an NNS (i.e., indefinite article), is 0.1 . However, this value is counterintuitive because it is to be interpreted as the smaller the value is, the larger the deviation is . To make this value more intuitively interpretable, the deviat ion will be operationalized as 0.5 p . Originally, the deviation was defined as p 0.5 (Gries & Deshors, 2014); however, Lester (2019) flipped the equation to make the numeric value more intuitively interpretable . In this example, the deviation is 0.5 - 0. 1 = 0.4 . The reasoning behind this operation is that, when an NS and NNS choices do not match, the theoretical minimum of the predicted probability of an NS making the same choice as an NNS is 0 (i.e., maximum deviation), whereas the theoretical maximum is < 0.5 (i.e., minimum deviation) becau se the choices must have matched if the predicted probability exceeds 0.5 regardless of the probability distribution of the other two articles . Therefore, by subtracting this value from 0.5, the deviation value will fall under the range 0 dev < 0.5. Given the capacity to quantify learn er deviation, Approach 2 was adopted. In what follows, I introduce a more detailed description of each step of Approach 2 with a particular focus on the software and packages used. 3.2.1.3 Software and Packages A statistical software R Studio was used to run each of the four steps in Approach 2 . Summarized below are the specifics of Approach 2 and the packages used in each step. (1) A multinomial logistic regression model was built using a function multinom() in the R package nnet , with the choice of determiner as a three - level categorical response variable and all other variables as categorical predictor variables . Following Lester (2019), cross - validation was conducted to ensure the generalizability of this classification accuracy . A commonly used five - fold cross - validation was employed in this study; in other words, 20% of the NS data were labeled as the test set and the other 80% the training set, and this data splitting took place five times, with a unique 20% assigned to the test set for each round of data spl itting . Due to lack of available packages, I implemented the five - fold cross - validation code . (2) The prediction s of R 1 on NNS data were obtained through predict() function in the R package nnet . (3) For the calculation of the deviation score, I wrote a code that followed the abovementioned calculation method. (4) T he mixed - effects multiple regression model R 2 was built with lmer() function in the R package l me4 . 3.2.2 The whole idea of MuPDAR and its powerfulness is based on the assumption that R 1 (the regression model trained on NS data) will predict a native - like judgment when applied to NNS data . This is precisely what enables us to compare what an NNS actually did in a given linguistic context and what an NS would do in the exact same linguistic context (as repres ented by a vector of variables) . This assumption is reasonable as long as R 1 fits the NS data well (as measured in the goodness - of - fit and classification accuracy); however, this assumption has never been, to my knowledge, tested empirically perhaps due to the lack of data that enable th e empirical validation of such an assumption . That is to say, this validation necessitates data such that an NS choice and an NNS choice of English articles can be directly compared in the exact same linguistic context, and one type of data that meet this requirement is an essay written by NNS and corrected by NS. In EFCAMDAT , one of the corpora used in the present study, learner essays are provided with profe ssional feedback on grammatical and lexical errors by language teachers . According to recruiting information by Education First, 5 the teachers are all English native speakers with the minimum of 40 hours of training in TEFL . Therefore, the validation of th e assumption is possible by comparing the prediction made by R 1 and the actual error correction (or non - correction) made by an NS. Of the 1628 tokens of articles in 49 learner essays extracted from EFCAMDAT, 34 tokens had error corrections across 13 essays . However, because EFCAMDAT does not claim that error corrections have been provided to all a substantial portion of scripts Huang et al., 2017 , p. 7, emphasis added), it is dangerous to assume that the othe r 35 essays, which had no error correction on article, were reviewed by NSs and judged to be completely error - free . Therefore, the only way to distinguish articles that were judged to be correct by an NS from the ones that were simply not reviewed by an NS is to restrict the scope of this analysis to the essays that have at least one error correction . It is reasonable to assume that all the articles with no error correction in an essay that has at least one error correction elsewhere within it were judged b y an NS to be correct . Consequently, 13 essays with at least one error correction were deemed appropriate for the assumption validation . These 13 essays included 34 error - corrected tokens and 428 error - free tokens of article , the following two versions of data were created out of these 462 tokens of article : 1. O riginal NNS production of 462 tokens (428 error - free tokens + 34 tokens before error - correction) 5 http://www.englishtown.com/teachonline/ 2. E rror - corrected NNS production of 462 tokens (428 error - free tokens + 34 tokens after error - correction) If the assumption of MuPDAR is correct, then the predicted NS choice (by R 1 ) should align closer to the error - corrected NNS production than to the original NNS production. Following this procedure, R 1 was applied to both versions 1 and 2 of the 462 tokens, and the classification accuracy of the version 2 was higher ( 73% ) than the version 1 (70%), although not statistically significant ( z = 0.87, p = .38) . Given that the model R 1 predicts Japanese data and Chinese data at a good accuracy of 70 % and 71%, respectively, this is not a cogent piece of evidence to validate the assumption of R 1 - like judgment on NNS data. This will be further discussed in the limitation section. 4 4.1 4.1.1 Overall, the model fit of the multinomial logistic regression model R 1 was excellent 6 (C= .96, well beyond the threshold level C = .8 as proposed by Gries & Deshors, 2014; as well as C = .93 in Lester, 2019; R 2 McFadden = .68) . The overall classification accuracy, which is the number of correct predictions divided by the total number of predictions, was 89 % . This means that the model trained on NS data was able to predict with the accuracy of 89 % which one of the three article op tions (i.e., ZA, IA, DA) a native speaker would use in a given linguistic context represented by a vector of variables. Its result of five - fold cross - validation showed that the mean accuracy score was 84 % ( SD = 2%) . Give n the little decrease in the accuracy score, this model is reasonably generalizable to different datasets. Initially, learner ID was to be included as a random effect in R 1 ; however, because no R packages allow the inclusion of random effects in a multinomial logistic regression model as far as I know, it was first included as a fixed effect . The classification accuracy was higher with learner ID (90%), but the generalizability decreased substantially, as we can see in the mean accuracy score of fiv e - fold cross - validation (77%) . Therefore, for the first regression model R 1 , a decision was made to not include the variable learner ID . Also, at this stage of MuPDAR, AIC was not considered because the purpose of R 1 is not to construct a parsimonious mode l; rather, it solely aims at making the most accurate prediction on NNS data that approximates what an NS would do. 6 C statistics was defined as the area under the receiver operating characteristic (ROC) curve. Because the fit of R 1 was shown to be good, the NS model R 1 was applied to NNS data . As has already been explained in the methodology sectio n, NS prediction based on R 1 was made on each of the cases in NNS data, and Tables 4 .1 and 4 .2 show the confusion matrices for the actual article choice by Chinese and Japanese learners of English and the NS prediction by R 1 , respectively. Confusion Matrix of R 1 prediction on NNS data (L1 = Chinese) NS choice predicted by R 1 DA IA ZA Total Actual NNS choice (L1 = Chinese) DA 153 32 36 221 IA 23 64 7 94 ZA 76 54 350 480 Total 252 150 393 795 Confusion Matrix of R 1 prediction on NNS data (L1 = Japanese) NS choice predicted by R 1 DA IA ZA Total Actual NNS choice (L1 = Japanese) DA 213 29 48 290 IA 12 56 4 72 ZA 91 66 314 471 Total 316 151 366 833 In Tables 4 .1 and 4 .2, each of the rows correspond with the actual NNS choices, whereas each of the columns correspond with the predicted NS choices . The boldfaced, underlined figures indicate the match between the two; namely, the number of occurrences of articles, in which the actual NNS choice and the predicted NS choice were the same . Overall, the proportion of that matched with the predicted NS choice ( 71% ) was not significantly higher than the proportion of that matched with the predicted NS choice ( 70% ; z = 0.59, p = .56) . 4.1.2 Approach 2 was adopted first in the calculation of deviation score and the construction of the final regression model (R 2 ) because it allows for a more fine - grained quantitative analysis of the deviation . Following the procedure described in the methodology section, the deviation score . A generalized linear mixed - effect model was built with a function glmer() in the R package lme4 . All the independent variables included in R 1 were entered as predictors, and the variable F ORM was also included in R 2 , as we would like to see how three typ es of article uniformly or differentially affect the learner deviation, and how they interact with other predictors . Also, all of them were allowed to interact with the variable L1, as their effects on the deviation are expected to differ based on the lear . Consequently, the model included main effects, two way interactions with F ORM ( F ORM : everything), two way interactions with L1 (L1 : everything), and three - way interactions with both F ORM and L1 ( F ORM : L1 : everything) . To avoid ove rparameterization, AIC and BIC scores were calculated for the models with (1) only main effect, (2) main effect and two - way F ORM interaction, (3) main effect and two - way L1 interaction, and (4) main effect, two - way F ORM interaction, two - way L1 interaction, and three - way F ORM : L1 interaction . This model selection process is summarized in Table 4 .3. Model Selection of R 2 Note . In Table 4 .3, AIC and BIC represent numerical measures of the model fit and model parsimony, which penalize the inclusion of additional terms . AIC is more useful for detecting type II error (false negatives), whereas BIC is more sensitive to type I error (false positives) . Based on the table, Model 2 has the smallest AIC and BIC, and Model 4 has a slightly higher R 2 C . The contrast between the models with smaller AIC and BIC (Model 2 and Model 4) and the ones with larger AIC and BIC (Model 1 and Model 3) is most likely due to the inc lusion of interaction terms with the variable F ORM . The slightly higher R 2 C of Model 4 is not surprising, given that Model 4 is Model 2 + three - way interaction ( F ORM : L1 : everything) . The AIC and BIC for Model 4 are lower than those of Model 2; however, given the conceptual importance of investigating how different L1 speakers receive different influences of other variables, Model 4 was selected as the initial model of R 2 . The model was highly significant ( F (192, 1435) = 35.46, p < .001) without any sign of multicollinearity (all VIFs < 2). 7 Because the inclusion of all of these categorical variables involves the generation of dummy variables, a reference level had to be set . A reference level is interpreted as the baseline level, to which all other level s will be compared to . The reference levels of the 12 variables are presented in Table 4 .4. 7 term and its constituent variables are often highly correlated, this multicollinearity does not pose pr oblems for the Reference Level for Each of the Categorical Independent Variables In table 4 .4, f or the variables that are linguistic ally more marked , the least marked level was set as the reference level (e.g., no prenominal adjectival modification is less marked than modified ones) . For the variables for which linguistic markedness was difficult to define , the most frequent level was set as the reference level, following Gries and Deshors (2014 ) . In addition to these fixed - effect indepe ndent variables, the variable ID was also included in this model as a random effect. Significant predictors of the regression model are summarized in Table 4 .5. . Significant Predictors of Deviation Score In Table 4 .5, only the statistically significant effects with no significant higher interaction effects were included . For example, even though it was significant, the main effect for was not included in the table because its main effect was overridden by the significant interaction term . This interaction was not included either because it was overridden by the highest order interaction term , which w as highly significant. In this sense, it is this highest order interaction (three - way interaction) that is particularly noteworthy in Table 4 .5. That is to say, the three highest order inter actions at the bottom of Table 4 .5 indicate that each of these thr ee independent variables (i.e., and differentially affected the accuracy of article use, and that such differential effects further varied for Chinese and Japanese learners of English. This will be presented graphically later in this section. Each of the F - statistics in Table 4 .5 can be interpreted as the amount of change in the model fit when the full model is compared against another model without one variable of interest . For example, the row L1 : represents how much improvement the inclusion of the interaction term L1: would make in terms of the model fit, when all other terms are already included in the model . Considering that all the independent variables are categorical variables, thi s is in principle identical to factorial ANOVA with type - III sum of squares . In the following sections, the significant predictors will be further investigated graphically and statistically. For each graph, the dots (or other shapes, such as triangles and squares) represent the means of the deviation score at a given level of the variable of interest . Because all other variables not on the graph are held constant, the differences between such means represent the marginal effects of the variables of interest . Error bars represent 95% confidence intervals . The value on y - axes is always the deviation score, whereas the value on x - axes and legends are the predictor variables that construct the interaction term of interest (as no main effects are analyzed here, e very graph will have at least two predictor variables). For the statistical analyses, because everything to everything comparison will lead to an - level adjustment, every level of the predictor variable was only compared against the reference level, which has already been discussed earlier . This means that for each predictor variable with k levels, ( k 1) tests were conducted . - level was adjusted accordingly based on Bonferroni correction, a simple and conservative t ype of adjustment . number of tests ( k - 1) . This calculation is a simplification of a more complex formula, but it can be used satisfactorily in most cases (Walters, 2016) . For t wo - way interaction terms, comparisons were conducted based on the second predictor, within a given level of the first predictor . For example, when analyzing interactions in Figure 4 .1, comparisons were conducted based on the second predictor (i.e., ; countable vs . uncountable) within a given level of the first predictor (i.e., ); which is, say, DA . Because the second predictor only has two levels and only one comparison - level will not be adjusted for this particular example. 4.1.2.1 Marginal Effect of the Interaction Term Figure 4 .2 shows that, regardless of the noun countability, both Chinese and Japanese learners had problems using DA accurately, and this inaccuracy of DA was more obvious with uncountable nouns . This difference becomes much more pronounced for the use of IA; learners had great difficulty using IA with uncountable nouns, whereas their use of IA with countable nouns was almost native - like . Most strikingly, the relative ease of countable nouns does not hold true for ZA; that is to say, the use of ZA with uncounta ble nouns was more nativelike, whereas the use of ZA with countable nouns was more deviant. Figure 4 .3 shows the marginal effect of the interaction term 4.1.2.2 5 5.1 (i.e., and ˛¨'Ü0·(¾ 5.2 Throughout the post - hoc graphical analyses of each significant effect, it appeared that the use of IA, in general, was associated with lower deviation scores (i.e., more nativelikeness) . However, it is important to note that this does not mean Chinese and Japanese learners of English have a good understanding of the distribution of IA; rather, it seems like the low deviation score of IA is attributed to its overly restricted use by NNS . With a closer look at the confusion matrices of actual NNS article choice and predicted NS article choice presented earlier, it becomes clear that NS (as predicted by R 1 ) is far more likely to use IA (19%) than Chinese (12%) . This pattern was also true for the Japanese data, in which NS prediction of IA (18%) was much more frequent than Japanese (9%) . It, then, follows naturally that the seeming nativelikeness of IA use by NNS was due to the high precision of IA use (Chinese: 68%, Japanese: 78%), and the overall infrequency of IA use resulted in a low recall of IA use (Chinese: 43%, Japanese: 37%) . This difference in the precision score and recall score of IA use is particularly large for Japanese . In other words, the problem lies in the fact that the marginal effect of (and its interactions with other independent variables) on the deviation score only considers the non - nativelikeness of IA that was actually used by NNS, and does not consider the non - nat ivelikeness due to the non - use of IA in an obligatory context (as predicted by R 1 ). One possible fix for this problem is to define as what the NS prediction by R 1 is, instead of what the actual NNS choice is . By doing so, we can observe how the devia tion score is differentially affected by other variables in each of the three obligatory contexts for DA, IA, and ZA . However, this is merely an ad - hoc fix for the problem described above, and is not effective when learners overuse a certain target form (f or the exact same reason for which TLU was proposed in place for SOC) . This problem of defining the variable is unique to a multinomial MuPDAR like the present study. Another aspect of this study that warrants further investigation is the calculation of the deviation score . Conceptually, it is reasonable to operationalize the deviation score as p 0.5, where p stands for the probability of an NS not choosing the articl e chosen by an NNS . However, as has alrea dy been pointed out in Section 3 , the deviation score defined in this way does not tell us whether the deviation is due to an overproduction or an underproduction of one level of the target structure . The deviation score for binary linguistic structures in original MuPDAR (Gries & Deshors, 2014) contains more information, because it ranges from - 0.5 to 0.5, with the absolute value and ± sign indicating the magnitude of deviation and the direction of the deviation (un derproduction vs. overproduction) , respectively . Therefore, a way to operationalize the deviation score in a multinomial setting, such that the direction of the deviation is also included in the score, would advance the instrumental convenience of multinomial MuPDAR. As to the validation of the assumption that R 1 in fact makes a nativelike judgment on NNS data, the results remained inconclusive . This is mainly due to the small number of corrections on the article errors available in the data used i n this study . As has alr eady been mentioned in Section 3 , the number of occurrences of articles used for this validation was 461, of which 34 were error - corrected and 418 were error - free . Because the validation relies on the difference between R 1 ication accuracy on the 461 occurrences before error - correction and the same 461 occurrences after error - correction, the number of occurrences of articles that were actually error - corrected has to be big enough for the two classification accuracies to be d ifferent enough to validate the assumption . One way to address this problem is to selectively extract essays with a large number of error corrections on articles from EFCAMDAT. 6 The present study is the first to (i) apply MuPDAR to a multinomial target structure and to (ii) take a multifactorial approach to the investigation of the article use by NNS . Conceptually, the results showed relative importance of each of the relevant semantic, syntactic, and morphological factors governing th e use of English articles . Methodologically, the first attempt to extend MuPDAR to a multinomial linguistic structure was not without problems, but it would potentially open up MuPDAR to a wider range of linguistic phenomena, as it is no longer restricted to the ones that have binary choices. Table A. 1 . List of Topics Extracted from EFCAMDAT Level Unit Title Topic 10 1 Extreme activities Helping a friend find a job 10 2 Gender differences Doing a survey about discrimination 10 3 The cost of living Requesting a bank loan 10 4 Health and fitness Applying to be a fitness trainer 10 5 Lifestyles Finding a home for a wealthy client 10 6 Telling stories Describing a terrifying experience 10 7 Presenting information Presenting trends 10 8 Competition and cooperation Giving feedback about a colleague 11 1 Talking about films Writing a movie review 11 2 Fears and phobias Helping a coworker deal with a phobia 11 3 Technology Writing an advertising blurb 11 4 Beliefs and convictions Writing up survey findings 11 5 Career paths Reviewing a self - help book 11 6 Computers and the Internet Setting rules for social networking 11 7 Law and order Dealing with a breach of contract 11 8 Listening skills Improving your study skills 12 1 Manners and etiquette Turning down an invitation 12 2 Books and stories Entering a writing competition 12 3 Mysterious phenomena Buying a painting for a friend 12 4 Corporate culture Writing a report on staff satisfaction 12 5 World English Proofreading an article 12 6 Leadership qualities Attending a leadership course 12 7 Soft skills Conducting a performance appraisal 12 8 Awkward situations Writing an apology note 13 1 Politics Writing a campaign speech 13 2 Home design Renting out a room 13 3 Market research Comparing two demographic groups 13 4 Fair trade Giving advice about budgeting 13 5 Contributing to society Writing about a disaster relief effort 13 6 Art and design Writing a brochure for a museum 13 7 Mother nature Making an educational product for kids 13 8 Reaching your potential Reaching your potential 14 1 Advertising Writing advertising copy 14 2 The environment Choosing a renewable energy source 14 3 Good and bad news Writing a rejection letter 14 4 Health and well - being Attending a seminar on stress reduction 14 5 Taking a risk Talking a friend out of a risky action 14 6 Education and training Applying for sponsorship 14 7 Making a speech Writing a wedding toast 14 8 Jokes and humor Delivering a punch line 15 1 In the news Covering a news story 15 2 Communication Hosting a group of foreign buyers 15 3 The power of the mind Writing an article about NLP techniques 15 4 The entertainment industry Making a movie 15 5 E - commerce Comparing two online retailers 15 6 Urban issues Writing an article about a superstore 15 7 Quality of life Writing about future lifestyles 15 8 Meaning and symbols Interpreting a prophecy 16 1 Science and technology Attending a robotics conference 16 2 National identity Writing about a symbol of your country 16 3 Tough choices Following a code of ethics 16 4 Fame and fortune Criticizing a celebrity 16 5 Creative thinking Using creative writing techniques 16 6 Financial planning Applying for a home loan 16 7 Dealing with stress Writing a visualization script 16 8 Doing research Researching a legendary creature N OUN A NIMACY D EFINITENESS ID Tag levels Examples Conflated Tag Levels Unique_Physical_Copresence John here is an investment banker. Unique Hearer Old (uniq_hear_old) Unique_Larger_Situation In the days since Hillary Clinton unburdened herself in an Unique_Predicative_Identity Clark Kent is Superman . Unique_Hearer_New A restaurant chain named Unique Hearer New (uniq_hear_new) NonUnique_Physical_Copresence The podium is too high. Non - Unique Hearer Old (nonuni_hear_old) NonUnique_Larger_Situation The chair (at a conference) / today NonUnique_Predicative_Identity He is the manager . NonUnique_Hearer_New_Spec I am looking for a nurse . Her name is Sara. Non - Unique Hearer New (nonuni_hear_new) NonUnique_NonSpec I am looking for a nurse [any nurse would do]. Non - Unique Non - Specific (nonuni_nonspe) Generic_Kind_Level Dinosaurs are extinct. Generic (generic) Generic_Individual_Level Cats have fur. Same_Head a true story . Basic Anaphora (bas_anaph) Different_Head I adopted a cat this weekend. The animal is so cute. Bridging_Nominal Ilooked at an apartment yesterday. The kitchen was really large. Extended Anaphora (ext_anaph) Bridging_Event got married this weekend. The bride looked beautiful. Bridging_Restrictive_Modifier the house next door / daughter Bridging_Subtype_Instance I collect coins. I have a 1943 steel penny . Bridging_Other_Context I want to focus on what many of you have said you would like me to elaborate on . What can you do about the climate crisis? Pleonastic It is raining. Miscellaneous (misc) Quantified All the people / no motorcade Predicative_Equative_Role a teacher . / This is an opportunity. Part_Of_Noncompositional_MWE the bucket today. Measure_Nonreferential Hours later / miles away Other_Nonreferential Global warming/ concern/ the topic of energy 8 8 The excerpts were re - organized into bullet points, and quotation marks were not used for readability. 9 9 The excerpts were re - organized into bullet points, and quotation marks were not use d for readability.