Semantic role labeling of implicit arguments for nominal predicates
Natural language is routinely used to express the occurrence of an event and existence of entities that participate in the event. The entities involved are not haphazardly related to the event; rather, they play specific roles in the event and relate to each other in systematic ways with respect to the event. This basic semantic scaffolding permits construction of the rich event descriptions encountered in spoken and written language. Semantic role labeling (SRL) is a method of automatically identifying events, their participants, and the existing relations within textual expressions of language. Traditionally, SRL research has focused on the analysis of verbs due to their strong connection with event descriptions. In contrast, this dissertation focuses on emerging topics in noun-based (or nominal) SRL.One key difference between verbal and nominal SRL is that nominal event descriptions often lack participating entities in the words that immediately surround the predicate (i.e., the word denoting an event). Participants (or arguments) found at longer distances in the text are referred to as implicit. Implicit arguments are relatively uncommon for verbal predicates, which typically require their arguments to appear in the immediate vicinity. In contrast, implicit arguments are quite common for nominal predicates. Previous research has not systematically investigated implicit argumentation, whether for verbal or nominal predicates. This dissertation shows that implicit argumentation presents a significant challenge to nominal SRL systems: after introducing implicit argumentation into the evaluation, the state-of-the-art nominal SRL system presented in this dissertation suffers a performance degradation of more than 8%.Motivated by these observations, this dissertation focuses specifically on implicit argumentation in nominal SRL. Experiments in this dissertation show that the aforementioned performance degradation can be reduced by a discriminative classifier capable of filtering out nominals whose arguments are implicit. The approach improves performance substantially for many frequent predicates - an encouraging result, but one that leaves much to be desired. In particular, the filter-based nominal SRL system makes no attempt to identify implicit arguments, despite the fact that they exist in nearly all textual discourses.As a first step toward the goal of identifying implicit arguments, this dissertation presents a manually annotated corpus in which nominal predicates have been linked to implicit arguments within the containing documents. This corpus has a number of unique properties that distinguish it from preexisting resources, of which few address implicit arguments directly. Analysis of this corpus shows that implicit arguments are frequent and often occur within a few sentences of the nominal predicate.Using the implicit argument corpus, this dissertation develops and evaluates a novel model capable of recovering implicit arguments. The model relies on a variety of information sources that have not been used in prior SRL research. The relative importance of these information sources is assessed and particularly troubling error types are discussed. This model is an important step forward because it unifies work on traditional verbal and nominal SRL systems. The model extracts semantic structures that cannot be recovered by applying the systems independently.Building on the implicit argument model, this dissertation then develops a preliminary joint model of implicit arguments. The joint model is motivated by the fact that semantic arguments do not exist independently of each other. The presence of a particular argument can promote or inhibit the presence of another. Argument dependency is modeled by using the TextRunner information extraction system to gather general purpose knowledge from millions of Internet webpages. Results for the joint model are mixed; however, a number of interesting insights are drawn from the study.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Gerber, Matthew Steven
- Thesis Advisors
-
Chai, Joyce
- Committee Members
-
Jin, Rong
Jain, Anil
Hale, John
- Date Published
-
2011
- Subjects
-
Grammar, Comparative and general--Verb phrase
Natural language processing (Computer science)
- Program of Study
-
Computer Science
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- viii, 155 pages
- ISBN
-
9781124792361
1124792368
- Permalink
- https://doi.org/doi:10.25335/fgn1-fk54