Lexical Entailment and its Extraction from Wikipedia
Masters thesis, Bar Ilan University.
This work investigates the lexical entailment relation and develops a Wikipedia-based resource of lexical entailment rules. Lexical entailment is a semantic relation that holds between lexical elements when the meaning of one element can
be inferred from the meaning of the other. This relation can be represented as a rule in which the left hand side (LHS) entails the right hand side (RHS), denoted by LHS --> RHS. This relation is very useful in many Natural Language Processing
(NLP) applications. For instance, a Question Answering system may use the lexical entailment rule cruiser --> ship in order to find the answer to the question “What is the name of the Russian ship that sunk at Port Arthur?” in the sentence
“Russian cruiser Pallada sunk at Port Arthur”. Even though NLP systems apply lexical semantic inference, there is no common definition for this relation and no resource was built as dedicated to include lexical entailment rules.
We suggest deriving the missing definition from the framework of Textual Entailment, a recent paradigm for semantic inference. We present two definitions which bridge the gap between textual entailment and lexical entailment. These
definitions help understanding the intended meaning of the lexical entailment relation and deciding which rules should be included in a lexical entailment rule base. We utilize these definitions to investigate the utility of several state-of-the-art
lexical semantic resources as potential sources for lexical entailment rules. In that process we reveal the strengths and weaknesses of these resources, with respect to the task of recognizing lexical entailment, and the different types of the
lexical entailment relations that each resource covers.
A second contribution of this work is the development of a large-scale lexical entailment rule base, the first one designed to contain lexical entailment rules, which was extracted from Wikipedia. We present extraction methods geared to
cover the broad range of the lexical entailment relation and evaluate them under this target criterion. We conduct both an internal evaluation, by comparing results
to human judgments, and an external evaluation, by using our rule base within a real NLP task. By filtering rules, according to the type of method which extracted them, one can choose different recall-precision trade-offs varying from a
precision of 0.87 for almost 2 million rules up to 8 million rules with a precision of 0.66. On the text categorization evaluation our resource performs better than previous automatically-created resources, and performs comparably to WordNet,
a lexicon in which relations between terms were manually crafted by experts, even though the rules in our resource were automatically extracted from texts written for human consumption.