Applied Textual Entailment
PhD thesis, Bar Ilan University.
This thesis introduces the applied notion of textual entailment as a generic empirical
task that captures major semantic inferences across many applications. Textual
entailment addresses semantic inference as a direct mapping between language expressions
and abstracts the common semantic inferences as needed for text based Natural
Language Processing applications. We define the task and describe the creation of a
benchmark dataset for textual entailment along with proposed evaluation measures.
This dataset was the basis for the PASCAL Recognising Textual Entailment (RTE)
Challenge. We further describe how textual entailment can be approximated and
modeled at the lexical level and propose a lexical reference subtask and a correspondingly
The thesis further proposes a general probabilistic setting that casts the applied
notion of textual entailment in probabilistic terms. We suggest that the proposed
setting may provide a unifying framework for modeling uncertain semantic inferences
from texts. In addition, we describe two lexical models demonstrating the applicability
of the probabilistic setting. Although our proposed models are relatively simple,
as they do not rely on syntactic or other deeper analysis, they nevertheless achieved
competitive results on the pascal rte challenge.
Finally, the thesis presents a novel acquisition algorithm to identify lexical entailment
relations from a single corpus focusing on the extraction of verb paraphrases.
Most previous approaches detect individual paraphrase instances within a pair (or
set) of comparable corpora, each of them containing roughly the same information,
and rely on the given substantial level of correspondence of such corpora. We present
a novel method that successfully detects isolated paraphrase instances within a single
corpus without relying on any a-priori structure and information. Our instance based
approach seems to address some of the drawbacks of distributional similarity based
methods, in particular by providing a consistent scoring scale across different words.