PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Applied Textual Entailment
Oren Glickman
(2006) PhD thesis, Bar Ilan University.


This thesis introduces the applied notion of textual entailment as a generic empirical task that captures major semantic inferences across many applications. Textual entailment addresses semantic inference as a direct mapping between language expressions and abstracts the common semantic inferences as needed for text based Natural Language Processing applications. We define the task and describe the creation of a benchmark dataset for textual entailment along with proposed evaluation measures. This dataset was the basis for the PASCAL Recognising Textual Entailment (RTE) Challenge. We further describe how textual entailment can be approximated and modeled at the lexical level and propose a lexical reference subtask and a correspondingly derived dataset. The thesis further proposes a general probabilistic setting that casts the applied notion of textual entailment in probabilistic terms. We suggest that the proposed setting may provide a unifying framework for modeling uncertain semantic inferences from texts. In addition, we describe two lexical models demonstrating the applicability of the probabilistic setting. Although our proposed models are relatively simple, as they do not rely on syntactic or other deeper analysis, they nevertheless achieved competitive results on the pascal rte challenge. Finally, the thesis presents a novel acquisition algorithm to identify lexical entailment relations from a single corpus focusing on the extraction of verb paraphrases. Most previous approaches detect individual paraphrase instances within a pair (or iii set) of comparable corpora, each of them containing roughly the same information, and rely on the given substantial level of correspondence of such corpora. We present a novel method that successfully detects isolated paraphrase instances within a single corpus without relying on any a-priori structure and information. Our instance based approach seems to address some of the drawbacks of distributional similarity based methods, in particular by providing a consistent scoring scale across different words.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Thesis (PhD)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:2761
Deposited By:Oren Glickman
Deposited On:22 November 2006