PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Linlin Li, Benjamin Roth and Caroline Sporleder
In: ACL 2010, 11-16 July 2010, Uppsala, Sweden.

Abstract

This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outperform state-of-the-art systems either quantitatively or statistically significantly.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:7112
Deposited By:Caroline Sporleder
Deposited On:04 March 2011