Unsupervised and Supervised Exploitation of Semantic Domains in Lexical Disambiguation
Alfio Gliozzo, Carlo Strapparava and Ido Dagan
Computer Speech and Language
Domains are common areas of human discussion, such as economics, politics, law,
science etc., which demonstrate lexical coherence. This paper explores the dual role
of domains in word sense disambiguation (WSD). On one hand, domain information
provides generalized features at the paradigmatic level that are useful to discriminate
among many word senses. On the other hand, domain distinctions constitute a
useful level of coarse grained sense distinctions, which lends itself to more accurate
disambiguation with lower amounts of knowledge.
In this paper we extend and ground the modeling of domains and the exploitation
of WordNet Domains, an extension of WordNet in which each synset is labeled
with domain information.We propose a novel unsupervised probabilistic method for
the critical step of estimating domain relevance for contexts, and suggest utilizing
it within unsupervised Domain Driven Disambiguation (DDD) for word senses, as
well as within a traditional supervised approach.
The paper presents empirical assessments of the potential utilization of domains
in WSD at a wide range of comparative settings, both supervised and unsupervised.
Following the dual role of domains we report experiments that evaluate both the
extent to which domain information provide effective features for WSD, as well
as the accuracy obtained by WSD at domain-level sense granularity. Furthermore,
we demonstrate the potential for either avoiding or minimizing manual annotation
thanks to the generalized level of information provided by domains.