PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Feature Vector Quality and Distributional Similarity
Maayan Geffet and Ido Dagan
In Proceedings of Coling-04 Conference 2004.

Abstract

We suggest a new goal and evaluation criterion for word similarity measures. The new criterion - meaning-entailing substitutability - fits the needs of semantic-oriented NLP applications and can be evaluated directly (independent of an application) at a good level of human agreement. Motivated by this semantic criterion we analyze the empirical quality of distributional word feature vectors and its impact on word similarity results, proposing an objective measure for evaluating feature vector quality. Finally, a novel feature weighting and se-lection function is presented, which yields superior feature vectors and better word similarity perform-ance.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
ID Code:350
Deposited By:Maayan Geffet
Deposited On:16 December 2004