Efficient induction of probabilistic word classes with LDA
In: IJCNLP 2011, Chiang Mai, Thailand(2011).
Word classes automatically induced from
distributional evidence have proved useful many NLP tasks including Named Entity Recognition, parsing and sentence retrieval. The Brown hard clustering algo-
rithm is commonly used in this scenario.
Here we propose to use Latent Dirichlet
Allocation in order to induce soft, probabilistic word classes. We compare our
approach against Brown in terms of efficiency. We also compare the usefulness of the induced Brown and LDA word
classes for the semi-supervised learning of
three NLP tasks: fine-grained Named Entity Recognition, Morphological Analysis
and semantic Relation Classification. We
show that using LDA for word class induction scales better with the number of
classes than the Brown algorithm and the
resulting classes outperform Brown on the