PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
Phil Blunsom and Trevor Cohn
In: ACL 2011, 19-24 Jun 2011, Portland, OR, USA.

Abstract

Part-of-speech induction has long been a cen- tral challenge in computational linguistics, and despite at least two decades of research there has been little progress. Our work aims to address this problem by bringing to- gether several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, which provides an elegant and princi- pled means of incorporating lexical features. Central to our approach is a new sampling algorithm which enforces a one tag per word type constraint, which we show mixes very quickly and produces high quality output. We show in empirical evaluation that our model consistently out-performs the current state-of- the-art across 14 different languages.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:8135
Deposited By:Trevor Cohn
Deposited On:29 April 2011