A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
Part-of-speech induction has long been a cen- tral challenge in computational linguistics, and despite at least two decades of research there has been little progress. Our work aims to address this problem by bringing to- gether several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, which provides an elegant and princi- pled means of incorporating lexical features. Central to our approach is a new sampling algorithm which enforces a one tag per word type constraint, which we show mixes very quickly and produces high quality output. We show in empirical evaluation that our model consistently out-performs the current state-of- the-art across 14 different languages.