PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation
Frank Wood and Yee Whye Teh
In: AISTATS 2009, 16-18 Apr 2009, Florida, USA.

Abstract

In this paper we present a doubly hierarchical Pitman-Yor process language model. Its bottom layer of hierarchy consists of multiple hierarchical Pitman-Yor process language models, one each for some number of domains. The novel top layer of hierarchy consists of a mechanism to couple together multiple language models such that they share statistical strength. Intuitively this sharing results in the "adaptation" of a latent shared language model to each domain. We introduce a general formalism capable of describing the overall model which we call the graphical Pitman-Yor process and explain how to perform Bayesian inference in it. We present encouraging language model domain adaptation results that both illustrate the potential benefits of our new model and suggest new avenues of inquiry.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:5105
Deposited By:Yee Whye Teh
Deposited On:24 March 2009