Sampling Table Configurations for the Hierarchical Poisson-Dirichlet Process
Changyou Chen, L Du and Wray Buntine
In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Athens Greece(2011).
Hierarchical modeling and reasoning are fundamental in ma- chine intelligence, and for this the two-parameter Poisson-Dirichlet Pro- cess (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incremental sampling based on the Chinese restaurant metaphor, which originates from the Chinese restaurant process (CRP). In this paper, with the same metaphor, we propose a new table repre- sentation for the hierarchical PDPs by introducing an auxiliary latent variable, called table indicator, to record which customers take responsi- bility for starting a new table. In this way, the new representation allows full exchangeability that is an essential condition for a correct Gibbs sam- pler. Based on this representation, we develop a block Gibbs sampling algorithm, which can jointly sample the data item and its table contri- bution. We test this out on the hierarchical Dirichlet process variant of latent Dirichlet allocation (HDP-LDA) developed by Teh, Jordan, Beal and Blei. Experimental results show that the proposed algorithm outper- forms their “posterior sampling by direct assignment” algorithm in both out-of-sample perplexity and convergence speed. The representation can be used with many other hierarchical PDP models.