## AbstractWe describe a flexible nonparametric approach to latent variable modelling in which the number of latent variables is unbounded. This approach is based on a probability distribution over equivalence classes of binary matrices with a finite number of rows, corresponding to the data points, and an unbounded number of columns, corresponding to the latent variables. Each data point can be associated with a subset of the possible latent variables, which we refer to as the latent features of that data point. The binary variables in the matrix indicate which latent feature is possessed by which data point, and there is a potentially infinite array of features. We derive the distribution over unbounded binary matrices by taking the limit of a distribution over N x K binary matrices as K -> infinity, a strategy inspired by the derivation of the Chinese restaurant process (Aldous, 1985; Pitman, 2002) which preserves exchangeability of the rows. We define a simple generative processes for this distribution which we call the Indian buffet process (IBP; Griffiths and Ghahramani, 2005, 2006). The IBP contains a single hyperparameter which controls the expected number of latent features possessed by each data point. We describe a two-parameter generalization of the IBP which has additional flexibility, independently controlling the expected number of features and their variance across data points. The use of this distribution as a prior in an infinite latent feature model is illustrated, and Markov chain Monte Carlo algorithms for inference are described.
[Edit] |