Generalization in Clustering with Unobserved Features
Eyal Krupka and Naftali Tishby
Advances in Neural Information Processing Systems (NIPS)
We argue that when objects are characterized by many attributes, clustering
them on the basis of a relatively small random subset of these
attributes can capture information on the unobserved attributes as well.
Moreover, we show that under mild technical conditions, clustering the
objects on the basis of such a random subset performs almost as well as
clustering with the full attribute set. We prove a finite sample generalization
theorems for this novel learning scheme that extends analogous
results from the supervised learning setting. The scheme is demonstrated
for collaborative filtering of users with movies rating as attributes.