Constraint-based subspace clustering
Elisa Fromont, Céline Robardet and Adriana Prado
In: SIAM DM 2009, 30 Apr - 02 May 2009, Sparks, Nevada, USA.
In high dimensional data, the general performance of traditional
clustering algorithms decreases. This is partly because
the similarity criterion used by these algorithms becomes
inadequate in high dimensional space. Another reason is
that some dimensions are likely to be irrelevant or contain
noisy data, thus hiding a possible clustering. To overcome
these problems, subspace clustering techniques, which can
automatically find clusters in relevant subsets of dimensions,
have been developed. However, due to the huge number of
subspaces to consider, these techniques often lack efficiency.
In this paper we propose to extend the framework of bottomup
subspace clustering algorithms by integrating background
knowledge and, in particular, instance-level constraints to
speed up the enumeration of subspaces. We show how this
new framework can be applied to both density and distancebased
bottom-up subspace clustering techniques. Our experiments
on real datasets show that instance-level constraints
cannot only increase the efficiency of the clustering process
but also the accuracy of the resultant clustering.