## AbstractClustering can be defined as a data assignment problem where the goal is to partition the data into non-hierarchical groups of items. In our previous work, we suggested an information-theoretic criterion, based on the minimum description length (MDL) principle, for defining the goodness of a clustering of data. The basic idea behind this framework is to optimize the total code length over the data by encoding together data items belonging to the same cluster. In this setting efficient coding is possible only by exploiting underlying regularities that are common to the members of a cluster, which means that this approach produces an implicitly defined similarity metric between the data items. Formally the global code length criterion to be optimized is defined by using the intuitively appealing universal normalized maximum likelihood (NML) code which has been shown to produce optimal code lengths in the worst case sense. In this paper, we focus on the optimization aspect of the clustering problem, and study five algorithms that can be used for efficiently searching the exponentially-sized clustering space. As the suggested NML clustering criterion can be used for comparing clusterings with different number of cluster labels, the number of clusters is not known beforehand and determining it is part of the optimization process. In the empirical part of the paper we compare the performance of the suggested algorithms in the task of optimizing the NML clustering criterion using several real-world datasets.
[Edit] |