An information-theoretic approach to finding informative noisy tiles in binary databases
The task of finding informative recurring patterns in data has been central to data mining research since the introduction of the task of frequent itemset mining in [1,2,14]. In these seminal papers, the informativeness of a recurring itemset in a binary database was formalized by its support in the database. However, it is now widely recognized that an itemset's support is not the best measure of its informativeness. Furthermore, recent work has highlighted that the support of an itemset is highly suspectible to noise, such that it may be appropriate to search for items that recur approximately. In this paper, we present a new measure of informativeness for noisy itemsets in binary databases within the formalism of tiles . We demonstrate the benefits of our new measure by means of experiments on artificial and real-life data, allowing for objective and subjective evaluation.