PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An information-theoretic approach to finding informative noisy tiles in binary databases
Kleanthis Kontonasios and Tijl De Bie
In: SIAM SDM 2010, 29 April- 1 May, Colombus, Ohio, USA.

Abstract

The task of finding informative recurring patterns in data has been central to data mining research since the introduction of the task of frequent itemset mining in [1,2,14]. In these seminal papers, the informativeness of a recurring itemset in a binary database was formalized by its support in the database. However, it is now widely recognized that an itemset's support is not the best measure of its informativeness. Furthermore, recent work has highlighted that the support of an itemset is highly suspectible to noise, such that it may be appropriate to search for items that recur approximately. In this paper, we present a new measure of informativeness for noisy itemsets in binary databases within the formalism of tiles [6]. We demonstrate the benefits of our new measure by means of experiments on artificial and real-life data, allowing for objective and subjective evaluation.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:5647
Deposited By:Kleanthis-Nikolaos Kontonasios
Deposited On:08 March 2010