PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Separating Structure from Interestingness
Taneli Mielikäinen
In: PAKDD 2004, 26-28 May 2004, Sydney, Australia.


Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented. In this paper we propose a general approach to build condensed representations of pattern collections. The approach is based on separating the structure of the pattern collection from the interestingness values of the patterns. We study also the concrete case of representing the frequent sets and their (approximate) frequencies following this approach: we discuss the trade-offs in representing the frequent sets by the maximal frequent sets, the minimal infrequent sets and their combinations, and investigate the problem approximating the frequencies from samples by giving new upper bounds on sample complexity based on frequent closed sets and describing how convex optimization can be used to improve and score the obtained samples.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:153
Deposited By:Taneli Mielikäinen
Deposited On:23 November 2004