Separating Structure from Interestingness
In: PAKDD 2004, 26-28 May 2004, Sydney, Australia.
Condensed representations of pattern collections have been recognized
to be important building blocks of inductive databases, a promising
theoretical framework for data mining, and recently they have been
studied actively. However, there has not been much research on how
condensed representations should actually be represented.
In this paper we propose a general approach to build condensed
representations of pattern collections. The approach is based on
separating the structure of the pattern collection from the
interestingness values of the patterns. We study also the concrete
case of representing the frequent sets and their (approximate)
frequencies following this approach: we discuss the trade-offs in
representing the frequent sets by the maximal frequent sets, the
minimal infrequent sets and their combinations, and investigate the
problem approximating the frequencies from samples by giving new upper
bounds on sample complexity based on frequent closed sets and
describing how convex optimization can be used to improve and score
the obtained samples.