PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Finding interesting itemsets using a probabilistic model for binary databases
Tijl De Bie
(2009) Technical Report. University of Bristol.


A good formalization of interestingness of a pattern should satisfy two criteria: it should conform well to intuition, and it should be computationally tractable to use. The focus has long been on the latter, with the development of frequent pattern mining methods. However, it is now recognized that more appropriate measures than frequency are required. In this paper we report results in this direction for itemset mining in binary databases. In particular, we introduce a probabilistic model that can be fitted efficiently to any binary database, and that has a compact and explicit representation. We then show how this model enables the formalization of an intuitive and tractable interestingness measure for itemsets, relying on concepts from information theory. Our probabilistic model is closely related to the uniform distribution over all databases that can be obtained by means of swap randomization [8]. However, in contrast to the swap randomization model, our model is explicit, which is key to its use for defining practical interestingness measures.

EPrint Type:Monograph (Technical Report)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:5934
Deposited By:Tijl De Bie
Deposited On:08 March 2010