PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Association rule interestingness: measure and statistical validation
Olivier Teytaud, Stéphane Lallich and Elie Prudhomme
In: Quality Measures in Data Mining (2006) Springer , pp. 251-275.

Abstract

Summary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the se- lected support and condence thresholds can easily be extracted by algorithms based on support and condence, such as Apriori. However, they may produce a large num- ber of rules, many of them are uninteresting. One has to resolve a two-tier problem: choosing the measures best suited to the problem at hand, then validating the inter- esting rules against the selected measures. First, the usual measures suggested in the literature will be reviewed and criteria to appreciate the qualities of these measures will be proposed. Statistical validation of the most interesting rules requests per- forming a large number of tests. Thus, controlling for false discoveries (type I errors) is of prime importance. An original bootstrap-based validation method is proposed which controls, for a given level, the number of false discoveries. The interest of this method for the selection of interesting association rules will be illustrated by several examples.

PDF (~/APIRE/lal.pdf) - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:3198
Deposited By:Olivier Teytaud
Deposited On:20 January 2008