|
Association rule interestingness:
measure and statistical validation
AbstractSummary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the se- lected support and condence thresholds can easily be extracted by algorithms based on support and condence, such as Apriori. However, they may produce a large num- ber of rules, many of them are uninteresting. One has to resolve a two-tier problem: choosing the measures best suited to the problem at hand, then validating the inter- esting rules against the selected measures. First, the usual measures suggested in the literature will be reviewed and criteria to appreciate the qualities of these measures will be proposed. Statistical validation of the most interesting rules requests per- forming a large number of tests. Thus, controlling for false discoveries (type I errors) is of prime importance. An original bootstrap-based validation method is proposed which controls, for a given level, the number of false discoveries. The interest of this method for the selection of interesting association rules will be illustrated by several examples.
[Edit] |