PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Mining probabilistic automata: a statistical view of sequential pattern mining
Stéphanie Jacquemont, François Jacquenet and Marc Sebban
Machine Learning Volume 75, Number 1, pp. 91-127, 2009. ISSN 0885-6125

Abstract

During the past decade, sequential pattern mining has been the core of numerous research efforts. It is now possible to efficiently extract knowledge of users’ behavior from a huge set of sequences collected over time. This has applications in various domains such as purchases in supermarkets, Web site visits, etc. However, sequence mining algorithms do little to control the risks of extracting false discoveries or overlooking true knowledge. In this paper, the theoretical conditions to achieve a relevant sequence mining process are examined. Then, the article offers a statistical view of sequence mining which has the following advantages: First, it uses a compact and generalized representation of the original sequences in the form of a probabilistic automaton. Second, it integrates statistical constraints to guarantee the extraction of significant patterns. Finally, it provides an interesting solution in a privacy preserving context in order to respect individuals’ information. An application in car flow modeling is presented, showing the ability of our algorithm (acsm) to discover frequent routes without any private information. Comparisons with a classical sequence mining algorithm (spam) are made, showing the effectiveness of our approach.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:4399
Deposited By:Stéphanie Jacquemont
Deposited On:13 March 2009