Mining Local Correlation Patterns in Sets of Sequences
Antti Ukkonen
In: Discovery Science, 12th International Conference, 3-5 Oct 2009, Porto, Portugal.

## Abstract

Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a \textit{local correlation pattern}. In practice we only want to find patterns that are longer than $\gamma$ and appear in at least $\sigma$ sequences. Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur. We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.

EPrint Type: Conference or Workshop Item (Paper) Project Keyword UNSPECIFIED Theory & Algorithms 6207 Antti Ukkonen 08 March 2010