PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Positional Oligomer Importance Matrices
Alexander Zien, Sören Sonnenburg, Petra Philips and Gunnar Raetsch
In: NIPS workshop on Machine Learning in Computational Biology, 7-8 Dec 2007, Whistler, BC, Canada.

Abstract

At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k−mer based scoring systems.

EPrint Type:Conference or Workshop Item (Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
ID Code:3105
Deposited By:Alexander Zien
Deposited On:19 December 2007