PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An Automated Combination of Kernels for Predicting Protein Subcellular Localization
Alexander Zien and Cheng Soon Ong
In: NIPS workshop on Machine Learning in Computational Biology, 7-8 Dec 2007, Whistler, BC, Canada.


Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We utilize an extension of the multiclass support vector machine (SVM) method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets, and show that we perform better than the current state of the art. Furthermore, our method provides some insights as to which features are most useful for determining subcellular localization, which are in agreement with biological reasoning.

EPrint Type:Conference or Workshop Item (Talk)
Additional Information:Web page:
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:3103
Deposited By:Alexander Zien
Deposited On:19 December 2007