PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

An efficient alternative to SVM based recursive feature elimination with applications in natural language processing and bioinformatics
Justin Bedo, Conrad Sanderson and Adam Kowalczyk
In: AI 2006: Advances in Artificial Intelligence Lecture Notes in Computer Science , 4304/2006 . (2006) Springer Berlin / Heidelberg , pp. 170-180. ISBN 978-3-540-49787-5


The SVM based Recursive Feature Elimination (RFE-SVM) algorithm is a popular technique for feature selection, used in natural language processing and bioinformatics. Recently it was demonstrated that a small regularisation constant C can considerably improve the performance of RFE-SVM on microarray datasets. In this paper we show that further improvements are possible if the explicitly computable limit C -> 0 is used. We prove that in this limit most forms of SVM and ridge regression classifiers scaled by the factor 1/C converge to a centroid classifier. As this classifier can be used directly for feature ranking, in the limit we can avoid the computationally demanding recursion and convex optimisation in RFE-SVM. Comparisons on two text based author verification tasks and on three genomic microarray classification tasks indicate that this straightforward method can surprisingly obtain comparable (at times superior) performance and is about an order of magnitude faster.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Book Section
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Theory & Algorithms
ID Code:2984
Deposited By:Conrad Sanderson
Deposited On:21 April 2007