An efficient alternative to SVM based recursive feature elimination with applications in natural language processing and bioinformatics
Justin Bedo, Conrad Sanderson and Adam Kowalczyk
AI 2006: Advances in Artificial Intelligence
Lecture Notes in Computer Science
Springer Berlin / Heidelberg
The SVM based Recursive Feature Elimination (RFE-SVM) algorithm is a popular technique for feature selection, used in natural language processing and bioinformatics. Recently it was demonstrated that a small regularisation constant C can considerably improve the performance of RFE-SVM on microarray datasets. In this paper we show that further improvements are possible if the explicitly computable limit C -> 0 is used. We prove that in this limit most forms of SVM and ridge regression classifiers scaled by the factor 1/C converge to a centroid classifier. As this classifier can be used directly for feature ranking, in the limit we can avoid the computationally demanding recursion and convex optimisation in RFE-SVM. Comparisons on two text based author verification tasks and on three genomic microarray classification tasks indicate that this straightforward method can surprisingly obtain comparable (at times superior) performance and is about an order of magnitude faster.