Efficient Sparse Kernel Feature Extraction Based on Partial Least Squares
The presence of irrelevant features in training data is a significant obstacle for many machine learning tasks. One approach to this problem is to extract appropriate features, and often one selects a feature extraction method based on the inference algorithm. Here we formalise a general framework for feature extraction, based on Partial Least Squares (PLS), in which one can select a user defined criteria to compute projection directions. The framework draws together a number of existing results and provides additional insights into several popular feature extraction methods. Two new sparse kernel feature extraction methods are derived under the framework, called Sparse Maximal Alignment (SMA) and Sparse Maximal Covariance (SMC) respectively. Key advantages of these approaches include simple implementation, and a training time which scales linearly in the number of examples. Furthermore, one can project a new test example using only k kernel evaluations where k is the output dimensionality. Computational results on several real-world datasets show that SMA and SMC extract features which are as predictive as those found using other popular feature extraction methods. Additionally, on large text retrieval and face detection datasets, they produce features which match the performance of the original ones in conjunction with a Support Vector Machine (SVM).