An introduction to feature extraction
Isabelle Guyon and Andre Elisseeff
Feature Extraction: Foundations and Applications
Studies in Fuzziness and Soft Computing
Feature extraction addresses the problem of finding the most compact
and informative set of features, to improve the efficiency or data
storage and processing. Defining feature vectors remains the most
common and convenient means of data representation for
classification and regression problems. Data can then be stored in
simple tables (lines representing ``entries", ``data points,
``samples", or ``patterns", and columns representing ``features").
Each feature results from a quantitative or qualitative measurement,
it is an ``attribute" or a ``variable". Modern feature extraction
methodology is driven by the size of the data tables, which is ever
increasing as data storage becomes more and more efficient.
After many years of parallel efforts, researchers in Soft-Computing,
Statistics, Machine Learning, and Knowledge Discovery, who are
interested in predictive modeling are uniting their effort to
advance the problem of feature extraction. The recent advances made
in both sensor technologies and machine learning techniques make it
possible to design recognition systems, which are capable of
performing tasks that could not be performed in the past. Feature
extraction lies at the center of these advances with applications in
the pharmaco-medical industry, oil industry, industrial inspection
and diagnosis systems, speech recognition, biotechnology, Internet,
targeted marketing and many of other emerging applications.
This chapter introduces the book "Feature Extraction: Foundations and Applications", organized around the results of a benchmark that took place in 2003 (the website of the challenge is still active) and whose results were discussed at the NIPS 2003 workshop on feature extraction. This book is a step towards validating, unifying, and formalizing approaches. The introduction chapter presents an overview of the field of feature extraction, the results presented in the book, and a research outlook.