Practical Feature Selection:
from Correlation to Causality
Mining Massive Data Sets for Security
Feature selection encompasses a wide variety of methods for selecting a restricted number of input variables or “features”, which are “relevant” to a problem at hand. In this report, we guide practitioners through the maze of methods, which have recently appeared in the literature, particularly for supervised feature selection. Starting from the simplest methods of feature ranking with correlation coefficients, we branch in various direction and explore various topics, including “conditional relevance”, “local relevance”, “multivariate selection”, and “causal relevance”. We make recommendations for assessment methods and stress the importance of matching the complexity of the method employed to the available amount of training data. Software and teaching material associated with this tutorial are available http://clopinet.com/CLOP/.