Outlier Detection in Benchmark Classification Tasks
We present a new outlier detection method which is appropriate for classification problems. It combines estimating the overall probability density and sequential ranking of the data according to observed changes in performance on validation sets. The method has been implemented on ten widely used benchmark datasets and a spam email filtering application. Evaluated by six popular machine learning methods, classification performances are shown to improve after removing outliers in comparison to removing the same number of examples at random from the datasets.