PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Hubness-based fuzzy measures for high-dimensiona k-nearest neighbor classification
Nenad Tomašev, Miloš Radovanović, Dunja Mladenić and Mirjana Ivanović
In: MLDM 2011, 30 Aug - 03 Sep 2011, New York, USA.

Abstract

High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Information Retrieval & Textual Information Access
ID Code:8710
Deposited By:Jan Rupnik
Deposited On:21 February 2012