Hubness-based fuzzy measures for high-dimensiona k-nearest neighbor classification
High-dimensional data are by their very nature often difficult to handle by conventional machine-learning algorithms, which is usually characterized as an aspect of the curse of dimensionality. However, it was shown that some of the arising high-dimensional phenomena can be exploited to increase algorithm accuracy. One such phenomenon is hubness, which refers to the emergence of hubs in high-dimensional spaces, where hubs are influential points included in many k-neighbor sets of other points in the data. This phenomenon was previously used to devise a crisp weighted voting scheme for the k-nearest neighbor classifier. In this paper we go a step further by embracing the soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well the standard kNN classifier.