PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Enhanced context recognition by sensitivity pruned vocabularies
Rasmus Elsborg Madsen, Sigurdur Sigurdsson and Lars Kai Hansen
In: 17th International Conference on Pattern Recognition, 23-26 Aug 2004, Cambridge, United Kingdom.

Abstract

Language independent `bag-of-words' representations are surprisingly effective for text classification. The generic BOW approach is based on a high-dimensional vocabulary which may reduce the generalization performance of subsequent classifiers, e.g., based on ill-posed principal component transformations. In this communication our aim is to study the effect of sensitivity based pruning of the bag-of-words representation. We consider neural network based sensitivity maps for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Pruning the vocabularies to approximately 20% of the original size, we find consistent context recognition enhancement for two mid size data-sets for a range of training set sizes. We also study the applicability of the sensitivity measure for automated keyword generation.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:842
Deposited By:Rasmus Elsborg Madsen
Deposited On:01 January 2005