PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Flexible length phrases in document classification
Daniel Radosevic, Jasminka Dobsa and Dunja Mladenić
In: 28th International Conference on Information Technology Interfaces, 19-22 Jun 2006, Cavtat, Dubrovnik, Croatia.


In this paper we investigate possibility of using phrases of flexible length in classification of textual documents as an extension to classic bag of words document representation where documents are represented using single words as index terms. The investigation is conducted on collection of articles from Večernji list. It is shown that usage of flexible length phrases improves precision of automatic document classification and there are indications that such approach could be used for genre classification.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Information Retrieval & Textual Information Access
ID Code:2333
Deposited By:Dunja Mladenić
Deposited On:22 November 2006