PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Investigating Unsupervised Learning for Text Categorization Bootstrapping
Alfio Gliozzo, Carlo Strapparava and Ido Dagan
In: HLT/EMNLP 2005, 6-8 Oct 2005, Vancouver, B.C., Canada.


We propose a generalized bootstrapping algorithm in which categories are described by relevant seed features. Our method introduces two unsupervised steps that improve the initial categorization step of the bootstrapping scheme: (i) using Latent Semantic space to obtain a generalized similarity measure between instances and features, and (ii) the Gaussian Mixture algorithm, to obtain uniform classification probabilities for unlabeled examples. The algorithm was evaluated on two Text Categorization tasks and obtained state-of-theart performance using only the category names as initial seeds.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:1768
Deposited By:Ido Dagan
Deposited On:28 November 2005