PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

The use of machine translation tools for cross-lingual text mining
Blaz Fortuna and John Shawe-Taylor
In: Learning With Multiple Views, workshop at the ICML, 11 Aug 2005, Bonn, Germany.


Eigen-analysis such as LSI or KCCA was already successfully applied to cross-lingual information retrieval. This approach has a weakness in that it needs an aligned training set of documents. In this paper we address this weakness and show that it can be successfully avoided through the use of machine translation. We show that the performance is similar on the domains where human generated training seta are available. However for other domains artificial training sets can be generated that significantly outperform human generated ones obtained from a different domain.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Information Retrieval & Textual Information Access
ID Code:1208
Deposited By:Blaz Fortuna
Deposited On:24 November 2005