PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Low-Rank Approximations for Large, Multi-lingual Data
Jan Rupnik, Andrej Muhič and Primož Škraba
In: NIPS 2011, 16-17 Dec 2011, Sierra Nevada, Spain.


In this paper we compare low rank approximation methods for data with a particular structure: documents in multiple languages. Rather than looking at only 2 languages in time, we examine the structure in up to 21 languages. The algorithms we choose to compare are k-means, cross-lingual latent semantic indexing(CL-LSI), and multi-view cannonical correlation analysis (mCCA). We test these methods on the European Parliament Proceedings Parallel Corpus.

EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:8733
Deposited By:Jan Rupnik
Deposited On:21 February 2012