Low-Rank Approximations for Large, Multi-lingual
In this paper we compare low rank approximation methods for data with a particular structure: documents in multiple languages. Rather than looking at only 2 languages in time, we examine the structure in up to 21 languages. The algorithms we choose to compare are k-means, cross-lingual latent semantic indexing(CL-LSI), and multi-view cannonical correlation analysis (mCCA). We test these methods on the European Parliament Proceedings Parallel Corpus.