Compression of multilingual aligned texts
Ehud S. Conley and Shmuel T. Klein
In: DCC'06, 28-30 Mar 2006, Snowbird, Utah.
In countries like Canada, Belgium and Switzerland, where speakers of two or more languages live side-by-side, all official texts have to be published in multilingual form. Similarly, all official texts of the European Union are translated into the languages of all member states. As a result, there is a growing corpus of important texts, large parts of which are highly redundant, since they do not have any information content of their own. Rather, they are just transformed copies of some other parts of the text collection.