PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Unsupervised decomposition of words for speech recognition and retrieval
Mikko Kurimo, Teemu Hirsimäki, Ville Turunen, Sami Virpioja and Niklas Raatikainen
In: 13th International Conference Speech and Computer, SPECOM 2009(2009).


Many tasks in speech and language technology require a statis- tical language model that covers a very large vocabulary. Such tasks include automatic speech recognition, audio indexing and retrieval and speech translation. A large vocabulary model is typically needed to predict the probability of occurrence of words and word sequences in different contexts. However, in morphologically complex languages, the high level of word in- ection, composition and derivation increases the amount of different word forms so much that it becomes prohibitive to to construct a sufcient vocabulary. Recently, several effective methods have been developed to dene the suitable sub-word units for building the models. To nd out which methods are most effective in practice, we maintain an evaluation framework called Morpho Challenge where different morphemes can be compared in various state-of-the-art practical evaluation tasks.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Information Retrieval & Textual Information Access
ID Code:6053
Deposited By:Mikko Kurimo
Deposited On:08 March 2010