|
Unsupervised decomposition of words for speech recognition and retrieval AbstractMany tasks in speech and language technology require a statis- tical language model that covers a very large vocabulary. Such tasks include automatic speech recognition, audio indexing and retrieval and speech translation. A large vocabulary model is typically needed to predict the probability of occurrence of words and word sequences in different contexts. However, in morphologically complex languages, the high level of word in- ection, composition and derivation increases the amount of different word forms so much that it becomes prohibitive to to construct a sufcient vocabulary. Recently, several effective methods have been developed to dene the suitable sub-word units for building the models. To nd out which methods are most effective in practice, we maintain an evaluation framework called Morpho Challenge where different morphemes can be compared in various state-of-the-art practical evaluation tasks.
[Edit] |