PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Morfessor and VariKN machine learning tools for speech and language technology
Vesa Siivola, Mathias Creutz and Mikko Kurimo
In: Interspeech 2007, 27-31 Aug 2007, Antwerp, Belgium.

Abstract

This paper introduces two recent open source software packages developed for unsupervised natural language modeling. The Morfessor program segments words automatically into morpheme-like units without any rule-based morphological analyzers. The VariKN toolkit trains language models producing a compact set of high-order n-grams utilizing state-of-art Kneser- Ney smoothing. As an example, this paper shows how to construct a language model for speech recognition in multiple languages utilizing only a minimal amount of linguistic resources. Morfessor and VariKN also have other applications in text understanding, information retrieval and machine translation. Unsupervised machine learning techniques are particularly well suited for the development of systems for less-resourced languages, because they do not depend on manually designed morphological or syntactical analyzers or annotated data.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Natural Language Processing
Speech
Information Retrieval & Textual Information Access
ID Code:3717
Deposited By:Mikko Kurimo
Deposited On:14 February 2008