Within and Across Sentence Boundary Language Model
Saeedeh Momtazi, Friedrich Faubel and Dietrich Klakow
In: Interspeech 2010, 27 Sept - 30 Sept 2010, Makuhari, Japan.
In this paper, we propose two different language modeling approaches,
namely skip trigram and across sentence boundary, to
capture the long range dependencies. The skip trigram model is
able to cover more predecessor words of the present word compared
to the normal trigram while the same memory space is
required. The across sentence boundary model uses the word
distribution of the previous sentences to calculate the unigram
probability which is applied as the emission probability in the
word and the class model frameworks. Our experiments on the
Penn Treebank  show that each of our proposed models and
also their combination significantly outperform the baseline for
both the word and the class models and their linear interpolation.
The linear interpolation of the word and the class models
with the proposed skip trigram and across sentence boundary
models achieves 118.4 perplexity while the best state-of-the-art
language model has a perplexity of 137.2 on the same dataset.