PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

The GERMANA database
Daniel Pérez, Lionel Tarazón, Nicolás Serrano, Francisco-Manuel Castro, Oriol Ramos-Terrades and Alfons Juan
In: 10th International Conference on Document Analysis and Recognition, 26-29 July 2009, Spain.

Abstract

A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
ID Code:5664
Deposited By:Alfons Juan
Deposited On:08 March 2010