PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Representing Languages by Learnable Rewriting Systems
Rémi Eyraud, Colin de la Higuera and Jean-Christophe Janodet
In: ICGI 2004, 11-13 october 2004, Athens, Greece.

Abstract

Powerful methods and algorithms are known to learn regular languages. Aiming at extending them to more complex grammars, we choose to change the way we represent these languages. Among the formalisms that allow to define classes of languages, the one of string-rewriting systems (SRS) has outstanding properties. Indeed, SRS are expressive enough to define, in a uniform way, a noteworthy and non trivial class of languages that contains all the regular languages, { a^nb^n: n \geq 0 }, {w\in\{a,b\}^*:|w|_a=|w|_b}, the parenthesis languages of Dyck, the language of Lukasewitz, and many others. Moreover, SRS constitute an efficient (often linear) parsing device for strings, and are thus promising and challenging candidates in forthcoming applications of Grammatical Inference. In this paper, we pioneer the problem of their learnability. We propose a novel and sound algorithm which allows to identify them in polynomial time. We illustrate the execution of our algorithm throughout a large amount of examples and finally raise some open questions and research directions.

Postscript - PASCAL Members only - Requires a viewer, such as GhostView
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Theory & Algorithms
ID Code:157
Deposited By:Rémi Eyraud
Deposited On:05 June 2004