PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

LARS: A Learning Algorithm for Rewriting Systems
Rémi Eyraud, Colin de la Higuera and Jean-Christophe Janodet
Machine Learning Volume 66, Number 1, pp. 7-31, 2007.


Whereas there is a number of methods and algorithms to learn regular languages, moving up the Chomsky hierarchy is proving to be a challenging task. Indeed, several theoretical barriers make the class of context-free languages hard to learn. To tackle these barriers, we choose to change the way we represent these languages. Among the formalisms that allow the definition of classes of languages, the one of string-rewriting systems (SRS) has outstanding properties. We introduce a new type of SRS's, called Delimited SRS (DSRS), that are expressive enough to define, in a uniform way, a noteworthy and non trivial class of languages that contains all the regular languages, $\{ a^nb^n: n \geq 0 \}$, $\{w\in\{a,b\}^*:|w|_a=|w|_b\}$, the parenthesis languages of Dyck, the language of Lukasiewicz, and many others. Moreover, DSRS's constitute an efficient (often linear) parsing device for strings, and are thus promising candidates in forthcoming applications of grammatical inference. In this paper, we pioneer the problem of their learnability. We propose a novel and sound algorithm (called LARS) which identifies a large subclass of them in polynomial time (but not data). We illustrate the execution of our algorithm through several examples, discuss the position of the class in the Chomsky hierarchy and finally raise some open questions and research directions.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Theory & Algorithms
ID Code:4588
Deposited By:Jean-Christophe Janodet
Deposited On:13 March 2009