Inférence grammaticale de langages hors-contextes
PhD thesis, University of Saint-Etienne.
The purpose of the grammatical inference is to study the learnability of formal language classes. Until recently, the researches had been focused on the regular languages.
But things tend to be harder when the following class of complexity, namely context-free languages, is considered. Indeed, important theoretical barriers exist: the majority of the theoretical results are negative and show the impossibility of learning the whole class.
In this document, after having analyzed the inherent difficulties of this inference and having studied the solutions brought by previous works, we propose three approaches.
The first one consists in modifying an algorithm of compression in order to structure the learning examples. We use then an existing algorithm able to learn the whole class from structured examples.
The second approach relies on a change of representation: we use string rewriting systems to represent and handle context-free languages. We introduce hybrid and almost non-overlapping delimited string rewriting systems for which we show an identification in the limit theorem, in polynomial time, using positive and negative examples. A study of the learned class of languages and some experiments are also detailed.
The third work presented here consists in a restriction to a subclass. We define the class of the substitutable languages and propose an algorithm able to identify it in the limit from positive examples only, with polynomial bounds on time and data