Learning Context-Free Languages
Colin de la Higuera and Jose Oncina
Language learning is referred to as grammatical inference or grammar induction. Whereas the problem of learning or inferring regular languages (usually represented by deterministic finite state automata) has been well studied, the one of learning context-free languages has received less attention and is recognised to be a harder problem. We present in this survey a number of better or worse known results, concerning all the important learning tasks related to the class of context-free languages: learning from text and from an informant, learning with queries or with mistakes, learning from additional help which can be a partial knowledge of the structure or using the hypothesis that the actual distribution is modelled by a stochastic context-free grammar. We show that the state of the art is
mainly made of negative results. Conversely, as the practical implications of these languages are important, many specific heuristics have been proposed to deal with the question of their learning. We explore some of these heuristics and propose some research directions.