Partially Distribution-Free Learning of Regular Languages from Positive Samples
Alexander Clark and Franck Thollard
In: COLING 2004, 23-27 Augist 2004, Geneva, Switzerland.
Regular languages are widely used in NLP today in spite of their shortcomings.
Efficient algorithms that can reliably learn these languages, and which must in realistic applications only
use positive samples, are necessary.
These languages are not learnable under traditional distribution free
We claim that an appropriate learning framework is PAC learning where the distributions are constrained
to be generated by a class of stochastic automata with support equal to the target concept.
We discuss how this is related to other learning paradigms.
We then present a simple learning algorithm for regular languages,
and a self-contained proof that it learns according to this partially
distribution free criterion.