Correction of Uniformly Noisy Distributions to Improve Probabilistic Grammatical Inference Algorithms
Amaury Habrard, Marc Bernard and Marc Sebban
In: FLAIRS 2005, 16-18 May 2005, USA.
In this paper, we aim at correcting distributions of noisy samples in order to improve the inference of probabilistic automata. Rather than definitively removing corrupted examples before the learning process, we propose a technique, based on statistical estimates and linear regression, allowing to correct the probabilistic prefix tree automaton (PPTA).
Our approach requires a human expertise to correct a small sample of data, randomly generated from the whole learning set, in order to estimate the noise level. This statistical information permits to automatically correct the PPTA and then to infer better models from a generalization point of view.
After a theoretical analysis of the noise impact, we show experimentally that our technique is able to improve the quality of the inferred models whatever the level of noise.
Keywords. probabilistic grammatical inference, noisy data, distribution correcting