PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Correction of Uniformly Noisy Distributions to Improve Probabilistic Grammatical Inference Algorithms
Amaury Habrard, Marc Bernard and Marc Sebban
In: FLAIRS 2005, 16-18 May 2005, USA.

Abstract

In this paper, we aim at correcting distributions of noisy samples in order to improve the inference of probabilistic automata. Rather than definitively removing corrupted examples before the learning process, we propose a technique, based on statistical estimates and linear regression, allowing to correct the probabilistic prefix tree automaton (PPTA). Our approach requires a human expertise to correct a small sample of data, randomly generated from the whole learning set, in order to estimate the noise level. This statistical information permits to automatically correct the PPTA and then to infer better models from a generalization point of view. After a theoretical analysis of the noise impact, we show experimentally that our technique is able to improve the quality of the inferred models whatever the level of noise. Keywords. probabilistic grammatical inference, noisy data, distribution correcting

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Paper)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:91
Deposited By:Amaury Habrard
Deposited On:18 May 2004