PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Using PCA for Probabilistic Grammatical Inference on Trees
François Denis, Raphael Bailly, Edouard Gilbert and Amaury Habrard
In: NIPS 2009 workshop on Grammar Induction, Representation of Language and Language Learning, Whistler, Canada(2009).

Abstract

We focus on the classical problem in grammatical inference of learning stochastic tree languages from finite samples of trees independently drawn according to a fixed unknown distribution. We consider here the class of stochastic tree languages that can be computed by rational tree series which can be viewed as a strict generalization of probabilistic tree automata. The class of rational stochastic tree languages has an algebraic characterization: All the residuals of a stochastic languages lie in a finite vector subspace. We propose a principle based on Principal Components Analysis to identify this vector subspace. This approach allows us to define a global solution of the problem instead of building an automaton iteratively as done by standard probabilistic grammatical inference algorithm. This is a way to tackle the main drawback of these approaches that is using statistical tests that rely on less and less examples when the structure grows. We provide an algorithm that computes an estimate of the target vector subspace and build a linear representation of a tree series giving an estimation of the target distribution. We notably show that in the case of tree languages, we have to consider the dual vector subspace to build the representation.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:5658
Deposited By:Amaury Habrard
Deposited On:08 March 2010