PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A Statistical Learning Perspective of Genetic Programming
Olivier Teytaud, Christian Gagné, Marc Schoenauer, Sylvain Gelly, Nicolas Bredeche and Nur Merve Amil
In: EuroGP 2009(2009).


This paper proposes a theoretical analysis of Genetic Pro- gramming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical er- ror minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is pro- posed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Learning/Statistics & Optimisation
ID Code:6884
Deposited By:Olivier Teytaud
Deposited On:09 April 2010