PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Bagging stabilization in decision trees
Yves Grandvalet
In: XLIII Scientific Meeting of the Italian Statistical Society, June 2006, Torino.

Abstract

Bagging is a simple ensemble technique, where an estimator is produced by averaging predictors fitted to bootstrap samples. Bagged decision trees almost consistently improve on the original predictor, and it is widely believed that bagging is effective thanks to the variance reduction stemming from averaging predictors. We provide here a counter-example, and we give experimental evidence supporting that bagging stabilizes prediction by equalizing the influence of training examples. The influence of near-boundary points is increased when they participate to the definition of the split location of any node. Highly influential examples, which have a high weight in deciding the split direction at the root node, are down-weighted due to their absence in some of the bootstrap samples. Recent analyses relating stability to generalization error are empically tested, to see if they account for bagging’s success. We quantify hypothesis stability on several benchmark, and conclude that the influence equalization process improves significantly the stability, which in turn may increase the generalization performances. Our experiments furthermore suggest that the bounds on generalization performances based on the stability analysis are quite tight for unbagged and bagged on decision trees.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Invited Talk)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Theory & Algorithms
ID Code:2590
Deposited By:Yves Grandvalet
Deposited On:22 November 2006