Bagging stabilization in decision trees
In: XLIII Scientific Meeting of the Italian Statistical Society, June 2006, Torino.
Bagging is a simple ensemble technique, where an estimator is produced by averaging predictors fitted to bootstrap samples. Bagged decision trees almost consistently improve on the original predictor, and it is widely believed that bagging is effective thanks to the variance reduction stemming from averaging predictors. We provide here a counter-example, and we give experimental evidence supporting that bagging stabilizes prediction by equalizing the influence of training examples. The influence of near-boundary points is increased when they participate to the definition of the split location of any node. Highly influential examples, which have a high weight in deciding the split direction at the root node, are down-weighted due to their absence in some of the bootstrap samples. Recent analyses relating stability to generalization error are empically tested, to see if they account for bagging’s success. We quantify hypothesis stability on several benchmark, and conclude that the influence equalization process improves significantly the stability, which in turn may increase the generalization performances. Our experiments furthermore suggest that the bounds on generalization performances based on the stability analysis are quite tight for unbagged and bagged on decision trees.