PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A tree-based regressor that adapts to intrinsic dimension.
Samory Kpotufe and Sanjoy Dasgupta
Journal of Computer and System Sciences 2011.

Abstract

We consider the problem of nonparametric regression, consisting of learning an arbitrary mapping f : X to Y from a data set of (x; y) pairs in which the y values are corrupted by noise of mean zero. This statistical task is known to to be subject to a severe curse of dimensionality: if X in RD, and if the only smoothness assumption on f is that it satisfies a Lipschitz condition, it is known that any estimator based on n data points will have an error rate (risk) of n^{-2/(2+D)}. Here we present a tree-based regressor whose risk depends only on the doubling dimension of X, not on D. This notion of dimension generalizes two cases of contemporary interest: when X is a low-dimensional manifold, and when X is sparse. The tree is built using random hyperplanes as splitting criteria, building upon recent work of Dasgupta and Freund [DF08]; and we show that axis-parallel splits cannot achieve the same finite-sample rate of convergence.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:7770
Deposited By:Samory Kpotufe
Deposited On:17 March 2011