PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

A tree-based distance between distributions: application to classification of neurons
Riwal Lefort and Francois Fleuret
In: IEEE International Conference on Acoustics, Speech and Signal Processing(2012).


The usual strategy for computing a distance between two distributions consists of modeling the distributions in feature space, and of computing the distance between the models. We propose here to model the distributions of points by using unsupervised trees. Our main contribution is the definition of a tree-based approximation of the Kullback-Leibler divergence for very large feature spaces, from which we derive a symmetric distance. Our tree-based KL divergence consists first of building for each set of samples a balanced tree. Then, for any pair of sets of samples, we effectively compute the KL divergence between the empirical distributions at the leaves for the set used to build the tree, and the empirical distribution at the leaves for the other set. We show experimentally on synthetic data the consistency between this quantity and the exact KL divergence, and demonstrate its efficiency for both unsupervised and supervised classification on multiple standard real-world data-sets. Our main application is the characterization of abnormal neuron development.

EPrint Type:Conference or Workshop Item (Poster)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Machine Vision
Theory & Algorithms
ID Code:9362
Deposited By:Francois Fleuret
Deposited On:16 March 2012