PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Classification of sparse high-dimensional vectors.
Yu. Ingster, C. Pouet and A.B. Tsybakov
Philosophical Transactions Royal Soc. A Volume 367, pp. 4427-4448, 2009.


We study the problem of classification of d-dimensional vectors into two classes (one of which is pure noise) based on a training sample of size m. The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m, and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e., the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and to the case of non-Gaussian noise satisfying the Cramer condition.

EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
ID Code:6921
Deposited By:Alexandre Tsybakov
Deposited On:16 April 2010