Classification of sparse high-dimensional vectors. ## AbstractWe study the problem of classification of d-dimensional vectors into two classes (one of which is pure noise) based on a training sample of size m. The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m, and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e., the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and to the case of non-Gaussian noise satisfying the Cramer condition.
[Edit] |