PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Principal Component Analysis for Sparse High-Dimensional Data
Tapani Raiko, Alexander Ilin and Juha Karhunen
In: International Conference on Neural Information Processing (ICONIP 2007), November 13-16, 2007, Kitakyushu, Japan.


Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are high-dimensional and a majority of the values are missing. In this case, both of these algorithms turn out to be inadequate. We propose using a gradient descent algorithm inspired by Oja's rule, and speeding it up by an approximate Newton's method. The computational complexity of the proposed method is linear with respect to the number of observed values in the data and to the number of principal components. In the experiments with Netflix data, the proposed algorithm is about ten times faster than any of the four comparison methods.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Conference or Workshop Item (Oral)
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Computational, Information-Theoretic Learning with Statistics
Learning/Statistics & Optimisation
Theory & Algorithms
ID Code:3359
Deposited By:Tapani Raiko
Deposited On:09 February 2008