PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Sparse Canonical Correlation Analysis
David Hardoon and John Shawe-Taylor
Journal of Machine Learning Research 2007.

Abstract

In this paper we present a novel method for solving Canonical Correlation Analysis (CCA) in a sparse convex framework using a least squares approach. The presented method focuses on the scenario when one is interested in (or limited to) a primal representation for the first view while having a dual representation for the second view. Sparse CCA (SCCA) minimises the number of features used in both the primal and dual projections while maximising the correlation between the two views. The method is demonstrated on two paired corpuses of English-French and English-Spanish for mate-retreival and word generation tasks. We are able to observe, in the mate-retreival, that when the number of the original features is large SCCA outperforms Kernel CCA (KCCA), learning the common semantic space from a sparse set of features. We are also able to show that SCCA can be used as a word generation technique to produce a sparse set of words from the training corpus for a new document query.

PDF - PASCAL Members only - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Multimodal Integration
Theory & Algorithms
Information Retrieval & Textual Information Access
ID Code:3249
Deposited By:David Hardoon
Deposited On:30 January 2008