PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning

Using KCCA for Japanese-English cross-language information retrieval and classification
Yaoyong Li and John Shawe-Taylor
Journal of Intelligent Information Systems Volume tba, Number tba, 2005. ISSN Print: 0925-9902 Electronic: 1573-7675


Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel dened feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese-English cross-language information retrieval. The results are quite encouraging and are signicantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classication using KCCA as well as other methods. Our results show that it is feasible to use a classier learned in one language to classify the documents in other languages.

PDF - Requires Adobe Acrobat Reader or other PDF viewer.
EPrint Type:Article
Project Keyword:Project Keyword UNSPECIFIED
Subjects:Natural Language Processing
Theory & Algorithms
ID Code:973
Deposited By:John Shawe-Taylor
Deposited On:21 May 2005